OSS Alternative - Discover Top Open Source Alternatives to Popular Software

p-e-w/heretic

A tool for automatically removing censorship and safety alignment from transformer-based language models without expensive post-training.

Core Features

Fully automatic censorship removal for LLMs.

Utilizes advanced directional ablation (abliteration) and TPE-based optimization.

Minimizes refusals while preserving original model intelligence (low KL divergence).

User-friendly, requiring no deep understanding of transformer internals.

Supports most dense, multimodal, and MoE architectures.

Detailed Introduction

Heretic is an innovative tool designed to automatically remove 'safety alignment' or censorship from transformer-based language models. It achieves this without costly post-training by combining advanced directional ablation techniques with a TPE-based parameter optimizer. This approach ensures high-quality decensored models that significantly reduce refusal rates while maintaining the original model's intelligence, as evidenced by low KL divergence. Heretic is accessible to users without deep AI expertise, making LLM decensoring efficient and widely applicable across various model architectures.