OSS Alternative - Discover Top Open Source Alternatives to Popular Software

p-e-w/heretic

Heretic is an AI model utility that automatically removes censorship and safety alignment from transformer-based language models without requiring expensive post-training.

Core Features

Fully automatic censorship removal for LLMs.

Leverages advanced directional ablation ("abliteration") techniques.

Optimizes parameters using TPE-based methods powered by Optuna.

Preserves original model intelligence by minimizing KL divergence.

Supports a wide range of dense, multimodal, and MoE architectures.

Detailed Introduction

Heretic is an innovative tool designed to address the "safety alignment" or censorship present in many language models. It achieves this automatically by combining an advanced implementation of directional ablation (known as "abliteration") with a TPE-based parameter optimizer. This unique approach allows Heretic to effectively decensor models while meticulously preserving their original intelligence, as evidenced by low KL divergence. The tool is user-friendly, requiring no deep understanding of transformer internals, and supports a broad spectrum of model architectures, making advanced model modification accessible.