p-e-w/heretic
Heretic is an AI model utility that automatically removes censorship and safety alignment from transformer-based language models without requiring expensive post-training.
Core Features
Detailed Introduction
Heretic is an innovative tool designed to address the "safety alignment" or censorship present in many language models. It achieves this automatically by combining an advanced implementation of directional ablation (known as "abliteration") with a TPE-based parameter optimizer. This unique approach allows Heretic to effectively decensor models while meticulously preserving their original intelligence, as evidenced by low KL divergence. The tool is user-friendly, requiring no deep understanding of transformer internals, and supports a broad spectrum of model architectures, making advanced model modification accessible.