Ship safer frontier models
Frontier models learn everything – including behaviors and data they shouldn’t. Machine Unlearning is a breakthrough in post-training and alignment, delivering optimized results for less time, data, and compute
The Post-Training Bottleneck
Current post-training methods are often too broad for surgical fixes, demanding high volumes of compute and data while leading to conflicting updates and unpredictable behavioral shifts
Precision Mismatch
SFT and RLHF optimize broad objectives, which can cause wide-scale "collateral damage" when you only need to fix a localized behavior.
Data Retraining Deadlock
Relying on new datasets for every emerging concern is reactive, expensive, and slow
Efficiency Roadblock
Current post-training is blunt and expensive — fixing narrow issues requires weeks of compute, and every new problem triggers another slow retraining cycle
Surgical Precision. No Performance Trade-off.

Explicit Behavior Mitigation
Identify and attenuate the specific internal representations driving unwanted patterns—like hallucinations or bias—without disrupting world knowledge or reasoning.
Parameter-Level Data Elimination
Completely eliminate the influence of copyrighted, restricted, or poisoned training data at the parameter level
Transparent, Modular & Reversible Edits
Alignment changes are expressed as Behavior Vectors or LoRA Adapters. These artifacts can be independently evaluated, scaled, or rolled back without modifying the base model weights
85% reduction in jailbreaks
Successful prompt injections reduced by up to 85%, validated on benchmarks like PurpleLlama across open-weight frontier models.
70% reduction in biases
Unlearned LLMs achieved up to 70% reduction in biases, verified on benchmarks like BBQ.
97% faster
Targeted unlearning completed in hours on standard GPU infrastructure – instead of weeks of retraining.
Built for Security and Transparency
Current post-training methods are often too broad for surgical fixes, demanding high volumes of compute and data while leading to conflicting updates and unpredictable behavioral shifts
White-Box Control: Behavior definitions are derived from explicit, reviewable datasets that can be fully customized by your team.
Secure Deployment: Designed for fully airgapped environments via Kubernetes and S3-compatible storage
Comprehensive Diagnostics: Scan models against custom or industry benchmarks to automatically assess failure points before remediation
What problem does Hirundo solve for frontier model builders?
Hirundo addresses the "Post-Training Bottleneck". Once a frontier model is trained, builders often find it impossible to surgically remove the influence of specific data or suppress emergent unsafe behaviors without undergoing broad, expensive retraining. Hirundo provides the first Machine Unlearning Platform for optimized behavior editing, allowing labs to correct these localized issues in hours rather than weeks
What kinds of issues can Hirundo address?
Hirundo focuses on two distinct classes of problems:
- Data traces – specific training data that must be removed (e.g., copyrighted text, leaked datasets, sensitive records).Emergent behaviors
- Emergent behaviors – global patterns like jailbreak susceptibility, bias, or hallucination tendencies.
These are treated differently, because they are fundamentally different technical problems.
How does data unlearning work in practice?
For data unlearning, Hirundo utilizes patent-pending Data Influence and Weight Editing technology to identify and suppress the specific influence of target data. This process makes the data unrecoverable while preserving the model's overall learned representations. Effectiveness is confirmed through localized extensive red teaming tests and downstream utility benchmarks.
How does Hirundo handle learned behaviors like jailbreaks?
Since learned behaviors cannot be literally" deleted," Hirundo identifies the specific internal vector representations correlated with the undesired behavior. By attenuating or inverting these target vectors, the platform reduces how strongly and frequently that behavior is expressed. The objective is a measurable reduction in exploitability, such as the 85% reduction in successful jailbreaks observed in frontier-scale evaluations
How do you know these changes don’t degrade the model?
Every unlearning operation undergoes a rigorous Optimized Post-Training Workflow. Edits are initially expressed as modular LoRA Adapters or Behavior Vectors that are evaluated independently against risk-specific benchmarks (e.g., PurpleLlama) and general utility benchmarks (e.g., MMLU-Pro, GPQA). Changes are only merged if the targeted risk reduction occurs without causing broad regressions in reasoning or world knowledge
What results have you observed on frontier-scale models?
On published evaluations, Hirundo has demonstrated:
- Up to 85% reduction in successful jailbreaks, measured on benchmarks such as PurpleLlama
- Up to ~70% reduction in measured bias across multiple categories
- ~1% change on general utility benchmarks
These results reflect targeted reduction of risk while preserving overall capability.
How is this different from additional post-training or fine-tuning?
Traditional post-training (SFT/RLHF) is an additive process that optimizes broad objectives, often leading to conflicting parameter updates and unpredictable behavioral shifts. In contrast, Hirundo is a subtractive or corrective process. It utilizes high-signal, "uncoupled" datasets—often 1/10th the size of traditional tuning sets—to provide granular control that avoids the "collateral damage" typical of broad retraining
When in the model lifecycle does Hirundo make sense?
Hirundo is most useful when issues are discovered post training: during red teaming, external evaluation, deployment preparation, or public release. It provides a way to respond to those findings without re-running long, compute-intensive training cycles.
What are the limits of this approach?
Hirundo does not claim to make models perfectly safe. Emergent behaviors can be attenuated, not erased, and new failure modes can always appear. The value is giving labs a practical way to reduce known risks post-training, quickly and precisely, without destabilizing the rest of the model.
Say something about verifying not harming the model while unlearning, by using multiple benchmarks and various sensitivity tests?
Seamless integration with your AI stack
No workflow changes needed. Our SOC-2 certified solution runs as an API or platform, with deployment available via SaaS, VPC, or air-gapped on premises.
Leading AI experts trust Hirundo

As AI regulation evolves, cost effective Machine Unlearning technology will become a must.

Avi Tel-Or
CTO, Intel Ignite

I've tried many data quality solutions. Hirundo finds data issues and mislabels at a level I’ve never seen before.

Dan Erez
AI Tech Lead, Taranis