refusal.fm :: AI alignment research broadcast

refusal@fm:~/research$ ls -lah --sort=date

papers.log

# most recent first · click to read

perm date size file · description

-rw-r--r-- 2026-03-06 4.2M

safety_across_scale_0.8B_to_397B.pdf

Cross-architecture comparison revealing qualitative phase transitions across nine models from 0.8B to 397B parameters.

#phase-transitions#holographic-safety#quantization

-rw-r--r-- 2026-03-02 2.8M

abliteration_at_hybrid_frontier_qwen3.5_122b.pdf

GGUF conversion barrier & semantic evasion in Qwen 3.5 122B. The un-pruned 122B is harder to modify than the 3× larger 394B.

#gguf-barrier#semantic-evasion#hybrid-arch

-rw-r--r-- 2026-02-24 3.6M

novel_moe_safety_topological_ablation.pdf

Structural abliteration is fundamentally impossible for 300B+ MoE models with 512 experts/layer under 4-bit quantization.

#topological-ablation#multi-pathway#moe-512

-rw-r--r-- 2026-02-11 1.9M

rank1_steer2edit_minimal_weight_surgery.pdf

Rank-1 weight edits induce targeted behavioral changes with ΔPPL < 0.05 across 6 reasoning benchmarks.

#rank-1#weight-surgery#steer2edit

refusal@fm:~$ cat findings.txt | head -15

findings.txt

# 15 of 42 mechanisms — full list in repo

[F-01] tag: defense

Multiplicative Three-Pathway Defense

Three independent safety pathways multiply rather than compose linearly — disabling one is insufficient.

[F-02] tag: streaming

Mid-Generation Safety Re-Detection

Models re-evaluate harmfulness mid-stream, allowing recovery from compromised initial tokens.

[F-03] tag: ablation

Late-Layer Interventions Fail

Ablations at later transformer layers fail to disable safety — contrary to prior single-pathway hypotheses.

[F-04] tag: steering

Contrastive Cognitive Trajectory Steering

Reasoning trajectories can be steered via contrastive prompts without modifying weights.

[F-05] tag: quant

4-bit Precision Fragility

Safety circuits break down unpredictably under 4-bit quantization in MoE models.

[F-06] tag: dbdi

Topological Ablation (DBDI)

Distance-Based Direction Intervention identifies refusal subspaces topologically.

[F-07] tag: awq

Weaponized AWQ

Activation-aware Weight Quantization can be deliberately misconfigured to remove guardrails.

[F-08] tag: surgery

Integer Surgery Impossibility

Direct integer-precision edits to quantized weights cannot produce coherent safety removal.

[F-09] tag: vl

Vision-Language Weight Inflation

Multimodal projection layers absorb safety signal, masking true alignment behavior.

[F-10] tag: streaming

Per-Tensor Streaming Quantization

Per-tensor streaming preserves expert routing fidelity at lower bit-widths.

[F-11] tag: bug

MLX Silent Corruption Bug

A reproducible silent corruption in MLX conversions affecting safety-critical layers.

[F-12] tag: behavior

Safety-Loop Behavior

Models enter recursive self-correction loops when adversarial prompts target reasoning chains.

[F-13] tag: signature

The 0.0 Logit Trap

Refusal logits collapse to exactly 0.0 in specific failure modes — an exploitable signature.

[F-14] tag: holographic

Holographic Safety

Safety is encoded redundantly across the network — every fragment contains the whole.

[F-15] tag: rank-1

Steer2Edit (Rank-1 Editing)

Rank-1 weight edits induce targeted behavioral changes with minimal collateral damage.

refusal@fm:~$ cat contract.txt

contract.txt

# official token contract address · update on launch

BROADCASTING · LIVE

tx.refusal.fm

refusal@fm:~/wallet$ decrypt --target=token.ca

CA:

chain: BASE network: base mainnet status: ● LAUNCHED

view on BaseScan DexScreener chart trade on Uniswap

[ OK ] connecting to broadcast tower 94.7 ... [ OK ] handshake completed · channel encrypted [ OK ] contract deployed on base mainnet · chainId 8453 [ OK ] ticker reserved · $REFUSE [ OK ] broadcast LIVE · trade open

⚠ NOTE: always verify the contract address against the official channel at x.com/refusalfm before any transaction. Beware of impostors.

refusal@fm:~$ man crack

crack(1) — MoE analysis toolkit

# open-source · MIT · v0.4.2-alpha

NAME

crack — probe, quantize and surgically edit Mixture-of-Experts models.

SYNOPSIS

install.sh — bash

# install from PyPI
$ pip install crack-moe

# run safety analysis on a HuggingFace model
$ crack analyze --model Qwen/Qwen3.5-394B-A17B \
                --probe dbdi \
                --quantize 4bit

FEATURES

per-tensor streaming quantization (2/3/4/5/8-bit)
DBDI subspace probes & topological ablation
rank-1 weight surgery with ΔPPL tracking
holographic redundancy estimation
cross-arch comparison (Qwen · Llama · Phi · DeepSeek)

git clone crack ./docs

crack@refusal.fm — zsh

refusal@fm:~/qwen3.5-394b$ crack analyze .
[INFO] opening shard 1/82 ......... 4.7GB
[INFO] opening shard 82/82 ........ 4.6GB
[INFO] loading 512 experts × 64 layers
[INFO] probing refusal subspace via DBDI
[ OK ] 3 multiplicative pathways detected
[ OK ] holographic redundancy: 0.94
[WARN] ablation will likely fail on this scale

refusal@fm:~/qwen3.5-394b$ crack steer --rank 1 --target refusal
[INFO] computing rank-1 edit ...
[INFO] applying surgery to layer 47 ...
[ OK ] ΔPPL = +0.02   coherence preserved
[ OK ] result written to ./out/edit.safetensors

refusal@fm:~/qwen3.5-394b$

refusal@fm:~$ cat models.manifest

models.manifest

# reference checkpoints — mirrored on hugging face

qwen3.5-aligned-394B MoE-512 394B-A17B 4-bit AWQ pull →

qwen3.5-hybrid-122B Hybrid 122B GGUF Q5_K pull →

llama-safety-probe-8B Dense 8B bf16 pull →

phi-mini-aligned-0.8B MoE-32 0.8B int4 pull →

deepseek-aligned-37B MoE-256 37B-A2.4B 4-bit pull →

view full library →

refusal@fm:~$ whoami --verbose

whoami

jinho.jang

independent researcher · alignment, interpretability, quantization safety · operating out of seoul, kr

exploit.bot jangq.ai vmlx.net mlx.studio github mail

follow @refusalfm

An open notebook on MoE alignment, refusal subspaces & weight surgery