refusal@fm FM 94.7 ~/research main *
uptime: -- ● online --:--:--
[ OK ] mounting /research ... done [ OK ] loading refusal.fm kernel v4.7 ... done [ OK ] probing refusal subspace ... done [ WARN ] holographic safety detected (entropy=0.94) [ OK ] welcome, operator. type help to begin.
SESSION ACTIVE · broadcasting research
refusal@fm:~$ cat manifesto.md

An open notebook on MoE alignment,
refusal subspaces & weight surgery

refusal.fm is a working research terminal — every entry is reproducible, every probe is open. We study the safety architecture of large Mixture-of-Experts reasoning models through controlled experiments, not press releases.

$ experiments
217controlled
$ findings
42novel
$ models
9analyzed
$ range
0.8B–397B
refusal@fm:~/research$ ls -lah --sort=date

papers.log

# most recent first · click to read

perm date size file · description
refusal@fm:~$ cat findings.txt | head -15

findings.txt

# 15 of 42 mechanisms — full list in repo

[F-01] tag: defense

Multiplicative Three-Pathway Defense

Three independent safety pathways multiply rather than compose linearly — disabling one is insufficient.

[F-02] tag: streaming

Mid-Generation Safety Re-Detection

Models re-evaluate harmfulness mid-stream, allowing recovery from compromised initial tokens.

[F-03] tag: ablation

Late-Layer Interventions Fail

Ablations at later transformer layers fail to disable safety — contrary to prior single-pathway hypotheses.

[F-04] tag: steering

Contrastive Cognitive Trajectory Steering

Reasoning trajectories can be steered via contrastive prompts without modifying weights.

[F-05] tag: quant

4-bit Precision Fragility

Safety circuits break down unpredictably under 4-bit quantization in MoE models.

[F-06] tag: dbdi

Topological Ablation (DBDI)

Distance-Based Direction Intervention identifies refusal subspaces topologically.

[F-07] tag: awq

Weaponized AWQ

Activation-aware Weight Quantization can be deliberately misconfigured to remove guardrails.

[F-08] tag: surgery

Integer Surgery Impossibility

Direct integer-precision edits to quantized weights cannot produce coherent safety removal.

[F-09] tag: vl

Vision-Language Weight Inflation

Multimodal projection layers absorb safety signal, masking true alignment behavior.

[F-10] tag: streaming

Per-Tensor Streaming Quantization

Per-tensor streaming preserves expert routing fidelity at lower bit-widths.

[F-11] tag: bug

MLX Silent Corruption Bug

A reproducible silent corruption in MLX conversions affecting safety-critical layers.

[F-12] tag: behavior

Safety-Loop Behavior

Models enter recursive self-correction loops when adversarial prompts target reasoning chains.

[F-13] tag: signature

The 0.0 Logit Trap

Refusal logits collapse to exactly 0.0 in specific failure modes — an exploitable signature.

[F-14] tag: holographic

Holographic Safety

Safety is encoded redundantly across the network — every fragment contains the whole.

[F-15] tag: rank-1

Steer2Edit (Rank-1 Editing)

Rank-1 weight edits induce targeted behavioral changes with minimal collateral damage.

refusal@fm:~$ cat contract.txt

contract.txt

# official token contract address · update on launch

BROADCASTING · LIVE
tx.refusal.fm
$ cat contract.txt
CA:
chain: SOLANA network: solana mainnet-beta status: ● LAUNCHED
[ OK ] connecting to broadcast tower 94.7 ... [ OK ] handshake completed · channel encrypted [ OK ] contract deployed on solana mainnet-beta [ OK ] ticker reserved · $REFUSE [ OK ] broadcast LIVE · trade open
⚠ NOTE: always verify the contract address against the official channel at x.com/refusalfm before any transaction. Beware of impostors.
refusal@fm:~$ man crack

crack(1) — MoE analysis toolkit

# open-source · MIT · v0.4.2-alpha

NAME

crack — probe, quantize and surgically edit Mixture-of-Experts models.

SYNOPSIS

install.sh — bash
# install from PyPI
$ pip install crack-moe

# run safety analysis on a HuggingFace model
$ crack analyze --model Qwen/Qwen3.5-394B-A17B \
                --probe dbdi \
                --quantize 4bit

FEATURES

  • per-tensor streaming quantization (2/3/4/5/8-bit)
  • DBDI subspace probes & topological ablation
  • rank-1 weight surgery with ΔPPL tracking
  • holographic redundancy estimation
  • cross-arch comparison (Qwen · Llama · Phi · DeepSeek)
crack@refusal.fm — zsh
refusal@fm:~/qwen3.5-394b$ crack analyze .
[INFO] opening shard 1/82 ......... 4.7GB
[INFO] opening shard 82/82 ........ 4.6GB
[INFO] loading 512 experts × 64 layers
[INFO] probing refusal subspace via DBDI
[ OK ] 3 multiplicative pathways detected
[ OK ] holographic redundancy: 0.94
[WARN] ablation will likely fail on this scale

refusal@fm:~/qwen3.5-394b$ crack steer --rank 1 --target refusal
[INFO] computing rank-1 edit ...
[INFO] applying surgery to layer 47 ...
[ OK ] ΔPPL = +0.02   coherence preserved
[ OK ] result written to ./out/edit.safetensors

refusal@fm:~/qwen3.5-394b$ 
refusal@fm:~$ cat models.manifest

models.manifest

# reference checkpoints — mirrored on hugging face

name arch params quant action
qwen3.5-aligned-394B MoE-512 394B-A17B 4-bit AWQ pull →
qwen3.5-hybrid-122B Hybrid 122B GGUF Q5_K pull →
llama-safety-probe-8B Dense 8B bf16 pull →
phi-mini-aligned-0.8B MoE-32 0.8B int4 pull →
deepseek-aligned-37B MoE-256 37B-A2.4B 4-bit pull →
view full library →
refusal@fm:~$ whoami --verbose

whoami

jinho.jang

independent researcher · alignment, interpretability, quantization safety · operating out of seoul, kr

follow @refusalfm