← Tous les séminaires

Compact Proofs of Model Performance via Mechanistic Interpretability

Louis Jaburi · Independent researcher

décembre 2024

Proposes constructing rigorous, compact proofs about neural-network behavior using mechanistic interpretability. Discusses challenges and scaling directions for formal verification.