Compact Proofs of Model Performance via Mechanistic Interpretability

Louis Jaburi · Independent researcher

December 2024

Proposes constructing rigorous, compact proofs about neural-network behavior using mechanistic interpretability. Discusses challenges and scaling directions for formal verification.

Watch recording

Readings

arXiv:2406.11779
arXiv:2410.07476