Bayesian oracles and safety bounds

Yoshua Bengio · Mila, Université de Montréal

novembre 2024

Investigates safety advantages of training a Bayesian oracle to estimate P(answer | question, data). Explores catastrophic-risk scenarios, failure modes, and using the oracle for conservative risk bounds.

Voir l'enregistrement