- Toronto AI Safety Meetup
- Incentivizing Honesty for Conditional Predictions via Zero-Sum Competition by Rubi
- Contrib
- Prev: impossible to use cond. preds from an agent to always take best action
- Inner alignment: learning objective, outer alignment: selecting alignment right
- Prediction is the easiest inner alignment problem
- Not easy, but easier than everything else
- Issues
- Observables don't include latent knowledge
- Anthropic capture
- Performative prediction
- prediction effects outcomes
- self-fulfilling is less common
- unstable is more common - predicting X causes not X
- example
- patient has 10% to die if they get extra care, 90% otherwise
- flagged for extra care if >70% of dying
- prediction that maximizes accuracy is 70% (just under threshold), which also maximizes death
- Can avoid that example with conditional predictions
- Elicit prob. with and without extra care
- BUT can only evaluate one
- So can still cheat by giving one option a low probability so it doesn't get taken
- Results
- Solving preformative prediction with multiple agents
- agents have same knowledge