2024-09-26

Toronto AI Safety Meetup
- Incentivizing Honesty for Conditional Predictions via Zero-Sum Competition by Rubi
  - Contrib
    - Prev: impossible to use cond. preds from an agent to always take best action
  - Inner alignment: learning objective, outer alignment: selecting alignment right
  - Prediction is the easiest inner alignment problem
    - Not easy, but easier than everything else
  - Issues
    - Observables don't include latent knowledge
    - Anthropic capture
    - Performative prediction
      - prediction effects outcomes
      - self-fulfilling is less common
      - unstable is more common - predicting X causes not X
      - example
        
        patient has 10% to die if they get extra care, 90% otherwise
        
        flagged for extra care if >70% of dying
        
        prediction that maximizes accuracy is 70% (just under threshold), which also maximizes death
      - Can avoid that example with conditional predictions
        
        Elicit prob. with and without extra care
        
        BUT can only evaluate one
        
        So can still cheat by giving one option a low probability so it doesn't get taken
  - Results
    - Solving preformative prediction with multiple agents
      - agents have same knowledge