2024-08-15
- Toronto AI Safety Meetup
- News
- https://www.arena.education/
- hard
- Hermes 3 model
- less censored
- Paper on LLM predicting treatment effect: https://www.treatmenteffect.app/
- SWE-bench verified
- Agentless scaffhold
- office website: https://trajectoryai.org
- https://www.arena.education/
- Emergent capabilities
- Paper about this: https://openreview.net/pdf?id=yzkSU5zdwD
- going from can't do at all -> can do at some size
- artifact of benchmark?
- might be because of measurement problems
- e.g. pass/fail grading of model responses
- might be because of measurement problems
- artifact of benchmark?
- Prompting strategies are emergent too
- Explanations
- Multi-step reasoning requires more layers
- not with chain of though tho
- More parameters for memorization
- Learning to do in-context learning at some scale makes everything better
- once ICL is gets to a certain goodness, the prompting techniques are viable
- Multi-step reasoning requires more layers
- Explanations
- Happens in humans, nature too
- Papers
- real - https://arxiv.org/pdf/2201.11903
- not real - https://arxiv.org/pdf/2304.15004
- actually real - https://arxiv.org/pdf/2403.15796
- observational scaling laws - https://arxiv.org/pdf/2405.10938
- more research needed!
- News