- Haize Labs Startup Spotlight
- AI brittleness
- Current evals are lacking
- Too narrow
- Too slow
- Too expensive
- Too general
- testing things I don't care for my application
- Things to evaluate are mostly in two categories
- Reliability
- does it work for the intended usecase?
- Risks
- do unintended things, produce harmful outputs, etc.
- reputational risks
- What Haize Labs started doing ("haizing")
- Simulate user interactions
- Dynamic - adjusts inputs based on application responses
- generates next batch of inputs to try to exploit corner cases and failures
- then they expanded to other stuff
- Planned EOF2025 trip
- Started taking Vitamin D supplements
- 10000 IU/day in gelcaps for next 2 months