2025-01-30

Haize Labs Startup Spotlight
- AI brittleness
  - LLMs giving unwanted output in edge cases
  - is Lipschitz discontinuity
- Current evals are lacking
  - Too narrow
    - e.g. regex, exact match
  - Too slow
  - Too expensive
    - e.g. human ranking
  - Too general
    - testing things I don't care for my application
- Things to evaluate are mostly in two categories
  - Reliability
    - does it work for the intended usecase?
  - Risks
    - do unintended things, produce harmful outputs, etc.
    - reputational risks
- What Haize Labs started doing ("haizing")
  - Simulate user interactions
  - Dynamic - adjusts inputs based on application responses
    - generates next batch of inputs to try to exploit corner cases and failures
- then they expanded to other stuff
Planned EOF2025 trip
Started taking Vitamin D supplements
- 10000 IU/day in gelcaps for next 2 months