2024-10-17

Penumbra community call
- https://forum.penumbra.zone/t/october-17-2024-community-call-notes/101
- I asked about FVK publication
Toronto AI Safety Meetup: Epoch AI: Can AI Scaling Continue Through 2030?
- Epoch AI is trying to provide grounded evidence about AI trajectory
- Training compute increasing at 4x per year since 2010
- Probably no constraint to scaling by 2030, but power is right on the line
- Bottlenecks
  - Power constraints
    - Llama 3 405B required ~27 MW
    - Amazon has 960 MW nuclear plant
    - Expected 2030 usage for frontier model (assuming 5000x) is 5.6 GW
      - Assuming 4x more efficent, fp16->fp8 (2x efficiency gain), 3x longer training runs
    - Sourcing lots of power
      - No power plants with enough, new builds take at least 3 years from permit approval
      - Grid power
        
        Northern Virginia is best place for data center power
    - Geographically distributed training runs
      - Very possible
  - Chip production
    - if largest lab gets 20% of H100s, 100M would need to be produced
    - 1.5M-2M H100s projected shipped in 2024
    - Advanced packaging
      - https://en.wikipedia.org/wiki/Advanced_packaging_(semiconductors)
      - Needed for making good GPUs, but super complicated
      - Difficult for TSMC to scale
    - HBM (high-bandwith memory) chips
      - Nvidia's supplier sold out until 2026
    - Silicon wafers not a constraint
      - AI uses only 5% of TSMC 5nm production
    - TSMC worried about scaling up too much, then an AI winter
  - Data scarcity
    - Text tokens
      - Internet has ~500T (100T if you only count pre-compiled, 3000T if you have private data)
      - Largest dataset is 18T
    - Multimodal data
      - 10T video seconds, 1T audio seconds, 10T images
      - Should allow training a 6e28-2e32 FLOP model
    - Chinchilla scaling laws
    - Synthetic data
      - Easier to verify than generate
        
        hard to generate code samples, easy keep ones that compile
      - Problems
        
        Modal collapse
        
        Promising research about how to avoid model collapse
        
        Diminishing returns
      - Lots of uncertainty
  - Latency wall
    - Minimum time to process datapoint proportional to number of layers
    - Least problematic constraint, but inflexible