- Toronto AI Safety Meetup
- News
- Confusing Terminology
- Types of confusing terminology
- Jargon, Ambiguity, Non-Concepts, Connotations/Baggage, False Friends, Drift/Appropriation
- Confusing terms
- Responsible Scaling Policy (from audience)
- Agent
- Types
- Anything that can be viewed as perceiving its environment through sensors and acting upon that environment through actuators
- Rational agent that maximizes EV
- Software that uses AI to perform tasks or mask decisions autonomously on behalf of a user/organisation
- Recent thing, since GenAI became a thing
- Each of these types is a subset of the last
- Maybe GenAI agents aren't rational?
- Electrons are agents
- Humans have more agency than electrons
- Not useful though
- Feature
- In ML: number in input vector
- In computer vision: something interesting about an image
- Usually first step in processing
- In interpretability: a vector in one of the hidden layers that corresponds to something interesting or meaningful
- E.g. "golden gate" in Golden Gate Claude
- Artificial Intelligence
- Original term was very broad
- GOFAI (Good Old Fashioned AI)
- Machine Learning
- Alignment
- Meanings
- Originally: advanced AI doing specific tasks without catastrophic side effects
- Now: making AI be nice
- Other meaning: doing what the user wants, not what they ask
- Inner vs outer alignment
- Originally "friendly AI"cla
- Anthropomorphizing stuff
- Goals
- Anthropic's golden gate feature only activates on 2% of mentions of golden gate
- Maybe they're just extracting the feature wrong?
- Reward
- Understanding
- Godel, Escher, Bach book discusses the term understanding
- Deception
- Generally treating LLMs like humans
- FLOP/s (floating point operations per second) and FLOPs (floating point operations)
- Supposed to use slash
- Doesn't work when talking though
- Not always used
- Arms race
- Actual weapons or technology race
- Intelligence
- AI risk
- Pivotal act
- An act that changes the trajectory of AI progress a billion years in the future
- Something that simplifies future decision tree/cone
- ASI could go really good or really bad, we don't know
- Destroying the world is a bad pivotal act, would near-guarantee doom
- Harder to near-guarantee not-doom
- E.g. destroying all GPUs
- Related: Long Reflection
- Governance