Securing AI systems

public: true
slides: https://sm4.ca/ai-sec
outline
- weight leaks
  - why we might want to stop this
  - ideas
  - attributing leaks
    - watermarking weights of model itself (not outputs)
      - easy to bypass
        
        randomizing values
        
        randomzing neuron order
      - better method
        
        randomly massively change some random neurons
        
        a few ones are really important - try to change unimportant (but not completely useless) ones
      - hard to do better than that without changing how model works - which is undesirable (since it will be better/worse for some users)
      - how to detect an attacker who can see a few different watermarked copies?
  - inner datacenter
- detecting training runs
  - example of nvidia non-cryptomining GPUs
    - in 2021 they made it so RTX 3060 GPUs were 50% nerfed for crypto mining
    - done purely for market segmentation reasons
      - they had another more expensive model for crypto mining
    - defeated pretty easily though
      - leaked beta firmware didn't have lockout
    - noveau drivers
    - hard to do it in silicon
    - source: https://news.sophos.com/en-us/2021/02/22/nvidia-announces-official-anti-cryptomining-software-drivers/
  - detecting