Securing AI systems
- public: true
- slides: https://sm4.ca/ai-sec
- outline
- weight leaks
- why we might want to stop this
- ideas
- https://www.lesswrong.com/posts/d396HCvYG7SSqg9Hh/take-scifs-it-s-dangerous-to-go-alone
- https://www.lesswrong.com/posts/rf66R4YsrCHgWx9RG/preventing-model-exfiltration-with-upload-limits
- https://github.com/SAP-archive/ml-model-watermarking?tab=readme-ov-file
- https://www.rand.org/content/dam/rand/pubs/research_reports/RRA2800/RRA2849-1/RAND_RRA2849-1.pdf
- attributing leaks
- watermarking weights of model itself (not outputs)
- easy to bypass
- randomizing values
- randomzing neuron order
- better method
- randomly massively change some random neurons
- a few ones are really important - try to change unimportant (but not completely useless) ones
- randomly massively change some random neurons
- hard to do better than that without changing how model works - which is undesirable (since it will be better/worse for some users)
- how to detect an attacker who can see a few different watermarked copies?
- easy to bypass
- watermarking weights of model itself (not outputs)
- inner datacenter
- detecting training runs
- example of nvidia non-cryptomining GPUs
- in 2021 they made it so RTX 3060 GPUs were 50% nerfed for crypto mining
- done purely for market segmentation reasons
- they had another more expensive model for crypto mining
- defeated pretty easily though
- leaked beta firmware didn't have lockout
- noveau drivers
- hard to do it in silicon
- source: https://news.sophos.com/en-us/2021/02/22/nvidia-announces-official-anti-cryptomining-software-drivers/
- detecting
- example of nvidia non-cryptomining GPUs
- weight leaks