r/slatestarcodex • u/nexech • 2d ago
AI Most Questionable Details in 'AI 2027' — LessWrong
https://www.lesswrong.com/posts/6Aq2FBZreyjBp6FDt/most-questionable-details-in-ai-2027
27
Upvotes
r/slatestarcodex • u/nexech • 2d ago
12
u/SoylentRox 1d ago
Just a specific criticism of this criticism. Local weight models are fairly easily broken from restrictions especially refusals when the underlying model is capable of performing the desired task: https://huggingface.co/perplexity-ai/r1-1776
Any remotely useful model for helping humans do legitimate engineering or bioscience tasks will be useful in designing bombs, killer drones, and bioweapons, just like current human engineers competent in these fields can do these things, and models like r1 will eagerly help to the best of their abilities if you say you are red teaming and want to produce a demo of the attack.
This 'fine tuning' effectively is a lobotomy of the circuits the model uses to refuse the request, and like any lobotomy, may have unwanted side effects : https://huggingface.co/perplexity-ai/r1-1776/discussions/254