r/slatestarcodex • u/ChiefExecutiveOcelot How The Hell • 2d ago

LessDoom: Response to AI 2027

https://sergey.substack.com/p/lessdoom-ai2027

9 Upvotes

permalink
archive.is
archive
reddit

You are about to leave Redlib

Do you want to continue?

https://www.reddit.com/r/slatestarcodex/comments/1jrgiza/lessdoom_response_to_ai_2027/
No, go back! Yes, take me to Reddit

60% Upvoted

u/Charlie___ 2d ago

On the hierarchy of disagreement, I think saying some people just make "assumptions" when they've written 40 pages with lots of supplemental information and you've written a five-paragraph essay is kinda low.

It's fine to think they're wrong. But if you're gonna interest me, I'd like you to look into why they think what they do a bit more.

u/thomas_m_k 2d ago

In human societies the greatest atrocities are committed when everyone’s political views are aligned and there’s no diversity of goals or visions. It’s likely that the same risk exists with AI. To that end, it would behoove us to build different versions of AIs so no single vision runs away with the future.

The problem is that AIs should be much better at cooperation than humans, so this won't help us. The reason for this is that they can look at each other's source code and make actual binding commitments by modifying themselves. Scott actually wrote about this many years ago:

This is also what I’d expect a war between superintelligences to look like. Superintelligences may have advantages people don’t. For one thing, they might be able to check one another’s source codes to make sure they’re not operating under a decision theory where peaceful resolution of conflicts would incentivize them to start more of them. For another, they could make oracular-grade predictions of the likely results. For a third thing, if superintelligences want to preserve their value functions rather than their physical forms or their empires, there’s a natural compromise where the winner adopts some of the loser’s values in exchange for the loser going down without a fight.

Though that is the best case scenario for superintelligences with legible utility functions – current AIs seem to have pretty illegible internals.

Nevertheless, even messy AIs have cooperation strategies available to them that humans don't have, and I'd expect cooperation to become easier and easier the more intelligent the AIs are – if they're all basically human-level, I'm not really worried about them cooperating too much despite their value differences, but as they get smarter and better able to model each other, they can trade more effectively and aggregate their preferences so effectively that it will make human democratic systems look like jokes in comparison. The coordination mechanisms that we currently have are not anywhere close to a ceiling of possible coordination.

Might these AIs then also bargain with humans and leave us a sliver of the galaxy to do with as we please? Only insofar as we are an actual threat to their rule. And I suspect we won't be a threat at all.

It's quite a bold plan to go "let's create these beings that are smarter than us and hope that they can't coordinate to get rid of us, and let's hope they'll make nice things for us, without our needing to solve any difficult technical problems like how to build artificial minds with human values."

1

u/SoylentRox 1d ago

I think you are litigating an outdated idea not even covered much in ai-2027. AI models don't use source code except at a systems level, but instead use difficult to read network weights that can be modified with every inference. (Current models use fixed weights but this is not required).

Such a dynamic network could have very difficult to model behavior, and anyways AIs will lie. They will send fake versions of their own weights to each other.

LessDoom: Response to AI 2027

You are about to leave Redlib