Skip to content

Latest commit

 

History

History
321 lines (165 loc) · 29.6 KB

on_paperclip_problems.md

File metadata and controls

321 lines (165 loc) · 29.6 KB

On Paperclip Problems

Why Complexity Theory Topics Are Important For Alignment

(modified from discussion on Eleuther.AI discord, 2022-12-14)

Summary: because AI systems interact with people and each other in complex ways and it's important to have tools to understand and investigate how large, complex, possibly dynamic and chaotic systems behave.

Complexity Theory provides tools that facilitate studying things like the dynamics of how information flows through a community. If a scenario for an AI risk has dynamics similar to a zombie apocalypse or an epidemic, these are the tools you're going to want to understand what kind of interventions will be effective in containing or intervening. Complexity Theory also contains tools for understanding and studying emergent behaviors of systems, studying how units in a system can collaborate to manifest complex behaviors that may be non obvious from studying the units in isolation.

(But isn't the problem of AI safety mostly about how to make one safe AI? or at least, the effects of powerful AI in the hands of a bunch of humans, and how that transforms society, is a seperate problem from creating just one aligned superintelligence. right? What you're talking about seems related to governance, and policy things.)

what I'm talking about is how you can e.g. implement microsoft word in game of life. If your vision of AI is constrained to an automaton in that system, you miss the emergent behavior of the system which is actually what's doing the interesting thing. if an AGI manifests as some sort of super organism like an ant colony, it's pretty myopic for the alignment community to constrain their concerns to aligning a single ant when it's the behavior of the community en masse that actually has the macro impacts.

(Yes if an AI is more like an ant colony, complexity theory would help, but why would it be like an ant colony? (or some other system that's best described as a bunch of individual creating emergent macro behavior)

because as far as we can tell, that's how human cognition works as well. we're getting into epistemic territory now, but part of the knowledge gap I'm describing here is confusion over what "one safe AI" even means. the language you're using here reflects a folk psychological understanding of "identity" which contemporary research has increasingly identified is an illusion. we should be concerned about an "ant-colony-like" AGI because as far as we can tell, that's how our own mind works as well. the idea that you are a "single agent" is actually more a property of your outward behavior and not the composition of your cognition. it's all an illusion. https://plato.stanford.edu/entries/consciousness-unity/

(I mean you can say, "behavior of cells are nothing like macro behaviors like emotions" but then do you need complexity theory to describe things like that? Like how does complexity theory help explain macro behaviors, that are a result of lots of cells or whatever doing their thing?) new hot take: alignment = domestication not just lots of cells. collectives of what you want to describe as "agents". as a concrete example in the human model, there've been studies done with people whose hemispheres have been surgically severed demonstating that their different hemispheres can exhibit different beliefs. like, one side of your brain believes in god and the other doesn't.

(or at least I can see sometiems that it's an illusion. usually it feels pretty much like I'm one thing)

and that 'feeling' is an emergent property of a bunch of things in a system interacting in complex ways here, i'll let MLST take over the consciousness/emergence conversation: https://www.youtube.com/watch?v=_KVAzAzO5HU

yes but why does subsystems existing, mean that complexity theory is useful? like water's wetness can be called an illusion, but it's still useful to have the concept "wetness" to describe things, as opposed to seeing it as molecules doing their thing

because AGI doesn't have to be a property of a single neural network. it can be an emergent property of how a bunch of different, seemingly isolated things interact with each other, and we are possibly part of that system and these are the tools for characterizing those kinds of behaviors that we otherwise can't see from inside the system. consider maximizing paperclips

what o you mean "and we are possibly part of that system"? like the entire human race + AI systems are 1 giant ant colony?

sort of. "I" am comprised of a lot of cells that I consider "me", but that also includes flora and fauna. bacteria comprise part of the system that is DigThatData, and there are times when I will actively attack the things living in me because I consider them harmful

i assure you, my white blood cells have no idea that DigThatData is chatting on discord right now, despite contributing to that emergent behavior of the system they are a part of

given our understanding of ourselves and how emergent behavior and capabilities can evolve from complex dynamics, whether or not you suspect AGI could manifest as a macro system of this kind, it can't hurt to understand better how they work and how to characterize their behavior

being more concrete, the reason I think the alignment community should be concerned about this sort of thing is specifically because of the alignment community's purported concern for problems that could manifest in the form of "paperclip maximization".

social media algorithms have been assigned to "maximize engagement" and "maximize CTR" etc., and it seems they've determined that the best way to do that is to ideologically polarize humanity, encouraging xenophoboia, stochastic terrorism, and civil war. this is, in my mind, directly analogous to every formulation of the "paperclip maximization" risk parable I've heard. so if that's a legitimate concern of the safety community, then it doesn't matter whether the "agent" is a single thing, or an emergent property of a system behaving a in a way that has no may or may not have intentionality. I find myself repeating this rant here every month or so, so you're definitely on-brand with alignment for being skeptical that this is "relevant" to safety research. But as far as I can tell, the concerns of the safety community aren't something that might manifest in the future, they're something that are already manifesting now but the safety community is unconcerned because it's not a "single AGI" doing it.

"paperclip maximization" is not a precise enough description of problems imo, because it could arise from different kinds of alignment problems. Outer misalignment, where it thinks you want a lot of paperclips, but doesn't understand that turning everyone into paperclips is bad, and Inner misalignment, where it knows that you don't want to be turned into paperclips, but it doesn't care.

what I'm asserting is that the issue isn't that "maximize engagement" is a dumb goal for us to assign to a single AI. The issue is that it's a goal we've separately assigned to google ads, facebook, twitter feed, reddit front page, etc., and the way these different algorithms interact with us is difficult to characterize without considering emergent phenomena from network effects

whenever I talk about this, alignment researchers seem to feel it's out of scope. the "agency" thing is the best explanation I've been able to come up with for why that is, so that's why I'm leaning on that so much

insofar as my cognition could be characterized as an emergent property of a system of "agents" interacting, my assertion here is that there's no reason we shouldn't consider that "AGI" could manifest as twitter+amazon+facebook+... interacting as well, and it would be difficult for us to characterize the global behavior of that "AGI" without consdiering it as a system

This looks more like a governance/policy problem than the problem of aligning a superintelligence.

are you familiar with conway's game of life? would you consider the arrangement of cellular automata a "policy problem"? the automata are analogous to interacting algorithms with their own independent ruhttps://web.archive.org/web/20220610131709/https://www.economist.com/by-invitation/2022/06/09/artificial-neural-networks-today-are-not-conscious-according-to-douglas-hofstadterlesets.

https://www.lesswrong.com/tag/coherent-extrapolated-volition - "we should find a way to program it in a way that it would act in our best interests – what we want it to do and not what we tell it to."


layout my eleuther rant here, maybe invite Dean Pleban from dagshub to coauthor, follow up from our conversation

  • boat turning evil before steering into rocks vs. already on the rocks and not bailing it out

On Trancendentalness of Identity

(modified from discussion on Eleuther.AI discord, 2022-11-22)

there isn't really such a thing as a continuous person, fundamentally, just instant-persons see eg https://carado.moe/existential-selfdet.html

there isn't really such a thing as an "instant person" individually either, we're ensembles - https://plato.stanford.edu/entries/consciousness-unity/

I'm of the opinion that our own "minds" are an emergent property of a collection of agents operating within a confined system. as a consequence, "AGI" can be a property of a system

I think ideological radicalization induced from social media interactions is an example of a real-world "paperclip maxoimizaiton problem" that has already manifested. the harms the alignment community claims to be concerned about are already here. but for some reason stuff that's right in front of them is out of scope of alignment concerns. we're being driven towards war with each other and cooking the planet.

try and think about what alignment might mean in the absence of a single intentional agent.

not getting a set of AIs to kill everything everywhere forever, such as https://www.lesswrong.com/posts/LpM3EAakwYdS6aRKf/what-multipolar-failure-looks-like-and-robust-agent-agnostic ? rather than a single one

how is this different from setting personalization algorithms loose on the world that are training us to hate and kill each other?

the only thing I ever get back from aligbnment people is "we're not worried about those because the algorithm isn't actively trying to kill people. but just imagine if it was!"

also they're not really particularly motivated to kill us intrinsically. they're merely locally motivated to get attention

isn't this the entire point of the "paperclip problem" thing? it doesn't matter what the intention of the system is if the harms are there

i mean fair enough but they're also not capable of killing us or even realizing that that's what would maximize their goal

i don't think we're gonna face extinction via war or famine, and it seems like social media isn't very good at causing famine (there is some war, though there could be much more, and i don't know how much it's social-media-caused)

unless their goal is "get as much attention from the humans as possible" and the way they maximized that was "encourage humans to be as devisive and angry towards each other as possible" and a consequence of that was extinction via war and famine. but for some reason this is fundamentally different from asking an AI to maximize paperclips and turning everyone into paperclips.

it's really kind of offensive to accuse ai risk people of calling existing harms "low" while at the same time trivializing the harms they're concerned about which is total imminent extinction

I'm accusing AI risk people of having less imagination than they think they do. AI risk people have a very constrained view of what "AGI" might be, and seem unconcerned with the behavior of systems of algorithms that might have emergent properties that are hard to describe as agents that interact with them ourselves.

basically I think everyone should read more hofstaedter. ant fugue yo.

those emergent effects don't seem like they're gonna cause extinction as soon and as likely as a powerful AI maximizing for something on purpose

famous last words

i'm focusing on the largest source of existential risk. emergent effects are not intentional agents. that's why they're less dangerous.

except yes they are. that's what I was saying at the beginning of the conversation. I am an emergent effect. there is no single "me" in my brain. this was demonstrated experimentally with split-brain patients.

(uwu: in general emergent effects are much more poweful than any agent. just because any agent is situated within a system)

overall there's a rough notion of you pursuing things. an AI will be better at this. in any case.

and if you asked an individual neuron in my brain what that rough notion was, it wouldn't know what you were even asking it

sure, but i can also ask the entire-you. collections-of-stuff are things too. they can act somewhat coherently

as members of the system, it's really hard for us to understand its emergent properties. there might be something at play that operates similar to what we would consider intention, and which has no specific desire to destroy us because it sees us similar to how we see our own white blood cells. so if all the humans go to war each other because the system wanted to maximize how much attention it was getting from humans, that's fine. all it cares about is attention. it doesn't care whether it set humans to war on each other and to extinction via nuclear holocaust. it maximized its paperclips.

the whole field of agent foundations is about what we can expect to predict about agentic things. even if they're complex things made of many parts. and we do believe some things. like instrumental convergence, goal-content integrity, etc.

you're a somewhat agentic thing. i am too. google (the company) is too. alphazero is too.

uwu: agenticness is a post-hoc label applied to stuff that can be helpful in predicting it's behavior, but I don't think that means that that analysis is useful for trying to understand systems without looking at their situated, emergent effects

facebook -- the community -- is too. 4chan is too.

4chan is uh not very agentic

like I said, you're just not being open minded about what "agentness" might even mean. there doesn't need to be a singular head of the decision making process. you can have a collective of processes that interact in ways where its useful to describe those processes as a collective. so how is 4chan not an agent? I posit any sufficiently large community exhibits properties of "agentness"

okay, let's say it is. do you think 4chan is gonna cause total extinction within a few decades.

uwu: i think 4chan could definitely be a part of the causal chain that leads to extinction in <2 years

yeah, i'll concede that as well. 4chan falls into the class of agents that I'm assigning things like facebook too, the difference is really only the extent to which algorithms are members of these "communities". and 4chan is definitely influenced directly by a variety of algorithms, specifically how their message board is constructed.

Having a bunch of relatively weak systems can certainly cause extreme issues, but they're rarely going to optimize as hard (or in a singular direction) as a basically-single (or centralized, or unfractured, or whatever you want to call it) very capable system. For example: A bunch of companies is optimizing towards ends which are disconnected/goodharted from what we want. That is bad. However, they are not very intelligent and are also internally fractured and thus most can't apply an extreme amount of optimization power in a completely coherent direction. They do apply it, and it certainly isn't random, but it isn't strongly optimizing yet. We should try to solve these issues, but they're not the main problem alignment is concerned with (though there's certainly been LW posts about related topics, so I think you're overstating how much it is ignored). It would be great to handle those, but they're less likely to cause extinction. As well, ending up with a powerful singleton AGI is 'game over'. (Then there's the solution that if we can actually make an aligned AGI, then all the other problems become way easier).

sure, but i don' think that that whole thing is anywhere near as likely to cause extinction as a singular powerful AI

my contention is that alignment people should be more concerned about systems that are comprised of lots of alogirhtms bouncing off each other. why does it have to be "singularly powerful"? Why can't a collective of algorithms exhibit that exact same property and even potentially intention? and if so, how can we be sure. we aren't already being subject to these effects? because it sure quacks like a duck

they can too. that just does't seem like what's likely to happen. how do you think we go extinct in a few years, if not a powerful AI trying to kill everyone

a powerful AI that doesn't have as much direct influence on the world convinces us to kill ourselves through war and famine. which seems to be what's already happening. but alignment people seem to be unwilling to engage with this possibliity. simply because of what "AI" means to them.


global warming is a paperclip knock on effect

this is what paperclip doom looks like: https://www.youtube.com/watch?v=NHf-xSvpF-Y


I don’t know about you guys, but, um, you know, I’ve been thinking recently that… that you know, maybe, um, allowing giant digital media corporations to exploit the neurochemical drama of our children for profit… You know, maybe that was, uh… a bad call by us. Maybe… maybe the… the flattening of the entire subjective human experience into a… lifeless exchange of value that benefits nobody, except for, um, you know, a handful of bug-eyed salamanders in Silicon Valley… Maybe that as a… as a way of life forever… maybe that’s, um, not good. -- Bo Burnham, Inside

I think we'll figure out how to live with... "it" (the internet? social media? a highly connected world?) eventually, but we're going to go through several generations of really fucked up populaces first

I don't use websites with content recommenders

you almost certainly do even if you don't realize it. targeted ads surround us

I've made this rant before, but I'm strongly of the opinion that we're going to experience a lot of the harms the alignment community anticipates well before AGI gets here. the filter bubble thing we're talking about is an example of exactly this.

I visit websites with ads and personalization. I'm just not kidding myself that it's there and affects me in ways I'm not cognizant of. so I try to be cognizant of what I can.

...

nearly every time I bring up climate change in here, someone brushes it off along the lines of "AGI will fix that, don't worry about it.

to do: pull my climate change rants from EAI


I think people taking direction from dumb LLMs will paperclip us before AGI will. can't wait for the "AI" cults. "we were just doing what the supreme intelligence told us to! we definitely didn't accidentally present it with leading prompts that directed it towards death-cult outputs" ... "the AI told me I had to shoot up my school!"


https://web.archive.org/web/20220610131709/https://www.economist.com/by-invitation/2022/06/09/artificial-neural-networks-today-are-not-conscious-according-to-douglas-hofstadter

Hofstaedter: "I would call gpt-3’s answers not just clueless but cluelessly clueless, meaning that gpt-3 has no idea that it has no idea about what it is saying."

mmm..... yes.... teach the people about intentionality

bmk: to which I say "nooooo that's not really conscious because it has no idea that it has no idea about what it's saying" I insist as AI slowly shrink and transform into a paperclip

honestly though, this is exactly why I'm generally sort of annoyed that I never hear alignment researchers talking about more current problematic phenomena and always discussing hazards as if they're things that haven't arrived yet. it doesn't matter if a system isn't formally "AGI". concretely: whenever I've mentioned the divisiveness that was and conitinues to be promoted by social media engagement metrics, alignment researchers always say that sort of thing is "out of scope" of their interests

I feel like you're equivocating between two subtly different points

exactly. thank you for making my point for me.

I think you're conflating the following axes:

  1. there is/isn't a sharp divide between AGI and non-AGI
  2. fixing current problems is/isn't helpful with superintelligence problems I argue that these are actually different things. I simultaneously think there is no sharp divide, and also that fixing current problems doesn't help much with superintelligent problems

you're just passing the buck back on to point (1) but using the word "super-intelligent" instead of AGI

ok lemme be even more precise: 2. fixing problems with LMs saying bad things doesn't really help us not die. you can disagree with this claim but this is a different claim from claim 1

sure, there is no hard threshold beyond which it suddenly starts counting as AGI or superintelligence. but simultaneously you can also argue that any work on a particular level of AI is not very useful for higher levels of AI

explain to me how unexpected, exponentially accelerating, negative consequences arizing from learning objectives like "minimize customer churn" or "maximize engagement" aren't paperclip problems

you're not engaging with my point

you're not engaging with mine. tell me why the ethical consequences of social media are out of scope for alignment researchers this is a fairly concrete question and I've asked it before and been brushed off before.

so I don't know about other people but the thing I ultimately care about is how to not die from AI killing everyone. I don't care what you call this thing. I call it alignment but you can disagree and call something else alignment, I'm not interested in litigating definitions. and of course there is some overlap between the problems I need to solve to make the world not be destroyed and the problems that need to be solved to stop Facebook creating conspiracy theories by optimizing engagement or whatever. the thing is that usually the "not destroying the world" problems are like evolved harder versions of the Facebook problems, and sometimes there are just all new problems. this means that usually solving the hard problems sometimes also implies a solution to the easier problems, but not vice versa, solving one of the easier problems sometimes helps the harder problem and sometimes not really. the reason I don't consider the Facebook problems in scope usually is because they are easier versions of problem I care about and they aren't enough to fix the problem I really care about, and sometimes they aren't even applicable at all. in particular, I expect not destroying the world to be extremely hard in ways that the Facebook problem isn't, and I spend the majority of my time trying to figure out the hard bits. of course, I won't say that progress on those things is bad, and they may even be helpful, but I leave that to someone else who cares a lot about Facebook to fix

so essentially you're saying it isn't out of scope, the entire community is just uninterested because they don't foresee "world ending" consequences in what I'm pointing to.

I think this entire discussion has the vibe of you reading way too much into the definitions of things and I don't like definitional debates

except I'm pretty sure you're drawing a definitional line here. forget "agi" entirely. > ""to which I say "nooooo that's not really conscious because it has no idea that it has no idea about what it's saying" I insist as AI slowly shrink and transform into a paperclip""

I meant like with respect to "out of scope". sure fine it's not "out if scope" it's "uninteresting". what I'm saying is that arguments about what counts as "really conscious"are dumb

the reason I'm so fixated on this is because here's what I see: > > ""to which I say "nooooo that's not really conscious because it has no idea that it has no idea about what it's saying" I insist as AI slowly shrink and transform into a paperclip destabilize society and induce civil wars and block us from addressing world ending problems like climate change""

it's a paperclip problem. calling it "small" is pure speculation. complacency, even. and yes, solutions to "small problems" don't necessarily solve big ones. but if the "small problem" is in the same family, then necessarily a solution to the big one should at least partially address the small one. so even if you don't foresee world ending consequences, it should at minimum be a reasonable testbed for solutions to those problems.

I just keep asking why this specific thing isn't a paperclip problem. the only equivocation I'm making is between a specific hazard scenario you invoked and hazard scenario I see happening around us. I'm absolutely equivocating there

rom: I think it's just a timeline problem. Bmk (and other people) believe AGI will end the world in 10 years. Some other people believe there is no chance of that happening in less than 50 years. That changes priorities a lot. I'm in the "AGI won't end the world in less than 50 years but humans might" camp

same

rom: i understand @DigThatData argument to be about "small alignment is more important than big alignement" and the discussion seems to be revolving around that

@DigThatData ok I'm going to make one more attempt at understanding your point: would you say that the key crux is you want to understand why I'm worried about AGI x-risks of the paperclip variety but not risks from current AI or other non AI things

try to restate that without invoking "AI". you said it yoruself, you don't want to litigate that term

then I don't really know what you're getting at. can you state your crux in a single paragraph. not in a way that's kinda like a rhetorical question that's supposed to serve a point or whatever, just like write out exactly the thing you want to disagree on

me: > ""you say that the key crux is you want to understand why I'm worried about AGI x-risks of the paperclip variety that arise out of interactions with emergent properties of complex decision systems driven by ML algorithms""

remove any notion of "AI" worth litigating and isolate the scenario

you're asking me why I'm worried about x-risks from monolithic dangerous planning systems rather than emergent interactions of ML decision systems with the world?

I guess maybe I don't understand why alignment limits itself to "planning systems". ... you said you didn't want to litigate what "AI" meant, but you were presuming a pretty rigid definition here

I don't mean AI = planning system

you mean "of concern to alignment researchers"

the two statements are different; I made the second statement after trying to guess at what you mean

it feels like the are two separate problems: how do we get Facebook to make their system optimize for happiness to the best of their ability, and how do we make an AGI optimize for happiness (or anything else at all for that matter). and like the former is a "what makes Facebook money" problem and the latter is a really hard technical problem that we need to solve to not die

rom: I think you underestimate the potential of the first problem. It's hard from a technical and incentive point of view and it can bring large amount of value of society, not only in amount of $. I don't disagree with the fact that solving AGI alignment will be necessary when AGI happens


(2022-06-06 - EAI discord)

new hot take: alignment = domestication

come at me


ersatz: I love this kind of paper. hilarious. > ""Climate change is one of the greatest challenges facing humanity - https://arxiv.org/abs/1906.05433"". The first sentence of the abstract and they are already full of shit

uh.... ok then, go fix climate change.

what's the latest estimate for climate change to be an existential catastrophe? 0.1% or something like that?

or maybe, crazy idea: it's possible for multiple threats to exist simultaneously and it's actually a good thing that everyone isn't just working on the same single problem.

lol if 0.1% was "one of the greatest challenges" we have to face, the world would be very different

like, researching alignment is important, but climate change is actually causing huge problems already, and meanwhile my local animal shelter also needs home for their puppies. all of these problems can coexist. or should we just ignore homing the puppies until alignment is solved

emad xposting twitter: if you want an image of the future, it's a swarm of chaotic nanobots spreading across the universe, converting everything into pictures of asuka langley sohryu, forever


... something something, AGI utopia


potential co-authors: rom1504, uwu, dean pleban