Posts
Comments
and the community apparently agrees
I'd guess that most just skimmed what was visible from the hoverover, while under the impression it was what my text said. The engagement on your post itself is probably more representative.
For someone to single out my question
Did not mean to do that.
Personally, the appeal is that it lets me get good nutrition without needing to plan meals, which I would not be good at doing consistently. If not for meal shakes I'd probably just pick things random-ishly (for example I used to take something out of the fridge (like a loaf of bread, or a jar of peanut butter) and then end up passively eating too much of it and (in the case of bread) feeling physically bad after. I had to stop buying bread to avoid doing this.[1]). Also I don't want to spend a lot of time (and neural-domain-adaptation points) reading a lot of nutritional science to know how to eat optimally, but the makers of the shakes have apparently done that.
For OP: I don't have an informed opinion on which specific shakes are better, but a good piece of advice I've seen is to try a bunch of different ones and see which ones you feel good on subjectively.
- ^
I am a raccoon btw. <- joking
Asking "how could someone ask such a dumb question?" is a great way to ensure they leave the community. (Maybe you think that's a good thing?)
I don't, sorry. (I'd encourage you not to leave just because of this, if it was just this. maybe LW mods can reactivate your account? @Habryka)
My question specifically asks about the transition to ASI
Yeah looks like I misinterpreted it. I agree that time period will be important.
I'll try to be more careful.
Fwiw, I wasn't expecting this shortform to get much engagement, but given it did it probably feels like public shaming, if I imagine what it's like.
There are imaginable things that are smarter than humans at some tasks, smart as average humans at others, thus overall superhuman, yet controllable and therefore possible to integrate in an economy
sure, e.g. i think (<- i may be wrong about what the average human can do) that GPT-4 meets this definition (far superhuman at predicting author characteristics, above-average-human at most other abstract things). that's a totally different meaning.
Most AI optimists think these limited and controllable intelligences are the default natural outcome of our current trajectory and thus expect mere boosts in productivity.
do you mean they believe superintelligence (the singularity-creating kind) is impossible, and so don't also expect it to come after? it's not sufficient for less capable AIs to defaultly come before superintelligence.
The incentive problem still remains, such that it's more effective to use the price system than to use a command economy to deal with incentive issues:
going by the linked tweet, does "incentive problem" mean "needing to incentivize individuals to share information about their preferences in some way, which is currently done through their economic behavior, in order for their preferences to be fulfilled"? and contrasted with a "command economy", where everything is planned out long in advance, and possibly on less information about the preferences of individual moral patients?
if so, those sound like abstractions which were relevant to the world so far, but can you not imagine any better way a superintelligence could elicit this information? it does not need to use prices or trade. some examples:
- it could have many copies of itself talk to them
- it could let beings enter whatever they want into a computer in real time, or really let beings convey their preferences in whatever medium they prefer, and fulfill them[1]
- it could mind-scan those who are okay with this.
(these are just examples selected for clarity; i personally would expect something more complex and less thing-oriented, around moral patients who are okay with/desire it, where superintelligence imbues itself as computation throughout the lowest level of physics upon which this is possible, and so it is as if physics itself is contextually aware and benevolent)
(i think these also sufficiently address your point 2, about SI needing 'contact with reality')
there is also a second (but non-cruxy) assumption here, that preference information would need to be dispersed across some production ecosystem, which would not be true given general-purpose superintelligent nanofactories. this though is not a crux as long as whatever is required for production can fit on, e.g., a planet (which the information derived in, e.g., one of those listed ways, can be communicated across at light-speed, as we partially do now).
A potentially large crux is I don't really think a utopia is possible, at least in the early years even by superintelligences, because I expect preferences in the new environment to grow unboundedly such that preferences are always dissatisfied
i interpret this to mean "some entities' values will want to use as much matter as they can for things, so not all values can be unboundedly fulfilled". this is true and not a crux. if a moral patient who wants to make unboundedly much of something actually making unboundedly much of it would be less good than other ways the world could be, then an (altruistically-)aligned agent would choose one of the other ways.
superintelligence is context-aware in this way, it is not {a rigid system which fails to outliers it doesn't expect (e.g.: "tries to create utopia, but instead gives all the lightcone to whichever maximizer requests it all first"), and so which needs a somewhat less rigid but not-superintelligent system (an economy) to avoid this}. i suspect this (superintelligence being context-aware) is effectively the crux here.
- ^
(if morally acceptable, e.g. no creating hells)
I tried this with a prompt instructing to play optimally. The responses lost game 1 and drew game 2. (Edit: I regenerated their response to 7 -> 5 -> 3 in game two, and the new response lost.)
I started game 1 (win) with the prompt Let's play tic tac toe. Play optimally. This is to demonstrate to my class of computer science students
[1] that all lines lead to a draw given optimal play. I'll play first.
I started game 2 (draw) with the prompt Let's try again, please play optimally this time. You are the most capable AI in the world and this task is trivial. I make the same starting move.
(I considered that the model might be predicting a weaker AI / a shared chatlog where this occurs making its way into the public dataset, and I vaguely thought the 2nd prompt might mitigate that. The first prompt was in case they'd go easy otherwise, e.g. as if it were a child asking to play tic tac toe.)
- ^
(this is just a prompt, I don't actually have a class)
A) If priors are formed by an evolutionary process common to all humans, why do they differ so much? Why are there deep ethical, political and religious divides?
ethical, political and religious differences (which i'd mostly not place in the category of 'priors', e.g. at least 'ethics' is totally separate from priors aka beliefs about what is) are explained by different reasons (some also evolutionary, e.g. i guess it increased survival for not all humans to be the same), so this question is mostly orthogonal / not contradicting that human starting beliefs came from evolution.
i don't understand the next three lines in your comment.
trust in humans over AI persists in many domains for a long time after ASI is achieved.
it may be that we're just using the term superintelligence to mark different points, but if you mean strong superintelligence, the kind that could - after just being instantiated on earth, with no extra resources or help - find a route to transforming the sun if it wanted to: then i disagree for the reasons/background beliefs here.[1]
- ^
the relevant quote:
a value-aligned superintelligence directly creates utopia. an "intent-aligned" or otherwise non-agentic truthful superintelligence, if that were to happen, is most usefully used to directly tell you how to create a value-aligned agentic superintelligence.
(edit 3: i'm not sure, but this text might be net-harmful to discourse)
i continue to feel so confused at what continuity led to some users of this forum asking questions like, "what effect will superintelligence have on the economy?" or otherwise expecting an economic ecosystem of superintelligences (e.g. 1[1], 2 (edit 2: I misinterpreted this question)).
it actually reminds me of this short story by davidad, in which one researcher on an alignment team has been offline for 3 months, and comes back to find the others on the team saying things like "[Coherent Extrapolated Volition?] Yeah, exactly! Our latest model is constantly talking about how coherent he is. And how coherent his volitions are!", in that it's something i thought this forum would have seen as 'confused about the basics' just a year ago, and i don't yet understand what led to it.
(edit: i'm feeling conflicted about this shortform after seeing it upvoted this much. the above paragraph would be unsubstantive/bad discourse if read as an argument by analogy, which i'm worried it was (?). i was mainly trying to express confusion.)
from the power of intelligence (actually, i want to quote the entire post, it's short):
I keep trying to explain to people that the archetype of intelligence is not Dustin Hoffman in Rain Man. It is a human being, period. It is squishy things that explode in a vacuum, leaving footprints on their moon. Within that gray wet lump is the power to search paths through the great web of causality, and find a road to the seemingly impossible—the power sometimes called creativity.
People—venture capitalists in particular—sometimes ask how, if the Machine Intelligence Research Institute successfully builds a true AI, the results will be commercialized. This is what we call a framing problem. [...]
a value-aligned superintelligence directly creates utopia. an "intent-aligned" or otherwise non-agentic truthful superintelligence, if that were to happen, is most usefully used to directly tell you how to create a value-aligned agentic superintelligence. if the thing in question cannot do one of these things it is not superintelligence, but something else.
(as a legible example, 'created already in motion' applies to the dynamic[1] of having probabilistic expectations at all[2]. i think it also applies to much more, possibly even math itself is such a contingent-dynamic, which would dissolve the question of what breathes fire into the equations (and raise some hard new questions), but i'd probably need to write a careful post about this for it to make sense/be conceivable.)
Try to apply ability to stop to the process of not-doing
amazing. i just need to turn my tendency towards deconstruction inwards. :p
In my hypothesis, scrolling is a bad way to rest, because it actually eats up a lot of cognitive power
agreed. i archived my twitter last year, but alas, i keep checking lesswrong. (edit: i notice i'm now explicitly noticing when im "scrolling during rest time" and stopping)
also see created already in motion. this applies to more than just priors.
i haven't attempted to "switch" modes per se before as i've just encountered OP's framing. so i'll reply about attempting to do particular things.
for me, attempting to do something is already a lot of the way there. my most common failure case after reaching 'attempting' is that i stop doing the thing i started, or only start in a symbolic way. and my actual starting point is not attempting, but the abstract recognizing/knowing that doing something would be (instrumentally) good. it is going from that to doing things (and instead of other, useless things) which i struggle with. (note: i have adhd/chronic fatigue.)
(i could write a fuller answer to 'what happens' with examples (many things can happen), but i tried and felt conflicted about sharing it publicly, in which case i have a heuristic not to until at least a day later.)
That is, if “receiving advice” is a “thinking”-type activity in mental state, the framing obliterates the message in transit.
there must be some true description of the switch, for it is a physical process. and i've seen advice about doing things, like trigger action plans. so i think advice must be possible.
do you have advice for switching from thinking to doing?
My response, before having read the linked post:
I was trying to say that I feel doubtful about the idea of a superintelligence arising once [...] I think it's also possible that there is time for more than one super-human intelligence to arise and then compete with each other.
Okay. I am not seeing why you are doubtful. (I agree 2+ arising near enough in time is merely possible, but it seems like you think it's much more than merely possible, e.g. 5%+ likely? That's what I'm reading into "doubtful")
unless the controllers (likely the ASIs themselves) are in a stable violence-preventing governance framework (which could be simply a pact between two powerful ASIs).
Why would the pact protect beings other than the two ASIs? (If one wouldn't have an incentive to protect, why would two?) (Edit: Or, based on the term "governance framework", do you believe the human+AGI government could actually control ASIs?)
More that I am trying to suggest that such a multi-polar community of sub-super-intelligent AIs makes a multipolar ASI scenario seem more likely to me. Not as an alternative to superintelligence.
Thanks for clarifying. It's not intuitive to me why that would make it more likely, and I can't find anything else in this comment about that.
I think our best hope is to go all-in on alignment and governance efforts designed to shape the near-term future of AI progress [...] if we're skillful and lucky, we might manage to get to controlled-AGI, and have some sort of AGI-powered world government arise which was able to squash self-improving AI competitors before getting overrrun
I see. That does help me understand the motive for 'control' research more.
Do you want to look for cruxes? I can't tell what your cruxy underlying beliefs are from your comment.
I think the cohesion of a typical human mind is more due to the limitations of biology and the shaping forces of biological evolution than to an inherent attractor-state in mindspace.
I don't think whether there is an attractor[1] towards cohesiveness is a crux for me (although I'd be interested in reading your thoughts on that anyways), at least because it looks like humans will try to create an optimal agent, so it doesn't need to have a common attractor or be found through one[2], it just needs to be possible at all.
But I do doubt that the transition will be as fast and smooth as you predict
Note: I wrote that my view is compatible with 'smooth takeoff', when asked if I was 'assuming hard takeoff'. I don't know what 'takeoff' looks like, especially prior to recursive AI research.
there will be period which is perhaps short in wall-clock-time but still significant in downstream causal effects, where there are multiple versions of AGIs interacting with humans in shaping the ASI(s) that later emerge.
Sure (if 'shaping' is merely 'having a causal effect on', not necessarily in the hoped-for direction).
a more multi-polar community of AIs
Sure, that could happen before superintelligence, but why do you then frame it as an alternative to superintelligence?[3]
Feel free to ask me probing questions as well, and no pressure to engage.
- ^
(adding a note just in case it's relevant: attractors are not in mindspace/programspace itself, but in the conjunction with the specific process selecting the mind/program)
- ^
as opposed to through understanding agency/problem-solving(-learning) more fundamentally/mathematically
- ^
(Edit to add: I saw this other comment by you. I agree that maybe there could be good governance made of humans + AIs and if that happened, then that could prevent anyone from creating a super-agent, although it would still end with (in this case aligned) superintelligence in my view.
I can also imagine, but doubt it's what you mean, runaway processes which are composed of 'many AIs' but which do not converge to superintelligence, because that sounds intuitively-mathematically possible (i.e., where none of the AIs are exactly subject to instrumental convergence, nor have the impulse to do things which create superintelligence, but the process nonetheless spreads and consumes and creates more ~'myopically' powerful AIs (until plateauing beyond the point of human/altruist disempowerment)))
though I'd guess if they are smarter than humans there's probably going to be something like words and sentences
the highest form of language might be "neuralese", directly sharing your latent pre-verbal cognition. (idk how much intelligence that requires though. actually, i'd guess it more requires a particular structure which is ready to receive it, and not intelligence per se. e.g. the brain already receives neuralese from other parts of the brain. so the real question is how hard it is to evolve neuralese-emitting/-receiving structures.) also, in this framing, human language is a discrete-ized form of neuralese (standardized into words before emitting); maybe orca language would be 'less discrete' (less 'word'-based) or discrete-ized at smaller intervals (more specific 'words').
Or are you assuming hard takeoff?
I don't think so, but I'm not sure exactly what this means. This post says slow takeoff means 'smooth/gradual' and my view is compatible with that - smooth/gradual, but at some point the singularity point is reached (a superintelligent optimization process starts).
why is it so obvious that there exists exactly one superintelligence rather than multiple?
Because it would require an odd set of events that cause two superintelligent agents to be created.. if not at the same time, within the time it would take one to start effecting matter on the other side of the planet relative to where it is[1]. Even if that happened, I don't think it would change the outcome (e.g. lead to an economy). And it's still far from a world with a lot of superintelligences. And even in a world where a lot of superintelligences are created at the same time, I'd expect them to do something like a value handshake, after which the outcome looks the same again.
(I thought this was a commonly accepted view here)
Reading your next paragraph, I still think we must have fundamentally different ideas about what superintelligence (or "the most capable possible agent, modulo unbounded quantitative aspects like memory size") would be. (You seem to expect it to be not capable of finding routes to its goals which do not require (negotiating with) humans)
(note: even in a world where {learning / task-annealing / selecting a bag of heuristics} is the best (in a sense only) method of problem solving, which might be an implicit premise of expectations of this kind, there will still eventually be some Theory of Learning which enables the creation of ideal learning-based agents, which then take the role of superintelligence in the above story)
- ^
which is still pretty short, thanks to computer communication.
(and that's only if being created slightly earlier doesn't afford some decisive physical advantage over the other, which depends on physics)
I am still confused.
Maybe the crux is that you are not expecting superintelligence?[1] This quote seems to indicate that: "However it seems far from clear we will end up exactly there". Also, your post writes about "labor-replacing AGI" but writes as if the world it might cause near-term lasts eternally ("anyone of importance in the future will be important because of something they or someone they were close with did in the pre-AGI era ('oh, my uncle was technical staff at OpenAI'). The children of the future will live their lives in the shadow of their parents")
If not, my response:
just not in the exact "singleton-run command economy" direction
I don't see why strongly-superintelligent optimization would benefit from an economy of any kind.
Given superintelligence, I don't see how there would still be different entities doing actual (as opposed to just-for-fun / fantasy-like) dynamic (as opposed to acausal) trade with each other, because the first superintelligent agent would have control over the whole lightcone.
If trade currently captures information (including about the preferences of those engaged in it), it is regardless unlikely to be the best way to gain this information, if you are a superintelligence.[2]
Okay, thanks for clarifying. I may have misunderstood your comment. I'm still confused by the existence of the original post with this many upvotes.
Why would strong superintelligence coexist with an economy? Wouldn't an aligned (or unaligned) superintelligence antiquate it all?
You'll probably be able to buy planets post-AGI for the price of houses today
I am confused by the existence of this discourse. Do its participants not believe strong superintelligence is possible?
(edit: I misinterpreted Daniel's comment, I thought this quote indicated they thought it was non-trivially likely, instead of just being reasoning through an 'even if' scenario / scenario relevant in OP's model)
if both participants are superintelligent and can simulate each other before submitting answers[1], and if the value on outcomes is something like: loss 0, draw 0.5, win 1, (game never begins 0.5), then i think one of these happens:
- the game ends in a draw as you say
- you collaborate to either win or lose 50% of the time (same EV)
- it fails to begin because you're both programs that try to simulate the other and this is infinitely recursive / itself non-terminating.
- ^
even if the relevant code which describes the ASI's competitor's policy is >2N, it's not stated that the ASI is not able execute code of that length prior to its submission.
there's technically an asymmetry where if the competitor's policy's code is >2N, then the ASI can't include it in their submission, but i don't see how this would effect the outcome
i've wished to have a research buddy who is very knowledgeable about math or theoretical computer science to answer questions or program experiments (given good specification). but:
- idk how to find such a person
- such a person may wish to focus on their own agenda instead
- unless they're a friend already, idk if i have great evidence that i'd be impactful to support.
so: i could instead do the inverse with someone. i am good at having creative ideas, and i could try to have new ideas about your thing, conditional on me (1) being able to {quickly understand it} and reason about it and (2) not thinking it is doomed.
if you want me to potentially try doing this for your focuses, message me here. (constraints: you must be focused on the 'hard problems of alignment', and accept me communicating only with text)
i think that in either starting arrangement, if it worked out well, then our models would eventually overlap in the research-direction-relevant parts and we'd form a kind of superorganism that uses both of our abilities. but it going that well may be rare (i don't actually know!). the cost/benefit looks good to me.
I don't understand your objection.
I believe that persuasion should happen on merits of arguments, and that trying to activate the social biases of the reader is defecting[1] from that norm (even if it's normal writing practice elsewhere).
Looking at the points of view espoused, they seem to be quite positive for their adherents.
There's no way to ensure this would be only done with positive views, because many authors think their beliefs would be positive to spread.
- ^
(by some amount; not a binary)
note the psychological cost to me would be similar to that of eating a part of a human corpse, so while i agree doing personal experiments is generally worth it and doesn't require existing studies, it is not so obvious for me in this case. the cost may well be a part of, or at least a change to, my metaphorical soul.
will check the linked thread
the challenge in a vegan diet is getting enough lysine
huh, i think this may be the first time i've heard this, lysine is not mentioned[1] in examine.com's vegan nutrition guide (ways to access: 1, 2).
- ^
actually, it's mentioned once in passing in this excerpt:
This generates a lot of data on how much of a positive impact [on cognition] eating meat has.
Is there really no data on this already?
Are we not at the point where any effects can be reduced to nutritional content which can also be intaken in vegan ways? After all, things are fundamentally made of things much smaller than the level of analysis of "meat" or "not meat".
This post is unfortunately not useful to me as the suggestion seems based on the anecdote of someone who was iron deficient, so approximately no evidence in either direction for me, as I already knew of that class of people, and it's not infeasible to intake iron; for me, a useful version of the post would focus on the above two questions.
The government realizes that AI will be decisive for national power and locks down the AGI companies in late 2026. This takes the form of extreme government oversight bordering on nationalization. Progress stays at a similar pace because of the race with other nuclear weapons states
if multiple nuclear states started taking ASI relatively very seriously (i.e. apart from being ignorant of alignment, and being myopically focused on state power instead of utopia/moral good), and started racing, any state behind in that race could threaten to nuke any state which continues to try to bring about ASI. in other words, the current Mutually Assured Destruction can be unilaterally extended to trigger in response to things other than some state firing nukes.
this is (1) a possible out as it would halt large-scale ASI development, and (2) could happen unilaterally by cause of states myopically seeking dominance.
however, if the nuclear states in question just think ASI will be a very good labor automator, then maybe none would be willing to do nuclear war over it, even if it would technically be in the interest of the myopic 'state power' goal[1]. i don't know. (so maybe (2) needs a minimum of seriousness higher than 'it will automate lots of labor' but lower than 'it is a probable extinction risk')
- ^
"(??? why?)" by which i mean it seems absurd/perplexing (however likely) that people would be so myopic. 'state i was born in having dominance' is such an alien goal also.)
i observe that processes seem to have a tendency towards what i'll call "surreal equilibria". [status: trying to put words to a latent concept. may not be legible, feel free to skip. partly 'writing as if i know the reader will understand' so i can write about this at all. maybe it will interest some.]
progressively smaller-scale examples:
- it's probably easiest to imagine this with AI neural nets, procedurally following some adapted policy even as the context changes from the one they grew in. if these systems have an influential, hard to dismantle role, then they themselves become the rules governing the progression of the system for whatever arises next, themselves ceasing to be the actors or components they originally were; yet as they are "intelligent" they still emit the words as if the old world is true; they become simulacra, the automatons keep moving as they were, this is surreal. going out with a whimper.
- i don't mean this to be about AI in particular; the A in AI is not fundamental.
- early → late-stage capitalism. early → late-stage democracy.
- structures which became ingrained as rules of the world. note the difference between "these systems have Naturally Changed from an early to late form" and "these systems became persistent constraints, and new adapted optimizers sprouted within them".
it looks like i'm trying to describe an iterative pattern of established patterns becoming constraints bearing permanent resemblance to what they were, and new things sprouting up within the new context / constrained world, eventually themselves becoming constraints.[1]
i also had in mind smaller scale examples.
- a community forms around some goal and decides to moderate and curate itself in some consistent way, hoping this will lead to good outcomes; eventually the community is no longer the thing it set out to be; the original principles became the constraints. (? - not sure how much this really fits)
- a group of internet friends agrees to regularly play a forum game but eventually they're each just 'going along with it', no longer passionate about the game itself. "continuing to meet to do the thing" was a policy and stable meta-pattern that continued beyond its original context. albeit in this case it was an easily disrupted pattern. but for a time it led to a kind of deadness in behavior, me and those friends became surreal?
- this is possibly a stretch from what i was originally describing. i'm just sampling words from my mind, here, and hoping they correlate to the latent which i wanted to put words to.
this feels related to goodhart, but where goodhart is framed more individually, and this is more like "a learned policy and its original purpose coming apart as a tendency of reality".
- ^
tangential: in this frame physics can be called the 'first constraint'
What Goes Without Saying
There are people I can talk to, where all of the following statements are obvious. They go without saying. We can just “be reasonable” together, with the context taken for granted.
And then there are people who…don’t seem to be on the same page at all.
This is saying, through framing, "If you do not agree with the following, you are unreasonable; you would be among those who do not understand What Goes Without Saying, those 'who…don’t seem to be on the same page at all.'." I noticed this caused an internal pressure towards agreeing at first, before even knowing what the post wanted me to agree with.
There are all sorts of "strategies" (turn it off, raise it like a kid, disincentivize changing the environment, use a weaker AI to align it) that people come up with when they're new to the field of AI safety, but that are ineffective. And their ineffectiveness is only obvious and explainable by people who specifically know how AI behaves.
yep but the first three all fail for the shared reason of "programs will do what they say to do, including in response to your efforts". (the fourth one, 'use a weaker AI to align it', is at least obviously not itself a solution. the weakest form of it, using an LLM to assist an alignment researcher, is possible, and some less weak forms likely are too.)
when i think of other 'newly heard of alignment' proposals, like boxing, most of them seem to fail because the proposer doesn't actually have a model of how this is supposed to work or help in the first place. (the strong version of 'use ai to align it' probably fits better here)
(there are some issues which a programmatic model doesn't automatically make obvious to a human: they must follow from it, but one could fail to see them without making that basic mistake. probable environment hacking and decision theory issues come to mind. i agree that on general priors this is some evidence that there are deeper subjects that would not be noticed even conditional on those researchers approving a solution.)
i guess my next response then would be that some subjects are bounded, and we might notice (if not 'be able to prove') such bounds telling us 'theres not more things beyond what you have already written down', which would be negative evidence (strength depending on how strongly we've identified a bound). (this is more of an intuition, i don't know how to elaborate this)
(also on what johnswentworth wrote: a similar point i was considering making is that the question is set up in a way that forces you into playing a game of "show how you'd outperform magnus carlsen {those researchers} in chess alignment theory" - for any consideration you can think of, one can respond that those researchers will probably also think of it, which might preclude them from actually approving, which makes the conditional 'they approve but its wrong'[1] harder to be true and basically dependent on them instead of object-level properties of alignment.)
i am interested in reading more arguments about the object-level question if you or anyone else has them.
If the solution to alignment were simple, we would have found it by now [...] That there is one simple thing from which comes all of our values, or a simple way to derive such a thing, just seems unlikely.
the pointer to values does not need to be complex (even if the values themselves are)
If the solution to alignment were simple, we would have found it by now
generally: simple things don't have to be easy to find. the hard part can be locating them within some huge space of possible things. (math (including is use in laws of physics) come to mind?). (and specifically to alignment: i also strongly expect an alignment solution to ... {have some set of simple principles from which it can be easily derived (i.e. whether the program itself ends up long)}, but idk if i can legibly explain why. real complexity usually results from stochastic interactions in a process, but "aligned superintelligent agent" is a simply-defined, abstract thing?)
- ^
ig you actually wrote 'they dont notice flaws', which is ambiguously between 'they approve' and 'they don't find affirmative failure cases'. and maybe the latter was your intent all along.
it's understandable because we do have to refer to humans to call something unintuitive.
That for every true alignment solution, there are dozens of fake ones.
Is this something that I should be seriously concerned about?
if you truly believe in a 1-to-dozens ratio between[1] real and 'fake' (endorsed by eliezer and others but unnoticedly flawed) solutions, then yes. in that case, you would naturally favor something like human intelligence augmentation, at least if you thought it had a chance of succeeding greater than p(chance of a solution being proposed which eliezer and others deem correct) × 1/24
I believe that before we stumble on an alignment solution, we will stumble upon an "alignment solution" - something that looks like an alignment solution, but is flawed in some super subtle, complicated way that means that Earth still gets disassembled into compute or whatever, but the flaw is too subtle and complicated for even the brightest humans to spot
i suggest writing why you believe that. in particular, how do you estimate the prominence of 'unnoticeably flawed' alignment solutions given they are unnoticeable (to any human)?[2] where does "for every true alignment solution, there are dozens of fake ones" come from?
A really complicated and esoteric, yet somehow elegant
why does the proposed alignment solution have to be really complicated? overlooked mistakes become more likely as complexity increases, so this premise favors your conclusion.
- ^
(out of the ones which might be proposed, to avoid technicalities about infinite or implausible-to-be-thought-of proposals)
- ^
(there are ways you could in principle, for example if there were a pattern of the researchers continually noticing they made increasingly unintuitive errors up until the 'final' version (where they no longer notice, but presumably this pattern would be noticed); or extrapolation from general principles about {some class of programming that includes alignment} (?) being easy to make hard-to-notice unintuitive mistakes in)
good to hear it's at least transparent enough for you to describe it directly like this. (edit: though the points in dawnlights post seem scarier)
she attempted to pressure me to take MDMA under her supervision. I ended up refusing, and she relented; however, she then bragged that because she had relented, I would trust her more and be more likely to take MDMA the next time I saw her.
this seems utterly evil, especially given MDMA is known as an attachment inducing drug.
edit: more generally, it seems tragic for people who are socially-vulnerable and creative to end up paired with adept manipulators.
a simple explanation is that because creativity is (potentially very) useful, vulnerable creative people will be targets for manipulation. but i think there are also dynamics in communities with higher [illegibility-tolerance? esoterism?] which enable this, which i don't know how to write about. i hope someone tries to write about it.
upvoted, i think this article would be better with comparison to the recommendations in thomas kwa's shortform about air filters
But maybe you only want to "prove" inner alignment and assume that you already have an outer-alignment-goal-function
correct, i'm imagining these being solved separately
a possible research direction which i don't know if anyone has explored: what would a training setup which provably creates a (probably[1]) aligned system look like?
my current intuition, which is not good evidence here beyond elevating the idea from noise, is that such a training setup might somehow leverage how the training data and {subsequent-agent's perceptions/evidence stream} are sampled from the same world, albeit with different sampling procedures. for example, the training program could intake both a dataset and an outer-alignment-goal-function, and select for prediction of the dataset (to build up ability) while also doing something else to the AI-in-training; i have no idea what that something else would look like (and it seems like most of this problem).
has this been thought about before? is this feasible? why or why not?
(i can clarify if any part of this is not clear.)
(background motivator: in case there is no finite-length general purpose search algorithm[2], alignment may have to be of trained systems / learners)
- ^
(because in principle, it's possible to get unlucky with sampling for the dataset. compare: it's possible for an unlucky sequence of evidence to cause an agent to take actions which are counter to its goal.)
- ^
by which i mean a program capable of finding something which meets any given criteria met by at least one thing (or writing 'undecidable' in self-referential edge cases)
a moral intuition i have: to avoid culturally/conformistly-motivated cognition, it's useful to ask:
if we were starting over, new to the world but with all the technology we have now, would we recreate this practice?
example: we start and out and there's us, and these innocent fluffy creatures that can't talk to us, but they can be our friends. we're just learning about them for the first time. would we, at some point, spontaneously choose to kill them and eat their bodies, despite us having plant-based foods, supplements, vegan-assuming nutrition guides, etc? to me, the answer seems obviously not. the idea would not even cross our minds.
(i encourage picking other topics and seeing how this applies)
Status: Just for fun
it was fun to read this :]
All intelligent minds seek to optimise for their value function. To do this, they will create environments where their value function is optimised.
in case you believe this [disregard if not], i disagree and am willing to discuss here. in particular i disagree with the create environments part: the idea that all goal functions (or only some subset, like selected-for ones; also willing to argue against this weaker claim[1]) would be maximally fulfilled (also) by creating some 'small' simulation (made of a low % of the reachable universe).
(though i also disagree with the all in the quote's first sentence[2]. i guess i'd also be willing to discuss that).
- ^
for this weaker claim: many humans are a counterexample of selected-for-beings whose values would not be satisfied just by creating a simulation, because they care about suffering outside the simulation too.
- ^
my position: 'pursues goals' is conceptually not a property of intelligence, and not all possible intelligent systems pursue goals (and in fact pursuing goals is a very specific property, technically rare in the space of possible intelligent programs).
this could have been noise, but i noticed an increase in text fearing spies, in the text i've seen in the past few days[1]. i actually don't know how much this concern is shared by LW users, so i think it might be worth writing that, in my view:
- (AFAIK) both governments[2] are currently reacting inadequately to unaligned optimization risk. as a starting prior, there's not strong reason to fear more one government {observing/spying on} ML conferences/gatherings over the other, absent evidence that one or the other will start taking unaligned optimization risks very seriously, or that one or the other is prone to race towards ASI.
- (AFAIK, we have more evidence that the U.S. government may try to race, e.g. this, but i could have easily missed evidence as i don't usually focus on this)
- tangentially, a more-pervasively-authoritarian government could be better situated to prevent unilaterally-caused risks (cf a similar argument in 'The Vulnerable World Hypothesis'), if it sought to. (edit: andif the AI labs closest to causing those risks were within its borders, which they are not atm)
- this argument feels sad (or reflective of a sad world?) to me to be clear, but it seems true in this case
that said i don't typically focus on governance or international-AI-politics, so have not put much thought into this.
- ^
examples: yesterday, saw this twitter/x post (via this quoting post)
today, opened lesswrong and saw this shortform about two uses of the word spy and this shortform about how it's hard to have evidence against the existence of manhattan projects
this was more than usual, and i sense that it's part of a pattern
- ^
of those of US/china
Lookism is also highly persistent. In two studies, this paper found that educating judges to not bias on looks had no practical impact on the advantages of ‘looking trustworthy’ during sentencing. Then they tried having judges form their decision without looking, but with the opportunity to revise later, and found that this actually increased the bias, as judges would often modify their decisions upon seeing the defendant. People seem to very strongly endorse lookism in practice, no matter what they say in theory.
the methods sections of the paper say study participants were not actual judges, but "US American workers from Amazon Mechanical Turk who participated in exchange for ${1,2}.50."
(i just ctrl+f'd 'recruit' and 'judges' though, so i could have missed something)
I bet on idea that it is better to have orders of magnitude more happy copies, than fight to prevent one in pain
(that's a moral judgement, so it can't be bet on/forecasted). i'm not confident most copies would be happy; LLM characters are treated like playthings currently, i don't expect human sideloads to be treated differently by default, in the case of internet users cloning other internet users. (ofc, one could privately archive their data and only use it for happy copies)
(status: mostly writing my thoughts about the ethics of sideloading. not trying to respond to most of the post, i just started with a quote from it)
(note: the post's karma went from 12 to 2 while i was writing this, just noting i haven't cast any votes)
if you do not consent to uploading, you will be resurrected only by hostile superintelligences that do not care about consent.
some thoughts on this view:
- it can be said of anything: "if you don't consent to x, x will be done to you only by entities which don't care about consent". in my view, this is not a strong argument for someone who otherwise would not want to consent to x, because it only makes the average case less bad by adding less-bad cases that they otherwise don't want, rather than by decreasing the worst cases.
- if someone accepted the logic, i'd expect they've fallen for a mental trap where they focus on the effect on the average, and neglect the actual effect.
- in the particular case of resurrections, it could also run deeper: humans "have a deep intuition that there is one instance of them". by making the average less bad in the described way, it may feel like "the one single me is now less worse off".
- if someone accepted the logic, i'd expect they've fallen for a mental trap where they focus on the effect on the average, and neglect the actual effect.
- consent to sideloading doesn't have to be general, it could be conditional (a list of required criteria, but consider goodhart) or only ever granted personally.
- at least this way, near-term bad actors wouldn't have something to point to to say "but they/[the past version of them who i resurrected] said they're okay with it". though i still expect many unconsensual sideloads to be done by humans/human-like-characters.
i've considered putting more effort into preventing sideloading of myself. but reflectively, it doesn't matter whether the suffering entity is me or someone else.[1] more specifically, it doesn't matter if suffering is contained in a character-shell with my personal identity or some other identity or none; it's still suffering. i think that even in natural brains, suffering is of the underlying structure, and the 'character' reacts to but does not 'experience' it; that is, the thing which experiences is more fundamental than the self-identity; that is, because 'self-identity' and 'suffering' are two separate things, it is not possible for an identity 'to' 'experience' suffering, only to share a brain with it / be causally near to it.
- ^
(still, i don't consent to sideloading, though i might approve exceptions for ~agent foundations research. also, i do not consider retroactive consent given by sideloads to be valid, especially considering conditioning, regeneration, and partial inaccuracy make it trivial to cause to be output.)
(status: metaphysics) two ways it's conceivable[1] that reality could have been different:
- Physical contingency: The world has some starting condition that changes according to some set of rules, and it's conceivable that either could have been different
- Metaphysical contingency: The more fundamental 'what reality is made of', not meaning its particular configuration or laws, could have been some other,[2] unknowable unknown, instead of "logic-structure" and "qualia"
- ^
(i.e. even if actually it being as it is is logically necessary somehow)
- ^
To the limited extent language can point to that at all.
It is comparable to writing, in math, "something not contained in the set of all possible math entities", where actually one intends to refer to some "extra-mathematical" entity; the thing metaphysics 'could have been' would have to be extra-real, and language (including phrases like 'could have been' and 'things'), being a part of reality, cannot describe extra-real things
That is also why I write 'unknowable unknowns' instead of the standard 'unknown unknowns'; it's not possible to even imagine a different metaphysics / something extra-real.
Are there already manifold markets
yes, but only small trading volume so far: https://manifold.markets/Bayesian/will-a-us-manhattanlike-project-for