Posts
Comments
I still expect the Singularity somewhere in the 2030s, even under that model.
Have you written up your model of AI timelines anywhere?
In what sense is Georgism "leftish"?
Fixed skull size
Artificial wombs may remove this bottleneck.
I'm suspicious of having made a mistake because LLaMa outputs similar tokens in sequence, e.g. the cyrillic tokens in succession, or repeating "partiellement". Overall the text looks too coherent (?), not enough weird unicode symbols and encoding errors.
A trajectory produced by sampling the least likely tokens almost certainly is not the least likely trajectory and your experiment may suggest it's not among the least likely trajectories either.
At the top of this post
Some of this essay is a revised/shortened/reconsidered version of some of the content from these notes on the topic, which I posted on LessWrong and the EA forum -- though not on my website or substack -- last fall.)
Is there any research on how the actual impact of [the kind of AI that we currently have] lives up to the expectations from the time [shortly before we had that kind of AI but close enough that we could clearly see it coming]?
This is vague but not unreasonable periods for the second time would be:
- After OA Copilot, before ChatGPT (so summer-autumn 2022).
- After PaLM, before Copilot.
- After GPT-2, before GPT-3.
I'm also interested in research on historical over- and under-performance of other tech (where "we kinda saw it (or could see) it coming") relative to expectations.
(FWIW it doesn't land on me at all.)
Did Yang's campaign backfire in some way?
I haven't looked into this in detail but I would be quite surprised if Voyager didn't do any of that?
Although I'm not sure whether what you're asking for is exactly what you're looking for. It seems straightforward that if you train/fine-tune a model on examples of people playing a game that involves leveraging [very helpful but not strictly necessary] resources, you are going to get an AI capable of that.
It would be more non-trivial if you got an RL agent doing that, especially if it didn't stumble into that strategy/association "I need to do X, so let me get Y first" by accident but rather figured that Y tends to be helpful for X via some chain of associations.
Are there any memes prevalent in the US government that make racing to AGI with China look obviously foolish?
The "let's race to AGI with China" meme landed for a reason. Is there something making the US gov susceptible to some sort of counter-meme, like the one expressed in this comment by Gwern?
This but to the extent that people reading him have not clearly already decided on their conclusion, it might be worth it to engage.
The purpose of a debate is not to persuade the debater, it's to persuade the audience. (Modulo that this frame is more soldier-mindset-y than truth-seeking but you know what I mean.)
I dont think this strategy works well. You shouldn't try to fight yourself. You cannot win.
I think sapphire is making a claim about the family of strategies you're discussing in the post.
Major sleep debt?
Probably either one of (or some combination of): (1) "g" is the next consonant after "c" in "cognitive"; (2) leakage from "g-factor"; (3) leakage from "general(ly good at thinking)"
(BTW first you say "CQ" and then "GQ")
- Sleep deprived a little, like stay up really late but without sleep debt: +5 GQ points.
Are you sure about the sign here?
I think I'm more prone to some kinds of creative/divergent thinking when I'm mildly to moderately sleep-deprived (at least sometimes in productive directions) but also worse in precise/formal/mathetmatical thinking about novel/unfamiliar stuff. So the features are pulled apart.
So the Zizian technology, which involves sleep deprivation and then having one eye closed and the other eye open (as a way to make one personality sleep), seems completely unsupported by what we know about human biology.
To the extent that they tried to ground this tech in this particular neuro stuff, then yeah sure but did they even? (These threads are getting long, I'm not remembering everything that was said upstream nor am I reading all of this very carefully.)
Now that I think about it, this sounds very much like "every person is born with the original sin and need our technology sacraments to be saved from damnation".
It's not about the eyes, it's about the part of the visual field.
The image from the right half of the visual field (left part of each retina) feeds into the left hemisphere and the image from the left half of the visual field (right part of each retina) feeds into the right hemisphere.
Since in humans each eye observes both sides of the visual field, you need to have ~50% of each eye's fibers (each corresponding to something like a pixel) to go to each hemisphere.
In vertebrates where the overlap in visual fields of each eye is minimal (e.g. horses, rabbits), each eye serves mostly one half of the visual field exclusively, so the entire image from the left eye feeds into the right hemisphere and ditto right eye -> left hemisphere.
If you think the primary bottleneck to dangerous ASI is not that, but rather something else, then what do you think it is?
So far in this thread I was mostly talking from the perspective of my model(/steelman?) of Abram's argument.
I think the primary bottleneck to dangerous ASI is the ability to develop coherent and correct understandings of arbitrary complex domains and systems
I mostly agree with this.
Still, this doesn't[1] rule out the possibility of getting an AI that understands (is superintelligent in?) one complex domain (specifically here, whatever is necessary to meaningfully speed up AIS research) (and maybe a few more, as I don't expect the space of possible domains to be that compartmentalizable), but is not superintelligent across all complex domains that would make it dangerous.
It doesn't even have to be a superintelligent reasoner about minds. Babbling up clever and novel mathematical concepts for a human researcher to prune could be sufficient to meaningfully boost AI safety (I don't think we're primarily bottlenecked on mathy stuff but it might help some people and I think that's one thing that Abram would like to see).
- ^
Doesn't rule out in itself but perhaps you have some other assumptions that imply it's 1:1, as you say.
It's not unique to Zizians or some slightly broader rationalist circle.
Right, so one possibility is that you are doing something that is “speeding up the development of AIS-helpful capabilities” by 1 day, but you are also simultaneously speeding up “dangerous capabilities” by 1 day, because they are the same thing.
TBC, I was thinking about something like: "speed up the development of AIS-helpful capabilities by 3 days, at the cost of speeding up the development of dangerous capabilities by 1 day".
Ziz herself apparently believed that trans women were inherently more capable of accepting her "truth" for g/acc-ish reasons
@Matrice Jacobine do you have a link to where Ziz talks about that?
I think Abram is saying the following:
- Currently, AIs are lacking capabilities that would meaningfully speed up AI Safety research.
- At some point, they are gonna get those capabilities.
- However, by default, they are gonna get those AI Safety-helpful capabilities roughly at the same time as other, dangerous capabilities (or at least, not meaningfully earlier).
- In which case, we're not going to have much time to use the AI Safety-helpful capabilities to speed up AI Safety research sufficiently for us to be ready for those dangerous capabilities.
- Therefore, it makes sense to speed up the development of AIS-helpful capabilities now. Even if it means that the AIs will acquire dangerous capabilities sooner, it gives us more time to use AI Safety-helpful capabilities to prepare for dangerous capabilities.
I interpreted it as: not by "usual means", but rather something like suicide or murder.
I'm aware of high rates among LWers but it's still far from what we see among Zizians that we hear a lot about.
Ziz herself apparently believed that trans women were inherently more capable of accepting her "truth" for g/acc-ish reasons
interesting
Trans people are over-represented in the rationalist community, relative to general population. It should be evident to anybody who hangs out with a lot of rationalists (at least in "big rationality hubs") or attends big rationality meetups. But I think I also saw some census data (either SSC/ACX or the LW census) confirming that.
To share something that might be non-public: Pasek was very much into letting timeless decision theory drive him even before he became in contact with Ziz. Pasek might have been the person with whom I talked deeply who was taking timeless decision theory the most seriously.
Can you share some examples of things Pasek did for TDT-ish reasons that most people either wouldn't do at all or at least wouldn't do them for TDT-ish reasons.
(I'm aware that this might be private stuff that you wouldn't like to share to any degree greater than what you've said already.)
Why is it the case that a majority of Zizians that we hear about in the news is trans/nb/queer? (If this is representative of Zizians in general, why is it true of Zizians in general?)
I believe the opposite - that the drives identified by Steve Omohundro as instrumental goals for any sufficiently advanced AI (like self-preservation, efficiency, resource acquisition) are really the only terminal goals that matter.
Even if this is ~technically true, if your [essence of self that you want to preserve] involves something like [effectively ensuring that X happens], this is at least behaviorally equivalent to having a terminal goal that is not instrumental in the sense that instrumental convergence is not going to universally produce it in the limit.
The most naive guess is that they may be using some special type of CoT that is more effective at delivering the right answer than what you'd get by default. If their competitors saw it, they would try to replicate it (not even train on the CoT, just use it to guide their design of the CoT training procedure).
So... yeah, technically these things are connected, but sometimes the connection is real and strong, and sometimes it's just "both of them sound kinda sci-fi to me".
As far as I remember (may be misremembering), to the extent that their memetic genealogy analysis holds water, there was little bit more common origin than I expected.
But ofc I agree that the entire thing is just an ideological carpet bombing via guilt by (often overblown) association.
What is the "industrial & philanthropic complex"? Doesn't it include... pretty much everything? I mean, most things are produced by industry, and most non-profits accept donations. Should we consider Bill Gates a part of TESCREAL? He had a software company, and he donated to charities.
I didn't mean all of industry or all of philanthropy. I was pointing at their perception that there is a cluster of futuristic privileged dudes, fractions constantly bickering, and having some views on AGI, industry-ing and philanthropy-ing.
(Possibly first time in my life I feel like I'm overcharitably steelmanning Torres & Gebru on TESCREAL. Is this even worse than I'm giving them credit for?)
Like, what the fuck is "cosmism"? It doesn't even have a proper Wikipedia page
I understood Nora as saying that GSAI in itself is not a swiss cheese approach. This is different from saying that [the overall portfolio of AI derisking approaches, one of which is GSAI] is not a swiss cheese approach.
Here's a way to find out. (Perhaps unrealistic/intractable (IDK) but it is a way to find out.)
- Research the number of malefactors of Ziz type/magnitude per 1,000 active members, across various communities/movements.
- Identify positive outliers: communities that have very below average malefactor-to-active-member ratio.
- Identify what accounts for this.
- If this is anything that can be replicated, replicate.
That's basically the idea behind "TESCREAL" (if we ignore the EA part) that all people who believe that one day we might have intelligent robots and fly to the stars and stuff like that must be a part of some sinister conspiracy.
Are you saying hat (most) sci-fi authors who take the futures they write about seriously (i.e. "we totally might/will see that kind of stuff in decades/centuries") are TESCREAL-ists (either in Torres & Gebru sense or in popular imagination)?
My impression is that TESCREAL was more meant to point at some kind of ... industrial & philantropic complex?
[Epistemic status: my model of the view that Jan/ACS/the GD paper subscribes to.]
I think this comment by Jan from 3 years ago (where he explained some of the difference in generative intuitions between him and Eliezer) may be relevant to the disagreement here. In particular:
Continuity
In my [Jan's] view, your [Eliezer's] ontology of thinking about the problem is fundamentally discrete. For example, you are imaging a sharp boundary between a class of systems "weak, won't kill you, but also won't help you with alignment" and "strong - would help you with alignment, but, unfortunately, will kill you by default". Discontinuities everywhere - “bad systems are just one sign flip away”, sudden jumps in capabilities, etc. Thinking in symbolic terms.
In my inside view, in reality, things are instead mostly continuous. Discontinuities sometimes emerge out of continuity, sure, but this is often noticeable. If you get some interpretability and oversight things right, you can slow down before hitting the abyss. Also the jumps are often not true "jumps" under closer inspection.
My understanding of Jan's position (and probably also the position of the GD paper) is that aligning the AI (and other?) systems will be gradual, iterative, continuous; there's not going to be a point where a system is aligned so that we can basically delegate all the work to them and go home. Humans will have to remain in the loop, if not indefinitely, then at least for many decades.
In such a world, it is very plausible that we will get to a point where we've built powerful AIs that are (as far as we can tell) perfectly aligned with human preferences or whatever but their misalignment manifests only on longer timescales.
Another domain where this discrete/continuous difference in assumptions manifests itself is the shape of AI capabilities.
One position is:
If we get a single-single-aligned AGI, we will have it solve the GD-style misalignment problems for us. If it can't do that (even in the form of noticing/predicting the problem and saying "guys, stop pushing this further, at least until I/we figure out how to prevent this from happening"), then neither can we (kinda by definition of "AGI"), so thinking about this is probably pointless and we should think about problems that are more tractable.
The other position is:
What people officially aiming to create AGI will create is not necessarily going to be superhuman at all tasks. It's plausible that economic incentives will push towards "capability configurations" that are missing some relevant capabilities, e.g. relevant to researching gnarly problems that are hard to learn from the training data or even through current post-training methods. Understanding and mitigating the kind of risk the GD paper describes can be one such problem. (See also: Cyborg Periods.)
Another reason to expect this is that alignment and capabilities are not quite separate magisteria and that the alignment target can induce gaps in capabilities, relative to what one would expect from its power otherwise, as measured by, IDK, some equivalent of the g-factor. One example might be Steven's "Law of Conservation of Wisdom".
There are at least types of people that the term "Zizian" might refer to:
- Someone who has read Sinceriously.fyi and is generally sympathetic to Ziz's philosophy.
- A member of a relatively tightly-coordinated anarchist conspiracy, that has (allegedly) planned and carried out a series of violent crimes.
Octavia is a Zizian in the first sense, but is not (to my knowledge) a Zizian in the second sense. In fact, she seems unaware or disbelieving that a network of Zizians of the second sense exists. She appears to think that there are only 'people who have benefited from reading Ziz's blog', and no coordinated criminal network to speak of.
I would be very surprised if there was no "inner Ziz crew", as inner circles around leaders / prominent figures in a community seem like a default thing that forms in movements/cultural groups.
But is it true that you don't think this inner circle is a coordinated group responsible for the murders?
She's not but to the extent that people put the AI labs in one bucket with LW/EA (TESCREAL or sth), the Annie Altman incident may cause us additional reputational damage.
(About half a year ago I had a thought along the lines of "gosh, it would be good for interp research if people doing interp were at least somewhat familiar with philosophy of mind ... not that it would necessarily teach them anything object-level useful for the kind of research they're doing but at least it would show them which chains of thought are blind alleys because they seem to be repeating some of the same mistakes as 20th century philosophers" (I don't remember what mistakes exactly but I think something to do with representations). Well, perhaps not just philosophy of mind.)
(Context: https://x.com/davidad/status/1885812088880148905 , i.e. some papers just got published that strongly question whether SAEs learn anything meaningful, just like the dead salmon study questioned the value of much of fMRI research.)
What exactly do you have in mind? Semi-regular check-ins with every member to see what they're up to, what their thinking processes are, what recently piqued their interest, what rabbit holes they've gone into?
Sam: https://www.lesswrong.com/posts/CvKnhXTu9BPcdKE4W/an-untrollable-mathematician-illustrated
@Ben Pace care to elaborate?
I'm not sure about the trust dilemma analysis.
It seems to me like it switches between two levels of abstraction.
Cooperate-Cooperate may be more desirable for both states' citizens but at the same time Defect-Cooperate may be more desirable for state A as a state qua rational actor.
ASI might provide a strategic advantage of a kind which doesn't negatively impact the losers of the race, e.g. it increases GDP by x10 and locks competitors out of having an ASI.
It does negatively impact the losers, to the extent that they're interested not only in absolute wealth but also relative wealth (which I expect to be the case, although I know ~nothing about SotA modeling of states as rational actors or whatever).
Pages 22-23:
Arms control for AI in general is therefore unlikely to succeed. The military and civilian applications of general-purpose systems are nearly indistinguishable, and AI will likely see wide use across military and civilian society.
However, the opposite may be true of ASI development control: ASI development would likely be distinguishable from most civilian AI development, and, so long as it is not developed, unintegrated in a state’s economy.
It's not obvious to me either.
At least in the current paradigm, it seems plausible that a state project of (or, deliberately aimed at) developing ASI would yield a lot of intermediate non-ASI products that would then be dispersed into the economy or military. That's what we've been seeing until now.
Are there reasons to expect this not to continue?
One reason might be that an "ASI Manhattan Project" would want to keep their development secrets so as to minimize information leakage. But would they keep literally all useful intermediate products to themselves? Even if they reveal some X, civilians play with this X, and conclude that X is useless for the purpose of developing ASI, this might still be a valuable negative result that closes off some until-then-plausible ASI development paths.
This is one reason, I think the Manhattan Project is a poor model for a state ASI project. Intermediate results of the original Manhattan Project didn't trickle down into the economy while the project was still ongoing. I'm not claiming that people are unaware of those disanalogies but I expect thinking in terms of an "ASI Manhattan Project" encourages overanchoring on it.
No mention of METR, Apollo, or even US AISI? (Maybe too early to pay much attention to this, e.g. maybe there'll be a full-o3 system card soon.)
Rushed bc of deepseek?
It seems to me that by saying this the authors wanted to communicate "this is not a place to discuss this". But I agree that the phrasing used may inaccurately (?) communicate that the authors are more uncertain/agnostic about this issue than they really are (or that they believe something like "both sides have comparably good arguments"), so I'd suggest to replace it with something like:
The probability of loss of control is beyond the scope of this report (for discussion, see: [sources]).
pro-DEI-requirements in clinical trials, so that drug companies have to have a certain number of various groups in their trial population
If this is motivated by accounting for biological diversity that might translate into different responses to a drug, then this feels like a very non-central case of DEI at best and I would expect it's not what a majority of people think of when they hear/think "DEI".
This ToV is a ToV for actors whose values are inverted relative to [shared values of humanity] on a particular dimension.
Ergo, if worldwide theocracy or totalitarianism is not a theory of victory, then human extinction is not either.