Posts
Comments
...in the last 24 hours? Or, like, awhile ago in a previous context?
Well, an alternate framing is "does the big stick turn out to have the effect you want?"
I guess the actual resolution here will eventually come from seeing the final headlines and that, like, they're actually reasonable.
I'd be interested in a few more details/gears. (Also, are you primarily replying about the immediate parent, i.e. domestication of dissent, or also about the previous one)
Two different angles of curiosity I have are:
- what sort of things you might you look out for, in particular, to notice if this was happening to you at OpenAI or similar?
- something like... what's your estimate of the effect size here? Do you have personal experience feeling captured by this dynamic? If so, what was it like? Or did you observe other people seeming to be captured, and what was your impression (perhaps in vague terms) of the diff that the dynamic was producing?
My take atm is "seems right that this shouldn't be a permanent norm, there are definitely costs of disclaimer-ratcheting that are pretty bad. I think it might still be the right thing to do of your own accord in some cases, which is, like, superogetory."
I think there's maybe a weird thing with this post, where, it's trying to be the timeless, abstract version of itself. It's certainly easier to write the timeless abstract version than the "digging into specific examples and calling people out" version. But, I think the digging into specific examples is actually kind of important here – it's easy to come away with vague takeaways that disagree, where everyone nods along but then mostly thinks it's Those Other Guys who are being power seeking.
Given that it's probably 10-50x harder to write the Post With Specific Examples, I think actually a pretty okay outcome is "ship the vague post, and let discussion in the comments get into the inside-baseball-details." And, then, it'd be remiss for the post-author's role in the ecosystem not coming up as an example to dig into.
They can believe in catastrophic but non-existential risks. (Like, AI causes something like crowdstrike periodically if your not trying to prevent that )
I think people mostly don't believe in extinction risk, so the incentive isn't nearly as real/immediate.
Part of the whole point of CEV is to discover at least some things that current humanity is confused about but would want if fully informed, with time to think. It'd be surprising to me if CEV-existing-humanity didn't turn out to want some things that many current humans are opposed to.
So, I do think definitely I've got some confirmation bias here – I know because the first thing I thought when I saw was "man this sure looks like the thing Eliezer was complaining about" and it was awhile later, thinking it through, that was like "this does seem like it should make you really doomy about any agent-foundations-y plans, or other attempts to sidestep modern ML and cut towards 'getting the hard problem right on the first try.'"
I did (later) think about that a bunch and integrate it into the post.
I don't know whether I think it's reasonable to say "it's additionally confirmation-bias-indicative that the post doesn't talk about general doom arguments." As Eli says, the post is mostly observing a phenonenon that seems more about planmaking than general reasoning.
(fwiw my own p(doom) is more like 'I dunno man, somewhere between 10% and 90%, and I'd need to see a lot of things going concretely right before my emotional center of mass shifted below 50%')
Yeah. I tried to get at this in the Takeaways but I like your more thorough write up here.
In the world where people had exactly $30 to spend every hour and they’d either spend it or it disappeared, would you object to calling that spending money? I feel like many of my spending intuitions would still basically transfer to that world.
Curious for details.
People varied in how much Baba-Is-You experience they had. Some of them were completely new, and did complete the first couple levels (which are pretty tutorial-like) using the same methodology I outline here, before getting to a level that was a notable challenge.
They actually did complete the first couple levels successfully, which I forgot when writing this post. This does weaken the rhetorical force, but also, the first couple levels are designed more to teach the mechanics and are significantly easier. I'll update the post to clarify this.
Some of them had played before, and were starting a new level from around where they left off.
...fwiw I think it's not grossly inaccurate.
I think MIRI did put a lot of effort into being cooperative about the situation (i.e. Don't leave your fingerprints on the future, doing the 'minimal' pivotal act that would end the acute risk period, and when thinking about longterm godlike AI, trying to figure out fair CEV sorts of things).
But, I think it was also pretty clear that "have a controllable, safe AI that's just powerful enough to take some action that prevents anyone else from building a more powerful and more dangerous AI" were not in the overton window. I don't know what Eliezer's actual plan was since he disclaimed "yes I know melt all the GPUs won't work", but, like, "melt all the GPUs" implies a level of power over the world that is really extreme by historical standards, even if you're trying to do the minimal thing with that power.
Also, the second section makes an argument in favor of backchaining. But that seems to contradict the first section, in which people tried to backchain and it went badly.
This didn't come across in the post, but – I think people in the experiment were mostly doing things closer to (simulated) forward chaining, and then getting stuck, and then generating the questionable assumptions. (which is also what I tended to do when I first started this experiment).
An interesting thing I learned is that "look at the board and think without fiddling around" is actually a useful skill to have even when I'm doing the more openended "solve it however seems best." It's easier to notice now when I'm fiddling around pointlessly instead of actually doing useful cognitive work.
I had a second half of this essay that felt like it was taking too long to pull together and I wasn't quite sure who I was arguing with. I decided I'd probably try to make it a second post. I generally agree it's not that obvious what lessons to take.
The beginning of the second-half/next-post was something like:
There's an age-old debate about AI existential safety, which I might summarize as the viewpoints:
1. "We only get one critical try, and most alignment research dodges the hard part of the problem, with wildly optimistic assumptions."
vs
2. "It is basically impossible to make progress on remote, complex problems on your first try. So, we need to somehow factor the problem into something we can make empirical progress on."
I started out mostly thinking through lens #1. I've updated that, actually, both views are may be "hair on fire" levels of important. I have some frustrations with both some doomer-y people who seem resistant to incorporating lens #2, and with people who seem to (in practice) be satisfied with "well, iterative empiricism seems tractable, and we don't super need to incorporate frame #1)
I am interested in both:
- trying to build "engineering feedback loops" that more accurately represent the final problem as best we can, and then iterating on both "solving representative problems against our current best engineered benchmarks" while also "continuing to build better benchmarks. (Automating Auditing and Model Organisms of Misalignment seem like attempts at this)
- trying to develop training regimens that seem like they should help people plan better in Low-Feedback-Domains, which includes theoretic work, and empirical research that's trying to keep their eye on the longterm ball better, and the invention of benchmarks a la previous bullet.
Games I was particularly thinking of were They Are Billions, Slay The Spire. I guess also Factorio although the shape of that is a bit different.
(to be clear, these are fictional examples that don't necessarily generalize, but, when I look at the AI situation I think it-in-particular has an 'exponential difficulty' shape)
I also just realized the actual reason I do this is not because it works better, but because I felt too awkward merely turning my back.
I take it a step farther and just start walking down the sidewalk away from the road until they pass, and then turn around.
though, curious to hear an instance of it actually playing out
Woo, great. :)
Whether this works out or not for you, I quite appreciate you laying out the details. Hope it's useful for you!
Curious to here what sort of things you end up predicting about, if you're up for sharing. :)
Maybe I'm unusual and few other readers don't have this problem. I suspect that's not the case, but given that I don't know, I'll just say that I find this writing style to be a little too Dark Artsy and symmetrical for my comfort.
fyi I also felt this. (Don't have much more to add. I just wanted to note it).
Quick mod note – this post seems like a pretty earnest, well intentioned version of "address a dialogue to someone who hasn't opted into it". But, it's the sort of thing I'd expect to often be kind of annoying. I haven't chatted with other mods yet about whether we want to allow this sort of thing longterm, but, flagging that we're tracking it as an edge case to think about.
I'm curating this post, both for the post itself, as well as various followup discussion in the post disclaimer and comments that I found valuable.
I think the question of "how do we quickly/efficiently train epistemic skills?" is a very important one. I'm interested in the holy grail of training full-generality epistemic skills, and I'm interesting in training more specific clusters of skills (such as ones relevant for trading). I agree with kave's comment that this post equivocates between "epistemics" and "trading" but I'm generally excited for LessWrong folk to develop the art of "designing games that efficiently teach nuanced skills that can transfer".
I like rossry's attitude of "the main feedbackloop of the game should help players become unconfused".
Yeah I read those lines, and also "Want to use your engineering skills to push the frontiers of what state-of-the-art language models can accomplish", and remain skeptical. I think the way OpenAI tends to equivocate on how they use the word "alignment" (or: they use it consistently, but, not in a way that I consider obviously good. Like, I the people working on RLHF a few years ago probably contributed to ChatGPT being released earlier which I think was bad*)
*I like the part where the world feels like it's actually starting to respond to AI now, but, I think that would have happened later, with more serial-time for various other research to solidify.
(I think this is a broader difference in guesses about what research/approaches are good, which I'm not actually very confident about, esp. compared to habryka, but, is where I'm currently coming from)
I was thinking of things like the Alignment Research Science role. If they talked up "this is a superalignment role", I'd have an estimate higher than 55%.
We are seeking Researchers to help design and implement experiments for alignment research. Responsibilities may include:
- Writing performant and clean code for ML training
- Independently running and analyzing ML experiments to diagnose problems and understand which changes are real improvements
- Writing clean non-ML code, for example when building interfaces to let workers interact with our models or pipelines for managing human data
- Collaborating closely with a small team to balance the need for flexibility and iteration speed in research with the need for stability and reliability in a complex long-lived project
- Understanding our high-level research roadmap to help plan and prioritize future experiments
- Designing novel approaches for using LLMs in alignment research
You might thrive in this role if you:
- Are excited about OpenAI’s mission of building safe, universally beneficial AGI and are aligned with OpenAI’s charter
- Want to use your engineering skills to push the frontiers of what state-of-the-art language models can accomplish
- Possess a strong curiosity about aligning and understanding ML models, and are motivated to use your career to address this challenge
- Enjoy fast-paced, collaborative, and cutting-edge research environments
- Have experience implementing ML algorithms (e.g., PyTorch)
- Can develop data visualization or data collection interfaces (e.g., JavaScript, Python)
- Want to ensure that powerful AI systems stay under human control
I tend to dismiss scenarios where it's obvious, because I expect the demonstration of strong misaligned systems to inspire a strong multi-government respons
I think covid was clear-cut, and it did inspire some kind of government response, but not a particularly competent one.
I think I have some tendency to apologize the way this post warns about, and have heard the "say thank you" advice and considered it in the past. But, I'm curious to hear from anyone who's been on the receiving end of the "thank you" apology substitutes and how it feels to them.
I'm not Elizabeth and probably wouldn't have worded my thoughts quite the same, but my own position regarding your first bullet point is:
"When I see OpenAI list a 'safety' role, I'm like 55% confident that it has much to do with existential safety, and maybe 25% that it produces more existential safety than existential harm."
This was really nice to read, thank you!
Re:
I see no obvious way on the site to send Lightcone money, or to otherwise contribute to this happening again, and I would like to. What do I do?
For now, the best place is https://www.lesswrong.com/donate. (We used to link this from the sidebar but people didn't use it often enough to really justify the screen real-estate)
Minor note:
His name is Leo. As best I could tell from asking others, he’s not attached to the site, he hails from one of the adjacent properties and just likes the people. I was going to nominate him as the LessOnline mascot, but must admit that Agendra might be more appropriate.
Leo's owner is one of the maintenance-folk who help keep the venue in good repair. :)
tbh I typically find those bots annoying too.
Are there currently board members who are meaningfully separated in terms of incentive-alignment with Daniella or Dario? (I don't know that it's possible for you to answer in a way that'd really resolve my concerns, given what sort of information is possible to share. But, "is there an actual way to criticize Dario and/or Daniella in a way that will realistically be given a fair hearing by someone who, if appropriate, could take some kind of action" is a crux of mine)
I agree we can't get exact numbers here but it'd be surprising to me if modern material science wasn't capable of generating some upper/lower bounds.
As a caveat, I would suggest that if the AI is "nice" enough to spare Earth, it's likely to be nice enough to beam some reconstituted sunlight over to us.
Yeah seems right. I still find myself curious, as well as strategically interested in "man, I just really don't know how the future is likely to play out, so getting more clarity on physical limits of this sort of system feels like it helps constrain possible future scenarios." That might just be cope though.
Nod, but, this doesn't answer the actual question.
A thing I'm still not sure about reading that is "what percent of the light is getting through?". Like, how dense are the reflector modules?
Later in the paper it says "The Dyson sphere is assumed to have an efficiency of one third", which could mean "realistically you only capture about 1/3rd of the energy in the first place" or "the capturing/redirecting process" loses 2/3rds of the energy.
Thinking a bit more, scenarios that seem at least kinda plausible:
- "misuse" where someone is just actively trying to use AI to commit genocide or similar. Or, we get into an humans+AI vs human+AI war.
- the AI economy takes off, it has lots of extreme environmental impact, and it's sort of aligned but we're not very good at regulating it fast enough, but, we get it under control after a billion death.
(To be clear, I think there is a substantial chance of at least 1 billion people dying and that AI takeover is very bad from a longtermist perspective.)
Is there a writeup somewhere of how we're likely to get "around a billion people die" that isn't extinction, or close to it? Something about this phrasing feels weird/suspicious to me.
Like I have a few different stories for everyone dying (some sooner, or later).
I have some stories where like "almost 8 billion people" die and the AI scans the remainder.
I have some stories where the AI doesn't really succeed and maybe kills millions of people, in what is more like "a major industrial accident" than "a powerful superintelligence enacting its goals".
Technically "substantial chance of at least 1 billion people dying" can imply the middle option there, but it sounds like you mean the central example to be closer to a billion than 7.9 billion or whatever. That feels like a narrow target and I don't really know what you have in mind.
I think the idea of an island ritual is really cool. (I also think it's a fine thing to try one year even if it turns out not to make sense as a permanent thing)
One of the things that feels hesitationy/cruxy to me is that I think it requires a larger team than one might expect (even after an intuitive scale-up from the normal Summer Solstices).
I think an issue with Summer Solstice is that it requires a lot of basic logistical infrastructure to be "basically functional" / "Maslow Hierarchy level 1." And this ends up consuming most of the energy to run it. I think in recent years there hasn't been much spare capacity for planning ritual aspects.
A thing I expect with Angel Island is that you'll need even more infrastructure to be "basically functional" (handling Ferry rides, making sure to have more food than usual, etc). And then developing ritual that really capitalizes on the island will be an additional large batch of effort. So I expect it to need more like 3x the amount of organizers, rather than (what I might naively guess) of 2x.
Curated.
The overall point here seems true and important to me.
I think I either disagree, or am agnostic about, some of the specific examples given in the Myth vs Reality section. I don't think they're loadbearing for the overall point. I may try to write those up in more detail later.
Nod.
I can't remember if I said this already, but the way I'm looking at this is "take stock of various clusters of strategy heuristics or frameworks, and think about which-if-any apply to stuff that I care about." So, less looking for universal principles, more "try on different strategic lenses and see what shakes out."
I appreciate the writeup/followup!
I maybe want to flag this is "one particular leg/trunk/ear of 'the elephant that is metastrategy'". My preferred way to intro people to it is with a full week of workshop classes that highlight different skills that interrelate with each other.
I think "have at least two plans that are pretty fleshed out and feel 'real' to you" is a major cornerstone of my personal practice, but I think the core element is "dedicate any significant fraction of time for thinking about 'how to do strategy', at all." See: "Metastrategic Brainstorming", a core building-block skill
It so happens I hadn't seen your other posts, although I think there is something that this post was aiming at, that yours weren't quite pointed at, which is laying out "this is a crux for timelines, these are the subcomponents of the crux." (But, I haven't read your posts in detail yet and thought about what else they might be good at that this post wasn't aiming for)
Curated.
This is a fairly straightforward point, but one I haven't seen written up before and I've personally been wondering a bunch about. I appreciated this post both for laying out the considerations pretty thoroughly, including a bunch or related reading, and laying out some concrete predictions at the end.
Sure, but those areas aren’t the ones that have me interested in gaming metaphors to figure out how to solve my problems.
‘Found a startup’ is a bit more of an established process that ‘counts’ for my purposes here. There’s a lot of reading and learning I can do before getting started. (Compared to ‘build a functioning alignment community’). But even there I think it’s less like playing a game I’ve already studied such that the early game is memorized, and more like sitting down to play a multiplayer game for the first time, which shares structure with other games but is still involved a lot of learning on the fly. (I bet this is still reasonably true on your second or third startup, though maybe not if you literally are running Y Combinator). though interested in hearing from people who have run multiple to see if they think that tracks.
The "early game is what you have memorized" makes sense for literal games, but doesn't actually help much with my current use-case, which is "and this translates into real life." (when I'm thinking about these in game-form, I'm generally thinking about one-shot gaming, where you're trying hard to win your first time playing a game, such that figuring out the early game is part of the challenge)