Posts
Comments
(Nitpick: I'd find the first paragraphs would be much easier to read if they didn't have any of the bolding)
rename the "provable safety" area as "provable safety modulo assumptions" area and be very explicit about our assumptions.
Very much agree. I gave some feedback along those lines as the term was coined; and am sad it didn't catch on. But of course "provable safety modulo assumptions" isn't very short and catchy...
I do like the word "guarantee" as a substitute. We can talk of formal guarantees, but also of a store guaranteeing that an item you buy will meet a certain standard. So it's connotations are nicely in the direction of proof but without, as it were, "proving too much" :)
Interesting thread to return to, 4 years later.
FYI: I skimmed the post quickly and didn't realize there was a Patreon!
If you wanted to change that, you might want to put it at the very end of the post, on a new line, saying something like: "If you'd like to fund my work directly, you can do so via Patreon [here](link)."
Someone posted these quotes in a Slack I'm in... what Ellsberg said to Kissinger:
“Henry, there’s something I would like to tell you, for what it’s worth, something I wish I had been told years ago. You’ve been a consultant for a long time, and you’ve dealt a great deal with top secret information. But you’re about to receive a whole slew of special clearances, maybe fifteen or twenty of them, that are higher than top secret.
“I’ve had a number of these myself, and I’ve known other people who have just acquired them, and I have a pretty good sense of what the effects of receiving these clearances are on a person who didn’t previously know they even existed. And the effects of reading the information that they will make available to you.
[...]
“In the meantime it will have become very hard for you to learn from anybody who doesn’t have these clearances. Because you’ll be thinking as you listen to them: ‘What would this man be telling me if he knew what I know? Would he be giving me the same advice, or would it totally change his predictions and recommendations?’ And that mental exercise is so torturous that after a while you give it up and just stop listening. I’ve seen this with my superiors, my colleagues….and with myself.
“You will deal with a person who doesn’t have those clearances only from the point of view of what you want him to believe and what impression you want him to go away with, since you’ll have to lie carefully to him about what you know. In effect, you will have to manipulate him. You’ll give up trying to assess what he has to say. The danger is, you’ll become something like a moron. You’ll become incapable of learning from most people in the world, no matter how much experience they may have in their particular areas that may be much greater than yours.”
(link)
tbf I never realized "sic" was mostly meant to point out errors, specifically. I thought it was used to mean "this might sound extreme --- but I am in fact quoting literally"
I mean that in both cases he used literally those words.
It's not epistemically poor to say these things if they're actually true.
Invalid.
Compare:
A: "So I had some questions about your finances, it seems your trading desk and exchange operate sort of closely together? There were some things that confused me..."
B: "our team is 20 insanely smart engineers"
A: "right, but i had a concern that i thought perhaps ---"
B: "if you join us and succeed you'll be a multi millionaire"
A: "...okay, but what if there's a sudden downturn ---"
B: "bull market is inevitable right now"
Maybe not false. But epistemically poor form.
(crossposted to the EA Forum)
(😭 there has to be a better way of doing this, lol)
(crossposted to EA forum)
I agree with much of Leopold's empirical claims, timelines, and analysis. I'm acting on it myself in my planning as something like a mainline scenario.
Nonetheless, the piece exhibited some patterns that gave me a pretty strong allergic reaction. It made or implied claims like:
- a small circle of the smartest people believe this
- i will give you a view into this small elite group who are the only who are situationally aware
- the inner circle longed tsmc way before you
- if you believe me; you can get 100x richer -- there's still alpha, you can still be early
- This geopolitical outcome is "inevitable" (sic!)
- in the future the coolest and most elite group will work on The Project. "see you in the desert" (sic)
- Etc.
Combined with a lot of retweets, with praise, on launch day, that were clearly coordinated behind the scenes; it gives me the feeling of being deliberately written to meme a narrative into existence via self-fulfilling prophecy; rather than inferring a forecast via analysis.
As a sidenote, this felt to me like an indication of how different the AI safety adjacent community is now to when I joined it about a decade ago. In the early days of this space, I expect a piece like this would have been something like "epistemically cancelled": fairly strongly decried as violating important norms around reasoning and cooperation. I actually expect that had someone written this publicly in 2016, they would've plausibly been uninvited as a speaker to any EAGs in 2017.
I don't particularly want to debate whether these epistemic boundaries were correct --- I'd just like to claim that, empirically, I think they de facto would have been enforced. Though, if others who have been around have a different impression of how this would've played out, I'd be curious to hear.
[censored_meme.png]
I like review bot and think it's good
(Sidenote: it seems Sam was kind of explicitly asking to be pressured, so your comment seems legit :)
But I also think that, had Sam not done so, I would still really appreciate him showing up and responding to Oli's top-level post, and I think it should be fine for folks from companies to show up and engage with the topic at hand (NDAs), without also having to do a general AMA about all kinds of other aspects of their strategy and policies. If Zach's questions do get very upvoted, though, it might suggest there's demand for some kind of Anthropic AMA event.)
Poor Review Bot, why do you get so downvoted? :(
I was around a few years ago when there were already debates about whether 80k should recommend OpenAI jobs. And that's before any of the fishy stuff leaked out, and they were stacking up cool governance commitments like becoming a capped-profit and having a merge-and-assist-clause.
And, well, it sure seem like a mistake in hindsight how much advertising they got.
30 kW
typo
Not sure how to interpret the "agree" votes on this comment. If someone is able to share that they agree with the core claim because of object-level evidence, I am interested. (Rather than agreeing with the claim that this state of affairs is "quite sad".)
Does anyone from Anthropic want to explicitly deny that they are under an agreement like this?
(I know the post talks about some and not necessarily all employees, but am still interested).
Note that, by the grapevine, sometimes serving inference requests might loose OpenAI money due to them subsidising it. Not sure how this relates to boycott incentives.
That metaphor suddenly slide from chess into poker.
If AI ends up intelligent enough and with enough manufacturing capability to threaten nuclear deterrence; I'd expect it to also deduce any conclusions I would.
So it seems mostly a question of what the world would do with those conclusions earlier, rather than not at all.
A key exception is if later AGI would be blocked on certain kinds of manufacturing to create it's destabilizing tech, and if drawing attention to that earlier starts serially blocking work earlier.
I have thoughts on the impact of AI on nuclear deterrents; and claims made thereof in the post.
But I'm uncertain whether it's wise to discuss such things publicly.
Curious if folks have takes on that. (The meta question)
y'know, come to think of it... Training and inference differ massively in how much compute they consume. So after you've trained a massive system, you have a lot of compute free to do inference (modulo needing to use it to generate revenue, run your apps, etc). Meaning that for large scale, critical applications, it might in fact be feasible to tolerate some big, multiple OOMs, hit to the compute cost of your inference; if that's all that's required to get the zero knowledge benefits, and if those are crucial
"arguments" is perhaps a bit generous of a term...
(also, lol at this being voted into negative! Giving karma as encouragement seems like a great thing. It's the whole point of it. It's even a venerable LW tradition, and was how people incentivised participation in the annual community surveys in the elden days)
(Also the arguments of this comment do not apply to Community Notes.)
the amount of people who could write sensible arguments is small
Disagree. The quality of arguments that need debunking is often way below the average LW:ers intellectual pay grade. And there's actually quite a lot of us.
Cross posting sure seems cheap. Though I think replying and engaging with existing discourse is easier than building a following of one's top level posts from scratch.
Yeah, my hypothesis is something like this might work.
(Though I can totally see how it wouldn't though, and I wouldn't have thought it a few years ago, so my intuition might just be mistaken)
I dont think the numbers really check out on your claim. Only a small proportion of people reading this are alignment researchers. And for remaining folks many are probably on Twitter anyway, or otherwise have some similarly slack part of their daily scheduling filled with sort of random non high opportunity cost stuff.
Historically there sadly hasn't been scalable ways for the average LW lurker to contribute to safety progress; now there might be a little one.
never thought I'd die fighting side by side with an elf...
If anyone signed up to Community Notes because of this post, feel free to comment below and I'll give you upvote karma :) (not agreement karma)
Yes, I've felt some silent majority patterns.
Collective action problem idea: we could run an experiment -- 30 ppl opt in to writing 10 comments and liking 10 comments they think raise the sanity waterline, conditional on a total of 29 other people opting in too. (A "kickstarter".) Then we see if it seemed like it made a difference.
I'd join. If anyone is also down for that, feel free to use this comment as a schelling point and reply with your interest below.
(I'm not sure the right number of folks, but if we like the result we could just do another round.)
there could still be founder effects in the discourse, or particularly influential people could be engaged in the twitter discourse.
I think that's the case. Mostly the latter, some of the former.
Without commenting on the proposal itself; I think the term "eval test set" is clearer for this purpose than "closed source eval".
I'm writing a quick and dirty post because the alternative is that I wait for months and maybe not write it after all.
This is the way.
I think this is an application of a more general, very powerful principle of mechanism design: when cognitive labor is abundant, near omni-present surveillance becomes feasible.
For domestic life, this is terrifying.
But for some high stakes, arms race-style scenarios, it might have applications.
Beyond what you metioned, I'm particularly interested in this being a game-changer for bilateral negotiation. Two parties make an agreement, consent to being monitored by an AI auditor, and verify that the auditor's design will communicate with the other party if and only if there has been a rule breach. (Beyond the rule breach, it won't be able to leak any other information. And, being an AI, it can be designed to have its memory erased, never recruited as a spy, etc.) However, one big challenge of building this is how two adversarial parties could ever gain enough confidence to allow such a hardware/software package into a secure facility, especially if it's whole point is to have a communication channel to their adversary.
ah that makes sense thanks
Sidenote: I'm a bit confused by the name. The all caps makes it seem like an acronym. But it seems to not be?
Sure that works! Maybe use a term like "importantly misguided" instead of "correct"? (Seems easier for me to evaluate)
To anyone reading this who is considering working in alignment --
Following the recent revelations, I now believe OpenAI should be regarded as a bad faith actor. If you go work at OpenAI, I believe your work will be net negative; and will most likely be used to "safetywash" or "governance-wash" Sam Altman's mad dash to AGI. It now appears Sam Altman is at least a sketchy as SBF. Attempts to build "social capital" or "affect the culture from the inside" will not work under current leadership (indeed, what we're currently seeing are the failed results of 5+ years of such attempts). I would very strongly encourage anyone looking to contribute to stay away from OpenAI.
I recognize this is a statement, and not an argument. I don't have the time to write out the full argument. But I'm leaving this comment here, such that others can signal agreement with it.
That's more about me being interested in key global infrastructure, I've been curious about them for quite a lot of years after realising the combination of how significant what they're building is vs how few folks know about them. I don't know that they have any particularly generative AI related projects in the short term.
Anyone know folks working on semiconductors in Taiwan and Abu Dhabi, or on fiber at Tata Industries in Mumbai?
I'm currently travelling around the world and talking to folks about various kinds of AI infrastructure, and looking for recommendations of folks to meet!
If so, freel free to DM me!
(If you don't know me, I'm a dev here on LessWrong and was also part of founding Lightcone Infrastructure.)
Noting that a nicer name that's just waiting to be had, in this context, is "Future of the Lightcone Institute" :)
Two notes:
- I think the title is a somewhat obscure pun referencing the old saying that Stanford was the "Harvard of the West". If one is not familiar with that saying, I guess some of the nuance is lost in the choice of term. (I personally had never heard that saying before recently, and I'm not even quite sure I'm referencing the right "X of the West" pun)
- habryka did have a call with Nick Bostrom a few weeks back, to discuss his idea for an "FHI of the West", and I'm quite confident he referred to it with that phrase on the call, too. Far as I'm aware Nick didn't particularly react to it with more than a bit humor.
See this: https://www.lesswrong.com/posts/CTBta9i8sav7tjC2r/how-to-hopefully-ethically-make-money-off-of-agi
Can you CC me too?
I work from the same office as John; and the location also happens to have dozens of LessWrong readers work there on a regular basis. We could probably set up an experiment here with many willing volunteers; and I'm interested in helping to make it happen (if it continues to seem promising after thinking more about it).
[Mod note: I edited out your email from the comment, to save you from getting spam email and similar. If you really want it there, feel free to add it back! :) ]
Mod here: most of the team were away over the weekend so we just didn't get around to processing this for personal vs frontpage yet. (All posts start as personal until approved to frontpage.) About to make a decision in this morning's moderation review session, as we do for all other new posts.
Jake himself has participated in both Zika and Shigella challenge trials.
Your civilisation thanks you 🫡
Cool idea and congrats on shipping! Installed it now and am trying it. One user feedback is I found the having-to-wait for replies a bit frictiony. Maybe you could stream responses in chunks? (I did for a gpt-to-slack app once. You just can't do letter-by-letter because you'll be rate limited).