LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Why Don't We Just... Shoggoth+Face+Paraphraser?
Daniel Kokotajlo (daniel-kokotajlo) · 2024-11-19T20:53:52.084Z · comments (49)

[link] My Number 1 Epistemology Book Recommendation: Inventing Temperature
adamShimi · 2024-09-08T14:30:40.456Z · comments (18)

A bird's eye view of ARC's research
Jacob_Hilton · 2024-10-23T15:50:06.123Z · comments (12)

Anthropic's Certificate of Incorporation
Zach Stein-Perlman · 2024-06-12T13:00:30.806Z · comments (4)

The LessWrong 2022 Review
habryka (habryka4) · 2023-12-05T04:00:00.000Z · comments (43)

Why I funded PIBBSS
Ryan Kidd (ryankidd44) · 2024-09-15T19:56:33.018Z · comments (21)

Should CA, TX, OK, and LA merge into a giant swing state, just for elections?
Thomas Kwa (thomas-kwa) · 2024-11-06T23:01:48.992Z · comments (35)

[link] Anthropic release Claude 3, claims >GPT-4 Performance
LawrenceC (LawChan) · 2024-03-04T18:23:54.065Z · comments (41)

Talent Needs of Technical AI Safety Teams
yams (william-brewer) · 2024-05-24T00:36:40.486Z · comments (64)

Current AIs Provide Nearly No Data Relevant to AGI Alignment
Thane Ruthenis · 2023-12-15T20:16:09.723Z · comments (155)

Mapping the semantic void: Strange goings-on in GPT embedding spaces
mwatkins · 2023-12-14T13:10:22.691Z · comments (31)

[link] Gender Exploration
sapphire (deluks917) · 2024-01-14T18:57:32.893Z · comments (25)

[link] introduction to cancer vaccines
bhauth · 2024-05-05T01:06:16.972Z · comments (19)

The Pareto Best and the Curse of Doom
Screwtape · 2024-02-21T23:10:01.359Z · comments (21)

Rationality Research Report: Towards 10x OODA Looping?
Raemon · 2024-02-24T21:06:38.703Z · comments (21)

[link] Please support this blog (with money)
Elizabeth (pktechgirl) · 2024-08-17T15:30:05.641Z · comments (3)

Four visions of Transformative AI success
Steven Byrnes (steve2152) · 2024-01-17T20:45:46.976Z · comments (22)

The Pearly Gates
lsusr · 2024-05-30T04:01:14.198Z · comments (6)

Social status part 1/2: negotiations over object-level preferences
Steven Byrnes (steve2152) · 2024-03-05T16:29:07.143Z · comments (15)

Simple versus Short: Higher-order degeneracy and error-correction
Daniel Murfet (dmurfet) · 2024-03-11T07:52:46.307Z · comments (6)

The Parable Of The Fallen Pendulum - Part 1
johnswentworth · 2024-03-01T00:25:00.111Z · comments (32)

[link] Practically A Book Review: Appendix to "Nonlinear's Evidence: Debunking False and Misleading Claims" (ThingOfThings)
tailcalled · 2024-01-03T17:07:13.990Z · comments (25)

DeepSeek beats o1-preview on math, ties on coding; will release weights
Zach Stein-Perlman · 2024-11-20T23:50:26.597Z · comments (23)

The case for more ambitious language model evals
Jozdien · 2024-01-30T00:01:13.876Z · comments (30)

Introduction to French AI Policy
Lucie Philippon (lucie-philippon) · 2024-07-04T03:39:45.273Z · comments (12)

You should go to ML conferences
Jan_Kulveit · 2024-07-24T11:47:52.214Z · comments (13)

Ten arguments that AI is an existential risk
KatjaGrace · 2024-08-13T17:00:03.397Z · comments (41)

A Selection of Randomly Selected SAE Features
CallumMcDougall (TheMcDouglas) · 2024-04-01T09:09:49.235Z · comments (2)

Please stop using mediocre AI art in your posts
Raemon · 2024-08-25T00:13:52.890Z · comments (24)

Scissors Statements for President?
AnnaSalamon · 2024-11-06T10:38:21.230Z · comments (31)

What I Would Do If I Were Working On AI Governance
johnswentworth · 2023-12-08T06:43:42.565Z · comments (32)

' petertodd'’s last stand: The final days of open GPT-3 research
mwatkins · 2024-01-22T18:47:00.710Z · comments (16)

[link] A primer on the current state of longevity research
Abhishaike Mahajan (abhishaike-mahajan) · 2024-08-22T17:14:57.990Z · comments (6)

Being nicer than Clippy
Joe Carlsmith (joekc) · 2024-01-16T19:44:23.893Z · comments (32)

"AI Alignment" is a Dangerously Overloaded Term
Roko · 2023-12-15T14:34:29.850Z · comments (100)

The Leopold Model: Analysis and Reactions
Zvi · 2024-06-14T15:10:03.480Z · comments (19)

Clarifying METR's Auditing Role
Beth Barnes (beth-barnes) · 2024-05-30T18:41:56.029Z · comments (1)

Attitudes about Applied Rationality
Camille Berger (Camille Berger) · 2024-02-03T14:42:22.770Z · comments (18)

Fact Finding: Attempting to Reverse-Engineer Factual Recall on the Neuron Level (Post 1)
Neel Nanda (neel-nanda-1) · 2023-12-23T02:44:24.270Z · comments (8)

OthelloGPT learned a bag of heuristics
jylin04 · 2024-07-02T09:12:56.377Z · comments (10)

[link] Perplexity wins my AI race
Elizabeth (pktechgirl) · 2024-08-24T19:20:10.859Z · comments (12)

Discriminating Behaviorally Identical Classifiers: a model problem for applying interpretability to scalable oversight
Sam Marks (samuel-marks) · 2024-04-18T16:17:39.136Z · comments (10)

2023 in AI predictions
jessicata (jessica.liu.taylor) · 2024-01-01T05:23:42.514Z · comments (35)

[link] Most smart and skilled people are outside of the EA/rationalist community: an analysis
titotal (lombertini) · 2024-07-12T12:13:56.215Z · comments (36)

[question] How do you feel about LessWrong these days? [Open feedback thread]
jacobjacob · 2023-12-05T20:54:42.317Z · answers+comments (281)

[link] Announcing turntrout.com, my new digital home
TurnTrout · 2024-11-17T17:42:08.164Z · comments (24)

Danger, AI Scientist, Danger
Zvi · 2024-08-15T22:40:06.715Z · comments (9)

Skills I'd like my collaborators to have
Raemon · 2024-02-09T08:20:37.686Z · comments (9)

Why I'm doing PauseAI
Joseph Miller (Josephm) · 2024-04-30T16:21:54.156Z · comments (16)

The first future and the best future
KatjaGrace · 2024-04-25T06:40:04.510Z · comments (12)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

rictic on (The) Lightcone is nothing without its people: LW + Lighthaven's big fundraiser

Donated. Lighthaven is incredible.

joseph-miller on You should consider applying to PhDs (soon!)

I started working on PhD applications about 12 days ago. I expect to have fairly polished applications for the first deadline on December 1, despite not working on this full time. So I think it's quite possible to do applications for the December 15 deadlines. You would need to contact your referees (and potential supervisors for UK universities) in the next couple of days.

meedstrom on Kenshō

Disclaimer: I am not sure I've done what you think of as Looking, but all your metaphors make sense to me.

If I "get" the general thing, then would you agree that aside from Fake Frameworks, experience with Focusing must help? Especially for people who haven't yet meditated much or find the idea of a "non-verbal thought" elusive.

I'm thinking of Focusing as targeting something that can also happen in meditation, but could take some beginner meditators a long time until they get direct experience with. It's the way that your mind can suddenly produce a new awareness or new knowledge, without any conscious chain-of-thought, any verbal reasoning behind it.

Focusing hammers that home again and again, yes, there's a way and it's right there. It gave me a lot of confidence to try the mental move of "step back and wait until I See Something" in a variety of contexts.

PS: Thank you for pointing out the purpose of koans. I had "dissolved" them, but now I see, that perhaps I can try to answer them anyway!

logan-zoellner on China Hawks are Manufacturing an AI Arms Race

Because the USA has always looked at the cost of using that 'robust military superiority', which would entail the destruction of Seoul and possibly millions of deaths and the provoking of major geopolitical powers - such as a certain CCP - and decided it was not worth the candle, and blinked, and kicked the can down the road, and after about three decades of can-kicking, ran out of road.

I can't explicitly speak for the China Hawks (not being one myself), but I believe one of the working assumptions is that AGI will allow the "league of free nations" to disarm China without the messiness of millions of deaths. Probably this is supposed to work like EY's "nanobot swarm that melts all of the GPUs".

I agree that the details are a bit fuzzy, but from an external perspective "we don't publicly discuss capabilities" and "there are no adults in the room" are indistinguishable. OpenAI openly admits the plan is "we'll as the AGI what to do". I suspect NATSEC's position is more like "amateurs discuss tactics, experts discuss logistics" (i.e. securing decisive advantage is more important that planning out exactly how to melt the GPUs)

To believe that the same group that pulled of Stuxnet and this lack the imagination or will to use AGI enabled weapons strikes me as naive, however.

The USA, for example, has always had 'robust military superiority' over many countries it desired to not get nukes, and yet, which did get nukes.

It's also worth nothing AGI is not a zero-to-one event but rather a hyper-exponential curve. Theoretically it may be possible to always stay far-enough-ahead to have decisive advantage (unlike nukes where even a handful is enough to establish MAD).

seth-herd on China Hawks are Manufacturing an AI Arms Race

I find this argument highly compelling. I think it's necessary to actually think through those 100 ways to prevent rivals from gaining AGI if you already have one. And to be realistic about the rate of progress that AGI. We will not immediately have unstoppable nanobots. To be safe, you'd need some way to not only stop the use of Chinese and Russian nukes, but reliably keep them disabled. To prevent massive bloodshed, you'd also probably need to do the same with conventional military assets - and probably without causing massive casualties.

Diplomatic solutions are probably going to be part of any realistic plan to use AGI to prevent rival AGI - but as you say they won't be enough.

Nonproliferation efforts for nukes slowed down proliferation but didn't stop it. AGI is different in that it will fairly quickly allow nearly universal surveillance - if you can stomach deploying it, and if you don't trigger a nuclear exchange by deploying it.

The other possibly-important difference between this scenario and the history of nuclear proliferation is the presence of a smarter-than-human advisor who can say "no really human, if you fail to follow through, these will be the very likely results, and you won't like them".

I also hope that smarter-than-human advisor will say something like "look guys, you can all get vastly wealthier and longer-lived if you can just not freak out and fight each other" - and be so obviously right and convincing that humans will actually listen. The win-win solutions may just be compelling. I fully agree that no amount of sharing will prevent others from pursuing AGI - but generous sharing of technological benefits would reduce the priority of those efforts and the animosity when they're thwarted.

Now is the time to think this through carefully, before the US commits to a race.

cole-wyeth on (The) Lightcone is nothing without its people: LW + Lighthaven's big fundraiser

This post provided far more data than I needed to donate to support a site I use constantly.

donald-hobson on A very strange probability paradox

n tHere is a more intuitive version of the same paradox.

Again, conditional on all dice rolls being even. But this time it's either

A) 1,000,000 consecutive 6's.

B) 999,999 consecutive 6's followed by a (possibly non-consecutive 6).

Suppose you roll a few even numbers, followed by an extremely lucky sequence of 999,999 6's.

From the point of view of version A, the only way to continue the sequence is a single extra 6. If you roll 4, you would need to roll a second sequence of a million 6'. And you are very unlikely to do that in the next 10 million steps. And very unlikely to go for 10 million steps without rolling an odd number.

Yes if this happened, it would add at least a million extra rolls. But the chance of that is exponentially tiny.

Whereas, for B, then it's quite plausible to roll 26 or 46 or 2426 instead of just 6.

Another way to think about this problem is with regular expressions. Let e=even numbers. *=0 or more.

The string "e*6e*6" matches any sequence with at least two 6's and no odd numbers.

The sequence "e*66" matches those two consecutive 6's. And the sequence "66" matches two consecutive 6's with no room for extra even numbers before the first 6. This is the shortest.

Phrased this way it looks obvious. Every time you allow a gap for even numbers to hide in, an even number might be hiding in the gap, and that makes the sequence longer.

When you remove the conditional on the other numbers being even, then the "first" becomes important to making the sequence converge at all.

gwern on China Hawks are Manufacturing an AI Arms Race

Just today, Deepseek claimed to match O1-preview performance--that is a two month delay.

Why is that comparison not to the much better GPT-4 o1 then, or the doubtless better o1 now?

gwern on China Hawks are Manufacturing an AI Arms Race

No, my problem with the hawks, as far as this criticism goes, is that they aren't repeatedly and explicitly saying what they will do. (They also won't do it, whatever 'it' is, even if they say they will; but we haven't even gotten that far yet.) They are continually shying away from cashing out any of their post-AGI plans, likely because they look at the actual strategies that could be executed and realize that execution is in serious doubt and so that undermines their entire paradigm. ("We will be greeted as liberators" and "we don't do nation-building" come to mind.)

Your quoted uses are a case in point of the substitution of rhetoric for substance. 'Robust military superiority' is not a decisive advantage in this sense, and is not 'conquering the world' or executing any of the strategies I mentioned; and in fact, this sort of vague bait-and-switch handwaving rhetoric, which is either wrong or deceptive about what they mean, is much of what I am criticizing: Oh, you have 'robust military superiority'? That's nice. But how does it actually stop Xi from getting AGI? Be concrete. How, exactly, do you go from eg. 'the USA has some cool new bombs and nanotech thanks to running hundreds of thousands of Von Neumann AGI instances' to 'China [and every other rival country] has no AGI program and will not for the foreseeable future'?

The USA, for example, has always had 'robust military superiority' over many countries it desired to not get nukes, and yet, which did get nukes. (If you don't like the early Cold War USSR example, then consider, say, North Korea pre-2006. The USA has always had 'robust military superiority' over the DPRK, and yet, here we are with Kim Jong Un having USA-range ICBMs and nukes. Why? Because the USA has always looked at the cost of using that 'robust military superiority', which would entail the destruction of Seoul and possibly millions of deaths and the provoking of major geopolitical powers - such as a certain CCP - and decided it was not worth the candle, and blinked, and kicked the can down the road, and after about three decades of can-kicking, ran out of road. Because the DPRK made nukes its #1 priority, ahead of lesser priorities like 'not starving to death', and it turns out that it's rather hard to compel a sovereign country - even an extremely impoverished, isolated, weak country suffering from regular famines - to not pursue its #1 priority. It's a lot easier to dissuade it from its #100 priority or something. But from #1? Difficult. Very difficult.)

All this statement means is that 'you lose even if you win': 1. You race to AGI, 'win', you gain 'robust military superiority' which means something like "the USA can conquer China or otherwise force it to credibly terminate all AGI-related activities, if it's willing to start a AGI-powered world war which will kill tens of millions of Chinese and crash the global economy (in the best case scenario)"; 2. Xi launches the national emergency crash AGI program like a 'two bombs, one satellite' program as the top national priority, the USA threatens to use its 'robust military superiority' if that AGI program is not canceled and condescendingly offers table scraps like gimped APIs, Xi says "no ur mom"... and then what? Answer: no world war starts, and the Chinese AGI program finishes on schedule as if that 'robust military superiority' never existed. (A threat both sides know will not be executed is no threat at all.) 3. ??? 4. Profit!

("arms race bros will srsly launch a global arms race by saying they'll use the robust military superiority from winning the arms race to stop rival AGI programs, and then will not stop rival AGI programs")

rohinmshah on Yonatan Cale's Shortform

Regarding the rest of the article - it seems to be mainly about making an agent that is capable at minecraft, which seems like a required first step that I ignored meanwhile (not because it's easy).

Huh. If you think of that as capabilities I don't know what would count as alignment. What's an example of alignment work that aims to build an aligned system (as opposed to e.g. checking whether a system is aligned)?

E.g. it seems like you think RLHF counts as an alignment technique -- this seems like a central approach that you might use in BASALT.

If you hope to check if the agent will be aligned with no minecraft-specific alignment training, then sounds like we're on the same page!

I don't particularly imagine this, because you have to somehow communicate to the AI system what you want it to do, and AI systems don't seem good enough yet to be capable of doing this without some Minecraft specific finetuning. (Though maybe you would count that as Minecraft capabilities? Idk, this boundary seems pretty fuzzy to me.)