LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] OpenAI's CBRN tests seem unclear
LucaRighetti (Error404Dinosaur) · 2024-11-21T17:28:30.290Z · comments (6)

BIG-Bench Canary Contamination in GPT-4
Jozdien · 2024-10-22T15:40:48.166Z · comments (13)

[question] Which things were you surprised to learn are not metaphors?
Eric Neyman (UnexpectedValues) · 2024-11-21T18:56:18.025Z · answers+comments (67)

[link] Miles Brundage resigned from OpenAI, and his AGI readiness team was disbanded
garrison · 2024-10-23T23:40:57.180Z · comments (1)

[link] My Number 1 Epistemology Book Recommendation: Inventing Temperature
adamShimi · 2024-09-08T14:30:40.456Z · comments (18)

A bird's eye view of ARC's research
Jacob_Hilton · 2024-10-23T15:50:06.123Z · comments (12)

Why Don't We Just... Shoggoth+Face+Paraphraser?
Daniel Kokotajlo (daniel-kokotajlo) · 2024-11-19T20:53:52.084Z · comments (43)

Why I funded PIBBSS
Ryan Kidd (ryankidd44) · 2024-09-15T19:56:33.018Z · comments (21)

"The Solomonoff Prior is Malign" is a special case of a simpler argument
David Matolcsi (matolcsid) · 2024-11-17T21:32:34.711Z · comments (41)

Should CA, TX, OK, and LA merge into a giant swing state, just for elections?
Thomas Kwa (thomas-kwa) · 2024-11-06T23:01:48.992Z · comments (35)

DeepSeek beats o1-preview on math, ties on coding; will release weights
Zach Stein-Perlman · 2024-11-20T23:50:26.597Z · comments (23)

Scissors Statements for President?
AnnaSalamon · 2024-11-06T10:38:21.230Z · comments (31)

You are not too "irrational" to know your preferences.
DaystarEld · 2024-11-26T15:01:42.996Z · comments (15)

[link] Announcing turntrout.com, my new digital home
TurnTrout · 2024-11-17T17:42:08.164Z · comments (24)

Backdoors as an analogy for deceptive alignment
Jacob_Hilton · 2024-09-06T15:30:06.172Z · comments (2)

I turned decision theory problems into memes about trolleys
Tapatakt · 2024-10-30T20:13:29.589Z · comments (20)

What happens if you present 500 people with an argument that AI is risky?
KatjaGrace · 2024-09-04T16:40:03.562Z · comments (7)

Passages I Highlighted in The Letters of J.R.R.Tolkien
Ivan Vendrov (ivan-vendrov) · 2024-11-25T01:47:59.071Z · comments (8)

LLMs can learn about themselves by introspection
Felix J Binder (fjb) · 2024-10-18T16:12:51.231Z · comments (38)

Refactoring cryonics as structural brain preservation
Andy_McKenzie · 2024-09-11T18:36:30.285Z · comments (14)

[link] Advice for journalists
Nathan Young · 2024-10-07T16:46:40.929Z · comments (53)

Behavioral red-teaming is unlikely to produce clear, strong evidence that models aren't scheming
Buck · 2024-10-10T13:36:53.810Z · comments (4)

Why comparative advantage does not help horses
Sherrinford · 2024-09-30T22:27:57.450Z · comments (10)

[link] China Hawks are Manufacturing an AI Arms Race
garrison · 2024-11-20T18:17:51.958Z · comments (13)

The case for unlearning that removes information from LLM weights
Fabien Roger (Fabien) · 2024-10-14T14:08:04.775Z · comments (14)

[link] Seven lessons I didn't learn from election day
Eric Neyman (UnexpectedValues) · 2024-11-14T18:39:07.053Z · comments (33)

[question] What are the best arguments for/against AIs being "slightly 'nice'"?
Raemon · 2024-09-24T02:00:19.605Z · answers+comments (56)

[link] Executable philosophy as a failed totalizing meta-worldview
jessicata (jessica.liu.taylor) · 2024-09-04T22:50:18.294Z · comments (40)

[link] Finishing The SB-1047 Documentary In 6 Weeks
Michaël Trazzi (mtrazzi) · 2024-10-28T20:17:47.465Z · comments (5)

2024 Petrov Day Retrospective
Ben Pace (Benito) · 2024-09-28T21:30:14.952Z · comments (25)

[link] Anthropic: Three Sketches of ASL-4 Safety Case Components
Zach Stein-Perlman · 2024-11-06T16:00:06.940Z · comments (33)

Information vs Assurance
johnswentworth · 2024-10-20T23:16:25.762Z · comments (7)

You can, in fact, bamboozle an unaligned AI into sparing your life
David Matolcsi (matolcsid) · 2024-09-29T16:59:43.942Z · comments (171)

[link] Sabotage Evaluations for Frontier Models
David Duvenaud (david-duvenaud) · 2024-10-18T22:33:14.320Z · comments (55)

Science advances one funeral at a time
Cameron Berg (cameron-berg) · 2024-11-01T23:06:19.381Z · comments (9)

Catastrophic sabotage as a major threat model for human-level AI systems
evhub · 2024-10-22T20:57:11.395Z · comments (8)

Bigger Livers?
sarahconstantin · 2024-11-08T21:50:09.814Z · comments (10)

Zvi’s Thoughts on His 2nd Round of SFF
Zvi · 2024-11-20T13:40:08.092Z · comments (2)

LLMs Look Increasingly Like General Reasoners
eggsyntax · 2024-11-08T23:47:28.886Z · comments (45)

Anvil Problems
Screwtape · 2024-11-13T22:57:41.974Z · comments (12)

A very strange probability paradox
notfnofn · 2024-11-22T14:01:36.587Z · comments (21)

Three Notions of "Power"
johnswentworth · 2024-10-30T06:10:08.326Z · comments (43)

[Intuitive self-models] 1. Preliminaries
Steven Byrnes (steve2152) · 2024-09-19T13:45:27.976Z · comments (20)

Singular learning theory: exercises
Zach Furman (zfurman) · 2024-08-30T20:00:03.785Z · comments (5)

Research update: Towards a Law of Iterated Expectations for Heuristic Estimators
Eric Neyman (UnexpectedValues) · 2024-10-07T19:29:29.033Z · comments (2)

[link] Self-Help Corner: Loop Detection
adamShimi · 2024-10-02T08:33:23.487Z · comments (6)

Solving adversarial attacks in computer vision as a baby version of general AI alignment
Stanislav Fort (stanislavfort) · 2024-08-29T17:17:47.136Z · comments (8)

GPT-o1
Zvi · 2024-09-16T13:40:06.236Z · comments (34)

There is a globe in your LLM
jacob_drori (jacobcd52) · 2024-10-08T00:43:40.300Z · comments (4)

5 homegrown EA projects, seeking small donors
Austin Chen (austin-chen) · 2024-10-28T23:24:25.745Z · comments (4)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

leogao on leogao's Shortform

the most valuable part of a social event is often not the part that is ostensibly the most important, but rather the gaps between the main parts.

at ML conferences, the headline keynotes and orals are usually the least useful part to go to; the random spontaneous hallway chats and dinners and afterparties are extremely valuable
when doing an activity with friends, the activity itself is often of secondary importance. talking on the way to the activity, or in the gaps between doing the activity, carry a lot of the value
at work, a lot of the best conversations happen outside of scheduled 1:1s and group meetings, but rather happen in spontaneous hallway or dinner groups

yams on yams's Shortform

Cool! I think we're in agreement at a high level. Thanks for taking the extra time to make sure you were understood.

In more detail, though:

I think I disagree with 1 being all that likely; there are just other things I could see happening that would make a pause or stop politically popular (i.e. warning shots, An Inconvenient Truth AI Edition, etc.), likely not worth getting into here. I also think 'if we pause it will be for stupid reasons' is a very sad take.

I think I disagree with 2 being likely, as well; probably yes, a lot of the bottleneck on development is ~make-work that goes away when you get a drop-in replacement for remote workers, and also yes, AI coding is already an accelerant // effectively doing gradient descent on gradient descent (RLing the RL'd researcher to RL the RL...) is intelligence-explosion fuel. But I think there's a big gap between the capabilities you need for politically worrisome levels of unemployment, and the capabilities you need for an intelligence explosion, principally because >30 percent of human labor in developed nations could be automated with current tech if the economics align a bit (hiring 200+k/year ML engineers to replace your 30k/year call center employee is only just now starting to make sense economically). I think this has been true of current tech since ~GPT-4, and that we haven't seen a concomitant massive acceleration in capabilities on the frontier (things are continuing to move fast, and the proliferation is scary, but it's not an explosion).

I take "depending on how concentrated AI R&D is" to foreshadow that you'd reply to the above with something like: "This is about lab priorities; the labs with the most impressive models are the labs focusing the most on frontier model development, and they're unlikely to set their sights on comprehensive automation of shit jobs when they can instead double-down on frontier models and put some RL in the RL to RL the RL that's been RL'd by the..."

I think that's right about lab priorities. However, I expect the automation wave to mostly come from middle-men, consultancies, what have you, who take all of the leftover ML researchers not eaten up by the labs and go around automating things away individually (yes, maybe the frontier moves too fast for this to be right, because the labs just end up with a drop-in remote worker 'for free' as long as they keep advancing down the tech tree, but I don't quite think this is true, because human jobs are human-shaped, and buyers are going to want pretty rigorous role-specific guarantees from whoever's selling this service, even if they're basically unnecessary, and the one-size-fits-all solution is going to have fewer buyers than the thing marketed as 'bespoke').

In general, I don't like collapsing the various checkpoints between here and superintelligence; there are all these intermediate states, and their exact features matter a lot, and we really don't know what we're going to get. 'By the time we'll have x, we'll certainly have y' is not a form of prediction that anyone has a particularly good track record making.

martin-randall on Unnatural Categories Are Optimized for Deception

I find this hypothetical about neural fireplaces curious, because the ambiguity exists in real fireplaces, speculative fiction is not needed. Please excuse any inaccuracies in this brief history of fireplaces:

Wood-burning fireplaces
Gas-burning fireplaces
Central heating
Electric heaters
Decorative fireplaces (no heat)

The original fireplaces produced both heat and a decorative flame effect. With each new type of invention there was a question of what to do with our previous terms. We've ended up with "heaters" to refer to things that heat a room and "fireplace" to refer to things that have a decorative flame effect. Both of these things are slightly fuzzy natural categories in the sense of this post.

Except... maybe we should say that "decorative" is a privative adjective and so a "decorative fireplace" isn't really a fireplace? For the sake of the thought experiment, let's say that practical rural folk place a higher value on having a secondary heat source because it takes longer to restore electricity after a storm. Meanwhile snobby urbanites place a higher value on decorative flame effects because they value gaining status through conspicuous consumption.

I see that someone could say "well, it's not a real fireplace, is it?" in order to signal that they share the values of practical rural folks. If they're actually a snobby urbanite politician and they don't actually have those practical rural values then they are being deceptive. That would be a deception about values, not about heat sources.

If a practical rural person says "well, it's not a real fireplace, is it?", then that could indeed be a true signal of their values. But my guess is the more restrictive meaning of fireplace came first. The causal diagram is something like:

Practical Rural Values -> Categorize functional fireplaces separately to decorative fireplaces -> Use the short word "fireplace" for functional fireplaces (for communication and signaling)

Not:

Practical Rural Values -> Use the short word "fireplace" for functional fireplaces (for signaling) -> Categorize functional fireplaces separately to decorative fireplaces

Because until practical rural folks have settled on a common meaning of "fireplace", they can't reliably use that meaning to signal their values to each other or to outsiders.

Except... maybe if it got caught up in the modern culture war there could be a flood of fireplace-related memes and then everyone would have very strong opinions about the best definition of "fireplace" a few months later for no real reason? Wow, that sure would suck for the CEO of Decorative Fireplaces Inc.

daniel-kokotajlo on Daniel Kokotajlo's Shortform

I'm no musician, but music-generating AIs are already way better than I could ever be. It took me about an hour of prompting to get Suno to make this: https://suno.com/playlist/34e6de43-774e-44fe-afc6-02f9defa7e22

It's not perfect (especially: I can't figure out how to get it to create a song of the correct length, so I had to cut and paste snippets from two songs into a playlist, and that creates audible glitches/issues at the beginning, middle, and end) but overall I'm full of wonder and appreciation.

daniel-kokotajlo on Dave Kasten's AGI-by-2027 vignette

Interesting! You should definitely think more about this and write it up sometime, either you'll change your mind about timelines till superintelligence or you'll have found an interesting novel argument that may change other people's minds (such as mine).

lucid_levi_ackerman on Which things were you surprised to learn are metaphors?

The Rumbling.

kvmanthinking on You are not too "irrational" to know your preferences.

The above statement could be applied to a LOT of other posts too, not just this one.

sharmake-farah on yams's Shortform

I think 1 and 2 are actually pretty likely, but 3 and 4 is where I'm a lot less confident in actually happening.

A big reason for this is that I suspect one of the reasons people aren't reacting to AI progress is they assume it won't take their job, so it will likely require massive job losses for humans to make a lot of people care about AI, and depending on how concentrated AI R&D is, there's a real possibility that AI has fully automated AI R&D before massive job losses begin in a way that matters to regular people.

gwern on Eli's shortform feed

Just ask a LLM. The author can always edit it, after all.

My suggestion for how such a feature could be done would be to copy the comment into a draft post, add LLM-suggested title (and tags?), and alert the author for an opt-in, who may delete or post it.

If it is sufficiently well received and people approve a lot of them, then one can explore optout auto-posting mechanisms, like "wait a month and if the author has still neither explicitly posted it nor deleted the draft proposal, then auto-post it".

jkaufman on Secular Solstice Songbook Update

Done; thanks!