LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Humans aren't fleeb.
Charlie Steiner · 2024-01-24T05:31:46.929Z · comments (5)

How predictive processing solved my wrist pain
max_shen (makoshen) · 2024-07-04T01:56:20.162Z · comments (8)

Representation Tuning
Christopher Ackerman (christopher-ackerman) · 2024-06-27T17:44:33.338Z · comments (9)

[link] Show LW: Get a phone call if prediction markets predict nuclear war
Lorenzo (lorenzo-buonanno) · 2023-09-17T22:25:21.206Z · comments (8)

Secondary Risk Markets
Vaniver · 2023-12-11T21:52:46.836Z · comments (4)

Book Review: On the Edge: The Gamblers
Zvi · 2024-09-24T11:50:06.065Z · comments (1)

Mech Interp Challenge: September - Deciphering the Addition Model
CallumMcDougall (TheMcDouglas) · 2023-09-13T22:23:28.222Z · comments (0)

[link] Hyperreals in a Nutshell
Yudhister Kumar (randomwalks) · 2023-10-15T14:23:58.027Z · comments (27)

[link] AISN #25: White House Executive Order on AI, UK AI Safety Summit, and Progress on Voluntary Evaluations of AI Risks
aogara (Aidan O'Gara) · 2023-10-31T19:34:54.837Z · comments (1)

Doomsday Argument and the False Dilemma of Anthropic Reasoning
Ape in the coat · 2024-07-05T05:38:39.428Z · comments (55)

My First Post
Jaivardhan Nawani (jaivardhan-nawani) · 2023-09-06T17:42:59.469Z · comments (9)

Categories of leadership on technical teams
benkuhn · 2024-07-22T04:50:04.071Z · comments (0)

Monthly Roundup #10: September 2023
Zvi · 2023-09-06T13:20:06.560Z · comments (4)

List of strategies for mitigating deceptive alignment
joshc (joshua-clymer) · 2023-12-02T05:56:50.867Z · comments (2)

[link] Twitter thread on politics of AI safety
Richard_Ngo (ricraz) · 2024-07-31T00:00:34.298Z · comments (2)

[link] My Model of Epistemology
adamShimi · 2024-08-31T17:01:45.472Z · comments (0)

Agency in Politics
Martin Sustrik (sustrik) · 2024-07-17T05:30:01.873Z · comments (2)

[question] What is an "anti-Occamian prior"?
Zane · 2023-10-23T02:26:10.851Z · answers+comments (22)

What Helped Me - Kale, Blood, CPAP, X-tiamine, Methylphenidate
Johannes C. Mayer (johannes-c-mayer) · 2024-01-03T13:22:11.700Z · comments (12)

'Theories of Values' and 'Theories of Agents': confusions, musings and desiderata
Mateusz Bagiński (mateusz-baginski) · 2023-11-15T16:00:48.926Z · comments (8)

LASR Labs Spring 2025 applications are open!
Erin Robertson · 2024-10-04T13:44:20.524Z · comments (0)

D&D Sci Coliseum: Arena of Data
aphyer · 2024-10-18T22:02:54.305Z · comments (10)

Eye contact is effortless when you’re no longer emotionally blocked on it
Chipmonk · 2024-09-27T21:47:01.970Z · comments (24)

Dangers of Closed-Loop AI
Gordon Seidoh Worley (gworley) · 2024-03-22T23:52:22.010Z · comments (7)

[link] My article in The Nation — California’s AI Safety Bill Is a Mask-Off Moment for the Industry
garrison · 2024-08-15T19:25:59.592Z · comments (0)

[link] On Fables and Nuanced Charts
Niko_McCarty (niko-2) · 2024-09-08T17:09:07.503Z · comments (2)

[link] OpenAI appoints Retired U.S. Army General Paul M. Nakasone to Board of Directors
Joel Burget (joel-burget) · 2024-06-13T21:28:18.110Z · comments (10)

[link] List of Collective Intelligence Projects
Chipmonk · 2024-07-02T14:10:41.789Z · comments (9)

Open Problems in AIXI Agent Foundations
Cole Wyeth (Amyr) · 2024-09-12T15:38:59.007Z · comments (2)

Economics Roundup #2
Zvi · 2024-07-02T12:40:05.908Z · comments (5)

My Detailed Notes & Commentary from Secular Solstice
Jeffrey Heninger (jeffrey-heninger) · 2024-03-23T18:48:51.894Z · comments (16)

Open Thread – Winter 2023/2024
habryka (habryka4) · 2023-12-04T22:59:49.957Z · comments (160)

Forecasting AI (Overview)
jsteinhardt · 2023-11-16T19:00:04.218Z · comments (0)

How I select alignment research projects
Ethan Perez (ethan-perez) · 2024-04-10T04:33:08.092Z · comments (4)

[Valence series] 4. Valence & Social Status (deprecated)
Steven Byrnes (steve2152) · 2023-12-15T14:24:41.040Z · comments (19)

Index of rationalist groups in the Bay Area July 2024
Lucie Philippon (lucie-philippon) · 2024-07-26T16:32:25.337Z · comments (10)

Monthly Roundup #22: September 2024
Zvi · 2024-09-17T12:20:08.297Z · comments (10)

Proposal for improving the global online discourse through personalised comment ordering on all websites
Roman Leventov · 2023-12-06T18:51:37.645Z · comments (21)

Open consultancy: Letting untrusted AIs choose what answer to argue for
Fabien Roger (Fabien) · 2024-03-12T20:38:03.785Z · comments (5)

ARENA 2.0 - Impact Report
CallumMcDougall (TheMcDouglas) · 2023-09-26T17:13:19.952Z · comments (5)

Optimisation Measures: Desiderata, Impossibility, Proposals
mattmacdermott · 2023-08-07T15:52:17.624Z · comments (9)

Predictive model agents are sort of corrigible
Raymond D · 2024-01-05T14:05:03.037Z · comments (6)

A sketch of acausal trade in practice
Richard_Ngo (ricraz) · 2024-02-04T00:32:54.622Z · comments (4)

An explanation for every token: using an LLM to sample another LLM
Max H (Maxc) · 2023-10-11T00:53:55.249Z · comments (5)

Monthly Roundup #12: November 2023
Zvi · 2023-11-14T15:20:06.926Z · comments (5)

[link] Robin Hanson & Liron Shapira Debate AI X-Risk
Liron · 2024-07-08T21:45:40.609Z · comments (4)

[link] legged robot scaling laws
bhauth · 2024-01-20T05:45:56.632Z · comments (8)

Intransitive Trust
Screwtape · 2024-05-27T16:55:29.294Z · comments (15)

The murderous shortcut: a toy model of instrumental convergence
Thomas Kwa (thomas-kwa) · 2024-10-02T06:48:06.787Z · comments (0)

Difficulty classes for alignment properties
Jozdien · 2024-02-20T09:08:24.783Z · comments (5)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

evolutionbydesign on Advice on Communicating Concisely

Actually, both.

I started a AI club at my high school last year, and I've been (slowly) trying to teach other students the basics of deep learning. They generally come out of my 15-to-20 minute-long explanations confused, rather than understanding.
This too (I don't have a specific example in mind - I'll see if any pop up during school tomorrow)

I normally think what I'm saying is clear, but the result is that others don't understand what I mean when I finish saying it - which causes me to tack on hasty clarifications of my intentions / ideas.

cubefox on Alexander Gietelink Oldenziel's Shortform

Yeah. I think the technical term for that would be cringe.

gwern on Alexander Gietelink Oldenziel's Shortform

Sunglasses can be too cool for most people to be able to wear in the absence of a good reason. Tom Cruise can go around wearing sun glasses any time he wants, and it'll look cool on him, because he's Tom Cruise. If we tried that, we would look like dorks because we're not cool enough to pull it off [LW · GW] and it would backfire on us. (Maybe our mothers would think we looked cool.) This could be said of many things: Tom Cruise or Kanye West or fashionable celebrities like them can go around wearing a fedora and trench coat and it'll look cool and he'll pull it off; but if anyone else tries it...

jackson-wagner on A Rocket–Interpretability Analogy

Satellites were also plausibly a very important military technology. Since the 1960s, some applications have panned out, while others haven't. Some of the things that have worked out:

GPS satellites were designed by the air force in the 1980s for guiding precision weapons like JDAMs, and only later incidentally became integral to the world economy. They still do a great job guiding JDAMs, powering the style of "precision warfare" that has given the USA a decisive military advantage ever since 1991's first Iraq war.
Spy satellites were very important for gathering information on enemy superpowers, tracking army movements and etc. They were especially good for helping both nations feel more confident that their counterpart was complying with arms agreements about the number of missile silos, etc. The Cuban Missile Crisis was kicked off by U-2 spy-plane flights photographing partially-assembled missiles in Cuba. For a while, planes and satellites were both in contention as the most useful spy-photography tool, but eventually even the U-2's successor, the incredible SR-71 blackbird, lost out to the greater utility of spy satellites.
Systems for instantly detecting the characteristic gamma-ray flashes of nuclear detonations that go off anywhere in the world (I think such systems are included on GPS satellites), and giving early warning by tracking ballistic missile launches during their boost phase (the Soviet version of this system famously misfired and almost caused a nuclear war in 1983, which was fortunately forestalled by one Lieutenant colonel Stanislav Petrov) are obviously a critical part of nuclear detterence / nuclear war-fighting.

Some of the stuff that hasn't:

The air force initially had dreams of sending soldiers into orbit, maybe even operating a military base on the moon, but could never figure out a good use for this. The Soviets even test-fired a machine-gun built into one of their Salyut space stations: "Due to the potential shaking of the station, in-orbit tests of the weapon with cosmonauts in the station were ruled out.The gun was fixed to the station in such a way that the only way to aim would have been to change the orientation of the entire station. Following the last crewed mission to the station, the gun was commanded by the ground to be fired; some sources say it was fired to depletion".
Despite some effort in the 1980s, were were unable to figure out how to make "Star Wars" missile defence systems work anywhere near well enough to defend us against a full-scale nuclear attack.
Fortunately we've never found out if in-orbit nuclear weapons, including fractional orbit bombardment weapons, are any use, because they were banned by the Outer Space Treaty. But nowadays maybe Russia is developing a modern space-based nuclear weapon as a tool to destroy satellites in low-earth orbit.

Overall, lots of NASA activities that developed satellite / spacecraft technology seem like they had a dual-use effect advancing various military capabilities. So it wasn't just the missiles. Of course, in retrospect, the entire human-spaceflight component of the Apollo program (spacesuits, life support systems, etc) turned out to be pretty useless from a military perspective. But even that wouldn't have been clear at the time!

ryan_greenblatt on A Rocket–Interpretability Analogy

(Huh, good to know this changed. I wasn't aware of this.)

mike-s-stuffnstuff on A brief theory of why we think things are good or bad

A lot of it comes down to timescales and sequences of events, long term vs short term.

"I will incur a little suffering today, but my well-being will be much better tomorrow".

"We need to get rid of the <undesired national or social group> at cost of suffering, but our nation will have bright future as a result".

People can be tricked into doing unspeakable evil to themselves and others if they have incorrect predictions of the future well-being.

seth-herd on A Rocket–Interpretability Analogy

I very much agree that the focus on interpretability is like searching under the light. It's legible; it's a way to show that you've done something nontrivial - you did some real work on alignment. And it's generally agreed that it's progress toward alignment.

When people talk about prosaic alignment proposals [LW · GW], there’s a common pattern: they’ll be outlining some overcomplicated scheme, and then they’ll say “oh, and assume we have great interpretability tools, this whole thing just works way better the better the interpretability tools are”, and then they’ll go back to the overcomplicated scheme. (Credit to Evan [LW · GW] for pointing out this pattern to me.)

- Wentworth [LW · GW]

But it's not a way to solve alignment in itself. The idea that we'll just understand and track all of the thoughts of a superintelligent AGI is just a strange idea. I really wonder how seriously people are thinking about the impact model of that work.

And they don't need to, because it's pretty obvious that better interp is incremental progress for a lot of AGI scenarios.

This is the incentive that makes progress in academia incredibly slow: there are incentives to do legibly impressive work. There are suprisingly few incentives to actually make progress on useful theories - because it's harder to tell what would count as progress.

But if we're all working on stuff with only small marginal payoffs, who's working on actually getting beyond "overcomplicated schemes" and actually creating and working through practical, workable alignment plans?

I really wish some of the folks working on interp would devote a bit more of their time to "solving the whole problem". It looks to me like we have a really dramatic misallocation of resources happening. We are searching under the light. We need more of us feeling around in the dark where we lost those keys.

david-johnston on A brief theory of why we think things are good or bad

I can explain why I believe bachelors are unmarried: I learned that this is what the word bachelor means, I learned this because it is what bachelor means, and the fact that there's a word "bachelor" that means "unmarried man" is contingent on some unimportant accidents in the evolution of language. A) it is certainly not the result of an axiomatic game and B) if moral beliefs were also contingent on accidents in the evolution of language (I think most are not), that would have profound implications for metaethics.

Motivated belief can explain non-purely-selfish beliefs. I might believe pain is bad because I am motivated to believe it, but the belief still concerns other people. This is even more true when we go about constructing higher order beliefs and trying to enforce consistency among beliefs. Undesirable moral beliefs could be a mark against this theory, but you need more than not-purely-selfish moral beliefs.

I'm going to bow out at this point because I think we're getting stuck covering the same ground.

raymond-d on Automation collapse

Could you expand on what you mean by 'less automation'? I'm taking it to mean some combination of 'bounding the space of controller actions more', 'automating fewer levels of optimisation', 'more of the work done by humans' and maybe 'only automating easier tasks' but I can't quite tell which of these you're intending or how they fit together.

(Also, am I correctly reading an implicit assumption here that any attempts to do automated research would be classed as 'automated ai safety'?)

jbash on The Mask Comes Off: At What Price?

OpenAI charges headfirst to AGI, and succeeds in building it safely. [...] The world transforms, and OpenAI goes from previously unprofitable due to reinvestment to an immensely profitable company.

You need a case where OpenAI successfully builds safe AGI, which may even go on to build safe ASI, and the world gets transformed... but OpenAI's profit stream is nonexistent, effectively valueless, or captures a much smaller fraction than you'd think of whatever AGI or ASI produces.

Business profits (or businesses) might not be a thing at all in a sufficiently transformed world, and it's definitely not clear that preserving them is part of being safe.

In fact, a radical change in allocative institutions like ownership is probably the best case, because it makes no sense in the long term to allocate a huge share of the world's resources and production to people who happened to own some stock when Things Changed(TM). In a transformed-except-corporate-ownership-stays-the-same world, I don't see any reason such lottery winners' portion wouldn't increase asymptotically toward 100 percent, with nobody else getting anything at all.

Radical change is also a likely case^[1]. If an economy gets completely restructured in a really fundamental way, it's strange if the allocation system doesn't also change. That's never happened before.

Even without an overtly revolutionary restructuring, I kind of doubt "OpenAI owns everything" would fly. Maybe corporate ownership would stay exactly the same, but there'd be a 99.999995 percent tax rate.

Contingent on the perhaps unlikely safe and transformative parts coming to pass. ↩︎