LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Simulacra Levels Summary
Zvi · 2023-01-30T13:40:00.774Z · comments (12)

Compounding Resource X
Raemon · 2023-01-11T03:14:08.565Z · comments (5)

[link] Opportunity Cost Blackmail
adamShimi · 2023-01-02T13:48:51.811Z · comments (11)

A general comment on discussions of genetic group differences
anonymous8101 (petrov-amethyst) · 2023-01-14T02:11:51.890Z · comments (45)

Some of my disagreements with List of Lethalities
TurnTrout · 2023-01-24T00:25:28.075Z · comments (7)

Infohazards vs Fork Hazards
jimrandomh · 2023-01-05T09:45:28.065Z · comments (16)

AGI safety field building projects I’d like to see
Severin T. Seehrich (sts) · 2023-01-19T22:40:37.284Z · comments (27)

[link] Investing for a World Transformed by AI
PeterMcCluskey · 2023-01-01T02:47:06.004Z · comments (19)

How we could stumble into AI catastrophe
HoldenKarnofsky · 2023-01-13T16:20:05.745Z · comments (18)

Simulacra are Things
janus · 2023-01-08T23:03:26.052Z · comments (7)

[link] Tracr: Compiled Transformers as a Laboratory for Interpretability | DeepMind
DragonGod · 2023-01-13T16:53:10.279Z · comments (12)

Announcing aisafety.training
JJ Hepburn (jj-hepburn) · 2023-01-21T01:01:40.580Z · comments (4)

[link] Spooky action at a distance in the loss landscape
Jesse Hoogland (jhoogland) · 2023-01-28T00:22:46.506Z · comments (4)

LW Filter Tags (Rationality/World Modeling now promoted in Latest Posts)
Ruby · 2023-01-28T22:14:32.371Z · comments (4)

Escape Velocity from Bullshit Jobs
Zvi · 2023-01-10T14:30:00.828Z · comments (18)

Movie Review: Megan
Zvi · 2023-01-23T12:50:00.873Z · comments (19)

Assigning Praise and Blame: Decoupling Epistemology and Decision Theory
adamShimi · 2023-01-27T18:16:43.025Z · comments (5)

Inverse Scaling Prize: Second Round Winners
Ian McKenzie (naimenz) · 2023-01-24T20:12:48.474Z · comments (17)

[link] [Linkpost] TIME article: DeepMind’s CEO Helped Take AI Mainstream. Now He’s Urging Caution
Akash (akash-wasil) · 2023-01-21T16:51:09.586Z · comments (2)

My first year in AI alignment
Alex_Altair · 2023-01-02T01:28:03.470Z · comments (10)

[link] Conversational canyons
Henrik Karlsson (henrik-karlsson) · 2023-01-04T18:55:04.386Z · comments (4)

[link] Evidence under Adversarial Conditions
PeterMcCluskey · 2023-01-09T16:21:07.890Z · comments (1)

Consider paying for literature or book reviews using bounties and dominant assurance contracts
Arjun Panickssery (arjun-panickssery) · 2023-01-15T03:56:07.110Z · comments (7)

My Advice for Incoming SERI MATS Scholars
Johannes C. Mayer (johannes-c-mayer) · 2023-01-03T19:25:38.678Z · comments (1)

[link] Announcing Cavendish Labs
derikk · 2023-01-19T20:15:09.035Z · comments (5)

Linear Algebra Done Right, Axler
David Udell · 2023-01-02T22:54:58.724Z · comments (6)

Dangers of deference
TsviBT · 2023-01-08T14:36:33.454Z · comments (5)

Gradient Filtering
Jozdien · 2023-01-18T20:09:20.869Z · comments (16)

Consequentialists: One-Way Pattern Traps
David Udell · 2023-01-16T20:48:56.967Z · comments (3)

What’s going on with ‘crunch time’?
rosehadshar · 2023-01-20T09:42:53.215Z · comments (6)

[link] formal alignment: what it is, and some proposals
Tamsin Leake (carado-1) · 2023-01-29T11:32:33.239Z · comments (3)

[link] Why you should learn sign language
Noah Topper (noah-topper) · 2023-01-18T17:03:24.090Z · comments (23)

[link] Paper: Superposition, Memorization, and Double Descent (Anthropic)
LawrenceC (LawChan) · 2023-01-05T17:54:37.575Z · comments (11)

Reflections on Deception & Generality in Scalable Oversight (Another OpenAI Alignment Review)
Shoshannah Tekofsky (DarkSym) · 2023-01-28T05:26:49.866Z · comments (7)

Thoughts on hardware / compute requirements for AGI
Steven Byrnes (steve2152) · 2023-01-24T14:03:39.190Z · comments (30)

How Likely is Losing a Google Account?
jefftk (jkaufman) · 2023-01-30T00:20:01.584Z · comments (11)

Critique of some recent philosophy of LLMs’ minds
Roman Leventov · 2023-01-20T12:53:38.477Z · comments (8)

Contra Common Knowledge
abramdemski · 2023-01-04T22:50:38.493Z · comments (31)

[question] Would it be good or bad for the US military to get involved in AI risk?
Grant Demaree (grant-demaree) · 2023-01-01T19:02:30.892Z · answers+comments (12)

11 heuristics for choosing (alignment) research projects
Akash (akash-wasil) · 2023-01-27T00:36:08.742Z · comments (5)

[Simulators seminar sequence] #1 Background & shared assumptions
Jan (jan-2) · 2023-01-02T23:48:50.298Z · comments (4)

Trying to isolate objectives: approaches toward high-level interpretability
Jozdien · 2023-01-09T18:33:18.682Z · comments (14)

Language models can generate superior text compared to their input
ChristianKl · 2023-01-17T10:57:10.260Z · comments (28)

[RFC] Possible ways to expand on "Discovering Latent Knowledge in Language Models Without Supervision".
gekaklam · 2023-01-25T19:03:16.218Z · comments (6)

[link] NYT: Google will “recalibrate” the risk of releasing AI due to competition with OpenAI
Michael Huang · 2023-01-22T08:38:46.886Z · comments (2)

Citability of Lesswrong and the Alignment Forum
Leon Lang (leon-lang) · 2023-01-08T22:12:02.046Z · comments (2)

How-to Transformer Mechanistic Interpretability—in 50 lines of code or less!
StefanHex (Stefan42) · 2023-01-24T18:45:01.003Z · comments (5)

VIRTUA: a novel about AI alignment
Karl von Wendt · 2023-01-12T09:37:21.528Z · comments (12)

[Crosspost] ACX 2022 Prediction Contest Results
Scott Alexander (Yvain) · 2023-01-24T06:56:33.101Z · comments (6)

How to eat potato chips while typing
KatjaGrace · 2023-01-03T11:50:05.816Z · comments (12)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

alenglander on Some "meta-cruxes" for AI x-risk debates

I agree that the first can be framed as a meta-crux, but actually I think the way you framed it is more of an object-level forecasting question, or perhaps a strong prior on the forecasted effects of technological progress. If on the other hand you framed it more as conflict theory vs. mistake theory [? · GW], then I'd say that's more on the meta level.

For the second, I agree that's for some people, but I'm skeptical of how prevalent the cosmopolitan view is, which is why I didn't include it in the post.

cody-rushing on Stephen Fowler's Shortform

Less important, but the grant justification appears to take seriously the idea that making AGI open source is compatible with safety. I might be missing some key insight, but it seems trivially obvious why this is a terrible idea even if you're only concerned with human misuse and not misalignment.

Hmmm, can you point to where you think the grant shows this? I think the following paragraph from the grant seems to indicate otherwise:

When OpenAI launched, it characterized the nature of the risks – and the most appropriate strategies for reducing them – in a way that we disagreed with. In particular, it emphasized the importance of distributing AI broadly;¹ our current view is that this may turn out to be a promising strategy for reducing potential risks, but that the opposite may also turn out to be true (for example, if it ends up being important for institutions to keep some major breakthroughs secure to prevent misuse and/or to prevent accidents). Since then, OpenAI has put out more recent content consistent with the latter view,² and we are no longer aware of any clear disagreements. However, it does seem that our starting assumptions and biases on this topic are likely to be different from those of OpenAI’s leadership, and we won’t be surprised if there are disagreements in the future.

chipmonk on Some Things That Increase Blood Flow to the Brain

Update: I resolved maybe all of my neck tension and vagus nerve tension. I don't know how to tell whether this increased by intelligence though. It's also not like I had headaches or anything obvious like that before

review-bot on Language Models Model Us

The LessWrong Review [? · GW] runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2025. The top fifty or so posts are featured prominently on the site throughout the year. Will this post make the top fifty?

chipmonk on Transformers Represent Belief State Geometry in their Residual Stream

this post seems like a win for PIBBSS gee

ebenezer-dukakis on robo's Shortform

If LW takes this route, it should be cognizant of the usual challenges of getting involved in politics. I think there's a very good chance of evaporative cooling, where people trying to see AI clearly gradually leave, and are replaced by activists. The current reaction to OpenAI events is already seeming fairly tribal IMO.

greg-d on D&D.Sci (Easy Mode): On The Construction Of Impossible Structures

I’m not a data scientist, but I love these. I’ve got a four-hour flight ahead of me and a copy of Microsoft Excel; maybe now is the right time to give one a try!

!It seems like the combination of materials determines the cost of the structure.

!Architects who apprenticed with Johnson or Stamatin always produce impossible buildings; architects who apprenticed with Geisel, Penrose, or Escher NEVER do. Self-taught architects sometimes produce impossible buildings, and sometimes they do not.

!This lets us select 5 designs from our proposals which will certainly produce impossible buildings. To do better, we need to understand how to tell when a proposal by a self-taught architect is likely to produce an impossible building.

!~44% of designs by self-taught architects are impossible. This more-or-less matches the 2/5 of masters whose apprentices reliably produce impossible buildings. So I hypothesize that self-taught students pick a favorite master at random and crib their style, acting (illegibly) like a typical apprentice thereafter. So now I need to see if there are particular materials, structure types, or blueprint types which are favored by students of any of the known master architects. By choosing designs by self-taught architects which have those properties, maybe I can tease out whose style they're probably using.

!A structure can contain either dreams or nightmares, but not both.

!I'm too smooth-brained to tease out complex correlations on this flight while just using Excel: if there's something weird going on (like, buildings made with either Dreams -or- Glass are likely to be impossible, but if you use both at once they cancel one another out somehow), I don't know how to find it. So I'll just assume everything is independent of everything else and do a Bayes to it.

!We can down-select our variables to match those which appear in the Self-Taught proposals; it does us no good to learn whether the "good" architects make use of Nightmares or not, if none the proposals before us make use of Nightmares.

!Good properties: Towers; buildings of Dreams and / or Glass; Hastily-Sketched blueprints. Bad properties: Mansions, Mechanisms; buildings of wood and / or Steel; Obsessively Detailed blueprints.

!So I choose proposals D, E, G, H, and K (probability 1); and also proposal A (probability ~62%) if we've got room.

!Ok, I just got off the plane and checked the puzzle description. Turns out we only get to choose 4 buildings, and there was no reason to try and tease out what Self-Taught architects are doing. In that case, I need to rank proposals D, E, G, H, and K by likely price.

!Structure price looks vaguely exponential, so I'll take do a linear fit to minimize RMS(log10(error)). If I minimize RMSE directly then it always screws up the low-price structures to get marginally better fits on high-priced ones.

!It really looks like for each structure, you pick two materials; each material contributes a random amount to the price, with every material having its own distribution of price contributions. I can't figure out what dice or whatever are being rolled for each material, but the fit gives me the average contribution for each one.

!So I choose proposals K, E, D, and H, with expected prices 30k, 73k, 78k, and 78k. Proposal G should be impossible too, but it’ll probably cost about 572k.

clone-of-saturn on Some "meta-cruxes" for AI x-risk debates

I would add

Conflict theory vs. comparative advantage

Is it possible for the wrong kind of technological development to make things worse, or does anything that increases aggregate productivity always make everyone better off in the long run?

Cosmopolitanism vs. human protectionism

Is it acceptable, or good, to let humans go extinct if they will be replaced by an entity that's more sophisticated or advanced in some way, or should humans defend humanity simply because we're human?

mo-putera on Fund me please - I Work so Hard that my Feet start Bleeding and I Need to Infiltrate University

Holden advised against this:

Jog, don’t sprint. Skeptics of the “most important century” hypothesis will sometimes say things like “If you really believe this, why are you working normal amounts of hours instead of extreme amounts? Why do you have hobbies (or children, etc.) at all?” And I’ve seen a number of people with an attitude like: “THIS IS THE MOST IMPORTANT TIME IN HISTORY. I NEED TO WORK 24/7 AND FORGET ABOUT EVERYTHING ELSE. NO VACATIONS."
I think that’s a very bad idea.
Trying to reduce risks from advanced AI is, as of today, a frustrating and disorienting thing to be doing. It’s very hard to tell whether you’re being helpful (and as I’ve mentioned, many will inevitably think you’re being harmful).
I think the difference between “not mattering,” “doing some good” and “doing enormous good” comes down to how you choose the job, how good at it you are, and how good your judgment is (including what risks you’re most focused on and how you model them). Going “all in” on a particular objective seems bad on these fronts: it poses risks to open-mindedness, to mental health and to good decision-making (I am speaking from observations here, not just theory).
That is, I think it’s a bad idea to try to be 100% emotionally bought into the full stakes of the most important century - I think the stakes are just too high for that to make sense for any human being.
Instead, I think the best way to handle “the fate of humanity is at stake” is probably to find a nice job and work about as hard as you’d work at another job, rather than trying to make heroic efforts to work extra hard. (I criticized heroic efforts in general here.)
I think this basic formula (working in some job that is a good fit, while having some amount of balance in your life) is what’s behind a lot of the most important positive events in history to date, and presents possibly historically large opportunities today.

Also relevant are the takeaways from Thomas Kwa's effectiveness as a conjunction of multipliers [EA(p) · GW(p)], in particular:

It's more important to have good judgment than to dedicate 100% of your life to an EA project. If output scales linearly with work hours, then you can hit 60% of your maximum possible impact with 60% of your work hours. But if bad judgment causes you to miss one or two multipliers, you could make less than 10% of your maximum impact. (But note that working really hard can sometimes enable multipliers-- see this comment by Mathieu Putz [EA(p) · GW(p)].)
Aiming for the minimum of self-care is dangerous [EA · GW].

jamespayor on Ilya Sutskever and Jan Leike resign from OpenAI [updated]

It may be that talking about "vested equity" is avoiding some lie that would occur if he made the same claim about the PPUs. If he did mean to include the PPUs as "vested equity" presumably he or a spokesperson could clarify, but I somehow doubt they will.