LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Spaciousness In Partner Dance: A Naturalism Demo
LoganStrohl (BrienneYudkowsky) · 2023-11-19T07:00:19.555Z · comments (5)

Reactions to the Executive Order
Zvi · 2023-11-01T20:40:02.438Z · comments (4)

Lying Alignment Chart
Zack_M_Davis · 2023-11-29T16:15:28.102Z · comments (17)

[link] Are language models good at making predictions?
dynomight · 2023-11-06T13:10:36.379Z · comments (14)

Anthropic Fall 2023 Debate Progress Update
Ansh Radhakrishnan (anshuman-radhakrishnan-1) · 2023-11-28T05:37:30.070Z · comments (9)

On the UK Summit
Zvi · 2023-11-07T13:10:04.895Z · comments (6)

Dialogue on the Claim: "OpenAI's Firing of Sam Altman (And Shortly-Subsequent Events) On Net Reduced Existential Risk From AGI"
johnswentworth · 2023-11-21T17:39:17.828Z · comments (84)

Testbed evals: evaluating AI safety even when it can’t be directly measured
joshc (joshua-clymer) · 2023-11-15T19:00:41.908Z · comments (2)

Some Rules for an Algebra of Bayes Nets
johnswentworth · 2023-11-16T23:53:11.650Z · comments (30)

[link] A framing for interpretability
Nina Rimsky (NinaR) · 2023-11-14T16:14:15.713Z · comments (5)

Alignment can improve generalisation through more robustly doing what a human wants - CoinRun example
Stuart_Armstrong · 2023-11-21T11:41:34.798Z · comments (9)

AI #39: The Week of OpenAI
Zvi · 2023-11-23T15:10:04.865Z · comments (8)

Intro to Superposition & Sparse Autoencoders (Colab exercises)
CallumMcDougall (TheMcDouglas) · 2023-11-29T12:56:21.608Z · comments (8)

[link] Why not electric trains and excavators?
bhauth · 2023-11-21T00:07:17.967Z · comments (39)

Reinforcement Via Giving People Cookies
Screwtape · 2023-11-15T04:34:21.119Z · comments (9)

AI Safety Research Organization Incubation Program - Expression of Interest
Alexandra Bos (AlexandraB) · 2023-11-21T10:23:14.204Z · comments (6)

[link] So you want to save the world? An account in paladinhood
Tamsin Leake (carado-1) · 2023-11-22T17:40:33.048Z · comments (19)

A to Z of things
KatjaGrace · 2023-11-17T05:20:03.134Z · comments (6)

Announcing New Beginner-friendly Book on AI Safety and Risk
Darren McKee · 2023-11-25T15:57:08.078Z · comments (2)

How to Control an LLM's Behavior (why my P(DOOM) went down)
RogerDearnaley (roger-d-1) · 2023-11-28T19:56:49.679Z · comments (30)

[link] A free to enter, 240 character, open-source iterated prisoner's dilemma tournament
Isaac King (KingSupernova) · 2023-11-09T08:24:43.277Z · comments (19)

Never Drop A Ball
Screwtape · 2023-11-23T04:15:35.834Z · comments (1)

Vote on worthwhile OpenAI topics to discuss
Ben Pace (Benito) · 2023-11-21T00:03:03.898Z · comments (55)

Raemon's Deliberate (“Purposeful?”) Practice Club
Raemon · 2023-11-14T18:24:19.335Z · comments (11)

Black Box Biology
GeneSmith · 2023-11-29T02:27:29.794Z · comments (30)

On OpenAI Dev Day
Zvi · 2023-11-09T16:10:06.646Z · comments (0)

"Epistemic range of motion" and LessWrong moderation
habryka (habryka4) · 2023-11-27T21:58:40.834Z · comments (3)

New paper shows truthfulness & instruction-following don't generalize by default
joshc (joshua-clymer) · 2023-11-19T19:27:30.735Z · comments (0)

[link] Sam Altman, Greg Brockman and others from OpenAI join Microsoft
Ozyrus · 2023-11-20T08:23:00.791Z · comments (15)

Paper out now on creatine and cognitive performance
Fabienne · 2023-11-26T10:58:29.745Z · comments (2)

AI Alignment Research Engineer Accelerator (ARENA): call for applicants
CallumMcDougall (TheMcDouglas) · 2023-11-07T09:43:41.606Z · comments (0)

Genetic fitness is a measure of selection strength, not the selection target
Kaj_Sotala · 2023-11-04T19:02:13.783Z · comments (43)

It's OK to be biased towards humans
dr_s · 2023-11-11T11:59:16.568Z · comments (69)

Thoughts on open source AI
Sam Marks (samuel-marks) · 2023-11-03T15:35:42.067Z · comments (17)

AMA: Earning to Give
jefftk (jkaufman) · 2023-11-07T16:20:10.972Z · comments (8)

[link] Theories of Change for AI Auditing
Lee Sharkey (Lee_Sharkey) · 2023-11-13T19:33:43.928Z · comments (0)

AI #37: Moving Too Fast
Zvi · 2023-11-09T17:50:04.324Z · comments (5)

Game Theory without Argmax [Part 1]
Cleo Nardo (strawberry calm) · 2023-11-11T15:59:47.486Z · comments (16)

[link] Open Phil releases RFPs on LLM Benchmarks and Forecasting
LawrenceC (LawChan) · 2023-11-11T03:01:09.526Z · comments (0)

[link] OpenAI Staff (including Sutskever) Threaten to Quit Unless Board Resigns
Seth Herd · 2023-11-20T14:20:33.539Z · comments (28)

The Assumed Intent Bias
silentbob · 2023-11-05T16:28:03.282Z · comments (13)

The Stochastic Parrot Hypothesis is debatable for the last generation of LLMs
Quentin FEUILLADE--MONTIXI (quentin-feuillade-montixi) · 2023-11-07T16:12:20.031Z · comments (20)

GPT-2030 and Catastrophic Drives: Four Vignettes
jsteinhardt · 2023-11-10T07:30:06.480Z · comments (5)

Altman firing retaliation incoming?
trevor (TrevorWiesinger) · 2023-11-19T00:10:15.645Z · comments (23)

On Overhangs and Technological Change
Roko · 2023-11-05T22:58:51.306Z · comments (19)

They are made of repeating patterns
quetzal_rainbow · 2023-11-13T18:17:43.189Z · comments (4)

Public Weights?
jefftk (jkaufman) · 2023-11-02T02:50:18.095Z · comments (19)

Job listing: Communications Generalist / Project Manager
Gretta Duleba (gretta-duleba) · 2023-11-06T20:21:03.721Z · comments (7)

Tall Tales at Different Scales: Evaluating Scaling Trends For Deception In Language Models
Felix Hofstätter · 2023-11-08T11:37:43.997Z · comments (0)

[question] why did OpenAI employees sign
bhauth · 2023-11-27T05:21:28.612Z · answers+comments (23)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

review-bot on Language Models Model Us

The LessWrong Review [? · GW] runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2025. The top fifty or so posts are featured prominently on the site throughout the year. Will this post make the top fifty?

chipmonk on Transformers Represent Belief State Geometry in their Residual Stream

this post seems like a win for PIBBSS gee

ebenezer-dukakis on robo's Shortform

If LW takes this route, it should be cognizant of the usual challenges of getting involved in politics. I think there's a very good chance of evaporative cooling, where people trying to see AI clearly gradually leave, and are replaced by activists. The current reaction to OpenAI events is already seeming fairly tribal IMO.

greg-d on D&D.Sci (Easy Mode): On The Construction Of Impossible Structures

I’m not a data scientist, but I love these. I’ve got a four-hour flight ahead of me and a copy of Microsoft Excel; maybe now is the right time to give one a try!

!It seems like the combination of materials determines the cost of the structure.

!Architects who apprenticed with Johnson or Stamatin always produce impossible buildings; architects who apprenticed with Geisel, Penrose, or Escher NEVER do. Self-taught architects sometimes produce impossible buildings, and sometimes they do not.

!This lets us select 5 designs from our proposals which will certainly produce impossible buildings. To do better, we need to understand how to tell when a proposal by a self-taught architect is likely to produce an impossible building.

!~44% of designs by self-taught architects are impossible. This more-or-less matches the 2/5 of masters whose apprentices reliably produce impossible buildings. So I hypothesize that self-taught students pick a favorite master at random and crib their style, acting (illegibly) like a typical apprentice thereafter. So now I need to see if there are particular materials, structure types, or blueprint types which are favored by students of any of the known master architects. By choosing designs by self-taught architects which have those properties, maybe I can tease out whose style they're probably using.

!A structure can contain either dreams or nightmares, but not both.

!I'm too smooth-brained to tease out complex correlations on this flight while just using Excel: if there's something weird going on (like, buildings made with either Dreams -or- Glass are likely to be impossible, but if you use both at once they cancel one another out somehow), I don't know how to find it. So I'll just assume everything is independent of everything else and do a Bayes to it.

!We can down-select our variables to match those which appear in the Self-Taught proposals; it does us no good to learn whether the "good" architects make use of Nightmares or not, if none the proposals before us make use of Nightmares.

!Good properties: Towers; buildings of Dreams and / or Glass; Hastily-Sketched blueprints. Bad properties: Mansions, Mechanisms; buildings of wood and / or Steel; Obsessively Detailed blueprints.

!So I choose proposals D, E, G, H, and K (probability 1); and also proposal A (probability ~62%) if we've got room.

!Ok, I just got off the plane and checked the puzzle description. Turns out we only get to choose 4 buildings, and there was no reason to try and tease out what Self-Taught architects are doing. In that case, I need to rank proposals D, E, G, H, and K by likely price.

!Structure price looks vaguely exponential, so I'll take do a linear fit to minimize RMS(log10(error)). If I minimize RMSE directly then it always screws up the low-price structures to get marginally better fits on high-priced ones.

!It really looks like for each structure, you pick two materials; each material contributes a random amount to the price, with every material having its own distribution of price contributions. I can't figure out what dice or whatever are being rolled for each material, but the fit gives me the average contribution for each one.

!So I choose proposals K, E, D, and H, with expected prices 30k, 73k, 78k, and 78k. Proposal G should be impossible too, but it’ll probably cost about 572k.

clone-of-saturn on Some "meta-cruxes" for AI x-risk debates

I would add

Conflict theory vs. comparative advantage

Is it possible for the wrong kind of technological development to make things worse, or does anything that increases aggregate productivity always make everyone better off in the long run?

Cosmopolitanism vs. human protectionism

Is it acceptable, or good, to let humans go extinct if they will be replaced by an entity that's more sophisticated or advanced in some way, or should humans defend humanity simply because we're human?

mo-putera on Fund me please - I Work so Hard that my Feet start Bleeding and I Need to Infiltrate University

Holden advised against this:

Jog, don’t sprint. Skeptics of the “most important century” hypothesis will sometimes say things like “If you really believe this, why are you working normal amounts of hours instead of extreme amounts? Why do you have hobbies (or children, etc.) at all?” And I’ve seen a number of people with an attitude like: “THIS IS THE MOST IMPORTANT TIME IN HISTORY. I NEED TO WORK 24/7 AND FORGET ABOUT EVERYTHING ELSE. NO VACATIONS."
I think that’s a very bad idea.
Trying to reduce risks from advanced AI is, as of today, a frustrating and disorienting thing to be doing. It’s very hard to tell whether you’re being helpful (and as I’ve mentioned, many will inevitably think you’re being harmful).
I think the difference between “not mattering,” “doing some good” and “doing enormous good” comes down to how you choose the job, how good at it you are, and how good your judgment is (including what risks you’re most focused on and how you model them). Going “all in” on a particular objective seems bad on these fronts: it poses risks to open-mindedness, to mental health and to good decision-making (I am speaking from observations here, not just theory).
That is, I think it’s a bad idea to try to be 100% emotionally bought into the full stakes of the most important century - I think the stakes are just too high for that to make sense for any human being.
Instead, I think the best way to handle “the fate of humanity is at stake” is probably to find a nice job and work about as hard as you’d work at another job, rather than trying to make heroic efforts to work extra hard. (I criticized heroic efforts in general here.)
I think this basic formula (working in some job that is a good fit, while having some amount of balance in your life) is what’s behind a lot of the most important positive events in history to date, and presents possibly historically large opportunities today.

Also relevant are the takeaways from Thomas Kwa's effectiveness as a conjunction of multipliers [EA(p) · GW(p)], in particular:

It's more important to have good judgment than to dedicate 100% of your life to an EA project. If output scales linearly with work hours, then you can hit 60% of your maximum possible impact with 60% of your work hours. But if bad judgment causes you to miss one or two multipliers, you could make less than 10% of your maximum impact. (But note that working really hard can sometimes enable multipliers-- see this comment by Mathieu Putz [EA(p) · GW(p)].)
Aiming for the minimum of self-care is dangerous [EA · GW].

jamespayor on Ilya Sutskever and Jan Leike resign from OpenAI [updated]

It may be that talking about "vested equity" is avoiding some lie that would occur if he made the same claim about the PPUs. If he did mean to include the PPUs as "vested equity" presumably he or a spokesperson could clarify, but I somehow doubt they will.

mishka on Hot take: The AI safety movement is way too sectarian and this is greatly increasing p(doom)

if one is after VC funding, one needs to show those VCs that there is some secret sauce which remains proprietary

IMO software/algorithmic moat is pretty impossible to keep.

Indeed.

That is, unless the situation is highly non-stationary (that is, algorithms and methods are modified fast without stopping; of course, a foom would be one such situation, but I can imagine a more pedestrian "rapid fire" evolution of methods which goes at a good clip, but does not accelerate beyond reason).

o-o on Hot take: The AI safety movement is way too sectarian and this is greatly increasing p(doom)

Unfortunately, I don't have enough inside access for that...

Yeah, with you there. I am just speculating based on what I've heard online and through the grapevine, so take my model of their internal workings with a grain of salt. With that said I feel pretty confident in it.

if one is after VC funding, one needs to show those VCs that there is some secret sauce which remains proprietary

IMO software/algorithmic moat is pretty impossible to keep. Researchers tend to be pretty smart, enough to figure it out independently, even if they manage to stop any researcher from leaving and diffusing knowledge. Some parallels:

The India trade done by Jane Street. They ~~are~~ were making billions of dollars contingent on the fact that no one else knows about this trade, but eventually their alpha also got diffused.
TikTok's content algorithm which the Chinese government doesn't want to export only took a couple months for Meta/Google to replicate.

review-bot on A Dozen Ways to Get More Dakka

The LessWrong Review [? · GW] runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2025. The top fifty or so posts are featured prominently on the site throughout the year. Will this post make the top fifty?