LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Counting arguments provide no evidence for AI doom
Nora Belrose (nora-belrose) · 2024-02-27T23:03:49.296Z · comments (188)

[link] Ilya Sutskever created a new AGI startup
harfe · 2024-06-19T17:17:17.366Z · comments (35)

[link] Explaining Impact Markets
Saul Munn (saul-munn) · 2024-01-31T09:51:27.587Z · comments (2)

[link] MIRI's April 2024 Newsletter
Harlan · 2024-04-12T23:38:20.781Z · comments (0)

[question] How to get nerds fascinated about mysterious chronic illness research?
riceissa · 2024-05-27T22:58:29.707Z · answers+comments (50)

[link] Ideological Bayesians
Kevin Dorst · 2024-02-25T14:17:25.070Z · comments (4)

Access to powerful AI might make computer security radically easier
Buck · 2024-06-08T06:00:19.310Z · comments (14)

[link] Compact Proofs of Model Performance via Mechanistic Interpretability
LawrenceC (LawChan) · 2024-06-24T19:27:21.214Z · comments (3)

I am the Golden Gate Bridge
Zvi · 2024-05-27T14:40:03.216Z · comments (6)

The case for unlearning that removes information from LLM weights
Fabien Roger (Fabien) · 2024-10-14T14:08:04.775Z · comments (14)

Kids or No kids
Kids or no kids (grosseholz.f@gmail.com) · 2023-11-14T18:37:02.799Z · comments (10)

[link] Almost everyone I’ve met would be well-served thinking more about what to focus on
Henrik Karlsson (henrik-karlsson) · 2024-01-05T21:01:27.861Z · comments (8)

On Claude 3.5 Sonnet
Zvi · 2024-06-24T12:00:05.719Z · comments (14)

[link] the Giga Press was a mistake
bhauth · 2024-08-21T04:51:24.150Z · comments (26)

[link] I found >800 orthogonal "write code" steering vectors
Jacob G-W (g-w1) · 2024-07-15T19:06:17.636Z · comments (19)

Towards a Less Bullshit Model of Semantics
johnswentworth · 2024-06-17T15:51:06.060Z · comments (44)

[link] RAND report finds no effect of current LLMs on viability of bioterrorism attacks
StellaAthena · 2024-01-25T19:17:30.493Z · comments (14)

[link] Things You’re Allowed to Do: University Edition
Saul Munn (saul-munn) · 2024-02-06T00:36:11.690Z · comments (13)

Sparsify: A mechanistic interpretability research agenda
Lee Sharkey (Lee_Sharkey) · 2024-04-03T12:34:12.043Z · comments (22)

Live Theory Part 0: Taking Intelligence Seriously
Sahil · 2024-06-26T21:37:10.479Z · comments (3)

A Solomonoff Inductor Walks Into a Bar: Schelling Points for Communication
johnswentworth · 2024-07-26T00:33:42.000Z · comments (1)

2024 Petrov Day Retrospective
Ben Pace (Benito) · 2024-09-28T21:30:14.952Z · comments (25)

[link] Anthropic: Three Sketches of ASL-4 Safety Case Components
Zach Stein-Perlman · 2024-11-06T16:00:06.940Z · comments (32)

[link] Finishing The SB-1047 Documentary In 6 Weeks
Michaël Trazzi (mtrazzi) · 2024-10-28T20:17:47.465Z · comments (5)

[link] Against Aschenbrenner: How 'Situational Awareness' constructs a narrative that undermines safety and threatens humanity
GideonF · 2024-07-15T18:37:40.232Z · comments (17)

Apollo Research 1-year update
Marius Hobbhahn (marius-hobbhahn) · 2024-05-29T17:44:32.484Z · comments (0)

It's time for a self-reproducing machine
Carl Feynman (carl-feynman) · 2024-08-07T21:52:22.819Z · comments (68)

Notes on Dwarkesh Patel’s Podcast with Demis Hassabis
Zvi · 2024-03-01T16:30:08.687Z · comments (0)

[link] Sabotage Evaluations for Frontier Models
David Duvenaud (david-duvenaud) · 2024-10-18T22:33:14.320Z · comments (12)

OpenAI: The Board Expands
Zvi · 2024-03-12T14:00:04.110Z · comments (1)

[question] Am I confused about the "malign universal prior" argument?
nostalgebraist · 2024-08-27T23:17:22.779Z · answers+comments (33)

On attunement
Joe Carlsmith (joekc) · 2024-03-25T12:47:34.856Z · comments (8)

Everything Wrong with Roko's Claims about an Engineered Pandemic
WitheringWeights (EZ97) · 2024-02-22T15:59:08.439Z · comments (10)

SB 1047: Final Takes and Also AB 3211
Zvi · 2024-08-27T22:10:07.647Z · comments (11)

Takeoff speeds presentation at Anthropic
Tom Davidson (tom-davidson-1) · 2024-06-04T22:46:35.448Z · comments (0)

You can, in fact, bamboozle an unaligned AI into sparing your life
David Matolcsi (matolcsid) · 2024-09-29T16:59:43.942Z · comments (171)

How to train your own "Sleeper Agents"
evhub · 2024-02-07T00:31:42.653Z · comments (11)

Announcing Neuronpedia: Platform for accelerating research into Sparse Autoencoders
Johnny Lin (hijohnnylin) · 2024-03-25T21:17:58.421Z · comments (7)

Bigger Livers?
sarahconstantin · 2024-11-08T21:50:09.814Z · comments (10)

Information vs Assurance
johnswentworth · 2024-10-20T23:16:25.762Z · comments (7)

Circular Reasoning
abramdemski · 2024-08-05T18:10:32.736Z · comments (36)

Just admit that you’ve zoned out
joec · 2024-06-04T02:51:27.594Z · comments (22)

New page: Integrity
Zach Stein-Perlman · 2024-07-10T15:00:41.050Z · comments (3)

Defining alignment research
Richard_Ngo (ricraz) · 2024-08-19T20:42:29.279Z · comments (23)

Quotes from Leopold Aschenbrenner’s Situational Awareness Paper
Zvi · 2024-06-07T11:40:03.981Z · comments (10)

Science advances one funeral at a time
Cameron Berg (cameron-berg) · 2024-11-01T23:06:19.381Z · comments (9)

Meaning & Agency
abramdemski · 2023-12-19T22:27:32.123Z · comments (17)

[link] Introducing METR's Autonomy Evaluation Resources
Megan Kinniment (megan-kinniment) · 2024-03-15T23:16:59.696Z · comments (0)

Review: Conor Moreton's "Civilization & Cooperation"
Duncan Sabien (Deactivated) (Duncan_Sabien) · 2024-05-26T19:32:43.131Z · comments (8)

Prediction Markets aren't Magic
SimonM · 2023-12-21T12:54:07.754Z · comments (29)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

anthonyc on Why would ASI share any resources with us?

It's not my field of expertise, so I have only vague impressions of what is going on, and I certainly wouldn't recommend anyone else use me as a source.

satron on Why would ASI share any resources with us?

That makes sense. Are there any promising developments in the field of AI safety that make you think that we will be able to answer that question by the time we need to?

quetzal_rainbow on D0TheMath's Shortform

You are making an error here: ZFC + not Consistent(ZFC) != ZFC.

Assuming ZFC + not Consistent(ZFC) we can prove Consistent(ZFC), because inconsistent systems can prove everything and ZFC + not Consistent(ZFC) + Consistent(ZFC) is, in fact, inconsistent. But it doesn't say anything about consistency of ZFC itself, because you can freely assume any sufficiently powerful system instead of ZFC. If you assume inconsistent system, then system + not Consistent(system) is still inconsistent, if you assume consistent system, then system + not Consistent(system) is inconsistent for reasoning above, so it can't prove whether assumed system is consistent or not.

satron on Buck's Shortform

I am asking this, because I am planning on entering the field, but this is a pretty niche topic that I haven't yet researched.

startattheend on Anvil Problems

I meant that they were functionally booleans, as a single condition is fulfilled "is rich", "has anvil", "AGI achieved". In the anvil example, any number past 1 corresponds to true. In programming, casting positive integers to booleans results in "true" for all positive numbers, and "false" in the case of zero, just like in the anvil example. The intuition carries over too well for me to ignore.

The first example which came to mind for me when reading the post was confidence, which is often treated as a boolean "Does he have confidence? yes/no". So you don't need any countable objects, only a condition/threshold which is either reached or not, with anything past "yes" still being "yes".

A function where everything past a threshold maps to true, and anything before it maps to false, is similar to the anvil example, and to a function like "is positive" (since a more positive number is still positive). But for the threshold to be exactly 1 unit, you need to choose a unit which is large enough. 1$ is not rich, and having one water droplet on you is not "wet", but with the appropriate unit (exactly the size of the threshold/condition) these should be functionally similar.

I'm hoping there is simple and intuitive mathematics for generalizing this class of problems. And now that I think about it, most of these things (the ones which can be used for making more of themselves) are catalysts (something used but not consumed in the process of making something). Using money to make more money, anvils to make more anvils, breeding more of a species before it goes extinct.

kajus on Complex Systems for AI Safety [Pragmatic AI Safety #3]

Great post!

In the past, broad interventions would clearly have been more effective: for instance, there would have been little use in studying empirical alignment prior to deep learning. Even more recently than the advent of deep learning, many approaches to empirical alignment were highly deemphasized when large, pretrained language models arrived on the scene (refer to our discussion of creative destruction in the last post).

As discussed in the last post, a leading motivation for researchers is the interestingness or “coolness” of a problem. Getting more people to research relevant problems is highly dependent on finding interesting and well-defined subproblems for them to work on. This relies on concretizing problems and providing funding for solving them.

This seems be a conflicting advice to me. If you try to follow both you might end up having hard time finding direction for research.

daemonicsigil on The Foraging (Ex-)Bandit [Ruleset & Reflections]

Thanks for making the game! I also played it, just didn't leave a comment on the original post. Scored 2751. I played each location for an entire day after building an initial food stockpile, and so figured out the timing of Tiger Forest and Dog Valley. But I also did some fairly dumb stuff, like assuming a time dependence for other biomes. And I underestimated Horse Hills, since when I foraged it for a full day, I got unlucky and only rolled a single large number. For what it's worth, I find these applet things more accessible than a full-on D&D.Sci (though those are also great), which I often end up not playing because it feels too much like work. With applets you can play on medium-low effort (which I did) and make lots of mistakes (which I did) and learn Valuable Lessons about How Not To Science (which one might hope I did).

jan-betley on Seven lessons I didn't learn from election day

The second reason that I don't trust the neighbor method is that people just... aren't good at knowing who a majority of their neighbors are voting for. In many cases it's obvious (if over 70% of your neighbors support one candidate or the other, you'll probably know). But if it's 55-45, you probably don't know which direction it's 55-45 in.

My guess is that there's some postprocessing here. E.g. if you assume that the "neighbor" estimate is wrong but without the refusal problem, and you have the same data from the previous election, then you could estimate the shift of opinions and apply that to other pools that ask about your vote. Or you could ask some additional question like "who did your neighbours vote for in the previous election" and compare that to the real data (ideally per county or so). I would be very surprised if they based the bets just on the raw results.

unexpectedvalues on Seven lessons I didn't learn from election day

If you ask people who their neighbors are voting for, they will make their best guess about who their neighbors are voting for. Occasionally their best guess will be to assume that their neighbors will vote the same way that they're voting, but usually not. Trump voters in blue areas will mostly answer "Harris" to this question, and Harris voters in red areas will mostly answer "Trump".

clone-of-saturn on Seven lessons I didn't learn from election day

The second reason that I don’t trust the neighbor method is that people just… aren’t good at knowing who a majority of their neighbors are voting for.

This seems like a point in favor of the neighbor method, not against it. You would want people to find "who are my neighbors voting for?" too difficult to readily answer and so mentally replace it with the simpler question "who am I voting for?" thus giving them a plausibly deniable way to admit to voting for Trump.