LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[question] What epsilon do you subtract from "certainty" in your own probability estimates?
Dagon · 2024-11-26T19:13:46.795Z · answers+comments (6)

Is the mind a program?
EuanMcLean (euanmclean) · 2024-11-28T09:42:02.892Z · comments (60)

Importing Bluesky Comments
jefftk (jkaufman) · 2024-11-28T03:50:06.635Z · comments (0)

The first AGI may be a good engineer but bad strategist
Knight Lee (Max Lee) · 2024-12-09T06:34:54.082Z · comments (2)

Lenses of Control
WillPetillo · 2024-10-22T07:51:06.355Z · comments (0)

The low Information Density of Eliezer Yudkowsky & LessWrong
Felix Olszewski (quick-maths) · 2024-12-30T19:43:59.355Z · comments (7)

How I saved 1 human life (in expectation) without overthinking it
Christopher King (christopher-king) · 2024-12-22T20:53:13.492Z · comments (0)

Inverse Problems In Everyday Life
silentbob · 2024-10-15T11:42:30.276Z · comments (2)

What can we learn from insecure domains?
Logan Zoellner (logan-zoellner) · 2024-11-01T23:53:30.066Z · comments (21)

Secular Solstice Songbook Update
jefftk (jkaufman) · 2024-11-17T17:30:07.404Z · comments (2)

[question] How can we prevent AGI value drift?
Dakara (chess-ice) · 2024-11-20T18:19:24.375Z · answers+comments (5)

AXRP Episode 38.0 - Zhijing Jin on LLMs, Causality, and Multi-Agent Systems
DanielFilan · 2024-11-14T07:00:06.977Z · comments (0)

Don’t Legalize Drugs
Declan Molony (declan-molony) · 2025-01-14T06:51:14.005Z · comments (5)

Backdoors have universal representations across large language models
Amirali Abdullah (amirali-abdullah) · 2024-12-06T22:56:33.519Z · comments (0)

Dance Differentiation
jefftk (jkaufman) · 2024-11-15T02:30:07.694Z · comments (0)

[link] I, Token
Ivan Vendrov (ivan-vendrov) · 2024-11-25T02:20:35.629Z · comments (2)

[link] Disentangling Representations through Multi-task Learning
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-11-24T13:10:26.307Z · comments (1)

Crosspost: Developing the middle ground on polarized topics
juliawise · 2024-11-25T14:39:53.041Z · comments (16)

[link] Is AI Hitting a Wall or Moving Faster Than Ever?
garrison · 2025-01-09T22:18:51.497Z · comments (3)

[question] How can humanity survive a multipolar AGI scenario?
[deleted] · 2025-01-09T20:17:40.143Z · answers+comments (8)

[question] Why is Gemini telling the user to die?
Burny · 2024-11-18T01:44:12.583Z · answers+comments (1)

[link] Do humans really learn from "little" data?
Alice Wanderland (alice-wanderland) · 2025-01-14T10:46:09.179Z · comments (2)

[question] Is AI alignment a purely functional property?
Roko · 2024-12-15T21:42:50.674Z · answers+comments (7)

[link] [Linkpost] Building Altruistic and Moral AI Agent with Brain-inspired Affective Empathy Mechanisms
Gunnar_Zarncke · 2024-11-04T10:15:35.550Z · comments (0)

Near term discussions need something smaller and more concrete than AGI
ryan_b · 2025-01-11T18:24:58.283Z · comments (0)

Low-effort review of "AI For Humanity"
Charlie Steiner · 2024-12-11T09:54:42.871Z · comments (0)

Goal: Understand Intelligence
Johannes C. Mayer (johannes-c-mayer) · 2024-11-03T21:20:02.900Z · comments (19)

[link] AISN #45: Center for AI Safety 2024 Year in Review
Corin Katzke (corin-katzke) · 2024-12-19T18:15:56.416Z · comments (0)

Registrations Open for 2024 NYC Secular Solstice & Megameetup
Joe Rogero · 2024-11-12T17:50:10.827Z · comments (0)

What You Can Give Instead of Advice
Karl Faulks (karl-faulks) · 2024-10-24T23:10:48.014Z · comments (2)

Mid-Generation Self-Correction: A Simple Tool for Safer AI
MrThink (ViktorThink) · 2024-12-19T23:41:00.702Z · comments (0)

Paraddictions: unreasonably compelling behaviors and their uses
Michael Cohn (michael-cohn) · 2024-11-22T20:53:59.479Z · comments (0)

Paper club: He et al. on modular arithmetic (part I)
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-13T11:18:44.738Z · comments (0)

[link] Progress links and short notes, 2025-01-13
jasoncrawford · 2025-01-13T18:35:21.426Z · comments (2)

[link] The lying p value
kqr · 2024-11-12T06:12:59.934Z · comments (7)

Curriculum of Ascension
andrew sauer (andrew-sauer) · 2024-11-07T23:54:18.983Z · comments (0)

Comparing the AirFanta 3Pro to the Coway AP-1512
jefftk (jkaufman) · 2024-12-16T01:40:01.522Z · comments (0)

A pragmatic story about where we get our priors
Fiora Sunshine (Fiora from Rosebloom) · 2025-01-02T10:16:54.019Z · comments (6)

Robbin's Farm Sledding Route
jefftk (jkaufman) · 2024-12-21T22:10:01.175Z · comments (1)

Lecture Series on Tiling Agents
abramdemski · 2025-01-14T21:34:03.907Z · comments (0)

LLM Psychometrics and Prompt-Induced Psychopathy
Korbinian K. (korbinian-koch) · 2024-10-18T18:11:24.256Z · comments (2)

[link] AI Prejudices: Practical Implications
PeterMcCluskey · 2024-10-19T02:19:58.695Z · comments (0)

Motte-and-Bailey: a Short Explanation
Lorec · 2024-10-23T22:29:55.074Z · comments (0)

A Poem Is All You Need: Jailbreaking ChatGPT, Meta & More
Sharat Jacob Jacob (sharat-jacob-jacob) · 2024-10-29T12:41:30.337Z · comments (0)

ML4Good (AI Safety Bootcamp) - Experience report
JanEbbing · 2024-11-05T01:18:43.554Z · comments (0)

GPT-4o Can In Some Cases Solve Moderately Complicated Captchas
dirk (abandon) · 2024-11-09T04:04:37.782Z · comments (2)

AXRP Episode 38.1 - Alan Chan on Agent Infrastructure
DanielFilan · 2024-11-16T23:30:09.098Z · comments (0)

Sideloading: creating a model of a person via LLM with very large prompt
avturchin · 2024-11-22T16:41:28.293Z · comments (4)

Reflections on ML4Good
james__p · 2024-11-25T02:40:32.586Z · comments (0)

Commenting Patterns by Platform
jefftk (jkaufman) · 2024-12-01T11:50:06.932Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

seth-herd on Thane Ruthenis's Shortform

I agree that chatbot progress is probably not existentially threatening. But it's all too short a leap to making chatbots power general agents. The labs have claimed to be willing and enthusiastic about moving to an agent paradigm. And I'm afraid that a proliferation of even weakly superhuman or even roughly parahuman agents could be existentially threatening.

I spell out my logic for how short the leap might be from current chatbots to takeover-capable AGI agents in my argument for short timelines being quite possible [LW(p) · GW(p)]. I do think we've still got a good shot of aligning that type of LLM agent AGI since it's a nearly best-case scenario. RL even in o1 is really mostly used for making it accurately follow instructions, which is at least roughly the ideal alignment goal of Corrigibility as Singular Target [LW · GW]. Even if we lose faithful chain of thought and orgs don't take alignment that seriously, I think those advantages of not really being a maximizer and having corrigibility might win out.

That in combination with the slower takeoff make me tempted to believe its actually a good thing if we forge forward, even though I'm not at all confident that this will actually get us aligned AGI or good outcomes. I just don't see a better realistic path.

declan-molony on Don’t Legalize Drugs

I did not replicate his argument in full. I merely selected interesting excerpts to comment upon.

The full essay can be read online here.

yonatan-cale-1 on Would catching your AIs trying to escape convince AI developers to slow down or undeploy?

Are you interested in having a prediction market about this that falls back on your judgement if the situation is unclear?

Something like "If it's publicly known that an AI lab 'caught the AI red handed' (in the spirit of Redwood's Control agenda), will the lab temporarily shut down as Redwood suggested (as opposed to applying a small patch and keep going)?"

gurkenglas on I'm offering free math consultations for programmers!

Thanks, edited. If we keep this going we'll have more authors than users x)

daniel-herrmann on Chance is in the Map, not the Territory

I wouldn't say that is a clear exception. There are perfectly normal, subjective probability ways to make sense of mixed strategies in game theory. For example, this paper by Aumann and Brandenburger provide epistemic conditions for Nash equilirbia, that don't require objective probabilities to randomize. From their paper:

"Mixed strategies are treated not as conscious randomizations, but as conjectures, on the part of other players, as to what a player will do." (p. 1161)

In slightly more detail:

"According to [our] view, players do not randomize; each player chooses some definite action. But other players need not know which one, and the mixture represents their uncertainty, their conjecture about his choice. This is the context of our main results, which provide sufficient conditions for a probile of conjectures to constitute a Nash equilibrium." (p. 1162)

Interestingly, this paper is very motivated by embedded agency type concerns. For example, on page 1174 they write:

"Though entirely apt, use of the term “state of the world” to include the actions of the players has perhaps caused confusion. In Savage (1954), the decision maker cannot affect the state; he can only react to it. While convenient in Savage’s one person context, this is not appropriate in the interactive, many-person world under study here. Since each player must take into account the actions of the others, the actions should be included in the description of the state. Also the plain, everyday meaning of the term “state of the world” includes one’s actions: Our world is shaped by what we do. It has been objected that prescribing what a player must do at a state takes away his freedom. This is nonsensical; the player may do what he wants. It is simply that whatever he does is part of the description of the state. If he wishes to do something else, he is heartily welcome to do it, but he thereby changes the state."

In general, getting back to reflective oracles, indeed I think that is one way that one might try to provide a formalism underlying some application of game theory! And I think it is a very interesting. But, as the Aumann and Brandenburger paper shows, there are totally normal ways to do this without fundamental chance. They have some references in their paper to other papers with this perspective, and it forms one of many motivations for the approach of epistemic game theory.

And, in general, I would resist the inference from "this kind of reasoning requires the world to be a certain way" to "the world must be a certain way.

screwtape on Thinking By The Clock

(Self review) I stand by this post, I think it's an important idea, I think not enough people are using this technique, and this adds nothing but a different way of writing something that was already in the rationalist canon.

If you do not sometimes stop, start a timer, think for five minutes, come to a conclusion and then move on, I believe you are missing an important mental skill and you should fix that. This skill helps me. I have observed some of the most effective people I know personally use this skill. You should at least try it.

You know what followup work I want? I want a dozen different modes of this idea. A youtube video. The audio version is great. The fictional version in HPMOR is great. Can we get a goofy videogame that makes you use the pause button well? (I tried to get at this with Troll Timers. https://www.lesswrong.com/posts/fCg3pLZqthXsGznHP/troll-timers) [LW · GW] I should try rewriting this as a rousing speech. It'd be cool to have it as a catchy tune. Maybe someone should tiktok the sucker.

I'm not saying it's the most important idea! Just, you know, it's broadly applicable and any mistake you make by not thinking for five minutes when you are not actually under time pressure is a stupid mistake that makes beisutsukai-san disappointed in you.

If the Best Of LessWrong collection is just for things that add to the conversation, this post doesn't belong there. I'd give it a small positive vote if I could vote on it. On the other hand if nobody else has gotten a post about this concept into the Best Of LessWrong collection yet, and some newcomers might just read the Best Of LessWrong posts, then I do kinda want something on this topic to get in there.

knight-lee on The purposeful drunkard

I think there is a typo somewhere, probably because you switched whether the vectors were rows or columns.

Based on the dimensions of the matrices, it should be $X = M_{u p d} \cdot S$

And $X_{c e n t} = M_{u p d} S C$

And I think $X_{c e n t} X_{c e n t}^{T} = M_{u p d} S C C^{T} S^{T} M_{u p d}^{T}$

Instead of $X_{c e n t}^{T} X_{c e n t} = M_{u p d}^{T} S^{T} C^{T} C S M_{u p d}$

$S$ should still be upper triangular.

Though don't trust me either, I often do math in a hand-wavy fashion.

My intuition was that PCA selects the "angle" you view the data from which stretches out the data as much as possible, forcing the random walk to appear relatively straighter.

But somehow the random walk is smooth on a over a few data points, but still turns back and forth over the duration of $T$ . This contradicts my intuition and I have no idea what's going on.

curt-tigges on How do you deal w/ Super Stimuli?

I use Freedom and Limit on my computer and Stay Focused on my Android phone. The former two allow for a combination of complete blocking during certain time windows and time limits (for any website, even across browsers and even if you open an incognito window). The latter does both for my phone.

I block all social media and content during prime working hours and implement a 30-minute limit outside of that. It works pretty well. I may make it more strict because I sometimes find myself looking at Twitter, etc. occasionally when watching a TV show in the evenings.

I also use BlockTube to get rid of YouTube Shorts entirely from my web browser. They no longer show up in search results or in the menu.

Finally, I recommend the tools here, though I haven't tried all of them: https://liamrosen.com/2023/04/18/modding-social-media-to-win-the-attention-war/

cstinesublime on How do you deal w/ Super Stimuli?

I don't want to pretend that I'm someone who is immune to Youtube binges or similar behaviors. However I am not sure why this is a problem and what meaningful work that this behavior was getting in the way of? Speaking for myself, 9/10 if I have a commitment the next morning, I won't stay up late on my computer because... I know I have a commitment at a set time. (If you forced me to hypothesize why that 1/10 times I don't, I'd guess that it is stress related anticipation means I can't sleep even if I did lay down - but that is just a wild guess).

I'm also surprised to see how most of the solutions in the comments involve removing access to anything... doing something more productive. I think there is a difference between the nebulous guilt we feel about Opportunity Cost - "oh geez I could have used that time more effectively" and specific, tangible, realistic things we could have done but didn't. I often find that Youtube Binges are caused by/as-a-result-of not being able to find those activities, they do not frustrate them.

I have perennially found that whatever vice (or as you call it 'hyperstimuli') that I remove, I just replace it with another but it's never a beneficial activity. (The one exception I can think of was when I stopped listening to music when I had a bout of insomnia and instead replaced it with lectures on Wittgenstein or Quantum Physics, because I figured "I might as well learn SOMETHING').

This has caused me an incredible amount of frustration. For all the talk of "social media detox" and even the farcically named "dopamine detox" none seem to actually result in net increases in my well being.

Going back to what I said about specific, tangible, realistic alternatives: I have found that the only way to stop mid-way through a Youtube binge or a Instagram scroll is to be excited about a project that I have a lot of faith in my ability to complete, and a viable first-step which I can do now.

This isn't fail-safe, if I'm writing a journal entry or an essay, and I have to leave in 30 minutes, you bet your bottom dollar I'll be late because I'll be so engrossed in that writing process. But that doesn't sound like a 'hyperstimuli'

screwtape on In Defense of Parselmouths

(Self review) Do I stand by this post? Eh. Kinda sorta but I think it's incomplete.

I think there's something important in truth-telling, and getting everyone on the same page about what we mean by the truth. Since everyone will not just start telling the literal truth all the time and I don't even particularly want them to, we're going to need to have some norms and social lubricant around how to handle the things people say that aren't literal truth.

The first thing I disagree with when rereading it is sometimes even if someone is obviously and straightforwardly feeding me bullshit, I keep trying to tell the truth. Sometimes I try even harder to be precise and truthful. In a conversation with friends, I might say "that game's no fun" when the true and accurate statement is "I don't find that game fun." In a heated internet argument, I think it's useful to check my stance and use the latter kind of statement, even if the other person is saying things like "everyone who doesn't like that game is a moron."

Short of a complete guide to Truth, I'd settle for a practical "Here's how Screwtape regards the truth, read it and you'll understand when he'd say false things." This essay falls short of that.

I'd love more things in this genre. Meta-Honesty: Firming Up Honesty Around Its Edge Cases and The Onion Test For Personal And Institutional Honesty are both good examples of the genre. Even personal versions seem useful.

I think that makes this a replaceable essay. It would be fine in a Best Of collection, but it's not adding too much other than a few intuition pumps.