LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

AI Safety Seed Funding Network - Join as a Donor or Investor
Alexandra Bos (AlexandraB) · 2024-12-16T19:30:43.812Z · comments (0)

A Principled Cartoon Guide to NVC
plex (ete) · 2025-01-07T21:01:07.904Z · comments (5)

Improving Model-Written Evals for AI Safety Benchmarking
Sunishchal Dev (sunishchal-dev) · 2024-10-15T18:25:08.179Z · comments (0)

The Intentional Stance, LLMs Edition
Eleni Angelou (ea-1) · 2024-04-30T17:12:29.005Z · comments (3)

[link] Safety tax functions
owencb · 2024-10-20T14:08:38.099Z · comments (0)

Announcing SPAR Summer 2024!
laurenmarie12 · 2024-04-16T08:30:31.339Z · comments (2)

Two Tales of AI Takeover: My Doubts
Violet Hour · 2024-03-05T15:51:05.558Z · comments (8)

"Full Automation" is a Slippery Metric
ozziegooen · 2024-06-11T19:56:49.855Z · comments (1)

A Case for Superhuman Governance, using AI
ozziegooen · 2024-06-07T00:10:10.902Z · comments (0)

Fun With CellxGene
sarahconstantin · 2024-09-06T22:00:03.461Z · comments (2)

Experiments with an alternative method to promote sparsity in sparse autoencoders
Eoin Farrell · 2024-04-15T18:21:48.771Z · comments (7)

[link] Agreeing With Stalin in Ways That Exhibit Generally Rationalist Principles
Zack_M_Davis · 2024-03-02T22:05:49.553Z · comments (25)

[link] Evaluating Synthetic Activations composed of SAE Latents in GPT-2
Giorgi Giglemiani (Rakh) · 2024-09-25T20:37:48.227Z · comments (0)

Quick Thoughts on Our First Sampling Run
jefftk (jkaufman) · 2024-05-23T00:20:02.050Z · comments (3)

[link] Why Recursion Pharmaceuticals abandoned cell painting for brightfield imaging
Abhishaike Mahajan (abhishaike-mahajan) · 2024-11-05T14:51:41.310Z · comments (1)

[question] What Other Lines of Work are Safe from AI Automation?
RogerDearnaley (roger-d-1) · 2024-07-11T10:01:12.616Z · answers+comments (35)

[link] The Poker Theory of Poker Night
omark · 2024-04-07T09:47:01.658Z · comments (13)

[link] AI & wisdom 1: wisdom, amortised optimisation, and AI
L Rudolf L (LRudL) · 2024-10-28T21:02:51.215Z · comments (0)

End-to-end hacking with language models
tchauvin (timot.cool) · 2024-04-05T15:06:53.689Z · comments (0)

[link] New blog: Expedition to the Far Lands
Connor Leahy (NPCollapse) · 2024-08-17T11:07:48.537Z · comments (3)

Paper Summary: Princes and Merchants: European City Growth Before the Industrial Revolution
Jeffrey Heninger (jeffrey-heninger) · 2024-07-15T21:30:04.043Z · comments (1)

Please Understand
samhealy · 2024-04-01T12:33:20.459Z · comments (11)

[question] Where to find reliable reviews of AI products?
Elizabeth (pktechgirl) · 2024-09-17T23:48:25.899Z · answers+comments (6)

[link] My MATS Summer 2023 experience
James Chua (james-chua) · 2024-03-20T11:26:14.944Z · comments (0)

[question] How does it feel to switch from earn-to-give?
Neil (neil-warren) · 2024-03-31T16:27:22.860Z · answers+comments (4)

Examples of How I Use LLMs
jefftk (jkaufman) · 2024-10-14T17:10:04.597Z · comments (2)

Offering Completion
jefftk (jkaufman) · 2024-06-07T01:40:02.137Z · comments (6)

DPO/PPO-RLHF on LLMs incentivizes sycophancy, exaggeration and deceptive hallucination, but not misaligned powerseeking
tailcalled · 2024-06-10T21:20:11.938Z · comments (13)

Big-endian is better than little-endian
Menotim · 2024-04-29T02:30:48.053Z · comments (17)

[link] GDP per capita in 2050
Hauke Hillebrandt (hauke-hillebrandt) · 2024-05-06T15:14:30.934Z · comments (8)

5. Open Corrigibility Questions
Max Harms (max-harms) · 2024-06-10T14:09:20.777Z · comments (0)

Wholesome Culture
owencb · 2024-03-01T12:08:17.877Z · comments (3)

Scorable Functions: A Format for Algorithmic Forecasting
ozziegooen · 2024-05-21T04:14:11.749Z · comments (0)

Two Weeks Without Sweets
jefftk (jkaufman) · 2024-12-31T03:30:02.003Z · comments (0)

The new ruling philosophy regarding AI
Mitchell_Porter · 2024-11-11T13:28:24.476Z · comments (0)

You can validly be seen and validated by a chatbot
Kaj_Sotala · 2024-12-20T12:00:03.015Z · comments (3)

Compositionality and Ambiguity: Latent Co-occurrence and Interpretable Subspaces
Matthew A. Clarke (Antigone) · 2024-12-20T15:16:51.857Z · comments (0)

Disagreement on AGI Suggests It’s Near
tangerine · 2025-01-07T20:42:43.456Z · comments (15)

D&D.Sci Hypersphere Analysis Part 1: Datafields & Preliminary Analysis
aphyer · 2024-01-13T20:16:39.480Z · comments (1)

[link] A new process for mapping discussions
Nathan Young · 2024-09-30T08:57:20.029Z · comments (8)

[question] Weighing reputational and moral consequences of leaving Russia or staying
spza · 2024-02-18T19:36:40.676Z · answers+comments (24)

[LDSL#4] Root cause analysis versus effect size estimation
tailcalled · 2024-08-11T16:12:14.604Z · comments (0)

Weekly newsletter for AI safety events and training programs
Bryce Robertson (bryceerobertson) · 2024-05-03T00:33:29.418Z · comments (0)

Reviewing the Structure of Current AI Regulations
Deric Cheng (deric-cheng) · 2024-05-07T12:34:17.820Z · comments (0)

Experience Report - ML4Good AI Safety Bootcamp
Kieron Kretschmar · 2024-04-11T18:03:41.040Z · comments (0)

[link] What fuels your ambition?
Cissy · 2024-01-31T18:30:53.274Z · comments (1)

Investigating Bias Representations in LLMs via Activation Steering
DawnLu · 2024-01-15T19:39:14.077Z · comments (4)

Aggregative Principles of Social Justice
Cleo Nardo (strawberry calm) · 2024-06-05T13:44:47.499Z · comments (10)

[link] Abs-E (or, speak only in the positive)
dkl9 · 2024-02-19T21:14:32.095Z · comments (24)

AI #61: Meta Trouble
Zvi · 2024-05-02T18:40:03.242Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

seth-herd on Thane Ruthenis's Shortform

I agree that chatbot progress is probably not existentially threatening. But it's all too short a leap to making chatbots power general agents. The labs have claimed to be willing and enthusiastic about moving to an agent paradigm. And I'm afraid that a proliferation of even weakly superhuman or even roughly parahuman agents could be existentially threatening.

I spell out my logic for how short the leap might be from current chatbots to takeover-capable AGI agents in my argument for short timelines being quite possible [LW(p) · GW(p)]. I do think we've still got a good shot of aligning that type of LLM agent AGI since it's a nearly best-case scenario. RL even in o1 is really mostly used for making it accurately follow instructions, which is at least roughly the ideal alignment goal of Corrigibility as Singular Target [LW · GW]. Even if we lose faithful chain of thought and orgs don't take alignment that seriously, I think those advantages of not really being a maximizer and having corrigibility might win out.

That in combination with the slower takeoff make me tempted to believe its actually a good thing if we forge forward, even though I'm not at all confident that this will actually get us aligned AGI or good outcomes. I just don't see a better realistic path.

declan-molony on Don’t Legalize Drugs

I did not replicate his argument in full. I merely selected interesting excerpts to comment upon.

The full essay can be read online here.

yonatan-cale-1 on Would catching your AIs trying to escape convince AI developers to slow down or undeploy?

Are you interested in having a prediction market about this that falls back on your judgement if the situation is unclear?

Something like "If it's publicly known that an AI lab 'caught the AI red handed' (in the spirit of Redwood's Control agenda), will the lab temporarily shut down as Redwood suggested (as opposed to applying a small patch and keep going)?"

gurkenglas on I'm offering free math consultations for programmers!

Thanks, edited. If we keep this going we'll have more authors than users x)

daniel-herrmann on Chance is in the Map, not the Territory

I wouldn't say that is a clear exception. There are perfectly normal, subjective probability ways to make sense of mixed strategies in game theory. For example, this paper by Aumann and Brandenburger provide epistemic conditions for Nash equilirbia, that don't require objective probabilities to randomize. From their paper:

"Mixed strategies are treated not as conscious randomizations, but as conjectures, on the part of other players, as to what a player will do." (p. 1161)

In slightly more detail:

"According to [our] view, players do not randomize; each player chooses some definite action. But other players need not know which one, and the mixture represents their uncertainty, their conjecture about his choice. This is the context of our main results, which provide sufficient conditions for a probile of conjectures to constitute a Nash equilibrium." (p. 1162)

Interestingly, this paper is very motivated by embedded agency type concerns. For example, on page 1174 they write:

"Though entirely apt, use of the term “state of the world” to include the actions of the players has perhaps caused confusion. In Savage (1954), the decision maker cannot affect the state; he can only react to it. While convenient in Savage’s one person context, this is not appropriate in the interactive, many-person world under study here. Since each player must take into account the actions of the others, the actions should be included in the description of the state. Also the plain, everyday meaning of the term “state of the world” includes one’s actions: Our world is shaped by what we do. It has been objected that prescribing what a player must do at a state takes away his freedom. This is nonsensical; the player may do what he wants. It is simply that whatever he does is part of the description of the state. If he wishes to do something else, he is heartily welcome to do it, but he thereby changes the state."

In general, getting back to reflective oracles, indeed I think that is one way that one might try to provide a formalism underlying some application of game theory! And I think it is a very interesting. But, as the Aumann and Brandenburger paper shows, there are totally normal ways to do this without fundamental chance. They have some references in their paper to other papers with this perspective, and it forms one of many motivations for the approach of epistemic game theory.

And, in general, I would resist the inference from "this kind of reasoning requires the world to be a certain way" to "the world must be a certain way.

screwtape on Thinking By The Clock

(Self review) I stand by this post, I think it's an important idea, I think not enough people are using this technique, and this adds nothing but a different way of writing something that was already in the rationalist canon.

If you do not sometimes stop, start a timer, think for five minutes, come to a conclusion and then move on, I believe you are missing an important mental skill and you should fix that. This skill helps me. I have observed some of the most effective people I know personally use this skill. You should at least try it.

You know what followup work I want? I want a dozen different modes of this idea. A youtube video. The audio version is great. The fictional version in HPMOR is great. Can we get a goofy videogame that makes you use the pause button well? (I tried to get at this with Troll Timers. https://www.lesswrong.com/posts/fCg3pLZqthXsGznHP/troll-timers) [LW · GW] I should try rewriting this as a rousing speech. It'd be cool to have it as a catchy tune. Maybe someone should tiktok the sucker.

I'm not saying it's the most important idea! Just, you know, it's broadly applicable and any mistake you make by not thinking for five minutes when you are not actually under time pressure is a stupid mistake that makes beisutsukai-san disappointed in you.

If the Best Of LessWrong collection is just for things that add to the conversation, this post doesn't belong there. I'd give it a small positive vote if I could vote on it. On the other hand if nobody else has gotten a post about this concept into the Best Of LessWrong collection yet, and some newcomers might just read the Best Of LessWrong posts, then I do kinda want something on this topic to get in there.

knight-lee on The purposeful drunkard

I think there is a typo somewhere, probably because you switched whether the vectors were rows or columns.

Based on the dimensions of the matrices, it should be $X = M_{u p d} \cdot S$

And $X_{c e n t} = M_{u p d} S C$

And I think $X_{c e n t} X_{c e n t}^{T} = M_{u p d} S C C^{T} S^{T} M_{u p d}^{T}$

Instead of $X_{c e n t}^{T} X_{c e n t} = M_{u p d}^{T} S^{T} C^{T} C S M_{u p d}$

$S$ should still be upper triangular.

Though don't trust me either, I often do math in a hand-wavy fashion.

My intuition was that PCA selects the "angle" you view the data from which stretches out the data as much as possible, forcing the random walk to appear relatively straighter.

But somehow the random walk is smooth on a over a few data points, but still turns back and forth over the duration of $T$ . This contradicts my intuition and I have no idea what's going on.

curt-tigges on How do you deal w/ Super Stimuli?

I use Freedom and Limit on my computer and Stay Focused on my Android phone. The former two allow for a combination of complete blocking during certain time windows and time limits (for any website, even across browsers and even if you open an incognito window). The latter does both for my phone.

I block all social media and content during prime working hours and implement a 30-minute limit outside of that. It works pretty well. I may make it more strict because I sometimes find myself looking at Twitter, etc. occasionally when watching a TV show in the evenings.

I also use BlockTube to get rid of YouTube Shorts entirely from my web browser. They no longer show up in search results or in the menu.

Finally, I recommend the tools here, though I haven't tried all of them: https://liamrosen.com/2023/04/18/modding-social-media-to-win-the-attention-war/

cstinesublime on How do you deal w/ Super Stimuli?

I don't want to pretend that I'm someone who is immune to Youtube binges or similar behaviors. However I am not sure why this is a problem and what meaningful work that this behavior was getting in the way of? Speaking for myself, 9/10 if I have a commitment the next morning, I won't stay up late on my computer because... I know I have a commitment at a set time. (If you forced me to hypothesize why that 1/10 times I don't, I'd guess that it is stress related anticipation means I can't sleep even if I did lay down - but that is just a wild guess).

I'm also surprised to see how most of the solutions in the comments involve removing access to anything... doing something more productive. I think there is a difference between the nebulous guilt we feel about Opportunity Cost - "oh geez I could have used that time more effectively" and specific, tangible, realistic things we could have done but didn't. I often find that Youtube Binges are caused by/as-a-result-of not being able to find those activities, they do not frustrate them.

I have perennially found that whatever vice (or as you call it 'hyperstimuli') that I remove, I just replace it with another but it's never a beneficial activity. (The one exception I can think of was when I stopped listening to music when I had a bout of insomnia and instead replaced it with lectures on Wittgenstein or Quantum Physics, because I figured "I might as well learn SOMETHING').

This has caused me an incredible amount of frustration. For all the talk of "social media detox" and even the farcically named "dopamine detox" none seem to actually result in net increases in my well being.

Going back to what I said about specific, tangible, realistic alternatives: I have found that the only way to stop mid-way through a Youtube binge or a Instagram scroll is to be excited about a project that I have a lot of faith in my ability to complete, and a viable first-step which I can do now.

This isn't fail-safe, if I'm writing a journal entry or an essay, and I have to leave in 30 minutes, you bet your bottom dollar I'll be late because I'll be so engrossed in that writing process. But that doesn't sound like a 'hyperstimuli'

screwtape on In Defense of Parselmouths

(Self review) Do I stand by this post? Eh. Kinda sorta but I think it's incomplete.

I think there's something important in truth-telling, and getting everyone on the same page about what we mean by the truth. Since everyone will not just start telling the literal truth all the time and I don't even particularly want them to, we're going to need to have some norms and social lubricant around how to handle the things people say that aren't literal truth.

The first thing I disagree with when rereading it is sometimes even if someone is obviously and straightforwardly feeding me bullshit, I keep trying to tell the truth. Sometimes I try even harder to be precise and truthful. In a conversation with friends, I might say "that game's no fun" when the true and accurate statement is "I don't find that game fun." In a heated internet argument, I think it's useful to check my stance and use the latter kind of statement, even if the other person is saying things like "everyone who doesn't like that game is a moron."

Short of a complete guide to Truth, I'd settle for a practical "Here's how Screwtape regards the truth, read it and you'll understand when he'd say false things." This essay falls short of that.

I'd love more things in this genre. Meta-Honesty: Firming Up Honesty Around Its Edge Cases and The Onion Test For Personal And Institutional Honesty are both good examples of the genre. Even personal versions seem useful.

I think that makes this a replaceable essay. It would be fine in a Best Of collection, but it's not adding too much other than a few intuition pumps.