LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] NAO Updates, January 2025
jefftk (jkaufman) · 2025-01-10T03:37:36.698Z · comments (0)

LessWrong audio: help us choose the new voice
PeterH · 2024-12-11T02:24:37.026Z · comments (0)

Who is marketing AI alignment?
MrThink (ViktorThink) · 2025-01-19T21:37:30.477Z · comments (3)

Patent Trolling to Save the World
Double · 2025-01-17T04:13:46.768Z · comments (7)

Evolution's selection target depends on your weighting
tailcalled · 2024-11-19T18:24:53.117Z · comments (22)

Review: The Lathe of Heaven
dr_s · 2025-01-31T08:10:58.673Z · comments (0)

Improving Our Safety Cases Using Upper and Lower Bounds
Yonatan Cale (yonatan-cale-1) · 2025-01-16T00:01:49.043Z · comments (0)

[question] Why do futurists care about the culture war?
Knight Lee (Max Lee) · 2025-01-14T07:35:05.136Z · answers+comments (22)

Subjective Naturalism in Decision Theory: Savage vs. Jeffrey–Bolker
Daniel Herrmann (Whispermute) · 2025-02-04T20:34:22.625Z · comments (4)

Boston Secular Solstice 2024: Call for Singers and Musicans
jefftk (jkaufman) · 2024-11-15T13:50:07.827Z · comments (0)

Why Isn't Tesla Level 3?
jefftk (jkaufman) · 2024-12-11T14:50:01.159Z · comments (7)

Plausibly Factoring Conjectures
Quinn (quinn-dougherty) · 2024-11-22T20:11:56.479Z · comments (1)

[question] Can we infer the search space of a local optimiser?
Lucius Bushnaq (Lblack) · 2025-02-03T10:17:01.661Z · answers+comments (5)

The Type of Writing that Pushes Women Away
Dahlia (sdjfhkj-dkjfks) · 2025-01-08T18:54:52.070Z · comments (4)

The average rationalist IQ is about 122
Rockenots (Ekefa) · 2024-12-28T15:42:07.067Z · comments (23)

Fluoridation: The RCT We Still Haven't Run (But Should)
ChristianKl · 2025-01-11T21:02:47.483Z · comments (5)

[link] Job Opening: SWE to help improve grant-making software
Ethan Ashkie (ethan-ashkie-1) · 2025-01-08T00:54:22.820Z · comments (1)

[question] Is weak-to-strong generalization an alignment technique?
cloud · 2025-01-31T07:13:03.332Z · answers+comments (1)

Electric Grid Cyberattack: An AI-Informed Threat Model
moonlightmaze · 2024-11-11T21:34:17.190Z · comments (0)

Is AI Physical?
Lauren Greenspan (LaurenGreenspan) · 2025-01-14T21:21:39.999Z · comments (5)

[question] Meal Replacements in 2025?
alkjash · 2025-01-06T15:37:25.041Z · answers+comments (9)

so you have a chronic health issue
agencypilled · 2025-01-26T19:00:29.972Z · comments (9)

The Alignment Mapping Program: Forging Independent Thinkers in AI Safety - A Pilot Retrospective
Alvin Ånestrand (alvin-anestrand) · 2025-01-10T16:22:16.905Z · comments (0)

Filled Cupcakes
jefftk (jkaufman) · 2024-11-26T03:20:08.504Z · comments (2)

SAE regularization produces more interpretable models
Peter Lai (peter-lai) · 2025-01-28T20:02:56.662Z · comments (6)

Text Posts from the Kids Group: 2018
jefftk (jkaufman) · 2024-11-23T12:50:05.325Z · comments (0)

Broken Latents: Studying SAEs and Feature Co-occurrence in Toy Models
chanind · 2024-12-30T22:50:54.964Z · comments (3)

A hierarchy of disagreement
Adam Zerner (adamzerner) · 2025-01-23T03:17:59.051Z · comments (4)

Balsa Research 2024 Update
Zvi · 2024-12-03T12:30:06.829Z · comments (0)

Can we rescue Effective Altruism?
Elizabeth (pktechgirl) · 2025-01-09T16:40:02.405Z · comments (0)

[link] Are we trying to figure out if AI is conscious?
Kristaps Zilgalvis (kristaps-zilgalvis-1) · 2025-01-27T01:05:07.001Z · comments (6)

Magnitudes: Let's Comprehend the Incomprehensible!
joec · 2024-12-01T03:08:46.503Z · comments (8)

Non-Obvious Benefits of Insurance
jefftk (jkaufman) · 2024-12-23T03:40:02.184Z · comments (5)

Open Thread Winter 2024/2025
habryka (habryka4) · 2024-12-25T21:02:41.760Z · comments (24)

Long Live the Usurper
pleiotroth · 2024-11-27T12:10:51.025Z · comments (0)

The absolute basics of representation theory of finite groups
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-08T09:47:13.136Z · comments (1)

[link] I read every major AI lab’s safety plan so you don’t have to
sarahhw · 2024-12-16T18:51:38.499Z · comments (0)

Gwerns
Tomás B. (Bjartur Tómas) · 2024-11-16T14:31:57.791Z · comments (2)

Information Versus Action
Screwtape · 2025-02-04T05:13:55.192Z · comments (0)

AXRP Episode 38.3 - Erik Jenner on Learned Look-Ahead
DanielFilan · 2024-12-12T05:40:06.835Z · comments (0)

[link] It looks like there are some good funding opportunities in AI safety right now
Benjamin_Todd · 2024-12-22T12:41:02.151Z · comments (0)

AI Strategy Updates that You Should Make
Alice Blair (Diatom) · 2025-01-27T21:10:41.838Z · comments (2)

Latent Adversarial Training (LAT) Improves the Representation of Refusal
alexandraabbas · 2025-01-06T10:24:53.419Z · comments (6)

[link] Announcement: AI for Math Fund
sarahconstantin · 2024-12-05T18:33:13.556Z · comments (9)

Grading my 2024 AI predictions
Nikola Jurkovic (nikolaisalreadytaken) · 2025-01-02T05:01:46.587Z · comments (1)

Why abandon “probability is in the mind” when it comes to quantum dynamics?
Maxwell Peterson (maxwell-peterson) · 2025-01-14T15:53:55.548Z · comments (17)

Economic Post-ASI Transition
Joel Burget (joel-burget) · 2025-01-01T22:37:31.722Z · comments (11)

A Generalization of the Good Regulator Theorem
Alfred Harwood · 2025-01-04T09:55:25.432Z · comments (6)

Definition of alignment science I like
quetzal_rainbow · 2025-01-06T20:40:38.187Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

avturchin on Why isn't AI containment the primary AI safety strategy?

I tried to model a best possible confinement strategy in Multilevel AI Boxing.
I wrote it a few years ago and most ideas will unlikely work for current situation with many instances of chats and open weight models.
However, the idea of landmines - secret stop words or puzzles which stop AI - may still hold. It is like jail breaking in reverse: unaligned AI finds some secret message which stops it. It could be realized on hardware level, or through anomalous tokens or "philosophical landmines'.

gunnar_zarncke on Reviewing LessWrong: Screwtape's Basic Answer

About archipelago: c2.com, the original wiki tried this. They created a federated wiki but that didn't seem to work. My guess: the volume was too low.

And LW has already all the filtering you need: just subscribe to the people and topics you are interested. There is also the unfinishe reading list.

I get tha this may not feel like its own community. Within LW this could be done with ongoing open threads about a topic. But tgat requires an organizer and participation. And we are back at volume. And at needing good writers.

tsvibt on evhub's Shortform

(Interesting. FWIW I've recently been thinking that it's a mistake to think of this type of thing--"what to do after the acute risk period is safed"--as being a waste of time / irrelevant; it's actually pretty important, specifically because you want people trying to advance AGI capabilities to have an alternative, actually-good vision of things. A hypothesis I have is that many of them are in a sense genuinely nihilistic/accelerationist; "we can't imagine the world after AGI, so we can't imagine it being good, so it cannot be good, so there is no such thing as a good future, so we cannot be attached to a good future, so we should accelerate because that's just what is happening".)

purple-fire on What working on AI safety taught me about B2B SaaS sales

I also think monopolizing talent enables software companies to make sure those high fixed costs stay nice and high.

If you disagreed with this, is it because you think it is literally false or because you don't agree with the implied argument that software companies are doing this on purpose?

ape-in-the-coat on Subjective Naturalism in Decision Theory: Savage vs. Jeffrey–Bolker

Richness: The model must include all the propositions the agent can meaningfully consider, including those about herself. If the agent can form a proposition “I will do X”, then that belongs in the space of propositions over which she has beliefs and (where appropriate) desirabilities.

I see a potential problem here, depending on what exactly is meant by "can meaningfully consider".

Consider this set up:

You participate in the experiment for seven days. Every day you wake up in a room and can choose between two envelopes. One of them has 100$ the other is empty. Then your memory of this act is erased. At the end of the experiment you get all the money that you've managed to win.
On day one money are assigned to an envelope randomly. However, on all the next days the money are put in the envelope that you didn't pick on the previous day. You do not have any access to random number generators.

Is the model supposed to include credence for proposition "Today the money is in envelope 1" when you wake up participating in such experiment?

purple-fire on What working on AI safety taught me about B2B SaaS sales

Hm, this violates my model of the world.

there are too many AI companies for this deal to work on all of them

Realistically, I think there are like 3-4 labs^[1] that matter, OAI, DM, Anthropic, Meta.

some of these AI companies will have strong kinda-ideological commitments to not doing this

Even if that was true, they will be at the whim of investors who are almost all big tech companies.

this is better done by selling (even at a lower revenue) to anyone who wants an AI SWE than selling just to Oracle.

This is the explicit claim I was making with the WTP argument. I think this is firmly not true, and OpenAI will make more money by selling just to Oracle. What evidence causes you to disagree?

^{^}
American/Western labs.

t3t on Nick Land: Orthogonality

I hadn't downvoted this post, but I am not sure why OP is surprised given the first four paragraphs, rather than explaining what the post is about, instead celebrate tree murder and insult their (imagined) audience:

so that no references are needed but those any LW-rationalist is expected to have committed to memory by the time of their first Lighthaven cuddle puddle

quetzal_rainbow on Subjective Naturalism in Decision Theory: Savage vs. Jeffrey–Bolker

I think austerity has a weird relationship with counterfactuals?

tsvibt on Do you have High-Functioning Asperger's Syndrome?

It's not clear why anyone would want to claim a self-diagnosis of that, since little about it is 'egosyntonic', as the psychiatrists say.

Since a friend mentioned I might be schizoid, I've been like "...yeah? somewhat? maybe? seems mixed? aren't I just avoidant? but I feel more worried about relating than about being rejected?", though I'm not very motivated to learn a bunch about it. So IDK. But anyway, re/ egosyntonicity:

Compared to avoidant, schizoid seems vaguely similar, but less pathetic; less needy or cowardly.
Schizoid has some benefits of "disagreeability, but not as much of an asshole". Thinking for yourself, not being taken in by common modes of being.
Schizoid is maybe kinda like "I have a really really high bar for who I want to relate to", which is kinda high-status.

ustice on We Fell For It

I describe myself as a techno-hippy. I was reading Cory Doctorow while waiting for the next chapter of HPMoR. I’ve often wanted a good leftrat community. I feel ya. I have an intuition that Consequentialism is a dangerous philosophy to adopt for optimizers, especially when they get scared. It’s not a huge hop from “saving the world” to “saving people from themselves.”

When I was in my twenties, I’d spend hours a day in forums and on Reddit. I’ve moderated and participated. Now, I’m 47 I just don’t have it in me. Planting a banner works, but it’s a lot of work.

This is about as close as I get to social media nowadays. It’s too easy to get people angry online, and to use that anger as a weapon. Right now I’m hoping I can guide my son through his teen-age years without him being weaponized. Hopefully kindness and the virtues of Stoicism stand stronger against inversion than Consequentialism.