LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

New paper shows truthfulness & instruction-following don't generalize by default
joshc (joshua-clymer) · 2023-11-19T19:27:30.735Z · comments (0)

[link] Are There Examples of Overhang for Other Technologies?
Jeffrey Heninger (jeffrey-heninger) · 2023-12-13T21:48:08.954Z · comments (50)

Approaching Human-Level Forecasting with Language Models
Fred Zhang (fred-zhang) · 2024-02-29T22:36:34.012Z · comments (6)

[link] shoes with springs
bhauth · 2023-12-30T21:46:55.319Z · comments (6)

AI #48: Exponentials in Geometry
Zvi · 2024-01-18T14:20:07.869Z · comments (9)

Understanding SAE Features with the Logit Lens
Joseph Bloom (Jbloom) · 2024-03-11T00:16:57.429Z · comments (0)

LessOnline Festival Updates Thread
Ben Pace (Benito) · 2024-04-18T21:55:08.003Z · comments (26)

[link] More people getting into AI safety should do a PhD
AdamGleave · 2024-03-14T22:14:48.855Z · comments (24)

Does AI risk “other” the AIs?
Joe Carlsmith (joekc) · 2024-01-09T17:51:47.020Z · comments (3)

[link] Towards shutdownable agents via stochastic choice
EJT (ElliottThornley) · 2024-07-08T10:14:24.452Z · comments (7)

The Sense Of Physical Necessity: A Naturalism Demo (Introduction)
LoganStrohl (BrienneYudkowsky) · 2024-02-24T02:56:31.458Z · comments (1)

[link] Pacing Outside the Box: RNNs Learn to Plan in Sokoban
Adrià Garriga-alonso (rhaps0dy) · 2024-07-25T22:00:55.398Z · comments (8)

What's next for the field of Agent Foundations?
Nora_Ammann · 2023-11-30T17:55:13.982Z · comments (23)

On the Latest TikTok Bill
Zvi · 2024-03-13T18:50:05.398Z · comments (7)

[question] Shane Legg's necessary properties for every AGI Safety plan
jacquesthibs (jacques-thibodeau) · 2024-05-01T17:15:41.233Z · answers+comments (12)

Managing catastrophic misuse without robust AIs
ryan_greenblatt · 2024-01-16T17:27:31.112Z · comments (17)

Paper out now on creatine and cognitive performance
Fabienne · 2023-11-26T10:58:29.745Z · comments (2)

Woods’ new preprint on object permanence
Steven Byrnes (steve2152) · 2024-03-07T21:29:57.738Z · comments (1)

A hermeneutic net for agency
TsviBT · 2024-01-01T08:06:30.289Z · comments (4)

[link] Against Nonlinear (Thing Of Things)
tailcalled · 2024-01-18T21:40:00.369Z · comments (18)

[link] "Why I Write" by George Orwell (1946)
Arjun Panickssery (arjun-panickssery) · 2024-04-25T16:02:28.668Z · comments (2)

The LessWrong 2022 Review: Review Phase
RobertM (T3t) · 2023-12-22T03:23:49.635Z · comments (7)

Consider the humble rock (or: why the dumb thing kills you)
pleiotroth · 2024-07-04T13:54:15.593Z · comments (11)

[link] microwave drilling is impractical
bhauth · 2024-06-12T22:16:00.199Z · comments (14)

[link] Talk: "AI Would Be A Lot Less Alarming If We Understood Agents"
johnswentworth · 2023-12-17T23:46:32.814Z · comments (3)

[link] Sam Altman, Greg Brockman and others from OpenAI join Microsoft
Ozyrus · 2023-11-20T08:23:00.791Z · comments (15)

SAEs (usually) Transfer Between Base and Chat Models
Connor Kissane (ckkissane) · 2024-07-18T10:29:46.138Z · comments (0)

Rationalists are missing a core piece for agent-like structure (energy vs information overload)
tailcalled · 2024-08-17T09:57:19.370Z · comments (9)

Memorizing weak examples can elicit strong behavior out of password-locked models
Fabien Roger (Fabien) · 2024-06-06T23:54:25.167Z · comments (5)

Aligned AI is dual use technology
lc · 2024-01-27T06:50:10.435Z · comments (31)

[link] An Opinionated Evals Reading List
Marius Hobbhahn (marius-hobbhahn) · 2024-10-15T14:38:58.778Z · comments (0)

Apply to ESPR & PAIR, Rationality and AI Camps for Ages 16-21
Anna Gajdova (anna-gajdova) · 2024-05-03T12:36:37.610Z · comments (5)

How you can help pass important AI legislation with 10 minutes of effort
ThomasW · 2024-09-14T22:10:50.386Z · comments (2)

We Inspected Every Head In GPT-2 Small using SAEs So You Don’t Have To
robertzk (Technoguyrob) · 2024-03-06T05:03:09.639Z · comments (0)

[link] Announcing the $200k EA Community Choice
Austin Chen (austin-chen) · 2024-08-14T00:39:37.350Z · comments (8)

The Problem With the Word ‘Alignment’
peligrietzer · 2024-05-21T03:48:26.983Z · comments (8)

Mira Murati leaves OpenAI/ OpenAI to remove non-profit control
Sodium · 2024-09-25T21:15:17.315Z · comments (4)

Against empathy-by-default
Steven Byrnes (steve2152) · 2024-10-16T16:38:49.926Z · comments (22)

Voting Results for the 2022 Review
Ben Pace (Benito) · 2024-02-02T20:34:59.768Z · comments (3)

Measurement tampering detection as a special case of weak-to-strong generalization
ryan_greenblatt · 2023-12-23T00:05:55.357Z · comments (10)

On the UBI Paper
Zvi · 2024-09-03T14:50:08.647Z · comments (6)

[link] This is Water by David Foster Wallace
Nathan Young · 2024-04-24T21:21:09.445Z · comments (16)

AI Alignment Research Engineer Accelerator (ARENA): Call for applicants v4.0
James Fox · 2024-07-06T11:34:57.227Z · comments (7)

Referendum Mechanics in a Marketplace of Ideas
Martin Sustrik (sustrik) · 2024-08-25T08:30:01.901Z · comments (2)

Transfer Learning in Humans
niplav · 2024-04-21T20:49:42.595Z · comments (1)

[link] Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning
Dan Braun (Daniel Braun) · 2024-05-17T16:25:02.267Z · comments (10)

[question] What's the theory of impact for activation vectors?
Chris_Leong · 2024-02-11T07:34:48.536Z · answers+comments (12)

[link] [EAForum xpost] A breakdown of OpenAI's revenue
dschwarz · 2024-07-10T18:09:20.017Z · comments (5)

AI #86: Just Think of the Potential
Zvi · 2024-10-17T15:10:06.552Z · comments (8)

Some negative steganography results
Fabien Roger (Fabien) · 2023-12-09T20:22:52.323Z · comments (5)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

minusgix on Jimrandomh's Shortform

I agree that it is easy to automatically lump the two concepts together.

I think another important part of this is that there are limited methods for most consumers to coordinate against companies to lower their prices. There's shopping elsewhere, leaving a bad review, or moral outrage. The last may have a chance of blowing up socially, such as becoming a boycott (but boycotts are often considered ineffective), or it may encourage the government to step in. In our current environment, the government often operates as the coordination method to punish companies for behaving in ways that people don't want. In a much more libertarian society we would want this replaced with other methods, so that consumers can make it harder to put themselves in a prisoner's dilemma or stag hunt against each other.

If we had common organizations for more mild coordination than the state interfering, then I believe this would improve the default mentality because there would be more options.

viliam on Jimrandomh's Shortform

I do not watch this topic closely, and have never played a game with a DLC. Speaking as an old gamer, it reminds me of the "shareware" concept, where companies e.g. released the first 10 levels of their game for free, and you could buy a full version that contained those 10 levels + 50 more levels. (In modern speech, that would make the remaining 50 levels a "DLC", kind of.)

I also see some differences:

First, the original game is not free. So you kinda pay for a product, only to be told afterwards that to enjoy the full experience, you need to pay again. Do we have this kind of "you only figure out the full price gradually, after you have already paid a part" in other businesses, and how do their customers tolerate it?

Second, somehow the entire setup works differently; I can't pinpoint it, but it feels obvious. In the days of shareware, the authors tried to make the experience of the free levels as great as possible, so that the customers would be motivated to pay for more of it. These days (but now I am speaking mostly about mobile games, that's the only kind I play recently -- so maybe it feels different there), the mechanism is more like: "the first three levels are nice, then the game gets shitty on purpose, and offers you to pay to make it playable again". For the customer, this feels like extortion, rather than "it's so great that I want more of it". Also, the usual problems with extortion: by paying once you send a strong signal that you are the kind of a person who pays when extorted, so obviously the game will soon require you to pay again, even more this time. (So unlike "get 10 levels for free, then get an offer of 50 more levels for $20", the dynamics is more like "get 20 levels, after level 10 get a surprise message that you need to pay $1 to play further, after level 13 get asked to pay $10, after level 16 get asked to pay $100, and after level 19 get asked to pay $1000 for the final level".)

The situation with desktop games is not as bad as with mobile games, as far as I know, but I can imagine gamers overreacting in order to prevent a slippery slope that would get them into the same situation.

cubefox on Jimrandomh's Shortform

This might be a possible solution to the "supply-demand paradox": sometimes things (e.g. concert or soccer tickets, new playstations) are sold at a price such that the demand far outweighs the supply. Standard economic theory predicts that the price would be increased in such cases.

viliam on Alex K. Chen's Shortform

Is there a simple way to jailbreak the models, such as asking them to talk about a hypothetical parallel universe which is exactly like ours (same biology, same history), except that in the parallel universe humans can have different abilities and competences?

viliam on avturchin's Shortform

Sounds similar to the kind of logic that makes salmonellosis 10x more frequent in America than in Europe.

On one hand, yes, the optimal number of people dying from farm-produced diseases is greater then zero, and overreaction could cause net harm.

On the other hand, it feels like the final decision should be made in some way better than "the farmers lobby declares the topic taboo, and enforces the taboo across the nation", because the one-sided incentives are obvious.

habryka4 on is it possible to comment anonymously on a post?

Yeah, the officially approved way is to just generate an alt.

(But please don't vote with multiple accounts, we will probably notice, and we will ban you if we do)

lsusr on is it possible to comment anonymously on a post?

No, but you can create an alt account.

alenoach on Are we dropping the ball on Recommendation AIs?

What do you think is the main issue preventing companies from making more ethical recommendation algorithms? Is it the difficulty of determining objectively what is accurate and ethical? Or is it more about the incentives, like an unwillingness to sacrifice addictiveness and part of their audience?

zack_m_davis on Claude Sonnet 3.5.1 and Haiku 3.5

The next major update can be Claude 4.0 (and Gemini 2.0) and after that we all agree to use actual normal version numbering rather than dating?

Date-based versions aren't the most popular, but it's not an unheard of thing that Anthropic just made up: see CalVer, as contrasted to SemVer. (For things that change frequently in small ways, it's convenient to just slap the date on it rather than having to soul-search about whether to increment the second or the third number.)

nathan-helm-burger on Claude Sonnet 3.5.1 and Haiku 3.5

For raw IQ, sure. I just mean "conversational flavor".