LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Inducing Unprompted Misalignment in LLMs
Sam Svenningsen (sven) · 2024-04-19T20:00:58.067Z · comments (6)

[question] How would you navigate a severe financial emergency with no help or resources?
Tigerlily · 2024-05-02T18:27:51.329Z · answers+comments (22)

Monthly Roundup #14: January 2024
Zvi · 2024-01-24T12:50:09.231Z · comments (22)

AI #48: The Talk of Davos
Zvi · 2024-01-25T16:20:26.625Z · comments (9)

Mud and Despair (Part 4 of "The Sense Of Physical Necessity")
LoganStrohl (BrienneYudkowsky) · 2024-03-07T00:14:23.975Z · comments (0)

Natural abstractions are observer-dependent: a conversation with John Wentworth
Martín Soto (martinsq) · 2024-02-12T17:28:38.889Z · comments (13)

Tort Law Can Play an Important Role in Mitigating AI Risk
Gabriel Weil (gabriel-weil) · 2024-02-12T17:17:59.135Z · comments (9)

Dialogue on What It Means For Something to Have A Function/Purpose
johnswentworth · 2024-07-15T16:28:56.609Z · comments (5)

LLMs as a Planning Overhang
Larks · 2024-07-14T02:54:14.295Z · comments (8)

Are we so good to simulate?
KatjaGrace · 2024-03-04T05:20:03.535Z · comments (24)

[link] Things You're Allowed to Do: At the Dentist
rbinnn · 2024-01-28T18:39:33.584Z · comments (16)

You're a Space Wizard, Luke
lsusr · 2024-08-18T05:35:39.238Z · comments (6)

International Scientific Report on the Safety of Advanced AI: Key Information
Aryeh Englander (alenglander) · 2024-05-18T01:45:10.194Z · comments (0)

Your LLM Judge may be biased
Henry Papadatos (henry) · 2024-03-29T16:39:22.534Z · comments (9)

Principles For Product Liability (With Application To AI)
johnswentworth · 2023-12-10T21:27:41.403Z · comments (55)

Turning Your Back On Traffic
jefftk (jkaufman) · 2024-07-17T01:00:08.627Z · comments (7)

[link] Turning 22 in the Pre-Apocalypse
testingthewaters · 2024-08-22T20:28:25.794Z · comments (14)

[link] Twitter thread on AI takeover scenarios
Richard_Ngo (ricraz) · 2024-07-31T00:24:33.866Z · comments (0)

[link] The Hippie Rabbit Hole -Nuggets of Gold in Rivers of Bullshit
Jonathan Moregård (JonathanMoregard) · 2024-01-05T18:27:01.769Z · comments (20)

[link] I didn't have to avoid you; I was just insecure
Chipmonk · 2024-08-17T16:41:50.237Z · comments (7)

Interview with Vanessa Kosoy on the Value of Theoretical Research for AI
WillPetillo · 2023-12-04T22:58:40.005Z · comments (0)

Gated Attention Blocks: Preliminary Progress toward Removing Attention Head Superposition
cmathw · 2024-04-08T11:14:43.268Z · comments (4)

[question] Is a random box of gas predictable after 20 seconds?
Thomas Kwa (thomas-kwa) · 2024-01-24T23:00:53.184Z · answers+comments (35)

Games for AI Control
charlie_griffin (cjgriffin) · 2024-07-11T18:40:50.607Z · comments (0)

Review Report of Davidson on Takeoff Speeds (2023)
Trent Kannegieter · 2023-12-22T18:48:55.983Z · comments (11)

Thousands of malicious actors on the future of AI misuse
Zershaaneh Qureshi (zershaaneh-qureshi) · 2024-04-01T10:08:42.357Z · comments (0)

[link] A Percentage Model of a Person
Sable · 2024-10-12T17:55:07.560Z · comments (3)

A New Class of Glitch Tokens - BPE Subtoken Artifacts (BSA)
Lao Mein (derpherpize) · 2024-09-20T13:13:26.181Z · comments (7)

COT Scaling implies slower takeoff speeds
Logan Zoellner (logan-zoellner) · 2024-09-28T16:20:00.320Z · comments (56)

Distinguish worst-case analysis from instrumental training-gaming
Olli Järviniemi (jarviniemi) · 2024-09-05T19:13:34.443Z · comments (0)

The murderous shortcut: a toy model of instrumental convergence
Thomas Kwa (thomas-kwa) · 2024-10-02T06:48:06.787Z · comments (0)

Reflections on the Metastrategies Workshop
gw · 2024-10-24T18:30:46.255Z · comments (5)

Exploring SAE features in LLMs with definition trees and token lists
mwatkins · 2024-10-04T22:15:28.108Z · comments (5)

Glitch Token Catalog - (Almost) a Full Clear
Lao Mein (derpherpize) · 2024-09-21T12:22:16.403Z · comments (3)

Free Will and Dodging Anvils: AIXI Off-Policy
Cole Wyeth (Amyr) · 2024-08-29T22:42:24.485Z · comments (12)

What is wisdom?
TsviBT · 2023-11-14T02:13:49.681Z · comments (3)

Possible OpenAI's Q* breakthrough and DeepMind's AlphaGo-type systems plus LLMs
Burny · 2023-11-23T03:16:09.358Z · comments (25)

The Defence production act and AI policy
[deleted] · 2024-03-01T14:26:09.064Z · comments (0)

[link] [Fiction] A Confession
Arjun Panickssery (arjun-panickssery) · 2024-04-18T16:28:48.194Z · comments (2)

Enhancing intelligence by banging your head on the wall
Bezzi · 2023-12-12T21:00:48.584Z · comments (26)

Deconfusing In-Context Learning
Arjun Panickssery (arjun-panickssery) · 2024-02-25T09:48:17.690Z · comments (1)

AI #49: Bioweapon Testing Begins
Zvi · 2024-02-01T15:30:04.690Z · comments (11)

[link] A High Decoupling Failure
Maxwell Tabarrok (maxwell-tabarrok) · 2024-04-14T19:46:09.552Z · comments (5)

I’m confused about innate smell neuroanatomy
Steven Byrnes (steve2152) · 2023-11-28T20:49:13.042Z · comments (1)

On DeepMind’s Frontier Safety Framework
Zvi · 2024-06-18T13:30:21.154Z · comments (4)

[question] Is there software to practice reading expressions?
lsusr · 2024-04-23T21:53:00.679Z · answers+comments (10)

Medical Roundup #2
Zvi · 2024-04-09T13:40:05.908Z · comments (18)

[link] WSJ: Inside Amazon’s Secret Operation to Gather Intel on Rivals
trevor (TrevorWiesinger) · 2024-04-23T21:33:08.049Z · comments (5)

Otherness and control in the age of AGI
Joe Carlsmith (joekc) · 2024-01-02T18:15:54.168Z · comments (0)

UDT1.01: The Story So Far (1/10)
Diffractor · 2024-03-27T23:22:35.170Z · comments (6)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

minusgix on Jimrandomh's Shortform

I agree that it is easy to automatically lump the two concepts together.

I think another important part of this is that there are limited methods for most consumers to coordinate against companies to lower their prices. There's shopping elsewhere, leaving a bad review, or moral outrage. The last may have a chance of blowing up socially, such as becoming a boycott (but boycotts are often considered ineffective), or it may encourage the government to step in. In our current environment, the government often operates as the coordination method to punish companies for behaving in ways that people don't want. In a much more libertarian society we would want this replaced with other methods, so that consumers can make it harder to put themselves in a prisoner's dilemma or stag hunt against each other.

If we had common organizations for more mild coordination than the state interfering, then I believe this would improve the default mentality because there would be more options.

viliam on Jimrandomh's Shortform

I do not watch this topic closely, and have never played a game with a DLC. Speaking as an old gamer, it reminds me of the "shareware" concept, where companies e.g. released the first 10 levels of their game for free, and you could buy a full version that contained those 10 levels + 50 more levels. (In modern speech, that would make the remaining 50 levels a "DLC", kind of.)

I also see some differences:

First, the original game is not free. So you kinda pay for a product, only to be told afterwards that to enjoy the full experience, you need to pay again. Do we have this kind of "you only figure out the full price gradually, after you have already paid a part" in other businesses, and how do their customers tolerate it?

Second, somehow the entire setup works differently; I can't pinpoint it, but it feels obvious. In the days of shareware, the authors tried to make the experience of the free levels as great as possible, so that the customers would be motivated to pay for more of it. These days (but now I am speaking mostly about mobile games, that's the only kind I play recently -- so maybe it feels different there), the mechanism is more like: "the first three levels are nice, then the game gets shitty on purpose, and offers you to pay to make it playable again". For the customer, this feels like extortion, rather than "it's so great that I want more of it". Also, the usual problems with extortion: by paying once you send a strong signal that you are the kind of a person who pays when extorted, so obviously the game will soon require you to pay again, even more this time. (So unlike "get 10 levels for free, then get an offer of 50 more levels for $20", the dynamics is more like "get 20 levels, after level 10 get a surprise message that you need to pay $1 to play further, after level 13 get asked to pay $10, after level 16 get asked to pay $100, and after level 19 get asked to pay $1000 for the final level".)

The situation with desktop games is not as bad as with mobile games, as far as I know, but I can imagine gamers overreacting in order to prevent a slippery slope that would get them into the same situation.

cubefox on Jimrandomh's Shortform

This might be a possible solution to the "supply-demand paradox": sometimes things (e.g. concert or soccer tickets, new playstations) are sold at a price such that the demand far outweighs the supply. Standard economic theory predicts that the price would be increased in such cases.

viliam on Alex K. Chen's Shortform

Is there a simple way to jailbreak the models, such as asking them to talk about a hypothetical parallel universe which is exactly like ours (same biology, same history), except that in the parallel universe humans can have different abilities and competences?

viliam on avturchin's Shortform

Sounds similar to the kind of logic that makes salmonellosis 10x more frequent in America than in Europe.

On one hand, yes, the optimal number of people dying from farm-produced diseases is greater then zero, and overreaction could cause net harm.

On the other hand, it feels like the final decision should be made in some way better than "the farmers lobby declares the topic taboo, and enforces the taboo across the nation", because the one-sided incentives are obvious.

habryka4 on is it possible to comment anonymously on a post?

Yeah, the officially approved way is to just generate an alt.

(But please don't vote with multiple accounts, we will probably notice, and we will ban you if we do)

lsusr on is it possible to comment anonymously on a post?

No, but you can create an alt account.

alenoach on Are we dropping the ball on Recommendation AIs?

What do you think is the main issue preventing companies from making more ethical recommendation algorithms? Is it the difficulty of determining objectively what is accurate and ethical? Or is it more about the incentives, like an unwillingness to sacrifice addictiveness and part of their audience?

zack_m_davis on Claude Sonnet 3.5.1 and Haiku 3.5

The next major update can be Claude 4.0 (and Gemini 2.0) and after that we all agree to use actual normal version numbering rather than dating?

Date-based versions aren't the most popular, but it's not an unheard of thing that Anthropic just made up: see CalVer, as contrasted to SemVer. (For things that change frequently in small ways, it's convenient to just slap the date on it rather than having to soul-search about whether to increment the second or the third number.)

nathan-helm-burger on Claude Sonnet 3.5.1 and Haiku 3.5

For raw IQ, sure. I just mean "conversational flavor".