LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Elon files grave charges against OpenAI
mako yass (MakoYass) · 2024-03-01T17:42:13.963Z · comments (10)

Making a Secular Solstice Songbook
jefftk (jkaufman) · 2024-01-23T19:40:05.055Z · comments (6)

Inducing Unprompted Misalignment in LLMs
Sam Svenningsen (sven) · 2024-04-19T20:00:58.067Z · comments (7)

The "context window" analogy for human minds
Ruby · 2024-02-13T19:29:10.387Z · comments (0)

Dialogue on What It Means For Something to Have A Function/Purpose
johnswentworth · 2024-07-15T16:28:56.609Z · comments (5)

[link] Simple Kelly betting in prediction markets
jessicata (jessica.liu.taylor) · 2024-03-06T18:59:18.243Z · comments (3)

Mud and Despair (Part 4 of "The Sense Of Physical Necessity")
LoganStrohl (BrienneYudkowsky) · 2024-03-07T00:14:23.975Z · comments (0)

[link] Things You're Allowed to Do: At the Dentist
rbinnn · 2024-01-28T18:39:33.584Z · comments (16)

Resolving von Neumann-Morgenstern Inconsistent Preferences
niplav · 2024-10-22T11:45:20.915Z · comments (5)

OODA your OODA Loop
Raemon · 2024-10-11T00:50:48.119Z · comments (3)

[link] [Fiction] A Confession
Arjun Panickssery (arjun-panickssery) · 2024-04-18T16:28:48.194Z · comments (2)

[question] How would you navigate a severe financial emergency with no help or resources?
Tigerlily · 2024-05-02T18:27:51.329Z · answers+comments (22)

Glitch Token Catalog - (Almost) a Full Clear
Lao Mein (derpherpize) · 2024-09-21T12:22:16.403Z · comments (3)

[link] Generative ML in chemistry is bottlenecked by synthesis
Abhishaike Mahajan (abhishaike-mahajan) · 2024-09-16T16:31:34.801Z · comments (2)

Is This Lie Detector Really Just a Lie Detector? An Investigation of LLM Probe Specificity.
Josh Levy (josh-levy) · 2024-06-04T15:45:54.399Z · comments (0)

Compelling Villains and Coherent Values
Cole Wyeth (Amyr) · 2024-10-06T19:53:47.891Z · comments (4)

Mech Interp Lacks Good Paradigms
Daniel Tan (dtch1997) · 2024-07-16T15:47:32.171Z · comments (0)

AI #70: A Beautiful Sonnet
Zvi · 2024-06-27T14:40:08.087Z · comments (0)

[link] An X-Ray is Worth 15 Features: Sparse Autoencoders for Interpretable Radiology Report Generation
hugofry · 2024-10-07T08:53:14.658Z · comments (0)

[link] On what research policymakers actually need
MondSemmel · 2024-04-23T19:50:12.833Z · comments (0)

Drug development costs can range over two orders of magnitude
rossry · 2024-11-03T23:13:17.685Z · comments (0)

[question] What progress have we made on automated auditing?
LawrenceC (LawChan) · 2024-07-06T01:49:43.714Z · answers+comments (1)

D&D.Sci: Whom Shall You Call?
abstractapplic · 2024-07-05T20:53:37.010Z · comments (6)

0.202 Bits of Evidence In Favor of Futarchy
niplav · 2024-09-29T21:57:59.896Z · comments (0)

(Appetitive, Consummatory) ≈ (RL, reflex)
Steven Byrnes (steve2152) · 2024-06-15T15:57:39.533Z · comments (1)

Free Will and Dodging Anvils: AIXI Off-Policy
Cole Wyeth (Amyr) · 2024-08-29T22:42:24.485Z · comments (12)

AI Safety Camp 10
Robert Kralisch (nonmali-1) · 2024-10-26T11:08:09.887Z · comments (9)

Book Review: On the Edge: The Business
Zvi · 2024-09-25T12:20:06.230Z · comments (0)

5 ways to improve CoT faithfulness
CBiddulph (caleb-biddulph) · 2024-10-05T20:17:12.637Z · comments (39)

[link] The consistent guessing problem is easier than the halting problem
jessicata (jessica.liu.taylor) · 2024-05-20T04:02:03.865Z · comments (5)

[link] AISafety.info: What is the "natural abstractions hypothesis"?
Algon · 2024-10-05T12:31:14.195Z · comments (2)

Evaluating Sparse Autoencoders with Board Game Models
Adam Karvonen (karvonenadam) · 2024-08-02T19:50:21.525Z · comments (1)

[link] Tinker
Richard_Ngo (ricraz) · 2024-04-16T18:26:38.679Z · comments (0)

[link] A Percentage Model of a Person
Sable · 2024-10-12T17:55:07.560Z · comments (3)

A New Class of Glitch Tokens - BPE Subtoken Artifacts (BSA)
Lao Mein (derpherpize) · 2024-09-20T13:13:26.181Z · comments (7)

[link] WSJ: Inside Amazon’s Secret Operation to Gather Intel on Rivals
trevor (TrevorWiesinger) · 2024-04-23T21:33:08.049Z · comments (5)

Intent alignment as a stepping-stone to value alignment
Seth Herd · 2024-11-05T20:43:24.950Z · comments (4)

LASR Labs Spring 2025 applications are open!
Erin Robertson · 2024-10-04T13:44:20.524Z · comments (0)

[question] Is there software to practice reading expressions?
lsusr · 2024-04-23T21:53:00.679Z · answers+comments (11)

[link] I didn't have to avoid you; I was just insecure
Chipmonk · 2024-08-17T16:41:50.237Z · comments (7)

Eye contact is effortless when you’re no longer emotionally blocked on it
Chipmonk · 2024-09-27T21:47:01.970Z · comments (24)

I'm creating a deep dive podcast episode about the original Leverage Research - would you like to take part?
spencerg · 2024-09-22T14:03:22.164Z · comments (2)

[link] A High Decoupling Failure
Maxwell Tabarrok (maxwell-tabarrok) · 2024-04-14T19:46:09.552Z · comments (5)

We’re not as 3-Dimensional as We Think
silentbob · 2024-08-04T14:39:16.799Z · comments (16)

Thousands of malicious actors on the future of AI misuse
Zershaaneh Qureshi (zershaaneh-qureshi) · 2024-04-01T10:08:42.357Z · comments (0)

The murderous shortcut: a toy model of instrumental convergence
Thomas Kwa (thomas-kwa) · 2024-10-02T06:48:06.787Z · comments (0)

Exploring SAE features in LLMs with definition trees and token lists
mwatkins · 2024-10-04T22:15:28.108Z · comments (5)

On DeepMind’s Frontier Safety Framework
Zvi · 2024-06-18T13:30:21.154Z · comments (4)

[question] When is reward ever the optimization target?
Noosphere89 (sharmake-farah) · 2024-10-15T15:09:20.912Z · answers+comments (17)

Turning Your Back On Traffic
jefftk (jkaufman) · 2024-07-17T01:00:08.627Z · comments (7)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

evan-r-murphy on Alignment Faking in Large Language Models

Agree, I'm surprised that a model which can reason about its own training process wouldn't also reason that the "secret scratchpad" might actually be surveilled and so avoid recording any controversial thoughts there. But it's lucky for us that some of these models have been willing to write interesting things on the scratchpad at least at current capability levels and below, because Anthropic has sure produced some interesting results from it (IIRC they used the scratchpad technique in at least one other paper).

oliver-daniels on Thoughts on the conservative assumptions in AI control

We initially called these experiments “control evaluations”; my guess is that it was a mistake to use this terminology, and we’ll probably start calling them “black-box plan-conservative control evaluations” or something.

also seems worth disambiguating "conservative evaluations" from "control evaluations" - in particular, as you suggest [LW · GW], we might want to assess scalable oversight methods under conservative assumptions (to be fair, the distinction isn't all that clean - your oversight process can both train and monitor a policy. Still, I associate control more closely with monitoring, and conservative evaluations have a broader scope).

edge_retainer on Orienting to 3 year AGI timelines

I've seen this take a few times about land values and I would bet against it. If society gets mega rich based on capital (and thus more or similarly inequality) I think the cultural capitals of the US (LA, NY, Bay, Chicago, Austin, etc.) and most beautiful places (Marin/Sonoma, Jackson hole, Park City, Aspen, Vail, Scotsdale, Florida Keys, Miami, Charleston, etc.) will continue to outpace everywhere else.

Also the idea that New York is expensive because that's where the jobs are doesn't seem particularly true to me. Companies move to these places as much because they are trying to attract talent as the other way around. I know lots of students who went to my T20 university and got remote jobs. Approximately 0 of them want to move to ugly bumfuck even if it's basically free. The suburbs/exurbs maybe, but not rural Missouri.

Now if there is a large wealth redistribution, which seem extremely unlikely given the timelines and current politics, I would agree. Also thinking construction will get cheaper is pretty questionable. The cost of construction in the US has skyrocketed largely because of regulations, new tech won't necessarily be able to fix this.

aynonymousprsn123 on What's Wrong With the Simulation Argument?

I have to say, quila, I'm pleasantly surprised that your response above is both plausible and logically coherent—qualities I couldn't find in any of the Reddit responses. Thank you.

However, I have concerns and questions for you.

Most importantly, I worry that if we're currently in a simulation, physics and even logic could be entirely different from what they appear to be. If all our senses are illusory, why should our false map align with the territory outside the simulation? A story like your "Mutual Anthropic Capture" offers hope: a logically sound hypothesis in which our understanding of physics is true. But why should it be? Believing that a simulation exactly matches reality sounds to me like the privileging the hypothesis fallacy.

By the way, I'm also somewhat skeptical of a couple of your assumptions in Mutual Anthropic Capture. Still, I think it's a good idea overall, and some subtle modifications to the idea would probably make logically sound. I won't bother you about those small issues here, though; I'm more interested in your response to my concern above.

dmitry-vaintrob on The quantum red pill or: They lied to you, we live in the (density) matrix

To add: I think the other use of "pure state" comes from this context. Here if you have a system of commuting operators and take a joint eigenspace, the projector is mixed, but it is pure if the joint eigenvalue uniquely determines a 1D subspace; and then I think this terminology gets used for wave functions as well

jimrandomh on Elizabeth's Shortform

Epistemic belief updating: Not noticeably different.

Task stickiness: Massively increased, but I believe this is improvement (at baseline my task stickiness is too low so the change is in the right direction).

sharmake-farah on What are the plans for solving the inner alignment problem?

My personal ranking of impact would be regularization, then AI control (at least for automated alignment schemes), with interpretability a distant 3rd or 4th at best.

I'm pretty certain that we will do a lot better than evolution, but whether that's good enough is an empirical question for us.

embee on Embee's Shortform

Pet peeve: AI community defaulted to von Neumann as being the ultimate smart human and therefore the basis of all ASI/human intelligence comparison when the mathematician Alexander Grothendieck exists somehow.

Von Neumann arguably had the highest processor-type "horsepower" we know of plus his breadth of intellectual achievements is unparalleled.
But imo Grothendieck is a better comparison point for ASI as his intelligence, while being strangely similar to LLMs in some dimensions, arguably more closely resembles what alien-like intelligence would be:
- solving "impossible" problem through meta-language and abstractions.
- able to think deeply on his own (re-discovered measure theory alone when he was a teenager, re-discovered Poincaré results when undergrad, apparently solved multiple PhD theses in parallel in less than a year)
- almost solely built algebraic geometry (which in turn provided the blueprint for category theory) a domain which scares a part of the mathematics community to this day.
- not your typical child prodigy
- famously bad at computations: "take a prime number. 57 for instance."

Even from the AI alignment perspective, Grothendieck is fascinating.
Unaligned with "society" incentives and rewards yet having strong moral preferences, in the sense of choosing to work for a public university when he probably could have earned a higher wage elsewhere, holding hardcore communist beliefs, refusing the Fields medal in protest of Soviet Union and on top of that chosing to be stateless.
Disappeared the moment he understood that despite all of that his discoveries were still fueling the industrial-military complex.

q-home on Q Home's Shortform

Sorry if it's not appropriate for this site. But is anybody interested in chess research? I've seen that people here might be interested in chess. For example, here's a chess post [LW · GW] barely related to AI.

Intro

In chess, what positions have the longest forced wins? "Mate in N" positions can be split into 3 types:

Positions which use "tricks" to get a big number of moves before checkmate. Such as cycles of repeating moves. For example, this manmade mate in 415 (see the last position) uses obvious cycles. Not to mention mates in omega.
Tablebase checkmates, discovered by brute force, showing absolutely incomprehensible play with no discernible logic. See this mate in 549 moves. One should assume it's based on some hidden cycles or something?
Positions which are similar to immortal games. Where the winning variation requires a combination without any cycles. For example: Kasparov's Immortal (14 moves long combination), Stoofvlees vs. Igel (down a rook for 21 moves) - the examples lack optimal play tho.

Surprisingly, nobody seems to look for the longest mates of Type 3. Well, I did look for them and discovered some. Down below I'll explain multiple ways to define what exactly I did. Won't go into too much detail. If you want more detail - Research idea: the longest non-trivial middlegames. There you also can see the puzzles I've created.

My longest puzzle is 42 moves: https://lichess.org/study/sTon08Mb/JG4YGbcP Overall, I've created 7 unique puzzles. Worked a lot on 1 more (mate in 52 moves), but couldn't make it work.

Among other things, I made this absurd mate in 34 puzzle. Almost the entire board is filled with pieces (62 pieces on the board!), only two squares are empty. And despite that the position has deep content. It's kinda a miracle. I think it deserves recognition.

Definition 1

Unlike Type 1 and Type 2 mates, my mates involve many sacrifices of material. So my mates can be defined as "the longest sacrificial combinations".

Definition 2

We can come up with important metrics which make a long mate more special, harder to find, more rare. Material disbalance, amount of non-check moves, amount of freedom of pieces, etc. Then we can search for the longest mates compatible with high enough values of those metrics.

Well, that's what I did.

Definition 3

This is an idea of a definition rather than a definition. But it might be important.

Take a sequential game with perfect information.
Take positions with the longest forced wins.
Out of those positions, choose positions where the defending side has the greatest control over the attacking side's optimal strategy.

My mates are an example of positions where the defending side has especially great control over the flow of the game.

Deeper meaning?

Can there be any deep meaning behind researching my type of mates? I think yes. There are two relevant things.

First thing is hard to explain, because I'm not a mathematician. But I'll try. Math can often be seen as skipping stuff which is the most interesting to humans. For example, math can prove theorems about games in general, without explaining why a specific game is interesting or why a specific position is interesting. However, here it seems like we can define something very closely related to subjective "interestingness".
Hardness of defining valuable things is relevant to Alignment. The definitions above reveal that maybe sometimes valuable things are easier to define than it seems.

Reception

How did chess community receive my work?

On Reddit, some posts got a moderate amount of upvotes (enough to get into daily top). A silly middlegame position. With checkmate in 50-80 moves? (110+); Does this position set any record? (60+). Sadly the pattern didn't continue: New long non-trivial middlegame mate found. Nobody asked for this. (1).
On a computer chess forum, people mostly ignored it. I hoped they could help me find the longest attacks in computer games.
On the Discord of chess composers, a bunch of people complimented my project. But nobody showed any proactive interest (e.g. "hey, I'd like to preserve your work"). One person reacted like ~"I'm not a specialist on that type of thing, I don't know with whom you could talk about that"
On Reddit communities where you can ask mathematicians things, people told that game theory is too abstract for tackling such things.

elifland on Charlie Steiner's Shortform

Sorry, fixed