LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] NAO Updates, January 2025
jefftk (jkaufman) · 2025-01-10T03:37:36.698Z · comments (0)

[link] Letter from an Alien Mind
Shoshannah Tekofsky (DarkSym) · 2024-12-27T13:20:49.277Z · comments (7)

[link] debating buying NVDA in 2019
bhauth · 2025-01-04T05:06:54.047Z · comments (0)

[link] Human-AI Complementarity: A Goal for Amplified Oversight
rishubjain · 2024-12-24T09:57:55.111Z · comments (3)

[link] Genetically edited mosquitoes haven't scaled yet. Why?
alexey · 2024-12-30T21:37:32.942Z · comments (0)

[link] PCR retrospective
bhauth · 2024-12-26T21:20:56.484Z · comments (0)

The average rationalist IQ is about 122
Rockenots (Ekefa) · 2024-12-28T15:42:07.067Z · comments (23)

[link] Job Opening: SWE to help improve grant-making software
Ethan Ashkie (ethan-ashkie-1) · 2025-01-08T00:54:22.820Z · comments (1)

[question] Meal Replacements in 2025?
alkjash · 2025-01-06T15:37:25.041Z · answers+comments (9)

Non-Obvious Benefits of Insurance
jefftk (jkaufman) · 2024-12-23T03:40:02.184Z · comments (5)

The absolute basics of representation theory of finite groups
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-08T09:47:13.136Z · comments (0)

A Generalization of the Good Regulator Theorem
Alfred Harwood · 2025-01-04T09:55:25.432Z · comments (5)

Grading my 2024 AI predictions
Nikola Jurkovic (nikolaisalreadytaken) · 2025-01-02T05:01:46.587Z · comments (1)

Broken Latents: Studying SAEs and Feature Co-occurrence in Toy Models
chanind · 2024-12-30T22:50:54.964Z · comments (3)

Turning up the Heat on Deceptively-Misaligned AI
J Bostock (Jemist) · 2025-01-07T00:13:28.191Z · comments (16)

Really radical empathy
MichaelStJules · 2025-01-06T17:46:31.269Z · comments (0)

Open Thread Winter 2024/2025
habryka (habryka4) · 2024-12-25T21:02:41.760Z · comments (9)

Whistleblowing Twitter Bot
Mckiev · 2024-12-26T04:09:45.493Z · comments (5)

Latent Adversarial Training (LAT) Improves the Representation of Refusal
alexandraabbas · 2025-01-06T10:24:53.419Z · comments (6)

Measuring Nonlinear Feature Interactions in Sparse Crosscoders [Project Proposal]
Jason Gross (jason-gross) · 2025-01-06T04:22:12.633Z · comments (0)

[link] Why OpenAI’s Structure Must Evolve To Advance Our Mission
stuhlmueller · 2024-12-28T04:24:19.937Z · comments (1)

An exhaustive list of cosmic threats
Jordan Stone (jordan-stone) · 2025-01-09T19:59:08.368Z · comments (2)

Definition of alignment science I like
quetzal_rainbow · 2025-01-06T20:40:38.187Z · comments (0)

We need a universal definition of 'agency' and related words
CstineSublime · 2025-01-11T03:22:56.623Z · comments (1)

Monthly Roundup #25: December 2024
Zvi · 2024-12-23T14:20:04.682Z · comments (3)

Fluoridation: The RCT We Still Haven't Run (But Should)
ChristianKl · 2025-01-11T21:02:47.483Z · comments (5)

[link] AI safety content you could create
Adam Jones (domdomegg) · 2025-01-06T15:35:56.167Z · comments (0)

Economic Post-ASI Transition
[deleted] · 2025-01-01T22:37:31.722Z · comments (11)

[link] Genesis
PeterMcCluskey · 2024-12-31T22:01:17.277Z · comments (0)

[link] From the Archives: a story
Richard_Ngo (ricraz) · 2024-12-27T16:36:50.735Z · comments (1)

The Alignment Mapping Program: Forging Independent Thinkers in AI Safety - A Pilot Retrospective
Alvin Ånestrand (alvin-anestrand) · 2025-01-10T16:22:16.905Z · comments (0)

Beliefs and state of mind into 2025
RussellThor · 2025-01-10T22:07:01.060Z · comments (7)

A Collection of Empirical Frames about Language Models
Daniel Tan (dtch1997) · 2025-01-02T02:49:05.965Z · comments (0)

[question] What is the most impressive game LLMs can play well?
Cole Wyeth (Amyr) · 2025-01-08T19:38:18.530Z · answers+comments (3)

Rebuttals for ~all criticisms of AIXI
Cole Wyeth (Amyr) · 2025-01-07T17:41:10.557Z · comments (11)

Incredibow
jefftk (jkaufman) · 2025-01-07T03:30:02.197Z · comments (3)

[link] Building AI safety benchmark environments on themes of universal human values
Roland Pihlakas (roland-pihlakas) · 2025-01-03T04:24:36.186Z · comments (3)

[question] What would be the IQ and other benchmarks of o3 that uses $1 million worth of compute resources to answer one question?
avturchin · 2024-12-26T11:08:23.545Z · answers+comments (2)

Coin Flip
XelaP (scroogemcduck1) · 2024-12-27T11:53:01.781Z · comments (0)

Don't fall for ontology pyramid schemes
Lorec · 2025-01-07T23:29:46.935Z · comments (7)

Can we rescue Effective Altruism?
Elizabeth (pktechgirl) · 2025-01-09T16:40:02.405Z · comments (0)

The case for pay-on-results coaching
Chipmonk · 2025-01-03T18:40:22.304Z · comments (3)

Predicting AI Releases Through Side Channels
Reworr R (reworr-reworr) · 2025-01-07T19:06:41.584Z · comments (1)

Stop Making Sense
JenniferRM · 2024-12-23T05:16:12.428Z · comments (0)

[link] Ideologies are slow and necessary, for now
Gabriel Alfour (gabriel-alfour-1) · 2024-12-23T01:57:47.153Z · comments (1)

Boston Solstice 2024 Retrospective
jefftk (jkaufman) · 2024-12-29T15:40:05.095Z · comments (0)

[link] The Legacy of Computer Science
Johannes C. Mayer (johannes-c-mayer) · 2024-12-29T13:15:28.606Z · comments (0)

Notes on Altruism
David Gross (David_Gross) · 2024-12-29T03:13:09.444Z · comments (1)

Apply to the 2025 PIBBSS Summer Research Fellowship
DusanDNesic · 2024-12-24T10:25:12.882Z · comments (0)

Guilt, Shame, and Depravity
Benquo · 2025-01-07T01:16:00.273Z · comments (10)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

lorec on On Dwarkesh Patel’s 4th Podcast With Tyler Cowen

I didn't say he wasn't overrated. I said he was capable of physics.

Did you read the linked post? Bohm, Aharonov, and Bell misunderstood EPR. Bohm's and Aharonov's formulation of the thought experiment is easier to "solve" but does not actually address EPR's concern, which is that mutual non-commutation of x-, y-, and z-spin implies hidden variables must not be superfluous. Again, EPR were fine with mutual non-commutation, and fine with entanglement. What they were pointing out was that the two postulates don't make sense in each other's presence.

daniel-tan on Daniel Tan's Shortform

That’s interesting! What would be some examples of axioms and theorems that describe a directed tree?

nathan-helm-burger on No one has the ball on 1500 Russian olympiad winners who've received HPMOR

Ok, probably a silly idea but... Maybe have some kind of competition for young people involving something like math / computer science / essay writing / puzzle solving / jailbreaking LLMs... Give a cash prize for the top three, and send books to a bunch of the runners up.

Sounds like something that would take a lot of organization effort, so somebody would need to be excited enough about this idea to want to spearhead it.

donatas-luciunas on Terminal Values and Instrumental Values

Bayesian decision system

Why would you assume AGI will use Byaesian decision system? Such system would be limited to known probabilities. Unknown probability = 0 probability is not intelligent (Hitchens's razor, Black swan theory, Fitch's paradox of knowability). Once you incorporate this, Orthogonality Thesis [? · GW] is no longer valid - it becomes obvious that every intelligent AI will only work in single direction (which is disastrous to humans). I know there is a huge gap between "unknown probabilities" and "existential risk", you can find more information in my posts and I am available to explain it verbally (calendly link below). Short teaser - it is possible that terminal value can also be discovered (not only assumed).

morpheus on The purposeful drunkard

Yes!

dmitry-vaintrob on The purposeful drunkard

Do the images load now?

morpheus on The purposeful drunkard

I can only see the image of the 5-d random walk. The other images aren't rendering.

mikhail-samin on No one has the ball on 1500 Russian olympiad winners who've received HPMOR

We also have 6k more copies (18k hard-cover books) left. We have no idea what to do with them. Suggestions are welcome.

Here's a map of Russian libraries that requested copies of HPMOR, and we've sent 2126 copies to:

Sending HPMOR to random libraries is cool, but I hope someone comes up with better ways of spending the books.

bronson-schoen on Human takeover might be worse than AI takeover

so we can control values by controlling data.

What do you mean? As in you would filter specific data from the posttraining step? What would you be trying to prevent the model from learning specifically?

quetzal_rainbow on Daniel Tan's Shortform

We need to split "search" into more fine-grained concepts.

For example, "model has representation of the world and simulates counterfactual futures depending of its actions and selects action with the highest score over the future" is a one notion of search.

The other notion can be like this: imagine possible futures as a directed tree graph. This graph has set of axioms and derived theorems describing it. Some of the axioms/theorems are encoded in model. When model gets sensory input, it makes 2-3 inferences from combination of encoded theorems + input and selects action depending on the result of inference. While logically this situation is equivalent to some search over tree graph, mechanistically it looks like "bag of heuristics".