LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Proving the Geometric Utilitarian Theorem
StrivingForLegibility · 2024-08-07T01:39:10.920Z · comments (0)

I didn't think I'd take the time to build this calibration training game, but with websim it took roughly 30 seconds, so here it is!
mako yass (MakoYass) · 2024-08-02T22:35:21.136Z · comments (2)

Seeking Mechanism Designer for Research into Internalizing Catastrophic Externalities
c.trout (ctrout) · 2024-09-11T15:09:48.019Z · comments (2)

Best-of-n with misaligned reward models for Math reasoning
Fabien Roger (Fabien) · 2024-06-21T22:53:21.243Z · comments (0)

Trying to be rational for the wrong reasons
Viliam · 2024-08-20T16:18:06.385Z · comments (8)

[link] Truth is Universal: Robust Detection of Lies in LLMs
Lennart Buerger · 2024-07-19T14:07:25.162Z · comments (3)

[question] What percent of the sun would a Dyson Sphere cover?
Raemon · 2024-07-03T17:27:50.826Z · answers+comments (26)

[link] Altruism and Vitalism Aren't Fellow Travelers
Arjun Panickssery (arjun-panickssery) · 2024-08-09T02:01:11.361Z · comments (2)

A Basic Economics-Style Model of AI Existential Risk
Rubi J. Hudson (Rubi) · 2024-06-24T20:26:09.744Z · comments (3)

How Congressional Offices Process Constituent Communication
Tristan Williams (tristan-williams) · 2024-07-02T12:38:41.472Z · comments (0)

[link] Robert Caro And Mechanistic Models In Biography
adamShimi · 2024-07-14T10:56:42.763Z · comments (5)

[link] The Living Planet Index: A Case Study in Statistical Pitfalls
Jan_Kulveit · 2024-06-24T10:05:55.101Z · comments (0)

Population ethics and the value of variety
cousin_it · 2024-06-23T10:42:21.402Z · comments (11)

Distillation of 'Do language models plan for future tokens'
TheManxLoiner · 2024-06-27T20:57:34.351Z · comments (2)

Would you benefit from, or object to, a page with LW users' reacts?
Raemon · 2024-08-20T16:35:47.568Z · comments (6)

AXRP Episode 34 - AI Evaluations with Beth Barnes
DanielFilan · 2024-07-28T03:30:07.192Z · comments (0)

GPT-3.5 judges can supervise GPT-4o debaters in capability asymmetric debates
Charlie George (charlie-george) · 2024-08-27T20:44:08.683Z · comments (7)

[question] When can I be numerate?
FinalFormal2 · 2024-09-12T04:05:27.710Z · answers+comments (1)

AI #77: A Few Upgrades
Zvi · 2024-08-20T00:20:09.717Z · comments (3)

The Garden of Eden
Alexander Turok · 2024-07-22T16:07:42.509Z · comments (2)

Incentive Learning vs Dead Sea Salt Experiment
Steven Byrnes (steve2152) · 2024-06-25T17:49:01.488Z · comments (1)

[LDSL#2] Latent variable models, network models, and linear diffusion of sparse lognormals
tailcalled · 2024-08-09T19:57:56.122Z · comments (0)

[link] Libs vs Frameworks, Middle-Level Regularities vs Theories
adamShimi · 2024-07-04T19:01:59.440Z · comments (0)

[question] Money Pump Arguments assume Memoryless Agents. Isn't this Unrealistic?
Dalcy (Darcy) · 2024-08-16T04:16:23.159Z · answers+comments (6)

Interpretability as Compression: Reconsidering SAE Explanations of Neural Activations with MDL-SAEs
Kola Ayonrinde (kola-ayonrinde) · 2024-08-23T18:52:31.019Z · comments (2)

Whirlwind Tour of Chain of Thought Literature Relevant to Automating Alignment Research.
sevdeawesome · 2024-07-01T05:50:49.498Z · comments (0)

[question] Why do Minimal Bayes Nets often correspond to Causal Models of Reality?
Dalcy (Darcy) · 2024-08-03T12:39:44.085Z · answers+comments (1)

[link] Managing Emotional Potential Energy
adamShimi · 2024-07-10T18:20:45.640Z · comments (4)

Text Posts from the Kids Group: 2019
jefftk (jkaufman) · 2024-06-23T13:20:01.495Z · comments (0)

[link] on Science Beakers and DDT
bhauth · 2024-09-05T03:21:21.382Z · comments (12)

[link] ML Safety Research Advice - GabeM
Gabe M (gabe-mukobi) · 2024-07-23T01:45:42.288Z · comments (2)

[link] The Tech Industry is the Biggest Blocker to Meaningful AI Safety Regulations
garrison · 2024-08-16T19:37:28.416Z · comments (1)

[link] Profit and Value
kwang · 2024-07-17T18:06:57.048Z · comments (3)

[link] Day Zero Antivirals for Future Pandemics
Niko_McCarty (niko-2) · 2024-08-26T15:18:33.858Z · comments (2)

Can We Predict Persuasiveness Better Than Anthropic?
Lennart Finke (l-f) · 2024-08-04T14:05:33.668Z · comments (5)

[LDSL#3] Information-orientation is in tension with magnitude-orientation
tailcalled · 2024-08-10T21:58:27.659Z · comments (0)

August 2024 Time Tracking
jefftk (jkaufman) · 2024-08-24T13:50:04.676Z · comments (0)

[link] My 5-step program for losing weight
Nikita Sokolsky (nikita-sokolsky) · 2024-06-30T01:05:40.408Z · comments (20)

[link] "On the Impossibility of Superintelligent Rubik’s Cube Solvers", Claude 2024 [humor]
gwern · 2024-06-23T21:18:10.013Z · comments (6)

Monthly Roundup #21: August 2024
Zvi · 2024-08-20T00:20:08.178Z · comments (6)

[LDSL#5] Comparison and magnitude/diminishment
tailcalled · 2024-08-12T18:47:20.546Z · comments (0)

"The Singularity Is Nearer" by Ray Kurzweil - Review
Lavender (Kevin92) · 2024-07-08T21:32:27.307Z · comments (0)

AXRP Episode 35 - Peter Hase on LLM Beliefs and Easy-to-Hard Generalization
DanielFilan · 2024-08-24T22:30:02.039Z · comments (0)

[link] An ML paper on data stealing provides a construction for "gradient hacking"
David Scott Krueger (formerly: capybaralet) (capybaralet) · 2024-07-30T21:44:37.310Z · comments (1)

[link] [Talk transcript] What “structure” is and why it matters
Alex_Altair · 2024-07-25T15:49:00.844Z · comments (0)

A New Class of Glitch Tokens - BPE Subtoken Artifacts (BSA)
Lao Mein (derpherpize) · 2024-09-20T13:13:26.181Z · comments (5)

[link] Hyperpolation
Gunnar_Zarncke · 2024-09-15T21:37:00.002Z · comments (4)

Consider attending the AI Security Forum '24, a 1-day pre-DEFCON event
Charlie Rogers-Smith (charlie.rs) · 2024-07-12T23:01:46.370Z · comments (0)

Deception and Jailbreak Sequence: 1. Iterative Refinement Stages of Deception in LLMs
Winnie Yang (winnie-yang) · 2024-08-22T07:32:07.600Z · comments (0)

Instrumental vs Terminal Desiderata
Max Harms (max-harms) · 2024-06-26T20:57:17.584Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

cubefox on What you know when you know nothing

If we know nothing about them, the statements could equally be true or false, and positively or negatively dependent. The same argument which makes us assume 50% probability to individual statements would also make us assume independence between statements. The possibilities cancel out, so to speak.

jessica-liu-taylor on The Obliqueness Thesis

Computationally tractable is Yudkowsky's framing and might be too limited. The kind of thing I believe is for example, an animal without a certain brain complexity will tend not to be a social animal and is therefore unlikely to have the sort of values social animals have. And animals that can't do math aren't going to value mathematical aesthetics the way human mathematicians do.

jessica-liu-taylor on The Obliqueness Thesis

Relativity to Newtonian mechanics is a warp in a straightforward sense. If you believe the layout of a house consists of some rooms connected in a certain way, but there are actually more rooms connected in different ways, getting the maps to line up looks like a warp. Basically, the closer the mapping is to a true homomorphism (in the universal algebra sense), the less warping there is, otherwise there are deviations intuitively analogous to space warps.

nathan-helm-burger on A New Class of Glitch Tokens - BPE Subtoken Artifacts (BSA)

Right, but... it's a combined issue of tokenization and data cleaning. I agree you can't properly do one without the other. I just think you should do both!

Clearly the root cause is the messy data, and once you've cleaned the data it should be trivial to fix the tokenizer.

I checked Lunary's predicted tokenization for Gemma, Mistral, and Grok as well. These three models all seem to handle number tokenization cleanly, a single digit at a time. This suggests that there isn't some well considered valid reason that digits need uneven lumpy tokenization, that it's really just a mistake.

danylo-zhyrko on How I started believing religion might actually matter for rationality and moral philosophy

I am unsure whether I've grasped the mechanism behind this inner work stuff. You feel your limbs weird, and what? What implication are you trying to point to? How should your experience or insight contribute to rationality and moral philosophy? Yes, we have inherent biases but, as far as I've managed to get acquainted with the rationality discourse, they are not resolved by merely being aware of them or expanding consciousness. I cannot clearly see the added value of this inner work. Why is experimental psychology insufficient? Thanks for your answer.

paradiddle on [Intuitive self-models] 1. Preliminaries

I strongly believe that step 1 is sufficient or almost sufficient for step 2, i.e., that it's impossible to give an adequate account of human phenomenology without figuring out most of the computational aspects of consciousness.

Apologies for nitpicking, but your strong belief that step 1 is (almost) sufficient for step 2 would be more faithfully re-phrased as: it will (probably) be possible/easy to give an adequate account of human phenomenology by figuring out most of the computational aspects of consciousness. The way you phrased it (viz., "impossible...without") is equivalent to saying that step 1 is necessary for step 2, an importantly different claim (on this phrasing, something besides the computational aspects may be required). Of course, you may think it is both necessary and sufficient, I'm just pointing out the distinction.

milan-w on More Dakka

Why would pulling the lever make you more responsible of the outcome than not pulling the lever? Both are options you decide to take once you have observed the situation.

chipmonk on The case for more Alignment Target Analysis (ATA)

Her main motivation was to make Thomas’ ideas more accessible to people.

thanks Chi!!

milan-w on What you know when you know nothing

Right. Pure ignorance is not evidence.

johnswentworth on We Don't Know Our Own Values, but Reward Bridges The Is-Ought Gap

Close, but not quite what this post is saying. The core claim in this post is that our brains model the world as though there's a thing called "our values", and tries to learn about those values in the usual epistemic way.

That does not necessarily imply that there actually is a thing called "our values". A fairly precise analogy: consider a Bayesian reasoner which is hardcoded to believe it's in the Game of Life, but is in fact hooked up to my webcam. That reasoner will, for instance, try to learn about the positions of rocks and oscillators and gliders, but there are not actually any rocks or oscillators or gliders in its environment.

You are exactly right that if I say “I’m learning about blah”, then I’m strongly implying that there’s a fact-of-the-matter about blah, and that I think I'm updating in such a way that my beliefs approximately-monotonically converge towards the fact-of-the-matter. But whether there actually exists a fact-of-the-matter, and whether I'm actually converging towards it, is a separate question from whether I think there is, or whether I model the world as though there's a fact-of-the-matter.

And that's the main next question which this post sets up: when, and to what extent, is there a referent of the "values" which humans think they're learning about? Insofar as this all works like ordinary epistemics, we should be able to answer that question in much the same way that we answer any other question about when a human who thinks they're learning about blah is learning about an actual thing. Just like the example of the position of the dog in my apartment.