LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Investigating Sensitive Directions in GPT-2: An Improved Baseline and Comparative Analysis of SAEs
Daniel Lee (daniel-lee) · 2024-09-06T02:28:41.954Z · comments (0)

Scientific Notation Options
jefftk (jkaufman) · 2024-05-18T15:10:02.181Z · comments (13)

[question] Why do Minimal Bayes Nets often correspond to Causal Models of Reality?
Dalcy (Darcy) · 2024-08-03T12:39:44.085Z · answers+comments (1)

Without Fundamental Advances, Rebellion and Coup d'État are the Inevitable Outcomes of Dictators & Monarchs Trying to Control Large, Capable Countries
Roko · 2024-01-31T10:14:02.042Z · comments (34)

A Strange ACH Corner Case
jefftk (jkaufman) · 2024-02-10T03:00:05.930Z · comments (2)

A short dialogue on comparability of values
cousin_it · 2023-12-20T14:08:29.650Z · comments (7)

[question] Me & My Clone
SimonBaars (simonbaars) · 2024-07-18T16:25:40.770Z · answers+comments (22)

[link] ML Safety Research Advice - GabeM
Gabe M (gabe-mukobi) · 2024-07-23T01:45:42.288Z · comments (2)

NYU Code Debates Update/Postmortem
David Rein (david-rein) · 2024-05-24T16:08:06.151Z · comments (4)

Survey on the acceleration risks of our new RFPs to study LLM capabilities
Ajeya Cotra (ajeya-cotra) · 2023-11-10T23:59:52.515Z · comments (1)

[link] Solving alignment isn't enough for a flourishing future
mic (michael-chen) · 2024-02-02T18:23:00.643Z · comments (0)

[link] Video Intro to Guaranteed Safe AI
Mike Vaiana (mike-vaiana) · 2024-07-11T17:53:47.630Z · comments (0)

Fifteen Lawsuits against OpenAI
Remmelt (remmelt-ellen) · 2024-03-09T12:22:09.715Z · comments (4)

EA Infrastructure Fund's Plan to Focus on Principles-First EA
Linch · 2023-12-06T03:24:55.844Z · comments (0)

[link] Link Collection: Impact Markets
Saul Munn (saul-munn) · 2023-12-26T09:01:48.815Z · comments (0)

AISC Project: Modelling Trajectories of Language Models
NickyP (Nicky) · 2023-11-13T14:33:56.407Z · comments (0)

My Dating Heuristic
Declan Molony (declan-molony) · 2024-05-21T05:28:40.197Z · comments (4)

[link] align your latent spaces
bhauth · 2023-12-24T16:30:09.138Z · comments (8)

An Affordable CO2 Monitor
Pretentious Penguin (dylan-mahoney) · 2024-03-21T03:06:53.255Z · comments (1)

[question] What Software Should Exist?
Tomás B. (Bjartur Tómas) · 2024-01-19T21:43:50.112Z · answers+comments (27)

[link] Goodhart's Law Example: Training Verifiers to Solve Math Word Problems
Chris_Leong · 2023-11-25T00:53:26.841Z · comments (2)

Weak vs Quantitative Extinction-level Goodhart's Law
VojtaKovarik · 2024-02-21T17:38:15.375Z · comments (1)

[link] David Burns Thinks Psychotherapy Is a Learnable Skill. Git Gud.
Morpheus · 2024-01-27T13:21:05.068Z · comments (20)

[link] AISN #30: Investments in Compute and Military AI Plus, Japan and Singapore’s National AI Safety Institutes
aogara (Aidan O'Gara) · 2024-01-24T19:38:33.461Z · comments (1)

Incentive Learning vs Dead Sea Salt Experiment
Steven Byrnes (steve2152) · 2024-06-25T17:49:01.488Z · comments (1)

[question] Supposing the 1bit LLM paper pans out
O O (o-o) · 2024-02-29T05:31:24.158Z · answers+comments (11)

Response to Dileep George: AGI safety warrants planning ahead
Steven Byrnes (steve2152) · 2024-07-08T15:27:07.402Z · comments (7)

The economy is mostly newbs (strat predictions)
lukehmiles (lcmgcd) · 2024-02-01T19:15:49.420Z · comments (6)

[link] Found Paper: "FDT in an evolutionary environment"
the gears to ascension (lahwran) · 2023-11-27T05:27:50.709Z · comments (47)

[question] What prevents SB-1047 from triggering on deep fake porn/voice cloning fraud?
ChristianKl · 2024-09-26T09:17:39.088Z · answers+comments (21)

[link] Evaluating Synthetic Activations composed of SAE Latents in GPT-2
Giorgi Giglemiani (Rakh) · 2024-09-25T20:37:48.227Z · comments (0)

[link] Why Recursion Pharmaceuticals abandoned cell painting for brightfield imaging
Abhishaike Mahajan (abhishaike-mahajan) · 2024-11-05T14:51:41.310Z · comments (1)

[link] Generic advice caveats
Saul Munn (saul-munn) · 2024-10-30T21:03:07.185Z · comments (1)

[question] Any real toeholds for making practical decisions regarding AI safety?
lukehmiles (lcmgcd) · 2024-09-29T12:03:08.084Z · answers+comments (6)

[link] Predicting Influenza Abundance in Wastewater Metagenomic Sequencing Data
jefftk (jkaufman) · 2024-09-23T17:25:58.380Z · comments (0)

Distinguishing ways AI can be "concentrated"
Matthew Barnett (matthew-barnett) · 2024-10-21T22:21:13.666Z · comments (2)

There aren't enough smart people in biology doing something boring
Abhishaike Mahajan (abhishaike-mahajan) · 2024-10-21T15:52:04.482Z · comments (13)

An AI crash is our best bet for restricting AI
Remmelt (remmelt-ellen) · 2024-10-11T02:12:03.491Z · comments (1)

Superintelligence Can't Solve the Problem of Deciding What You'll Do
Vladimir_Nesov · 2024-09-15T21:03:28.077Z · comments (11)

Domain-specific SAEs
jacob_drori (jacobcd52) · 2024-10-07T20:15:38.584Z · comments (0)

European Progress Conference
Martin Sustrik (sustrik) · 2024-10-06T11:10:03.819Z · comments (11)

Interpretability of SAE Features Representing Check in ChessGPT
Jonathan Kutasov (jonathan-kutasov) · 2024-10-05T20:43:36.679Z · comments (2)

SAE features for refusal and sycophancy steering vectors
neverix · 2024-10-12T14:54:48.022Z · comments (4)

Why is there Nothing rather than Something?
Logan Zoellner (logan-zoellner) · 2024-10-26T12:37:50.204Z · comments (3)

[link] Care Doesn't Scale
stavros · 2024-10-28T11:57:38.742Z · comments (1)

[link] Active Recall and Spaced Repetition are Different Things
Saul Munn (saul-munn) · 2024-11-08T20:14:56.092Z · comments (0)

Sleeping on Stage
jefftk (jkaufman) · 2024-10-22T00:50:07.994Z · comments (3)

Do Sparse Autoencoders (SAEs) transfer across base and finetuned language models?
Taras Kutsyk · 2024-09-29T19:37:30.465Z · comments (7)

[question] Seeking AI Alignment Tutor/Advisor: $100–150/hr
MrThink (ViktorThink) · 2024-10-05T21:28:16.491Z · answers+comments (3)

AI debate: test yourself against chess 'AIs'
Richard Willis · 2023-11-22T14:58:10.847Z · comments (35)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

danwil on The Median Researcher Problem

If an outsider's objective is to be taken seriously, they should write papers and submit them to peer review (e.g. conferences and journals).

Yann LeCun has gone so far to say that independent work only counts as "science" if submitted to peer review: https://x.com/ylecun/status/1795589846771147018?s=19

From my experience, professors are very open to discuss ideas and their work with anyone who seems serious, interested, and knowledgeable. Even someone inside academia will face an uphill battle if their work uses completely different methods.

cstinesublime on Johannes C. Mayer's Shortform

I'm missing a key piece of context here - when you say "doing something good" are you referring to educational or research reading; or do you mean any type of personal project which may or may not involve background research?

I may have some practical observations about note-taking which may be relevant, if I understand the context.

startattheend on Does the "ancient wisdom" argument have any validity? If a particular teaching or tradition is old, to what extent does this make it more trustworthy?

That sounds about right. And "people sometimes feel that way" is a good explanation for the downvote in my opinion. I was arguing the object-level premises of the post because the "disagree" downvote was factually wrong, and this factual wrongness, I argue, is caused by a faulty understanding of how truth works, and this faulty understanding is most common in the western world and in educated people, and in the ideologies which correlate with western thought and academia.

If you disagree with something which is true, I think the only likely explanations are "Does not understand" and "Has a dislike of", and the bias I pointed out covers both of these possibilities (the former is a "map vs territory" issue and the latter is a "morality vs reality" issue).

I think you figured out what went wrong nicely, but in the end the disagreement remains. I still consider my point likely. If somebody comes along and tells me that they disagreed with it for other reasons, I might even argue that they're lying to themselves, as I'm way to disillusioned to think that a "will to truth" exists. I think social status, moral values and other such things are stronger motivators than people will admit even to themselves.

rob-lucas on Bigger Livers?

One reason is just that eating food is enjoyable. I limit the amount of food I eat to stay within a healthy range, but if I could increase that amount while staying healthy, I could enjoy that excess.

I think there are two aspects to the enjoyment of food. One is related to satiety. I enjoy the feeling of sating my appetite, and failing to sate it leaves me with te negative experience of craving food (negative if I don't satisfy those cravings.

But the other aspect is just the enjoyment of eating each individual bite of food. Not the separate enjoyment of sating my appetite, but just the experience of eating.*

When I was younger and much more physically active I ate very large amounts of food. I miss being able to do that. I'm just as sated now with the much smaller portions I eat, but eating a small breakfast instead of a large one is a different experience.

This probably doesn't justify some sort of risky intervention in increasing liver size. Food is enjoyable, but so are a lot of other things in life. But shifting to a higher protien diet seems like the kind of safe intervention, potentially even also healthier in other respects, that, if it has the side effect of being able to eat a little more food, could improve quality of life with minimal other costs. Potential costs I see are related to the price of protein relative to other sources of nutrition, the cost of additional food (if the point is being able to eat more, you've got spend money for that excess), and, depending on one's moral views, something related to the source of the protien being added.

*I think Kahneman's remembering vs. expereincing selves adds some confusion here as well. When we remember a meal we don't necessarily remember the enjoyment we got from every bite, but probably put more weight on the feeling of satiety and the peak experience (how good did it taste at its best?). But the experiencing self experiences every bite. How much you want to weight the remembering vs. experiencing self is a philosophical issue, but I just want to note that it comes up here.

lukehmiles on Should CA, TX, OK, and LA merge into a giant swing state, just for elections?

I wonder if anybody has tried to quantify how much it's worth to be a swing voter. I imagine if you are the government contractor up for renewal then it's worth quite a lot, but I wonder how much of the money/benefits the average Joe sees.

I don't know much about swing state benefits except that Milwaukee, Wisconsin got their lead pipes replaced by the fed and the workers were required to be local and they say they were paid quite well https://youtube.com/watch?v=4VpwgG0P8VU

lukehmiles on The hostile telepaths problem

Aw man we used the same word for different things again

lukehmiles on The hostile telepaths problem

Your examples fit the definition quite well. Apparently this is in the dictionary now. https://www.merriam-webster.com/dictionary/gaslighting

norimori1992 on The Witness

Why would we stretch the definition of lawyer in such a way? That's not what the word "lawyer" means, either in the dictionary sense or in the sense of how people use the word. And even if you can come up with a reason to stretch it to include all those professions, what makes you think that's what Eliezer was doing?

richard_kennaway on Bigger Livers?

I'm missing something here. Why would I want a bigger liver? I mean, from this account, liver size is obviously something that the body is controlling. You list various interventions to make it bigger, which predictably have bad effects. But why would I want to change something that my body is already managing perfectly well?

The only reason I could find was this:

Athletes have higher resting metabolic rates than non-athletes; their bodies use more energy, even when they’re not exercising. That means they can eat more without getting fat.

Is that it? Why not just^[1]...not eat more? These are athletes. They eat to sustain themselves in the pursuit of athletic excellence. They can already "just" not eat more. If they couldn't, they would not be athletes.

I agree there are people, notably Eliezer, who can't "just" not eat more without being as unable to function as if they were starving. I can't see a larger liver burning up more energy helping with that.

If anyone's hackles rise at a sentence beginning "Why not just—", you're quite right. No problem can be solved by "just"...whatever it is. If it could, it would not be a problem. ↩︎

nostalgebraist on the case for CoT unfaithfulness is overstated

The "sequential calculation steps" I'm referring to are the ones that CoT adds above and beyond what can be done in a single forward pass. It's the extra sequential computation added by CoT, specifically, that is bottlenecked on the CoT tokens.

There is of course another notion of "sequential calculation steps" involved: the sequential layers of the model. However, I don't think the bolded part of this is true:

replacing the token by a dot reduces the number of serial steps the model can perform (from mn to m+n, if there are m forward passes and n layers)

If a model with N layers has been trained to always produce exactly M "dot" tokens before answering, then the number of serial steps is just N, not M+N.

One way to see this is to note that we don't actually need to run M separate forward passes. We can just pre-fill a context window containing the prompt tokens followed by M dot tokens, and run 1 forward pass on the whole thing.

Having the dots does add computation, but it's only extra parallel computation – there's still only one forward pass, just a "wider" one, with more computation happening in parallel inside each of the individually parallelizable steps (tensor multiplications, activation functions).

(If we relax the constraint that the number of dots is fixed, and allow the model to choose it based on the input, that still doesn't add much: note that we could do 1 forward pass on the prompt tokens followed by a very large number of dots, then find the first position where we would have sampled a non-dot from from output distribution, truncate the KV cache to end at that point and sample normally from there.)

If you haven't read the paper I linked in OP, I recommend it – it's pretty illuminating about these distinctions. See e.g. the stuff about CoT making LMs more powerful than versus dots adding more power withing $T C^{0}$ .