LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] [Paper] Programming Refusal with Conditional Activation Steering
Bruce W. Lee (bruce-lee) · 2024-09-11T20:57:08.714Z · comments (0)

[link] Adverse Selection by Life-Saving Charities
vaishnav92 · 2024-08-14T20:46:23.662Z · comments (16)

[link] Things I learned talking to the new breed of scientific institution
Abhishaike Mahajan (abhishaike-mahajan) · 2024-08-29T14:00:14.844Z · comments (6)

D&D Sci Coliseum: Arena of Data
aphyer · 2024-10-18T22:02:54.305Z · comments (23)

Superintelligent AI is possible in the 2020s
HunterJay · 2024-08-13T06:03:26.990Z · comments (3)

[link] Point of Failure: Semiconductor-Grade Quartz
Annapurna (jorge-velez) · 2024-09-30T15:57:40.495Z · comments (8)

[link] An Interactive Shapley Value Explainer
James Stephen Brown (james-brown) · 2024-09-28T05:01:21.169Z · comments (9)

Monthly Roundup #23: October 2024
Zvi · 2024-10-16T13:50:05.869Z · comments (12)

[link] IAPS: Mapping Technical Safety Research at AI Companies
Zach Stein-Perlman · 2024-10-24T20:30:41.159Z · comments (12)

[link] What's important in "AI for epistemics"?
Lukas Finnveden (Lanrian) · 2024-08-24T01:27:06.771Z · comments (0)

Reflections on the Metastrategies Workshop
gw · 2024-10-24T18:30:46.255Z · comments (5)

Metastatic Cancer Treatment Since 2010: The Success Stories
sarahconstantin · 2024-11-04T22:50:09.386Z · comments (0)

Californians, tell your reps to vote yes on SB 1047!
Holly_Elmore · 2024-08-12T19:50:09.817Z · comments (24)

[question] Implications of China's recession on AGI development?
Eric Neyman (UnexpectedValues) · 2024-09-28T01:12:36.443Z · answers+comments (3)

instruction tuning and autoregressive distribution shift
nostalgebraist · 2024-09-05T16:53:41.497Z · comments (5)

[Linkpost] Play with SAEs on Llama 3
Tom McGrath · 2024-09-25T22:35:44.824Z · comments (2)

Winners of the Essay competition on the Automation of Wisdom and Philosophy
AI Impacts (AI Imacts) · 2024-10-28T17:10:04.272Z · comments (3)

2025 Color Trends
sarahconstantin · 2024-10-07T21:20:03.962Z · comments (7)

Anthropic rewrote its RSP
Zach Stein-Perlman · 2024-10-15T14:25:12.518Z · comments (19)

[link] Gwern Branwen interview on Dwarkesh Patel’s podcast: “How an Anonymous Researcher Predicted AI's Trajectory”
Said Achmiz (SaidAchmiz) · 2024-11-14T23:53:34.922Z · comments (0)

Signaling with Small Orange Diamonds
jefftk (jkaufman) · 2024-11-07T20:20:08.026Z · comments (1)

Are we dropping the ball on Recommendation AIs?
Charbel-Raphaël (charbel-raphael-segerie) · 2024-10-23T17:48:00.000Z · comments (16)

Interpretability as Compression: Reconsidering SAE Explanations of Neural Activations with MDL-SAEs
Kola Ayonrinde (kola-ayonrinde) · 2024-08-23T18:52:31.019Z · comments (5)

You're a Space Wizard, Luke
lsusr · 2024-08-18T05:35:39.238Z · comments (6)

[link] An X-Ray is Worth 15 Features: Sparse Autoencoders for Interpretable Radiology Report Generation
hugofry · 2024-10-07T08:53:14.658Z · comments (0)

Compelling Villains and Coherent Values
Cole Wyeth (Amyr) · 2024-10-06T19:53:47.891Z · comments (4)

Open Source Replication of Anthropic’s Crosscoder paper for model-diffing
Connor Kissane (ckkissane) · 2024-10-27T18:46:21.316Z · comments (1)

[link] AISafety.info: What is the "natural abstractions hypothesis"?
Algon · 2024-10-05T12:31:14.195Z · comments (2)

Book Review: On the Edge: The Business
Zvi · 2024-09-25T12:20:06.230Z · comments (0)

[link] Characterizing stable regions in the residual stream of LLMs
Jett Janiak (jett) · 2024-09-26T13:44:58.792Z · comments (4)

[link] Generative ML in chemistry is bottlenecked by synthesis
Abhishaike Mahajan (abhishaike-mahajan) · 2024-09-16T16:31:34.801Z · comments (2)

0.202 Bits of Evidence In Favor of Futarchy
niplav · 2024-09-29T21:57:59.896Z · comments (0)

Exploring SAE features in LLMs with definition trees and token lists
mwatkins · 2024-10-04T22:15:28.108Z · comments (5)

[link] Turning 22 in the Pre-Apocalypse
testingthewaters · 2024-08-22T20:28:25.794Z · comments (14)

OODA your OODA Loop
Raemon · 2024-10-11T00:50:48.119Z · comments (3)

A New Class of Glitch Tokens - BPE Subtoken Artifacts (BSA)
Lao Mein (derpherpize) · 2024-09-20T13:13:26.181Z · comments (7)

[link] I didn't have to avoid you; I was just insecure
Chipmonk · 2024-08-17T16:41:50.237Z · comments (7)

Glitch Token Catalog - (Almost) a Full Clear
Lao Mein (derpherpize) · 2024-09-21T12:22:16.403Z · comments (3)

Free Will and Dodging Anvils: AIXI Off-Policy
Cole Wyeth (Amyr) · 2024-08-29T22:42:24.485Z · comments (12)

AI Safety Camp 10
Robert Kralisch (nonmali-1) · 2024-10-26T11:08:09.887Z · comments (7)

COT Scaling implies slower takeoff speeds
Logan Zoellner (logan-zoellner) · 2024-09-28T16:20:00.320Z · comments (56)

LASR Labs Spring 2025 applications are open!
Erin Robertson · 2024-10-04T13:44:20.524Z · comments (0)

I'm creating a deep dive podcast episode about the original Leverage Research - would you like to take part?
spencerg · 2024-09-22T14:03:22.164Z · comments (2)

Distinguish worst-case analysis from instrumental training-gaming
Olli Järviniemi (jarviniemi) · 2024-09-05T19:13:34.443Z · comments (0)

[link] A Percentage Model of a Person
Sable · 2024-10-12T17:55:07.560Z · comments (3)

The murderous shortcut: a toy model of instrumental convergence
Thomas Kwa (thomas-kwa) · 2024-10-02T06:48:06.787Z · comments (0)

[link] Big tech transitions are slow (with implications for AI)
jasoncrawford · 2024-10-24T14:25:06.873Z · comments (16)

An anti-inductive sequence
Viliam · 2024-08-14T12:28:54.226Z · comments (10)

[link] Shifting Headspaces - Transitional Beast-Mode
Jonathan Moregård (JonathanMoregard) · 2024-08-12T13:02:06.120Z · comments (9)

Eye contact is effortless when you’re no longer emotionally blocked on it
Chipmonk · 2024-09-27T21:47:01.970Z · comments (24)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

nevin-wetherill on Confronting the legion of doom.

I cannot explain the thoughts of others who have read this and chose not to comment.

I would've not commented had I not gone through a specific series of 'not heavily determined' mental motions.

First, I spent some time in the AI recent news rabbit hole, including an interview with Gwern wherein he spoke very beautifully about the importance of writing.

This prompted me to check back in on LessWrong, to see what people have been writing about recently. I then noticed your post, which I presumably only saw due to a low-karma-content-filter setting I'd disabled.

And this prompted me to think "maybe that's something I could do on LessWrong to dip my toes into the waters more - replying to extremely downvoted posts on the principle that there are likely people arguing in good faith and yet falling flat on LessWrong due to some difference in taste or misapprehension about what kind of community this is."

Note, I said: people arguing in good faith.

The tone of this post does not seem "good faith." At least, not at a glance.

Framing this with the language "legion of doom" is strange and feels extremely unhelpful for having a useful conversation about what is actually true in reality.

It calls to mind disclaimers in old General Semantics literature about "emotionally charged language" - stuff that pokes people in the primate instinct parts of their brain.

That would be my guess as to why this tone feels so compelling to you. It trips those wires in my head as well. It's fun being cheeky and combative - "debate me bro, my faction vs your faction, let's fight it out."

That doesn't help people actually figure out what's true in reality. It leads to a lot of wasted time running down chains of thought that have their roots in "I don't like those guys, I'm gonna destroy their arguments with my impeccable logic and then call them idiots" - which is different than thinking thoughts like "I'm curious about what is true here, and what's actually true in reality seems important and useful to know, I should try my best to figure out what that is."

What you've written here has a lot of flaws, but my guess is that this is the main one people here would want you to acknowledge or merely not repeat.

Don't use words like "legion of doom" - people who believe these specific things about AI are not actually a cult. People here will not debate you about this stuff if you taunt them by calling them a 'legion of doom.'

They will correctly recognize that this pattern-matches to an internet culture phenomenon of people going "debate me bro" - followed by a bunch of really low effort intellectual work, and a frustrating inability to admit mistakes or avoid annoying/uncomfortable tactics.

There may be people out there - I haven't met them - who believe that they are in a cult (and are happy about that) because they loudly say they agree with Eliezer about AI. They are, I think, wrong about agreeing with Eliezer. If they think this a cult, they have done an abysmal job understanding the message. They have "failed at reading comprehension." Those people are probably not on LessWrong. You may find them elsewhere, and you can have an unproductive debate with them on a different platform where unproductive debate is sometimes celebrated.

If you want to debate the object-level content of this article with me, a good start would be showing that the points I've made so far are well-taken, or give me an account of your worldview where actually the best policy is to say things like "I'm surprised the legion of doom is so quiet?"

Mostly I don't expect this to result in a productive conversation. I've skimmed your post and if I had it on paper I'd have underlined a lot of it with some "this is wrong/weird" color of ink and drawn "?" symbols in the margins. I wrote this on a whim, which is the reason the 'legion of doom' is quiet about this post - you aren't going to catch many people's whims with this quality of bait.

lsusr on Open Thread Fall 2024

I didn't know about that. That sounds like fun!

rhollerith_dot_com on Why the 2024 election matters, the AI risk case for Harris, & what you can do to help

Some people are more concerned about S-risk than extinction risk, and I certainly don't want to dismiss them or imply that their concerns are mistaken or invalid, but I just find it a lot less likely that the AI project will lead to massive human suffering than its leading to human extinction.

the public seems pretty bought-in on AI risk being a real issue and is interested in regulation.

There's a huge gulf between people's expressing concern about AI to pollsters and the kind of regulations and shutdowns that would actually avert extinction. The people (including the "safety" people) whose careers would be set back by many years if they had to find employment outside the AI field and the people who've invested a few hundred billion into AI are a powerful lobbying group in direction opposition to the members of the general public who tell pollsters they are concerned.

I don't actually know enough about the authoritarian countries (e.g., Russia, China, Iran) to predict with any confidence how likely they are to prevent their populations from contributing to human extinction through AI. I can't help but notice though that so far it is the US and the UK that have done the most so far to advance the AI project. Also, the government's deciding to shut down movements and technological trends is much more normalized and accepted in Russia, China and Iran than it is in the West, particularly the US.

I don't have any prescriptions really. I just think that the OP (titled "why the 2024 election matters, the AI risk case for Harris, & what you can do to help", currently standing at 23 points) is badly thought out and badly reasoned, and I wish I had called for readers to downvote it because its main effect on LW was probably to add some politics-mindkill without adding any useful insight.

tslarm on What are Emotions?

So it doesn't make much sense to value emotions

I think this is a non sequitur. Everything you value can be described as just <dismissive reductionist description>, so the fact that emotions can too isn't a good argument against valuing them. And in this case, the dismissive reductionist description misses a crucial property: emotions are accompanied by (or identical with, depending on definitions) valenced qualia.

tsvibt on What are Emotions?

Emotions are hardwired stereotyped syndromes of hardwired blunt-force cognitive actions. E.g. fear makes your heart beat faster and puts an expression on your face and makes you consider negative outcomes more and maybe makes you pay attention to your surroundings. So it doesn't make much sense to value emotions, but emotions are good ways of telling that you value something; e.g. if you feel fear in response to X, probably X causes something you don't want, or if you feel happy when / after doing Y, probably Y causes / involves something you want.

williamkiely on Seven lessons I didn't learn from election day

That's a different question than the one I meant. Let me clarify:

Basically I was asking you what you think the probability is that Trump would win the election (as of a week before the election, since I think that matters) now that you know how the election turned out.

An analogous question would be the following:

Suppose I have two unfair coins. One coin is biased to land on heads 90% of the time (call it H-coin) and the other is biased to land on tails 90% of the times (T-coin). These two coins look the same to you on the outside. I choose one of the coins, then ask you how likely it is that the coin I chose will land on heads. You don't know whether the coin I'm holding is H-coin or T-coin, so you answer 50% (50%=0.5*.90=+0.5*0.10). I then flip the coin and it lands on heads. Now I ask you, knowing that the coin landed on heads, now how likely do you think it was that it would land on heads when I first tossed it? (I mean the same question by "Knowing how the election turned out, how likely do you think it was a week before the election that Trump would win?").

(Spoilers: I'd be interested in knowing your answer to this question before you read my comment on your "The value of a vote in the 2024 presidential election" EA Forum post that you linked to [EA(p) · GW(p)] to avoid getting biased by my answer/thoughts.)

deepthoughtlife on Seven lessons I didn't learn from election day

1. Kamala Harris did run a bad campaign. She was 'super popular' at the start of the campaign (assuming you can trust the polls, though you mostly can't), and 'super unpopular' losing definitively at the end of it. On September 17th, she was ahead by 2 points in polls, and in a little more than a month and a half she was down by that much in the vote. She lost so much ground. She had no good ads, no good policy positions, and was completely unconvincing to people who weren't guaranteed to vote for her from the start. She had tons of money to get out all of this, but it was all wasted.

The fact that other incumbent parties did badly is not in fact proof that she was simply doomed, because there were so many people willing to give her a chance. It was her choice to run as the candidate who 'couldn't think of a single thing' (not sure of exact quote) that she would do differently than Biden. Not a single thing!

Also, voters already punished Trump for Covid related stuff and blamed him. She was running against a person who was the Covid incumbent! And she couldn't think of a single way to take advantage of that. No one believed her that inflation was Trump's fault because she didn't even make a real case for it. It was a bad campaign.

Not taking policy positions is not a good campaign when you are mostly known for bad ones. She didn't run away very well from her unpopular positions from the past despite trying to be seen as moderate now.

I think the map you used is highly misleading. Just because there are some states that swung even more against her, doesn't mean she did well in the others. You can say that losing so many supporters in clearly left states like California doesn't matter, and neither does losing so many supporters in clearly right states like Texas, but thinking both that it doesn't matter in terms of it being a negative, and that it does matter enough that you should 'correct' the data by it is obviously bad.

2.Some polls were bad, some were not. Ho hum. But that Iowa poll was really something else. (I don't have a particular opinion on why she screwed up, aside from the fact that no one wants to be that far off if they have any pride.) She should have separately told people she thought the poll was wrong if she thought it was, did she do that? (I genuinely don't know.) I do think you should ignore her if she doesn't fix her methodology to account for nonresponse bias, because very few people actually answer polls. An intereting way might be to run a poll that just asks something like 'are you male or female?' or 'are you a democrat of Republican?' and so on so you can figure out those variables for the given election on both separate polls and on the 'who are you voting for' polls. If those numbers don't match, something is weird about the polls.

I think it is important to note that people thought the polls would be closer this time by a lot than before (because otherwise everyone would have predicted a landslide due to them being close.) You said, "Some people went into the 2024 election fearing that pollsters had not adequately corrected for the sources of bias that had plagued them in 2016 and 2020." but I mostly heard the opposite from those who weren't staunch supporters of Trump. I think the idea of how corrections had gone before we got the results was mostly partisan. Many people were sure they had been fully fixed (or overcorrected) for bias and this was not true, so people act like they are clearly off (which they were). Most people genuinely thought this was a much closer race than it turned out to be.

The margin of being off was smaller than in the past trump elections, I'll agree, but I think it is mostly the bias people are keying on rather than the absolute error. The polls have been heavily biased on average for the past three presidential cycles, and this time was still clearly biased (even if less so). With absolute error but no bias, you can just take more or larger polls, but with bias, especially an unknowable amount of bias, it is very hard to just improve things. Also, the 'moderate' bias is still larger than 2000, 2004, 2008, and 2012.

My personal theory is that the polls are mostly biased against Trump personally because it is more difficult to get good numbers on him due to interacting strangely with the electorate as compared to previous Republicans (perhaps because he isn't really a member of the same party they were), but obviously we don't actually know why. If the Trump realignment sticks around, perhaps they'll do better correcting for it later.

I do think part of the bias is the pollsters reacting to uncertainty about how to correct for things by going with the results they prefer, but I don't personally think that is the main issue here.

3.Your claim that 'Theo' was just lucky because neighbor polls are nonsense doesn't seem accurate. For one thing, neighbor polls aren't nonsense. They actually give you a lot more information than 'who are you voting for'. (Though they are speculative.) You can easily correct for how many neighbors someone has too and where they live using data on where people live, and you can also just ask 'what percentage of your neighbors are likely to vote for' to correct for the fact that it is different percentages of support.

As a separate point, a lot of people think the validity of neighbor polls comes from people believing that the respondents are largely revealing their own personal vote, though I have some issues with that explanation.

So, one bad poll with an extreme definition of 'neighbor' negates neighbor voting and many bad polls don't negate traditional? Also, Theo already had access to the normal polls as did everyone else. Even if a neighbor poll for some reason exaggerates the difference, as long as it is in the right direction, it is still evidence of what direction the polls are wrong in.

Keep in mind that the chance of Trump winning was much higher than traditional polls said. Just because Theo won with his bets doesn't mean you should believe he'd be right again, but claiming that it is 'just lucky' is a bad idea epistemologically, because you don't know what information he had that you don't.

4.I agree, we don't know whether or not the campaigns spent money wisely. The strengths and weaknesses of the candidates seemed to not rely much on the amount of money they spent, which likely does indicate they were somewhat wasteful on both sides, but it is hard to tell.

5.Is Trump a good candidate or a bad one? In some ways both. He is very charismatic in the sense of making everyone pay attention to him, which motivates both his potential supporters and potential foes to both become actual supporters and foes respectively. He also acts in ways his opponents find hard to counter, but turn off a significant number of people. An election with Trump in it is an election about Trump, whether that is good or bad for his chances.

I think it would be fairer to say Trump got unlucky with election that he lost than that he was lucky to win this one. Trump was the covid incumbent who got kicked out because of it despite having an otherwise successful first term.

We don't usually call a bad opponent luck in this manner. Harris was a quasi-incumbent from a badly performing administration who was herself a laughingstock for most of the term. She was partially chosen as a reaction to Trump! (So he made his own luck! if this is luck.)

His opponent in 2016 was obviously a bad candidate too, but again, that isn't so much 'luck'. Look closely at the graph for Clinton. Her unfavorability went way up when Trump ran against her. This is also a good example of a candidate making their own 'luck'. He was effective in his campaign to make people dislike her more.

6.Yeah, money isn't the biggest deal, but it probably did help Kamala. She isn't any good at drawing attention just by existing like Trump, so she really needed it. Most people aren't always the center of attention, so money almost always does matter to an extent.

7.I agree that your opinion of Americans shouldn't really change much by being a few points different than expected in a vote either way, especially since each individual person making the judgement is almost 50% likely to be wrong anyway! If the candidates weren't identically as good, at least as many as the lower of the two were 'wrong' (if you assume one correct choice regardless of person reasons) and it could easily be everyone who didn't vote for the lower. If they were identically as good, then it can't be that voting for one of them over the other should matter to your opinion of them. I have an opinion on which candidate was 'wrong' of course, but it doesn't really matter to the point (though I am freely willing to admit that it is the opposite of yours).

williamkiely on Seven lessons I didn't learn from election day

That makes sense, thanks.

deepthoughtlife on Seven lessons I didn't learn from election day

Some people went into the 2024 election fearing that pollsters had not adequately corrected for the sources of bias that had plagued them in 2016 and 2020.

I mostly heard the opposite, that they had overcorrected.

daniel-tan on Current safety training techniques do not fully transfer to the agent setting

This seems pretty cool! The data augmentation technique proposed seems simple and effective. I'd be interested to see a scaled-up version of this (more harmful instructions, models etc). Also would be cool to see some interpretability studies to understand how the internal mechanisms change from 'deep' alignment (and compare this to previous work, such as https://arxiv.org/abs/2311.12786, https://arxiv.org/abs/2401.01967)