LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Interpretability of SAE Features Representing Check in ChessGPT
Jonathan Kutasov (jonathan-kutasov) · 2024-10-05T20:43:36.679Z · comments (2)

[question] What prevents SB-1047 from triggering on deep fake porn/voice cloning fraud?
ChristianKl · 2024-09-26T09:17:39.088Z · answers+comments (21)

[link] The Roots of Progress 2024 in review
jasoncrawford · 2025-01-01T00:02:06.441Z · comments (0)

European Progress Conference
Martin Sustrik (sustrik) · 2024-10-06T11:10:03.819Z · comments (11)

Consequentialism is a compass, not a judge
Neil (neil-warren) · 2024-04-13T10:47:44.980Z · comments (6)

Ackshually, many worlds is wrong
tailcalled · 2024-04-11T20:23:59.416Z · comments (42)

Sleeping on Stage
jefftk (jkaufman) · 2024-10-22T00:50:07.994Z · comments (3)

The causal backbone conjecture
tailcalled · 2024-08-17T18:50:14.577Z · comments (0)

[link] what becoming more secure did for me
Chipmonk · 2024-08-22T17:44:48.525Z · comments (5)

Distillation of 'Do language models plan for future tokens'
TheManxLoiner · 2024-06-27T20:57:34.351Z · comments (2)

[link] Can a Bayesian Oracle Prevent Harm from an Agent? (Bengio et al. 2024)
mattmacdermott · 2024-09-01T07:46:26.647Z · comments (0)

My Dating Heuristic
Declan Molony (declan-molony) · 2024-05-21T05:28:40.197Z · comments (4)

Links and brief musings for June
Kaj_Sotala · 2024-07-06T10:10:03.344Z · comments (0)

Learning Multi-Level Features with Matryoshka SAEs
Bart Bussmann (Stuckwork) · 2024-12-19T15:59:00.036Z · comments (4)

LessWrong email subscriptions?
Raemon · 2024-08-27T21:59:56.855Z · comments (6)

[link] A brief history of the automated corporation
owencb · 2024-11-04T14:35:04.906Z · comments (1)

SAE features for refusal and sycophancy steering vectors
neverix · 2024-10-12T14:54:48.022Z · comments (4)

[question] Thoughts on Francois Chollet's belief that LLMs are far away from AGI?
O O (o-o) · 2024-06-14T06:32:48.170Z · answers+comments (17)

Living with Rats in College
lsusr · 2024-12-25T10:44:13.085Z · comments (0)

[link] UK AISI: Early lessons from evaluating frontier AI systems
Zach Stein-Perlman · 2024-10-25T19:00:21.689Z · comments (0)

Mask and Respirator Intelligibility Comparison
jefftk (jkaufman) · 2024-12-07T03:20:01.585Z · comments (5)

[question] Seeking AI Alignment Tutor/Advisor: $100–150/hr
MrThink (ViktorThink) · 2024-10-05T21:28:16.491Z · answers+comments (3)

AI Safety University Organizing: Early Takeaways from Thirteen Groups
agucova · 2024-10-02T15:14:00.137Z · comments (0)

[link] Death notes - 7 thoughts on death
Nathan Young · 2024-10-28T15:01:13.532Z · comments (1)

Optimizing Repeated Correlations
SatvikBeri · 2024-08-01T17:33:23.823Z · comments (1)

Evidential Correlations are Subjective, and it might be a problem
Martín Soto (martinsq) · 2024-03-07T18:37:54.105Z · comments (6)

Just because an LLM said it doesn't mean it's true: an illustrative example
dirk (abandon) · 2024-08-21T21:05:59.691Z · comments (12)

How do LLMs give truthful answers? A discussion of LLM vs. human reasoning, ensembles & parrots
Owain_Evans · 2024-03-28T02:34:21.799Z · comments (0)

AI #57: All the AI News That’s Fit to Print
Zvi · 2024-03-28T11:40:05.435Z · comments (14)

Do Sparse Autoencoders (SAEs) transfer across base and finetuned language models?
Taras Kutsyk · 2024-09-29T19:37:30.465Z · comments (8)

Intranasal mRNA Vaccines?
J Bostock (Jemist) · 2025-01-01T23:46:40.524Z · comments (2)

Why I think it's net harmful to do technical safety research at AGI labs
Remmelt (remmelt-ellen) · 2024-02-07T04:17:15.246Z · comments (24)

[link] Positive visions for AI
L Rudolf L (LRudL) · 2024-07-23T20:15:26.064Z · comments (4)

D&D.Sci Hypersphere Analysis Part 3: Beat it with Linear Algebra
aphyer · 2024-01-16T22:44:52.424Z · comments (1)

Vote in the LessWrong review! (LW 2022 Review voting phase)
habryka (habryka4) · 2024-01-17T07:22:17.921Z · comments (9)

Evaluating Solar
jefftk (jkaufman) · 2024-02-17T21:50:04.783Z · comments (5)

[link] Emotional issues often have an immediate payoff
Chipmonk · 2024-06-10T23:39:40.697Z · comments (2)

[link] Let's Design A School, Part 2.1 School as Education - Structure
Sable · 2024-05-02T22:04:30.435Z · comments (3)

[link] my favourite Scott Sumner blog posts
DMMF · 2024-06-11T14:40:43.093Z · comments (0)

Talk: AI safety fieldbuilding at MATS
Ryan Kidd (ryankidd44) · 2024-06-23T23:06:37.623Z · comments (2)

Causality is Everywhere
silentbob · 2024-02-13T13:44:49.952Z · comments (12)

Ideas for Next-Generation Writing Platforms, using LLMs
ozziegooen · 2024-06-04T18:40:24.636Z · comments (4)

Exploring OpenAI's Latent Directions: Tests, Observations, and Poking Around
Johnny Lin (hijohnnylin) · 2024-01-31T06:01:27.969Z · comments (4)

[link] Manifold Markets
PeterMcCluskey · 2024-02-02T17:48:36.630Z · comments (9)

Three Types of Constraints in the Space of Agents
Nora_Ammann · 2024-01-15T17:27:27.560Z · comments (3)

What is the best argument that LLMs are shoggoths?
JoshuaFox · 2024-03-17T11:36:23.636Z · comments (22)

Smartphone Etiquette: Suggestions for Social Interactions
Declan Molony (declan-molony) · 2024-06-04T06:01:03.336Z · comments (4)

Improving SAE's by Sqrt()-ing L1 & Removing Lowest Activating Features
Logan Riggs (elriggs) · 2024-03-15T16:30:00.744Z · comments (5)

Meetup In a Box: Year In Review
Czynski (JacobKopczynski) · 2024-02-14T01:18:28.259Z · comments (1)

[question] How are you preparing for the possibility of an AI bust?
Nate Showell · 2024-06-23T19:13:45.247Z · answers+comments (16)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

lucas-spailier on Tax Price Gouging?

I'm never sure if people do not see the logical case for price control in a crisis, or if they see it but believe it doesn't apply.

In any case it's probably worth it to share what I think is the (rationalized, sanewashed) reasoning against price gouging. The central example to keep in mind while reading this is a peasant during a famine in 15th-century France.

Survival is at stake. There is not enough survival goods for everyone.
Rich and powerful people shouldn't have an excessively easier time surviving than others. Whether or not one thinks that makes sense at society level, it would not be accepted by the poorer majority during the crisis.
Depending on the amount of stuff available and the level of wealth inequality, survival goods might be bid up entirely out of poorer people's ability to pay. There is no inherent reason why you couldn't end up with rich people eating their full while a poorer majority starve.
Goods coming from outside the disaster area might or might not help. If the price required to incentivize bringing these goods is higher than the already inflated price of local goods, it won't help at all. If the price is lower, it will help lower prices, but then it will act as a lower floor on the price of survival goods. There might not be enough outside goods coming in to reach this lower floor, and there is no inherent reason why this lower floor could not be too high for a majority to pay anyway.
Violence is the historical tool to incentivize richer people to accept some hardship they could pay their way out of, in order to let others survive : reduced comfort is less dangerous than the threat of an angry mob.
While the threat of violence makes it less worthwhile (because more risky) to bring relief from outside the disaster area, without it this relief might not benefit the poorer majority at all, so they have no reason to care.
Laws against price gouging, and other form of market interference by the state in a crisis, are deemed less disruptive than riots, so they are used as substitute when possible.

I think the above reasoning mostly holds for the central example it's meant for. It seems obvious that all the differences between 15th century France and 21st century America push in the direction of making it less likely to be correct. But it's not obvious (to me) that these differences are enough to invalidate the argument entirely - I haven't really try to model it properly, so I don't have a strong opinion one way or another.

But I'm not sure it matters politically, since it's not like this is what price control enthusiasts think when they argue against price gouging, the logic has been baked into cultural expectations of 'fairness'.

Back to your proposition, I think if anything it would make things worse. From the point of view of someone on the streets during a disaster, the state would just be another rich and powerful actor driving up the cost of basic goods.

jbash on What are the chances that Superhuman Agents are already being tested on the internet?

If you had a really superhuman agent, and you wanted to hide it, why would you blow your cover by playing silly games or making obvious GitHub commits? It's already SOP for social media bots to hide behind many accounts (and use many styles). So unless you have access to a lot of investment information that's typically kept confidential...

Even in stuff like cracking into other people's computers, you'd want to avoid being extremely obvious.

yoav-ravid on Per Tribalismum ad Astra

I looked into bridging-based ranking a bit more. The term seems to have been introduced in Aviv Ovadya's paper from 2022. Aviv and his partner Luke Thorburn have a website dedicated to Bridging systems. And this article seems like a good explanation of the concept and how it's applied in community notes.

I haven't found applications to voting.

p-joao on Don’t ignore bad vibes you get from people

How often do we risk losing something important by assigning too high a priority to a bad vibe?

Personally, I try not to ignore any vibe or thought, but I also attempt to prioritize them by importance. Maybe I should start a 'bad vibes journal'—a record of every time I feel something off and then compare it to the actual outcomes. My sense is that I often misjudge, but without tracking it, I can’t really calibrate my accuracy.

michael-roe on Don’t ignore bad vibes you get from people

I will redact out the name of the person here, but it’s a moderately well known UK politician.

The question sometimes comes up as to whether X is an anti-Semite. To which, people have had direct dealings with X typically respond with something to that they don’t think X has it in for Jews specifically, but they think X is a complete asshole ..and then launch into telling some story of a thing X did that annoyed them. This is, to my mind, not exactly an endorsement of X’s character.

johannes-c-mayer on Exercise: Planmaking, Surprise Anticipation, and "Baba is You"

If you've tried this earnestly 3 times, after the 3rd time, I think it's fine to switch to just trying to solve the level however you want (i.e. moving your character around the screen, experimenting).

After you failed 3 times, wouldn't it be a better exercise to just play around in the level until you get a new pice of information that you predict will allow you to reformulate better plans, and then step back into planning mode again?

raemon on Don’t ignore bad vibes you get from people

Curated. This is a generally important point, which I've also learned this the hard way. And I like how Kaj includes two important caveats while making it (i.e. some advice on distinguishing prejudice from bad vibes, and what sorts of people should maybe consider the opposite advice)

dzoldzaya on Don’t ignore bad vibes you get from people

I’m not saying to endorse prejudice. But my experience is that many types of prejudice feel more obvious. If someone has an accent that I associate with something negative, it’s usually pretty obvious to me that it’s their accent that I’m reacting to.
Of course, not everyone has the level of reflectivity to make that distinction. But if you have thoughts like “this person gives me a bad vibe but maybe that’s just my internalized prejudice and I should ignore it”, then you probably have enough metacognition to also notice if there’s any clear trait you’re prejudiced about, and whether you would feel the same way about other people with that trait.

It seems like the most common situation when you'd ignore bad vibes would be when a trait like this confuses your signals. When you identify a negative trait that "feels more obvious", especially if it's socially taboo to be prejudiced against (race, ethnicity/accent, LGBT-status, mental/physical disability), this can interfere with your ability to correctly interpret other evidence (including "vibes"), so that it's very easy to overcompensate the other way.

The classic example from women's self-defence classes: you enter an enclosed space (e.g. a lift) with a man of a particular ethnicity who makes you instantly nervous. You consider not getting in, but then think "oh, this must just be his ethnicity I'm reacting to", castigate yourself for your prejudice, ignore the bad vibes, get in any way, and it turns out he was dodgy.

Or a neuro-atypical colleague suggests a small business venture in a manner that would normally raise red flags. You get "bad vibes", but you interpret this as irrational prejudice against autistic behaviour traits, so you go along with it despite your vibes. Only later do you realise that your red flags were real, and your correction for prejudice was adding unnecessary noise into your decision-making.

I don't know whether there's evidence to back this up, but my sense is that "correction for potential prejudice" would be the major source of error here, especially among people who are more reflective.

technicalities on Shallow review of technical AI safety, 2024

I hear that you and your band have sold your technical agenda and bought suits. I hear that you and your band have sold your suits and bought gemma scope rigs.

(riff on this tweet, which is a riff on the original)

ege-erdil on meemi's Shortform

Yes.