LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Superintelligence Can't Solve the Problem of Deciding What You'll Do
Vladimir_Nesov · 2024-09-15T21:03:28.077Z · comments (11)

An AI crash is our best bet for restricting AI
Remmelt (remmelt-ellen) · 2024-10-11T02:12:03.491Z · comments (1)

Cheap Whiteboards!
Johannes C. Mayer (johannes-c-mayer) · 2024-08-08T13:52:59.627Z · comments (2)

Domain-specific SAEs
jacob_drori (jacobcd52) · 2024-10-07T20:15:38.584Z · comments (0)

Investigating Sensitive Directions in GPT-2: An Improved Baseline and Comparative Analysis of SAEs
Daniel Lee (daniel-lee) · 2024-09-06T02:28:41.954Z · comments (0)

There aren't enough smart people in biology doing something boring
Abhishaike Mahajan (abhishaike-mahajan) · 2024-10-21T15:52:04.482Z · comments (13)

[link] Predicting Influenza Abundance in Wastewater Metagenomic Sequencing Data
jefftk (jkaufman) · 2024-09-23T17:25:58.380Z · comments (0)

[question] Seeking AI Alignment Tutor/Advisor: $100–150/hr
MrThink (ViktorThink) · 2024-10-05T21:28:16.491Z · answers+comments (3)

Open Thread Fall 2024
habryka (habryka4) · 2024-10-05T22:28:50.398Z · comments (67)

[link] Evaluating Synthetic Activations composed of SAE Latents in GPT-2
Giorgi Giglemiani (Rakh) · 2024-09-25T20:37:48.227Z · comments (0)

[link] Can a Bayesian Oracle Prevent Harm from an Agent? (Bengio et al. 2024)
mattmacdermott · 2024-09-01T07:46:26.647Z · comments (0)

[Intuitive self-models] 6. Awakening / Enlightenment / PNSE
Steven Byrnes (steve2152) · 2024-10-22T13:23:08.836Z · comments (0)

LessWrong email subscriptions?
Raemon · 2024-08-27T21:59:56.855Z · comments (6)

Optimizing Repeated Correlations
SatvikBeri · 2024-08-01T17:33:23.823Z · comments (1)

5 ways to improve CoT faithfulness
CBiddulph (caleb-biddulph) · 2024-10-05T20:17:12.637Z · comments (8)

Just because an LLM said it doesn't mean it's true: an illustrative example
dirk (abandon) · 2024-08-21T21:05:59.691Z · comments (12)

Do Sparse Autoencoders (SAEs) transfer across base and finetuned language models?
Taras Kutsyk · 2024-09-29T19:37:30.465Z · comments (7)

The causal backbone conjecture
tailcalled · 2024-08-17T18:50:14.577Z · comments (0)

Can Generalized Adversarial Testing Enable More Rigorous LLM Safety Evals?
scasper · 2024-07-30T14:57:06.807Z · comments (0)

[link] SB 1047 gets vetoed
ryan_b · 2024-09-30T15:49:38.609Z · comments (1)

A Visual Task that's Hard for GPT-4o, but Doable for Primary Schoolers
Lennart Finke (l-f) · 2024-07-26T17:51:28.202Z · comments (4)

SAE features for refusal and sycophancy steering vectors
neverix · 2024-10-12T14:54:48.022Z · comments (4)

[link] Introduction to Super Powers (for kids!)
Shoshannah Tekofsky (DarkSym) · 2024-09-20T17:17:27.070Z · comments (0)

[link] A primer on the next generation of antibodies
Abhishaike Mahajan (abhishaike-mahajan) · 2024-09-01T22:37:59.207Z · comments (0)

Fun With The Tabula Muris (Senis)
sarahconstantin · 2024-09-20T18:20:01.901Z · comments (0)

[question] When can I be numerate?
FinalFormal2 · 2024-09-12T04:05:27.710Z · answers+comments (3)

The case for more Alignment Target Analysis (ATA)
Chi Nguyen · 2024-09-20T01:14:41.411Z · comments (13)

AXRP Episode 36 - Adam Shai and Paul Riechers on Computational Mechanics
DanielFilan · 2024-09-29T05:50:02.531Z · comments (0)

Sleeping on Stage
jefftk (jkaufman) · 2024-10-22T00:50:07.994Z · comments (3)

[link] what becoming more secure did for me
Chipmonk · 2024-08-22T17:44:48.525Z · comments (5)

[link] Fictional parasites very different from our own
Abhishaike Mahajan (abhishaike-mahajan) · 2024-09-08T14:59:39.080Z · comments (0)

Proving the Geometric Utilitarian Theorem
StrivingForLegibility · 2024-08-07T01:39:10.920Z · comments (0)

[link] Beware the science fiction bias in predictions of the future
Nikita Sokolsky (nikita-sokolsky) · 2024-08-19T05:32:47.372Z · comments (20)

[link] Conventional footnotes considered harmful
dkl9 · 2024-10-01T14:54:01.732Z · comments (16)

A Triple Decker for Elfland
jefftk (jkaufman) · 2024-10-11T01:50:02.332Z · comments (0)

You're Playing a Rough Game
jefftk (jkaufman) · 2024-10-17T19:20:06.251Z · comments (2)

I didn't think I'd take the time to build this calibration training game, but with websim it took roughly 30 seconds, so here it is!
mako yass (MakoYass) · 2024-08-02T22:35:21.136Z · comments (2)

Seeking Mechanism Designer for Research into Internalizing Catastrophic Externalities
c.trout (ctrout) · 2024-09-11T15:09:48.019Z · comments (2)

[link] "25 Lessons from 25 Years of Marriage" by honorary rationalist Ferrett Steinmetz
CronoDAS · 2024-10-02T22:42:30.509Z · comments (2)

[link] Altruism and Vitalism Aren't Fellow Travelers
Arjun Panickssery (arjun-panickssery) · 2024-08-09T02:01:11.361Z · comments (2)

[link] Liquid vs Illiquid Careers
vaishnav92 · 2024-10-20T23:03:49.725Z · comments (4)

Trying to be rational for the wrong reasons
Viliam · 2024-08-20T16:18:06.385Z · comments (8)

[link] [Talk transcript] What “structure” is and why it matters
Alex_Altair · 2024-07-25T15:49:00.844Z · comments (0)

AI Safety University Organizing: Early Takeaways from Thirteen Groups
agucova · 2024-10-02T15:14:00.137Z · comments (0)

[link] Foundations - Why Britain has stagnated [crosspost]
Nathan Young · 2024-09-23T10:43:20.411Z · comments (1)

[link] The unreasonable effectiveness of plasmid sequencing as a service
Abhishaike Mahajan (abhishaike-mahajan) · 2024-10-08T02:02:55.352Z · comments (0)

[link] The Offense-Defense Balance of Gene Drives
Maxwell Tabarrok (maxwell-tabarrok) · 2024-09-27T16:47:25.976Z · comments (1)

[link] Tokyo AI Safety 2025: Call For Papers
Blaine (blaine-rogers) · 2024-10-21T08:43:38.467Z · comments (0)

Apply to the Cooperative AI PhD Fellowship by October 14th!
Lewis Hammond (lewis-hammond-1) · 2024-10-05T12:41:24.093Z · comments (0)

GPT-3.5 judges can supervise GPT-4o debaters in capability asymmetric debates
Charlie George (charlie-george) · 2024-08-27T20:44:08.683Z · comments (7)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

davekasten on davekasten's Shortform

It seems like the current meta is to write a big essay outlining your opinions about AI (see, e.g., Gladstone Report, Situational Awareness, various essays recently by Sam Altman and Dario Amodei, even the A Narrow Path report I co-authored).

Why do we think this is the case?
I can imagine at least 3 hypotheses:
1. Just path-dependence; someone did it, it went well, others imitated

2. Essays are High Status Serious Writing, and people want to obtain that trophy for their ideas

3. This is a return to the true original meaning of an essay, under Montaigne, that it's an attempt to write thinking down when it's still inchoate, in an effort to make it more comprehensible not only to others but also to oneself. And AGI/ASI is deeply uncertain, so the essay format is particularly suited for this.

What do you think?

habryka4 on A Defense of Peer Review

An exciting recent development is community peer review, also called “open peer review.” Under this system, preprints are uploaded to a server, wherein a pool of reviewers can look them over and decide which, if any, they would like to review. Articles that have made it out of this pool are then selected for publication. This differs from the “upload PDFs to the internet” ideas because it is more structured, results in a definitive outcome, and allows gatekeeping in terms of the composition of the pool of reviewers.

That... seems like a weird framing of what is going on? Community peer-review was the standard before anonymous and random peer review ended up being forced on the scientific institution, the way this article describes. Post-publication community peer review was the standard in most fields until the mid of the 20th century, and describing it as an exciting recent development feels like it's conceding the whole debate.

Yes, just do post-publication peer review. Let journals and authors curate which papers they think are good at the same time as everyone else gets to read them. That's what science did before various large government funding bodies demanded more objectivity in the process (with, as this article and other articles it links to, great harm to the process of science).

viliam on TurnTrout's shortform feed

Morality means sometimes not following the incentives.

I am not saying that people who always follow the incentives are immoral. Maybe they are just lucky and their incentives happen to be aligned with doing the right thing. Too much luck is suspicious though.

eggsyntax on Exploring SAE features in LLMs with definition trees and token lists

My main insight from all this is that we should be thinking in terms of taxonomisation of features. Some are very token-specific, others are more nuanced and context-specific (in a variety of ways). The challenge of finding maximally activating text samples might be very different from one category of features to another.

Joseph and Johnny did some interesting work on this in 'Understanding SAE Features with the Logit Lens' [LW · GW], taxonomizing features as partition features vs suppression features vs prediction features, and using summary statistics to distinguish them.

jonas-hallgren on Liquid vs Illiquid Careers

No sorry, I meant from the perspective of the person with less legible skills.

ape-in-the-coat on Change My Mind: Thirders in "Sleeping Beauty" are Just Doing Epistemology Wrong

I suppose the participant is just supposed to lie about their credence here in order to "win".

On Tuesday your credence in Heads supposed to be 0, but saying the true value would go against the experimental protocol unless you also said that your credence is 0 on Monday, which would also be a lie.

satchlj on Arithmetic is an underrated world-modeling technology

Calculations on Hydroelectric Energy Storage

For those interested in the numbers on pumped hydroelectric storage, we can get more energy by increasing 'head' or the distance that the weight falls, from 6 meters to up to 500 meters for some of the largest projects (and we could in theory go bigger).

Let's pick a more reasonable number like 60 meters:

MASS/house = 15 kWh/house / (9.8 m/s² × 60 m) = 91,836 kg/house = 91 m^3/house

Let's say we have a dam with ~20 meters of water level fluctuation (drawdown). Then that's 5 m^2 per house of surface area.

As a sanity check, Bath County Pumped Storage Station in VA stores about 24000 MWh/ 30 KWh/house = 800,000 houses worth of energy.

800,000 houses * 5 m^2 = 4 km^2

The Bath County reservoir is about 1km^2 so we're in the right range here (the reservoir has a little more drawdown and a way bigger head).

charlie-steiner on Resolving von Neumann-Morgenstern Inconsistent Preferences

Nice, especially the second half.

To me, this post makes it seem like the natural place to push forward is to try and restrict incoherent policies over lotteries until they're just barely well-behaved - either on the behavior end (continuity etc) or the algorithmic end (compte budgets etc).

lorxus on D&D Sci Coliseum: Arena of Data

I'm going to start by attacking this a little on my own before I even look much at what other people have done.

Some initial observations from the SQL+Python practice this gave me a good excuse to do:

Adelon looks to have rough matchups against Elf Monks. Which we don't have. They are however soft to even level 3-4 challengers sometimes. Maybe Monks and/or Fencers have an edge on Warriors?
Bauchard seems to have particularly strong matchups against other Knights, so we don't send Velaya there. They seem a little soft to Monks and to Dwarf Ninjas and especially to Knights, so maybe Zelaya? Boots should help here.
Cadagal has precious few defeats, but one of them might be to a level 2(!) Human Warrior with fancy +3 Gauntlets. Though it seems like there's a lot of combats where some Cadagal-like fighter has +4 Boots instead? Not sure if that's the same guy.
- And on that note, the max level is 7, and the max bonus for Boots and Gauntlets both is +4.
- Max Boots (+4) is always on a level 7 Elf Ninja with +3 Gauntlets (but disappears altogether most of the way through the dataset).
- Max Gauntlets (+4) is on either a level 7 Dwarf Monk who upgraded from +1 Boots to +3 Boots halfway through, or else there's two of them. Thankfully we're not facing them.
Deepwrack poses problems. They have just as few defeats, and one of them even contradicts the ordering I derived below! Ninjas are meant to lose to Monks. Maybe the speed matters a lot in that case?
It looks like a strict advantage in level or gear - holding all else constant - means you win every time. If everything is totally identical, you win about half the time. (Which seems obvious but worth checking.)
Looking through upsets - bouts where the classes are different, the losing fighter had at least 2 levels on the winner, and the loser's gear was no better than the winner's - we generally see that:
- Fencers beat Monks and Rangers and lose to Knights, Ninjas, and Warriors
- Knights beat Fencers and Ninjas, tie(???) with Monks and Warriors, and lose (weakly) to Rangers
- Monks beat Ninjas, Rangers, and maybe Warriors, tie (?) with Knights, and lose to Fencers
- Ninjas beat Fencers and (weakly) Rangers, and lose to Knights, Monks, and Warriors
- Rangers beat Knights (weakly), Ninjas, and Warriors, tie with Fencers, and lose to Monks
- Warriors beat Fencers, Ninjas, tie(?) with Knights, and lose to Rangers and maybe Monks

So my current best guess (pending understanding which gear is best for which class/race) is:

Willow v Adelon, Varina v Bauchard, Xerxes v Cadagal, Yalathinel v Deepwrack.

If I had to guess what gear to give to who: Warrior v Knight is a rough matchup, so Varina's going to need the help; the rest of my assignments are based thus far on ~vibes~ for whether speed or power will matter more for the class. Thus:

Willow gets +2 Boots and +1 Gauntlets, Varina gets +4 Boots and +3 Gauntlets, Xerxes gets +1 Boots and +2 Gauntlets, and Yalathinel gets +3 Boots.

Some theories I need to test:

Race affects how good you are at a class. Elves might be best at rangering, say.
Race and/or class affect how much benefit you get out of boots and/or gauntlets. Being a warrior might mean you get full benefit from gauntlets but none from boots.
Color might affect how well classes do. Ninjas wearing red might win way less often.
- The color does not actually seem to affect ninjas all that much if at all - 6963 vs 6762 wins. Could still be a tiebreaker?
- Color doesn't affect things much overall either: 40136 vs 39961 wins.
There's some rank-ordering of class+race+level matchups, maybe an additive one.
- Alternatively there could be some nontransitive thing going on with tiebreaks sometimes from levels, races, and gear?
- On further reflection that totally seems to be what's going on here.
- Maybe there's something about the matchup ordering being sorted over (race, class)? D's loss (as a L6 Dwarf Monk) to a L4 Dwarf Ninja is... unexpected to say the least!

Wild speculation:

If you [use the +4 Boots in combat and beat Cadagal then they'll know you were] responsible [for] ????? ?????? [Boots from his/her/the] House. [You will gain its] lasting enmity, [and] [people? will?] ???????? ???? ??? ???? ?? ??? ???? ??????? ?? ?? ????? ?? ????????? ?? ??? [upon] your honor [if] ????????? ???? ?? ???? ??? ???? ??? ??? friendship ???? ?? ??? ???? ?? ??? ?????? ??????? ?? ?? ?????.
- So maybe we're OK to use the +4 Boots as long as it's not against Cadagal?
- No idea how to even guess at what's going on in that second sentence apart from "bad things will happen and everyone will hate you, you dirty thief".

declan-molony on Conversational Signposts—An Antidote to Dull Social Interactions

Sure! Here are two of my favorites.

(1) From Leil Lowndes' book:

Don't ask what they do. In the US in my experience, the most common question upon meeting someone is "what do you do?" But the problem with this is that while 65% of Americans are satisfied with their jobs, only 20% of Americans are passionate about their work. From Lowndes:

If you instead ask, "How do you enjoy spending most of your time?" It allows people to mention their job or their hobbies. And homemakers are no longer embarrassed to say, "I'm just a mom" to the question of "what do you do?"

(2) From Dale Carnegie's book:

Never disagree and say "you're wrong". I am a naturally disagreeable person. Learning about this technique hasn't made me more agreeable, I just express my disagreement differently now. From Carnegie:

Never announce, "I am going to prove so-and-so to you." That's bad. That's tantamount to saying: "I am smarter than you are and am going to make you change your mind."
"We sometimes find ourselves changing our minds without any resistance; but if we are told we are wrong, we resent the imputation and harden our hearts. We are heedless in the formation of our beliefs, but find ourselves filled with a passion for them when anyone proposes to rob us of their companionship. It is obviously not the ideas themselves that are dear to us, but our self-esteem which is threatened."—James Harvey Robinson

I've adopted a more indirect way of challenging people's beliefs. Rather than stating my disagreement, I tend to ask questions (à la the Socratic Method) to get to the root of somebody's belief. Sometimes they'll notice contradictions in their own arguments without me having to point them out.