LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

On ‘Responsible Scaling Policies’ (RSPs)
Zvi · 2023-12-05T16:10:06.310Z · comments (3)

[link] Designing for a single purpose
Itay Dreyfus (itay-dreyfus) · 2024-05-07T14:11:22.242Z · comments (12)

Philosophers wrestling with evil, as a social media feed
David Gross (David_Gross) · 2024-06-03T22:25:22.507Z · comments (2)

SRE's review of Democracy
Martin Sustrik (sustrik) · 2024-08-03T07:20:01.483Z · comments (2)

[link] JumpReLU SAEs + Early Access to Gemma 2 SAEs
Senthooran Rajamanoharan (SenR) · 2024-07-19T16:10:54.664Z · comments (10)

Evaluating the truth of statements in a world of ambiguous language.
Hastings (hastings-greer) · 2024-10-07T18:08:09.920Z · comments (19)

What is the next level of rationality?
lsusr · 2023-12-12T08:14:14.846Z · comments (24)

On the lethality of biased human reward ratings
Eli Tyre (elityre) · 2023-11-17T18:59:02.303Z · comments (10)

Making Bad Decisions On Purpose
Screwtape · 2023-11-09T03:36:59.611Z · comments (8)

Highlights from Lex Fridman’s interview of Yann LeCun
Joel Burget (joel-burget) · 2024-03-13T20:58:13.052Z · comments (15)

Experiments as a Third Alternative
Adam Zerner (adamzerner) · 2023-10-29T00:39:31.399Z · comments (21)

Why the Best Writers Endure Isolation
Declan Molony (declan-molony) · 2024-07-16T05:58:25.032Z · comments (6)

Safety First: safety before full alignment. The deontic sufficiency hypothesis.
Chipmonk · 2024-01-03T17:55:19.825Z · comments (3)

[link] Spaced repetition for teaching two-year olds how to read (Interview)
Chipmonk · 2023-11-26T16:52:58.412Z · comments (9)

The Dunning-Kruger of disproving Dunning-Kruger
kromem · 2024-05-16T10:11:33.108Z · comments (0)

“Why can’t you just turn it off?”
Roko · 2023-11-19T14:46:18.427Z · comments (25)

[link] On scalable oversight with weak LLMs judging strong LLMs
zac_kenton (zkenton) · 2024-07-08T08:59:58.523Z · comments (18)

How to do conceptual research: Case study interview with Caspar Oesterheld
Chi Nguyen · 2024-05-14T15:09:30.390Z · comments (5)

AISC 2024 - Project Summaries
NickyP (Nicky) · 2023-11-27T22:32:23.555Z · comments (3)

[link] Urging an International AI Treaty: An Open Letter
Olli Järviniemi (jarviniemi) · 2023-10-31T11:26:25.864Z · comments (2)

[link] Web-surfing tips for strange times
eukaryote · 2024-05-31T07:10:25.805Z · comments (19)

[link] Book review: Xenosystems
jessicata (jessica.liu.taylor) · 2024-09-16T20:17:56.670Z · comments (18)

Interested in Cognitive Bootcamp?
Raemon · 2024-09-19T22:12:13.348Z · comments (0)

AI and the Technological Richter Scale
Zvi · 2024-09-04T14:00:08.625Z · comments (8)

[link] Can AI Outpredict Humans? Results From Metaculus's Q3 AI Forecasting Benchmark
ChristianWilliams · 2024-10-10T18:58:46.041Z · comments (2)

[link] A Good Explanation of Differential Gears
Johannes C. Mayer (johannes-c-mayer) · 2023-10-19T23:07:46.354Z · comments (4)

[link] "If we go extinct due to misaligned AI, at least nature will continue, right? ... right?"
plex (ete) · 2024-05-18T14:09:53.014Z · comments (23)

AI Pause Will Likely Backfire (Guest Post)
jsteinhardt · 2023-10-24T04:30:02.113Z · comments (6)

Value learning in the absence of ground truth
Joel_Saarinen (joel_saarinen) · 2024-02-05T18:56:02.260Z · comments (8)

Critiques of the AI control agenda
Jozdien · 2024-02-14T19:25:04.105Z · comments (14)

2023 Prediction Evaluations
Zvi · 2024-01-08T14:40:07.377Z · comments (0)

Mission Impossible: Dead Reckoning Part 1 AI Takeaways
Zvi · 2023-11-01T12:52:29.341Z · comments (13)

Some Experiments I'd Like Someone To Try With An Amnestic
johnswentworth · 2024-05-04T22:04:19.692Z · comments (33)

4. Existing Writing on Corrigibility
Max Harms (max-harms) · 2024-06-10T14:08:35.590Z · comments (13)

[link] Five projects from AI Safety Hub Labs 2023
charlie_griffin (cjgriffin) · 2023-11-08T19:19:37.759Z · comments (1)

Demis Hassabis and Geoffrey Hinton Awarded Nobel Prizes
Anna Gajdova (anna-gajdova) · 2024-10-09T12:56:24.856Z · comments (14)

Sora What
Zvi · 2024-02-22T18:10:05.397Z · comments (3)

How do we know that "good research" is good? (aka "direct evaluation" vs "eigen-evaluation")
Ruby · 2024-07-19T00:31:38.332Z · comments (21)

Caring about excellence
owencb · 2024-07-22T14:24:37.892Z · comments (4)

Environmental allergies are curable? (Sublingual immunotherapy)
Chipmonk · 2023-12-26T19:05:08.880Z · comments (10)

What distinguishes "early", "mid" and "end" games?
Raemon · 2024-06-21T17:41:30.816Z · comments (22)

shortest goddamn bayes guide ever
lukehmiles (lcmgcd) · 2024-05-10T07:06:23.734Z · comments (8)

Extended Interview with Zhukeepa on Religion
Ben Pace (Benito) · 2024-08-18T03:19:05.625Z · comments (58)

Goodbye, Shoggoth: The Stage, its Animatronics, & the Puppeteer – a New Metaphor
RogerDearnaley (roger-d-1) · 2024-01-09T20:42:28.349Z · comments (8)

[link] on neodymium magnets
bhauth · 2024-01-30T15:58:24.088Z · comments (6)

How to safely use an optimizer
Simon Fischer (SimonF) · 2024-03-28T16:11:01.277Z · comments (21)

[link] Constructive Cauchy sequences vs. Dedekind cuts
jessicata (jessica.liu.taylor) · 2024-03-14T23:04:07.300Z · comments (23)

Run evals on base models too!
orthonormal · 2024-04-04T18:43:25.468Z · comments (6)

Thoughts on "The Offense-Defense Balance Rarely Changes"
Cullen (Cullen_OKeefe) · 2024-02-12T03:26:50.662Z · comments (4)

All The Latest Human tFUS Studies
sarahconstantin · 2024-08-09T22:20:04.561Z · comments (2)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

mo-putera on What's a good book for a technically-minded 11-year old?

You mention in another comment that your kid reads the encyclopaedia for fun, in which case I don't think The Martian would be too complex, no?

I'm also reminded of how I started perusing the encyclopaedia for fun at age 7. At first I understood basically nothing (English isn't my native language), but I really liked certain pictures and diagrams and keep going back to them wanting to learn more, realising that I'd comprehend say 20% more each time, which taught me to chase exponential growth in comprehension. Might be worth teaching that habit.

selfmaker662 on Laziness death spirals

“My experience may not be applicable to you.”

Thanks for the note - my experience has been exactly the opposite. A classic case of the law of equal and opposite advice :)

jmh on If far-UV is so great, why isn't it everywhere?

Had something of a similar reaction but the note about far-UV not having the same problems as other UV serilization (i.e., also harmful to humans) I gather the point is about locality. UV in ducks will kill viri in the air system. But the spread of an airborn illness goes host-to-target before it passed through the air system.

As such seems that while the in-duct UV solution would help limit spread, it's not going to do much to clean the air in the room while people are in it exhailing, coughing or sneezing, talking....

I suspect it does little to protect the people directly next/in front of a contagious person but probably good for those practicing that old 6 foot rule (or whatever the arbitray distancing rule was).

Just my guess though.

niplav on yams's Shortform

Apologies for the soldier mindset react, I pattern-matched to some more hostile comment. Communication is hard.

jmh on If far-UV is so great, why isn't it everywhere?

Quick comment regarding research.

If far-UV is really so great, and not that simple, I would assume that any company that would be selling and installing might not be some small Mom and Pop type operation. If that holds, why are the companies that want to promote and sell the systems using them and then collecting the data?

Or is would that type of investment be seen as too costly even for those with a direct interest in producing the results to bolster sales and increase the size of the network/ecosystem?

jmh on Open Thread Fall 2024

I think perhaps a first one might be:

On what evidence do I conclude what I think is know is correct/factual/true and how strong is that evidence? To what extent have I verified that view and just how extensively should I verify the evidence?

After that might be a similar approach to the implications or outcomes of applying actions based on what one holds as truth/fact.

I tend to think of rationality as a process rather than endpoint. Which isn't to say that the destination is not important but clearly without the journey the destination is just a thought or dream. That first of a thousand steps thing.

lsusr on What's a good book for a technically-minded 11-year old?

If the kid is enjoying the robot stories then that's definitely the place to start. Foundation goes well after robots.

abstractapplic on D&D Sci Coliseum: Arena of Data

Took an ML approach, got radically different results which I'm choosing to trust.

Fit a LightGBM model to the raw data, and to the data transformed by simon's stats-to-strength-and-speed model. Simon's version got slightly better results on an outsample despite having many fewer degrees of freedom and fewer chances to 'cheat' by fingerprinting exceptional fighters; I therefore used that going forward. (I also tried tweaking some of the arbitrary constants in simon's model: this invariably lowered performance, reassuring me that he got all the multipliers right.)

Iterated all possible matchups, then all possible loadouts (modulo not using the +4 boots), looking for max EV of total count of wins.

New strategy goes like this:

Against A, send U, with +3 Boots

Against B, send X, with +2 Boots and +1 Gauntlets

Against C, send V, with +3 Gauntlets

Against D, send Y, with +1 Boots and +2 Gauntlets

Notes:

The machines say this gives me ~2.6 expected victories but I'm selecting for things they liked so realistically I expect my EV somewhere in the 2-2.5 range.

If I was doing this IRL I'd move the Gauntlets from V to U, lowering EV but (almost) guaranteeing me at least one win.

My best guess about why my solution works (assuming it does) is that the "going faster than your opponent" bonus hits sharply diminishing returns around +4 speed. But that's just post hoc confabulation.

silverflame on Bitter lessons about lucid dreaming

(source epistemic status: mostly experiential and anecdotal from a lay lucid dreamer who knows a few other lucid dreamers)

The common negative effects from my lucid dreaming experiences:
- If I'm not careful with how I exert the "influence" I have in the dream, I can "crash" the dream, usually resulting in me waking up and having trouble getting back to sleep for a bit
- When I use a lot of influence in a lucid dream, especially to extend the length of a dream, I find that I end up seeming way less rested than normal (but that has proven hard to try and quantify beyond "when in the day do I hit a point of exhaustion")

A somewhat less common negative effect I keep in mind:
- Some people I know have had issues where their nightmares became far more unpleasant after trying to learn lucid dreaming to "fight back"

roger-dearnaley on Do Sparse Autoencoders (SAEs) transfer across base and finetuned language models?

Model/FinetuneGlobal Mean (Cosine) Similarity
Gemma-2b/Gemma-2b-Python-codes0.6691
Mistral-7b/Mistral-7b-MetaMath0.9648

As an ex-Googler, my informed guess would be that Gemma 2B will have been distilled (or perhaps both pruned and distilled) from a larger teacher model, presumably some larger Gemini model — and that Gemma-2b and Gemma-2b-Python-codes may well have been distilled separately, as separate students of two similar-but-not-identical teacher models distilled using different teaching datasets. The fact that the global mean cosine you find here isn't ~0 shows that if so, the separate distillation processes were either warm-started from similar models (presumably a weak 2B model — a sensible way to save some distillation expense), or at least shared the same initial token embeddings/unembeddings.

Regardless of how they were created, these two Gemma models clearly differ pretty significantly, so I'm unsurprised by your subsequent discovery that the SAE basically doesn't transfer between them.

For Mistral 7B, I would be astonished if any distillation was involved, I would expect just a standard combination of fine-tuning followed by either RLHF or something along the lines of DPO. In very high dimensional space, a cosine of 0.96 means "almost identical", so clearly the instruct training here consists of fairly small, targeted changes, and I'm unsurprised that as a result the SAE transfers quite well.