LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] LLMs seem (relatively) safe
JustisMills · 2024-04-25T22:13:06.221Z · comments (24)

Per protocol analysis as medical malpractice
braces · 2024-01-31T16:22:21.367Z · comments (8)

Zvi's Manifold Markets House Rules
Zvi · 2023-11-13T00:28:02.147Z · comments (6)

Acting Wholesomely
owencb · 2024-02-26T21:49:16.526Z · comments (64)

[question] Can we get an AI to "do our alignment homework for us"?
Chris_Leong · 2024-02-26T07:56:22.320Z · answers+comments (33)

[link] Open Phil releases RFPs on LLM Benchmarks and Forecasting
LawrenceC (LawChan) · 2023-11-11T03:01:09.526Z · comments (0)

AMA: Earning to Give
jefftk (jkaufman) · 2023-11-07T16:20:10.972Z · comments (8)

Human wanting
TsviBT · 2023-10-24T01:05:39.374Z · comments (1)

Trading off Lives
jefftk (jkaufman) · 2024-01-03T03:40:05.603Z · comments (12)

AI #37: Moving Too Fast
Zvi · 2023-11-09T17:50:04.324Z · comments (5)

Self-Blinded L-Theanine RCT
niplav · 2023-10-31T15:24:57.717Z · comments (12)

We are headed into an extreme compute overhang
devrandom · 2024-04-26T21:38:21.694Z · comments (33)

Fat Tails Discourage Compromise
niplav · 2024-06-17T09:39:16.489Z · comments (5)

Calendar feature geometry in GPT-2 layer 8 residual stream SAEs
Patrick Leask (patrickleask) · 2024-08-17T01:16:53.764Z · comments (0)

Announcing the Double Crux Bot
sanyer (santeri-koivula) · 2024-01-09T18:54:15.361Z · comments (8)

The Stochastic Parrot Hypothesis is debatable for the last generation of LLMs
Quentin FEUILLADE--MONTIXI (quentin-feuillade-montixi) · 2023-11-07T16:12:20.031Z · comments (20)

Pseudonymity and Accusations
jefftk (jkaufman) · 2023-12-21T19:20:19.944Z · comments (20)

The case for stopping AI safety research
catubc (cat-1) · 2024-05-23T15:55:18.713Z · comments (38)

Can we build a better Public Doublecrux?
Raemon · 2024-05-11T19:21:53.326Z · comments (6)

BatchTopK: A Simple Improvement for TopK-SAEs
Bart Bussmann (Stuckwork) · 2024-07-20T02:20:51.848Z · comments (0)

Gradient Descent on the Human Brain
Jozdien · 2024-04-01T22:39:24.862Z · comments (5)

AI #43: Functional Discoveries
Zvi · 2023-12-21T15:50:04.442Z · comments (26)

The Geometry of Feelings and Nonsense in Large Language Models
7vik (satvik-golechha) · 2024-09-27T17:49:27.420Z · comments (10)

A D&D.Sci Dodecalogue
abstractapplic · 2024-04-12T01:10:01.625Z · comments (0)

[link] The Long-Term Future Fund is looking for a full-time fund chair
Linch · 2023-10-05T22:18:53.720Z · comments (0)

Was Releasing Claude-3 Net-Negative?
Logan Riggs (elriggs) · 2024-03-27T17:41:56.245Z · comments (5)

Schelling points in the AGI policy space
mesaoptimizer · 2024-06-26T13:19:25.186Z · comments (2)

Anthropical Paradoxes are Paradoxes of Probability Theory
Ape in the coat · 2023-12-06T08:16:26.846Z · comments (18)

Who Has the Best Food?
Zvi · 2023-09-05T13:40:07.593Z · comments (61)

[question] Intelligence Enhancement (Monthly Thread) 13 Oct 2023
Nicholas / Heather Kross (NicholasKross) · 2023-10-13T17:28:37.490Z · answers+comments (40)

AI #45: To Be Determined
Zvi · 2024-01-04T15:00:05.936Z · comments (4)

[link] OpenAI Staff (including Sutskever) Threaten to Quit Unless Board Resigns
Seth Herd · 2023-11-20T14:20:33.539Z · comments (28)

Reflections on my first year of AI safety research
Jay Bailey · 2024-01-08T07:49:08.147Z · comments (3)

AI#28: Watching and Waiting
Zvi · 2023-09-07T17:20:10.559Z · comments (14)

[link] Bed Time Quests & Dinner Games for 3-5 year olds
Gunnar_Zarncke · 2024-06-22T07:53:38.989Z · comments (0)

Provably Safe AI: Worldview and Projects
bgold · 2024-08-09T23:21:02.763Z · comments (43)

Reformative Hypocrisy, and Paying Close Enough Attention to Selectively Reward It.
Andrew_Critch · 2024-09-11T04:41:24.872Z · comments (7)

Parental Writing Selection Bias
jefftk (jkaufman) · 2024-10-13T14:00:03.225Z · comments (3)

How to Give in to Threats (without incentivizing them)
Mikhail Samin (mikhail-samin) · 2024-09-12T15:55:50.384Z · comments (25)

Reflections on "Making the Atomic Bomb"
boazbarak · 2023-08-17T02:48:19.933Z · comments (7)

OpenAI-Microsoft partnership
Zach Stein-Perlman · 2023-10-03T20:01:44.795Z · comments (19)

[link] Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems
Gunnar_Zarncke · 2024-05-16T13:09:39.265Z · comments (20)

The Shutdown Problem: Incomplete Preferences as a Solution
EJT (ElliottThornley) · 2024-02-23T16:01:16.378Z · comments (22)

Spatial attention as a “tell” for empathetic simulation?
Steven Byrnes (steve2152) · 2024-04-26T15:10:58.040Z · comments (12)

Some reasons why I frequently prefer communicating via text
Adam Zerner (adamzerner) · 2023-09-18T21:50:48.620Z · comments (18)

[link] Anthropic's updated Responsible Scaling Policy
Zac Hatfield-Dodds (zac-hatfield-dodds) · 2024-10-15T16:46:48.727Z · comments (3)

Does literacy remove your ability to be a bard as good as Homer?
Adrià Garriga-alonso (rhaps0dy) · 2024-01-18T03:43:14.994Z · comments (19)

Model evals for dangerous capabilities
Zach Stein-Perlman · 2024-09-23T11:00:00.866Z · comments (9)

Will 2024 be very hot? Should we be worried?
A.H. (AlfredHarwood) · 2023-12-29T11:22:50.200Z · comments (12)

D&D.Sci Alchemy: Archmage Anachronos and the Supply Chain Issues Evaluation & Ruleset
aphyer · 2024-06-17T21:29:08.778Z · comments (11)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

mo-putera on What's a good book for a technically-minded 11-year old?

You mention in another comment that your kid reads the encyclopaedia for fun, in which case I don't think The Martian would be too complex, no?

I'm also reminded of how I started perusing the encyclopaedia for fun at age 7. At first I understood basically nothing (English isn't my native language), but I really liked certain pictures and diagrams and keep going back to them wanting to learn more, realising that I'd comprehend say 20% more each time, which taught me to chase exponential growth in comprehension. Might be worth teaching that habit.

selfmaker662 on Laziness death spirals

“My experience may not be applicable to you.”

Thanks for the note - my experience has been exactly the opposite. A classic case of the law of equal and opposite advice :)

jmh on If far-UV is so great, why isn't it everywhere?

Had something of a similar reaction but the note about far-UV not having the same problems as other UV serilization (i.e., also harmful to humans) I gather the point is about locality. UV in ducks will kill viri in the air system. But the spread of an airborn illness goes host-to-target before it passed through the air system.

As such seems that while the in-duct UV solution would help limit spread, it's not going to do much to clean the air in the room while people are in it exhailing, coughing or sneezing, talking....

I suspect it does little to protect the people directly next/in front of a contagious person but probably good for those practicing that old 6 foot rule (or whatever the arbitray distancing rule was).

Just my guess though.

niplav on yams's Shortform

Apologies for the soldier mindset react, I pattern-matched to some more hostile comment. Communication is hard.

jmh on If far-UV is so great, why isn't it everywhere?

Quick comment regarding research.

If far-UV is really so great, and not that simple, I would assume that any company that would be selling and installing might not be some small Mom and Pop type operation. If that holds, why are the companies that want to promote and sell the systems using them and then collecting the data?

Or is would that type of investment be seen as too costly even for those with a direct interest in producing the results to bolster sales and increase the size of the network/ecosystem?

jmh on Open Thread Fall 2024

I think perhaps a first one might be:

On what evidence do I conclude what I think is know is correct/factual/true and how strong is that evidence? To what extent have I verified that view and just how extensively should I verify the evidence?

After that might be a similar approach to the implications or outcomes of applying actions based on what one holds as truth/fact.

I tend to think of rationality as a process rather than endpoint. Which isn't to say that the destination is not important but clearly without the journey the destination is just a thought or dream. That first of a thousand steps thing.

lsusr on What's a good book for a technically-minded 11-year old?

If the kid is enjoying the robot stories then that's definitely the place to start. Foundation goes well after robots.

abstractapplic on D&D Sci Coliseum: Arena of Data

Took an ML approach, got radically different results which I'm choosing to trust.

Fit a LightGBM model to the raw data, and to the data transformed by simon's stats-to-strength-and-speed model. Simon's version got slightly better results on an outsample despite having many fewer degrees of freedom and fewer chances to 'cheat' by fingerprinting exceptional fighters; I therefore used that going forward. (I also tried tweaking some of the arbitrary constants in simon's model: this invariably lowered performance, reassuring me that he got all the multipliers right.)

Iterated all possible matchups, then all possible loadouts (modulo not using the +4 boots), looking for max EV of total count of wins.

New strategy goes like this:

Against A, send U, with +3 Boots

Against B, send X, with +2 Boots and +1 Gauntlets

Against C, send V, with +3 Gauntlets

Against D, send Y, with +1 Boots and +2 Gauntlets

Notes:

The machines say this gives me ~2.6 expected victories but I'm selecting for things they liked so realistically I expect my EV somewhere in the 2-2.5 range.

If I was doing this IRL I'd move the Gauntlets from V to U, lowering EV but (almost) guaranteeing me at least one win.

My best guess about why my solution works (assuming it does) is that the "going faster than your opponent" bonus hits sharply diminishing returns around +4 speed. But that's just post hoc confabulation.

silverflame on Bitter lessons about lucid dreaming

(source epistemic status: mostly experiential and anecdotal from a lay lucid dreamer who knows a few other lucid dreamers)

The common negative effects from my lucid dreaming experiences:
- If I'm not careful with how I exert the "influence" I have in the dream, I can "crash" the dream, usually resulting in me waking up and having trouble getting back to sleep for a bit
- When I use a lot of influence in a lucid dream, especially to extend the length of a dream, I find that I end up seeming way less rested than normal (but that has proven hard to try and quantify beyond "when in the day do I hit a point of exhaustion")

A somewhat less common negative effect I keep in mind:
- Some people I know have had issues where their nightmares became far more unpleasant after trying to learn lucid dreaming to "fight back"

roger-dearnaley on Do Sparse Autoencoders (SAEs) transfer across base and finetuned language models?

Model/FinetuneGlobal Mean (Cosine) Similarity
Gemma-2b/Gemma-2b-Python-codes0.6691
Mistral-7b/Mistral-7b-MetaMath0.9648

As an ex-Googler, my informed guess would be that Gemma 2B will have been distilled (or perhaps both pruned and distilled) from a larger teacher model, presumably some larger Gemini model — and that Gemma-2b and Gemma-2b-Python-codes may well have been distilled separately, as separate students of two similar-but-not-identical teacher models distilled using different teaching datasets. The fact that the global mean cosine you find here isn't ~0 shows that if so, the separate distillation processes were either warm-started from similar models (presumably a weak 2B model — a sensible way to save some distillation expense), or at least shared the same initial token embeddings/unembeddings.

Regardless of how they were created, these two Gemma models clearly differ pretty significantly, so I'm unsurprised by your subsequent discovery that the SAE basically doesn't transfer between them.

For Mistral 7B, I would be astonished if any distillation was involved, I would expect just a standard combination of fine-tuning followed by either RLHF or something along the lines of DPO. In very high dimensional space, a cosine of 0.96 means "almost identical", so clearly the instruct training here consists of fairly small, targeted changes, and I'm unsurprised that as a result the SAE transfers quite well.