LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Prices are Bounties
Maxwell Tabarrok (maxwell-tabarrok) · 2024-10-12T14:51:40.689Z · comments (13)

Metastatic Cancer Treatment Since 2010: The Success Stories
sarahconstantin · 2024-11-04T22:50:09.386Z · comments (2)

Model evals for dangerous capabilities
Zach Stein-Perlman · 2024-09-23T11:00:00.866Z · comments (11)

I Finally Worked Through Bayes' Theorem (Personal Achievement)
keltan · 2024-12-05T02:04:16.547Z · comments (6)

Claude Sonnet 3.5.1 and Haiku 3.5
Zvi · 2024-10-24T14:50:06.286Z · comments (9)

[link] Just one more exposure bro
Chipmonk · 2024-12-12T21:37:07.069Z · comments (6)

[link] Anthropic's updated Responsible Scaling Policy
Zac Hatfield-Dodds (zac-hatfield-dodds) · 2024-10-15T16:46:48.727Z · comments (3)

Correct my H5N1 research ($reward)
Elizabeth (pktechgirl) · 2024-12-09T19:07:03.277Z · comments (24)

The OODA Loop -- Observe, Orient, Decide, Act
Davis_Kingsley · 2025-01-01T08:00:27.979Z · comments (2)

Which evals resources would be good?
Marius Hobbhahn (marius-hobbhahn) · 2024-11-16T14:24:48.012Z · comments (4)

[Intuitive self-models] 7. Hearing Voices, and Other Hallucinations
Steven Byrnes (steve2152) · 2024-10-29T13:36:16.325Z · comments (2)

Low Probability Estimation in Language Models
Gabriel Wu (gabriel-wu) · 2024-10-18T15:50:05.947Z · comments (0)

Applications of Chaos: Saying No (with Hastings Greer)
Elizabeth (pktechgirl) · 2024-09-21T16:30:07.415Z · comments (16)

[link] Can AI Outpredict Humans? Results From Metaculus's Q3 AI Forecasting Benchmark
ChristianWilliams · 2024-10-10T18:58:46.041Z · comments (2)

[link] A toy evaluation of inference code tampering
Fabien Roger (Fabien) · 2024-12-09T17:43:40.910Z · comments (0)

DeekSeek v3: The Six Million Dollar Model
Zvi · 2024-12-31T15:10:06.924Z · comments (6)

[link] [Paper Blogpost] When Your AIs Deceive You: Challenges with Partial Observability in RLHF
Leon Lang (leon-lang) · 2024-10-22T13:57:41.125Z · comments (2)

A Solution for AGI/ASI Safety
Weibing Wang (weibing-wang) · 2024-12-18T19:44:29.739Z · comments (29)

AI #82: The Governor Ponders
Zvi · 2024-09-19T13:30:04.863Z · comments (8)

D&D.Sci Dungeonbuilding: the Dungeon Tournament
aphyer · 2024-12-14T04:30:55.656Z · comments (16)

[link] Review: Breaking Free with Dr. Stone
TurnTrout · 2024-12-18T01:26:37.730Z · comments (5)

[link] cancer rates after gene therapy
bhauth · 2024-10-16T15:32:53.949Z · comments (0)

Work with me on agent foundations: independent fellowship
Alex_Altair · 2024-09-21T13:59:16.706Z · comments (5)

AI #94: Not Now, Google
Zvi · 2024-12-12T15:40:06.336Z · comments (3)

[link] Book review: Xenosystems
jessicata (jessica.liu.taylor) · 2024-09-16T20:17:56.670Z · comments (18)

Toy Models of Feature Absorption in SAEs
chanind · 2024-10-07T09:56:53.609Z · comments (8)

[link] Careless thinking: A theory of bad thinking
Nathan Young · 2024-12-17T18:23:16.140Z · comments (17)

Dave Kasten's AGI-by-2027 vignette
davekasten · 2024-11-26T23:20:47.212Z · comments (8)

Greedy-Advantage-Aware RLHF
sej2020 · 2024-12-27T19:47:25.562Z · comments (15)

[link] Active Recall and Spaced Repetition are Different Things
Saul Munn (saul-munn) · 2024-11-08T20:14:56.092Z · comments (2)

Live Machinery: An Interface Design Philosophy for Wholesome AI Futures
Sahil · 2024-11-01T17:24:09.957Z · comments (3)

Looking back on the Future of Humanity Institute - Asterisk
jakeeaton · 2024-11-19T00:44:40.928Z · comments (0)

Demis Hassabis and Geoffrey Hinton Awarded Nobel Prizes
Anna Gajdova (anna-gajdova) · 2024-10-09T12:56:24.856Z · comments (14)

Interested in Cognitive Bootcamp?
Raemon · 2024-09-19T22:12:13.348Z · comments (0)

Cognitive Biases Contributing to AI X-risk — a deleted excerpt from my 2018 ARCHES draft
Andrew_Critch · 2024-12-03T09:29:49.745Z · comments (2)

Evaluating the truth of statements in a world of ambiguous language.
Hastings (hastings-greer) · 2024-10-07T18:08:09.920Z · comments (19)

An alternative approach to superbabies
Towards_Keeperhood (Simon Skade) · 2024-11-05T22:56:15.740Z · comments (19)

On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback
Marcus Williams · 2024-11-07T15:39:06.854Z · comments (6)

I finally got ChatGPT to sound like me
lsusr · 2024-09-17T09:39:59.415Z · comments (18)

[link] A dataset of questions on decision-theoretic reasoning in Newcomb-like problems
Caspar Oesterheld (Caspar42) · 2024-12-16T22:42:03.763Z · comments (1)

D&D.Sci Coliseum: Arena of Data Evaluation and Ruleset
aphyer · 2024-10-29T01:21:03.075Z · comments (12)

We Don't Know Our Own Values, but Reward Bridges The Is-Ought Gap
johnswentworth · 2024-09-19T22:22:05.307Z · comments (47)

[link] Epistemic status: poetry (and other poems)
Richard_Ngo (ricraz) · 2024-11-21T18:13:17.194Z · comments (5)

AI #91: Deep Thinking
Zvi · 2024-11-21T14:30:06.930Z · comments (10)

[link] Analyzing how SAE features evolve across a forward pass
bensenberner · 2024-11-07T22:07:02.827Z · comments (0)

Book a Time to Chat about Interp Research
Logan Riggs (elriggs) · 2024-12-03T17:27:46.808Z · comments (3)

[link] What Ketamine Therapy Is Like
Sable · 2024-11-11T11:09:08.602Z · comments (8)

Bounty for Evidence on Some of Palisade Research's Beliefs
benwr · 2024-09-23T20:01:20.917Z · comments (4)

Anthropic rewrote its RSP
Zach Stein-Perlman · 2024-10-15T14:25:12.518Z · comments (19)

The Shallow Bench
Karl Faulks (karl-faulks) · 2024-11-05T05:07:27.357Z · comments (5)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

mo-putera on [Fiction] [Comic] Effective Altruism and Rationality meet at a Secular Solstice afterparty

They used to consider speedrunning games a guilty pleasure, but after goal-factoring their supposed guilty pleasure concludes that the guilt doesn't align with their actual goals and values and feeling bad about enjoying speedrunning doesn't really serve any productive purpose, so now they enjoy speedrunning unabashedly.

adam-b on Adam B's Shortform

Predict your 2025: a website for recording probabilistic forecasts about your life and the world in the next year.

seth-herd on A Principled Cartoon Guide to NVC

That TLDR is great! I've read the NVC book through twice and taken half of an online audio course. I've also never directly benefitted from NVC communication suggestions because they're framed awkwardly.

Your distillation rings true, but I have not made it or heard it. Thank you.

I'd just add to your TLDR something like:

Make requests but don't pressure people to do things. Try to be clear about why you're asking them to do things.

You do cover this but your TLDR is missing it. There's probably a better formulation, that's just a first random stab. That part is the nonviolent part.

It's interesting to note that LW communication usually does seem to follow those NVC principles.

quinces6l on [Fiction] [Comic] Effective Altruism and Rationality meet at a Secular Solstice afterparty

This was a nice read but I don't have the slightest clue what "Goal-factored the idea of "guilty pleasures". Now loves speedrunning unabashedly" means

rafael-kaufmann-nedal on Self-prediction acts as an emergent regularizer

Latecomer, but as this relates to some of my prior work on self- and other-modeling, I thought I'd comment... The consistently high task accuracy displayed on Figure D suggests that even your smallest neural network is significantly over-capacity/over-parameterized for the test dataset. Excess capacity seems to be the only way the model can take on the expensive self-modeling task (*) without losing accuracy on the main task. Indeed, this would suggest that the explanation for the regularization benefit of self-modeling here is precisely that it soaks up the excess capacity, avoiding overfitting. But obviously, you can have too much of a good thing -- as the experiments with fewer hidden layers show, the attention weight can take over the model's focus and destroy accuracy. So it seems that, if you up the problem complexity/network size knob, the "maximum allowable attention weight" that doesn't compromise accuracy will tend to zero. On the other hand, one can think of simpler tasks than fully predicting all of a layer's activations -- for example, predicting the activation signs, the maximum-minimum range, the mean activation, etc. I want to say these seem more meaningful anyway, and a way to avoid Borges's "Map of the Empire whose size was that of the Empire", no?

* BTW: Unless I missed it, the paper did not report the accuracy of the self-modeling task, only of the primary task, right? I must imagine it was far from perfect, as perfect self-modeling is only possible in trivial edge cases, right?

daniel-tan on Daniel Tan's Shortform

Note that “thinking through implications” for alignment is exactly the idea in deliberative alignment https://openai.com/index/deliberative-alignment/

Some notes

Authors claim that just prompting o1 with full safety spec already allows it to figure out aligned answers
The RL part is simply “distilling” this imto the model (see also, note from a while ago on RLHF as variational inference)
Generates prompt, CoT, outcome traces from the base reasoning model. Ie data is collected “on-policy”
Uses a safety judge instead of human labelling

nickh on Pascal's Mugging: Tiny Probabilities of Vast Utilities

What is the probability that there are not 3^^^3 anti-muggers out there who will kill 3^^^^^^3 people if I submit to the mugger? Not 0.
The original argument against Pascal's Wager does not require you to actually believe in any of the other god's, just that the probability of them existing and having the reverse utility is enough to cancel out the probability of Pascal being right.

gunnar_zarncke on Viliam's Shortform

And that is why the rewiring of the pubescent brain involves changes that enable that. As the brain can't hardwire changes to values, which are high-level learned, it has to go some other way.

Change the ground truth feedback (brain stem rewiring).
Weaken all previous connections (synaptic pruning?).
Reduce learning rate (myelination).
Something else? Changes to other hyperparameters (hormonal changes?).

alexander-gietelink-oldenziel on Alexander Gietelink Oldenziel's Shortform

John, you know much coding theory much better than I do so I am inclinced to defer to your superior knowledge...

Now behold the awesome power of gpt-Pro

Let’s unpack the question in pieces:
1. Is ZIP (a.k.a. DEFLATE) “locally decodable” or not?
Standard ZIP files are typically not “locally decodable” in the strictest sense—i.e., you cannot start decoding exactly at the byte corresponding to your region of interest and reconstruct just that portion without doing some earlier decoding.
The underlying method, DEFLATE, is indeed based on LZ77 plus Huffman coding. LZ77 uses back‐references to prior segments of the uncompressed data, which forces you to at least partially decode from an earlier point to resolve those references.
However, many ZIP files are stored in blocks or chunks—sometimes called “deflate blocks.” So if you only need data near the start of a block (rather than in the middle of one), you can skip previous blocks and just decode that chunk. Still, you cannot trivially jump to an arbitrary byte offset inside the block without re-initializing the LZ77 state from somewhere earlier.
2. If ZIP is “based on Lempel–Ziv,” can we easily modify it for local decodability?
Because Lempel–Ziv (LZ) methods revolve around referencing earlier uncompressed substrings, one must store enough information (or replicate the references with separate “partial dictionaries”) if one wants truly local decodability.
One can certainly modify an LZ-based compression scheme so that each block is smaller, or that references never cross block boundaries, etc. That would allow a more local decoding property: you only decode from the start of the local block rather than from the entire file.
This usually comes with tradeoffs. If your blocks are small, you lose compression efficiency because you can’t take advantage of references bridging across large swaths of data. So making a compression format “locally decodable” in a fine‐grained way often lowers the overall compression ratio.
3. Is Lempel–Ziv (and thus ZIP) lossless?
Yes. Standard LZ77 (or LZ78, LZMA, etc.) is a lossless compression scheme: you can reconstruct the exact original data bit for bit. The DEFLATE algorithm used in ZIP is also lossless.
4. Does the picture change when doing lossy compression?
Yes, quite a bit. In lossy compression (like JPEG for images or MP3/AAC for audio), parts of the original data are thrown away. Often you can still decode small “chunks” of the file—e.g., many lossy audio or video formats do decode in frames or segments. But the format internals (e.g., DCT coefficients in JPEG, wavelet transforms, etc.) differ a lot from LZ-based bit-by-bit exactness.
In lossy formats that use transform blocks (like 8×8 pixel blocks in JPEG), you can decode each block locally, but you still have boundary effects, optional overlap between blocks, etc. So “local decodability” there can be simpler in some ways (each block is more independent), but it’s also less “exact” because we accept errors from the original data.
Short Summary
ZIP/DEFLATE is not strictly locally decodable because LZ77 references require that you decode from (at least) the beginning of the current block.
You can modify an LZ-based scheme (for instance, by using smaller blocks or restricting cross‐block references) to make it more locally decodable, but that typically sacrifices some compression ratio.
LZ methods and ZIP are lossless.
With lossy compression, “local decodability” can be easier in the sense that each block may be decoded without referencing data too far outside itself—but at the cost that you’re not reconstructing the original data exactly.

l-rudolf-l on AI Safety as a YC Startup

However, I think there is a group of people who over-optimize for Direction and neglect the Magnitude. Increasing Magnitude often comes with the risk of corrupting the Direction. For example, scaling fast often makes it difficult to hire only mission-aligned people, and it requires you to give voting power to investors that prioritizes profit. To increase Magnitude can therefore feel risky, what if I end up working at something that is net-negative for the world? Therefore it might be easier for one's personal sanity to optimize for Direction, to do something that is unquestionably net-positive. But this is the easy way out, and if you want to have the highest expected value of your Impact, you cannot disregard Magnitude.

You talk here about an impact/direction v ambition/profit tradeoff. I've heard many other people talking about this tradeoff too. I think it's overrated; in particular, if you're constantly having to think about it, that's a bad sign.

It's rare that you have a continuous space of options between lots of impact and low profit, and low/negative impact and high profit.
If you do have such a continuous space of options then I think you are often just screwed and profit incentives will win.
The really important decision you make is probably a discrete choice: do you start an org trying to do X, or an org trying to do Y? Usually you can't (and even if you can, shouldn't) try to interpolate between these things, and making this high-level strategy call will probably shape your impact more than any later finetuning of parameters within that strategy.
Often, the profit incentives point towards the more-obvious, gradient-descent-like path, which is usually very crowded and leads to many "mediocre" outcomes (e.g. starting a $10M company), but the biggest things come from doing "Something Else Which Is Not That" (as is said in dath ilan [LW(p) · GW(p)]). For example, SpaceX (ridiculously hard and untested business proposition) and Facebook (started out seeing very small and niche and with no clue of where the profit was).

Instead, I think the real value of doing things that are startup-like comes from:

The zero-to-one part of Peter Thiel's zero-to-one v one-to-n framework [LW · GW]: the hardest, progress-bottlenecking things usually look like creating new things, rather than scaling existing things. For example, there is very little you can do today in American politics that is as impactful or reaches as deep into the future as founding America in the first place.
In the case of AI safety: neglectedness. Everyone wants to work at a lab instead, humans are too risk averse in general, etc. (I've heard many people in AI safety say that neglectedness is overrated. There are arguments like this one that replaceability/neglectedness considerations aren't that major: job performance is heavy-tailed, hiring is hard for orgs, etc. But such arguments seem like weirdly myopic parameter-fiddling, at least when the alternative is zero-to-one things like discussed above. Starting big things is in fact big. Paradigm shifts matter because they're the frame that everything else takes place in. You either see this or you don't.)
To the extent you think the problem is about economic incentives or differential progress, have you considered getting your hands dirty and trying to change the actual economy or the direction of the tech tree? There are many ways to do this, including some types of policy and research. But I think the AI safety scene has a cultural bias towards things that look like research or information-gathering, and away from being "builders" in the Silicon Valley sense. One of the things that Silicon Valey does get right is that being a builder is very powerful. If the AI debate comes down to a culture/influence struggle between anti-steering, e/acc-influenced builder types and pro-steering EA-influenced academic types, it doesn't look good for the world.