LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Michael Dickens' Caffeine Tolerance Research
niplav · 2024-09-04T15:41:53.343Z · comments (3)

Humanity isn't remotely longtermist, so arguments for AGI x-risk should focus on the near term
Seth Herd · 2024-08-12T18:10:56.543Z · comments (10)

Conflating value alignment and intent alignment is causing confusion
Seth Herd · 2024-09-05T16:39:51.967Z · comments (18)

Untrustworthy models: a frame for scheming evaluations
Olli Järviniemi (jarviniemi) · 2024-08-19T16:27:11.088Z · comments (3)

Bounty for Evidence on Some of Palisade Research's Beliefs
benwr · 2024-09-23T20:01:20.917Z · comments (4)

Decision Theory in Space
lsusr · 2024-08-18T07:02:11.847Z · comments (18)

I finally got ChatGPT to sound like me
lsusr · 2024-09-17T09:39:59.415Z · comments (18)

All The Latest Human tFUS Studies
sarahconstantin · 2024-08-09T22:20:04.561Z · comments (2)

Forecasting One-Shot Games
Raemon · 2024-08-31T23:10:05.475Z · comments (0)

How to hire somebody better than yourself
lukehmiles (lcmgcd) · 2024-08-28T08:12:53.450Z · comments (5)

AI #80: Never Have I Ever
Zvi · 2024-09-10T17:50:08.074Z · comments (20)

We Don't Know Our Own Values, but Reward Bridges The Is-Ought Gap
johnswentworth · 2024-09-19T22:22:05.307Z · comments (47)

Startup Roundup #2
Zvi · 2024-08-06T13:30:06.554Z · comments (0)

~80 Interesting Questions about Foundation Model Agent Safety
RohanS · 2024-10-28T16:37:04.713Z · comments (4)

Work with me on agent foundations: independent fellowship
Alex_Altair · 2024-09-21T13:59:16.706Z · comments (5)

Principled Satisficing To Avoid Goodhart
JenniferRM · 2024-08-16T19:05:27.204Z · comments (2)

Economics Roundup #3
Zvi · 2024-09-10T13:50:06.955Z · comments (9)

Start an Upper-Room UV Installation Company?
jefftk (jkaufman) · 2024-10-19T02:00:10.691Z · comments (9)

[question] "Deception Genre" What Books are like Project Lawful?
Double · 2024-08-28T17:19:52.172Z · answers+comments (20)

In defense of technological unemployment as the main AI concern
tailcalled · 2024-08-27T17:58:01.992Z · comments (36)

[link] cancer rates after gene therapy
bhauth · 2024-10-16T15:32:53.949Z · comments (0)

Which LessWrong/Alignment topics would you like to be tutored in? [Poll]
Ruby · 2024-09-19T01:35:02.999Z · comments (12)

How difficult is AI Alignment?
Sammy Martin (SDM) · 2024-09-13T15:47:10.799Z · comments (6)

Minimal Motivation of Natural Latents
johnswentworth · 2024-10-14T22:51:58.125Z · comments (14)

[link] you should probably eat oatmeal sometimes
bhauth · 2024-08-25T14:50:37.570Z · comments (32)

A Robust Natural Latent Over A Mixed Distribution Is Natural Over The Distributions Which Were Mixed
johnswentworth · 2024-08-22T19:19:28.940Z · comments (4)

[link] An Interactive Shapley Value Explainer
James Stephen Brown (james-brown) · 2024-09-28T05:01:21.169Z · comments (9)

Formalizing the Informal (event invite)
abramdemski · 2024-09-10T19:22:53.564Z · comments (0)

Debate: Get a college degree?
Ben Pace (Benito) · 2024-08-12T22:23:34.744Z · comments (14)

Time Efficient Resistance Training
romeostevensit · 2024-10-07T15:15:44.950Z · comments (8)

[link] Rowing vs steering
Saul Munn (saul-munn) · 2024-08-10T07:00:17.594Z · comments (2)

MATS AI Safety Strategy Curriculum v2
DanielFilan · 2024-10-07T22:44:06.396Z · comments (6)

Unit economics of LLM APIs
dschwarz · 2024-08-27T16:51:22.692Z · comments (0)

AI as a powerful meme, via CGP Grey
TheManxLoiner · 2024-10-30T18:31:58.544Z · comments (2)

[link] Adverse Selection by Life-Saving Charities
vaishnav92 · 2024-08-14T20:46:23.662Z · comments (16)

[link] What's important in "AI for epistemics"?
Lukas Finnveden (Lanrian) · 2024-08-24T01:27:06.771Z · comments (0)

D&D Sci Coliseum: Arena of Data
aphyer · 2024-10-18T22:02:54.305Z · comments (23)

Superintelligent AI is possible in the 2020s
HunterJay · 2024-08-13T06:03:26.990Z · comments (3)

[link] Point of Failure: Semiconductor-Grade Quartz
Annapurna (jorge-velez) · 2024-09-30T15:57:40.495Z · comments (8)

[link] [Paper] Programming Refusal with Conditional Activation Steering
Bruce W. Lee (bruce-lee) · 2024-09-11T20:57:08.714Z · comments (0)

[link] IAPS: Mapping Technical Safety Research at AI Companies
Zach Stein-Perlman · 2024-10-24T20:30:41.159Z · comments (10)

Startup Success Rates Are So Low Because the Rewards Are So Large
AppliedDivinityStudies (kohaku-none) · 2024-10-10T20:22:01.557Z · comments (6)

[link] Things I learned talking to the new breed of scientific institution
Abhishaike Mahajan (abhishaike-mahajan) · 2024-08-29T14:00:14.844Z · comments (6)

Reflections on the Metastrategies Workshop
gw · 2024-10-24T18:30:46.255Z · comments (5)

[Linkpost] Play with SAEs on Llama 3
Tom McGrath · 2024-09-25T22:35:44.824Z · comments (2)

Monthly Roundup #23: October 2024
Zvi · 2024-10-16T13:50:05.869Z · comments (12)

Californians, tell your reps to vote yes on SB 1047!
Holly_Elmore · 2024-08-12T19:50:09.817Z · comments (24)

instruction tuning and autoregressive distribution shift
nostalgebraist · 2024-09-05T16:53:41.497Z · comments (5)

2025 Color Trends
sarahconstantin · 2024-10-07T21:20:03.962Z · comments (7)

[question] Implications of China's recession on AGI development?
Eric Neyman (UnexpectedValues) · 2024-09-28T01:12:36.443Z · answers+comments (3)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

anthonyc on Both-Sidesism—Where Fair & Balanced Goes Wrong

TBH one of the things I always wonder about is not so much the "sidesism" as the "both." How are people deciding what should count as a side, and why there should be two? And when should something no longer count as a side?

I mean, I get it in practice, there's nothing this self-reflective going on at all and it's all decided by inertia, FPTP voting, and revenue. I still would naively have expected more people on the audience side to have the realization that:

this didn’t seem to fit into the Headmaster’s list; and it occurred to Hermione that there might be a lot more viewpoints on the subject than just four.

tailcalled on What can we learn from insecure domains?

No, it was a lot of words that describe why your strategy of modelling stuff as more/less "dangerous" and then trying to calibrate to how much to be scared of "dangerous" stuff doesn't work.

The better strategy, if you want to pursue this general line of argument, is to make the strongest argument you can for what makes e.g. Bitcoin so dangerous and how horrible the consequences will be. Then since your sense of danger overestimates how dangerous Bitcoin will be, you can go in and empirically investigate where your intuition was wrong by seeing what predictions of your intuitive argument failed and what obstacles caused them to fail.

logan-zoellner on What can we learn from insecure domains?

That was a lot of words to say "I don't think anything can be learned here".

Personally, I think something can be learned here.

tailcalled on What can we learn from insecure domains?

You shouldn't use "dangerous" or "bad" as a latent variable because it promotes splitting. MAD and Bitcoin have fundamentally different operating principles (e.g. nuclear fission vs cryptographic pyramid schemes), and these principles lead to a mosaic of different attributes. If you ignore the operating principles and project down to a bad/good axis, then you can form some heuristics about what to seek out or avoid, but you face severe model misspecification, violating principles like realizability which are required for Bayesian inference to get reasonable results (e.g. converge rather than oscillate, and be well-calibrated rather than massively overconfident).

Once you understand the essence of what makes a domain seem dangerous to you, you can debug by looking at what obstacles this essence faced that stopped it from flowing into whatever horrors you were worried about, and then try to think through why you didn't realize those obstacles ahead of time. As you learn more about the factors relevant in those cases, maybe you will learn something that generalizes across cases, but most realistically what you learn will be about the problems with the common sense.

anthonyc on Science advances one funeral at a time

I think the issue here is not so much the disagreement or criticism as it is the mockery and ostracism. Unlike in, say, venture capital, there's much less opportunity in science for someone to try something different and exciting, get enough funding to see if it really works out, and then, if it doesn't but you were doing a good enough job trying, still be part of the community and get funding to try something else. (Yes, I know it doesn't always work that way either, but I think the odds are much better than in science)

vanessa-kosoy on Complete Feedback

I feel that this post would benefit from having the math spelled out. How is inserting a trader a way to do feedback? Can you phrase classical RL like this?

logan-zoellner on What can we learn from insecure domains?

MAD is obviously governed by completely different principles than crypto is

Maybe this is obvious to you. It is not obvious to me. I am genuinely confused what is going on here. I see what seems to be a pattern: dangerous domain -> basically okay. And I want to know what's going on.

anthonyc on Two arguments against longtermist thought experiments

I appreciate the discussion, but I can't help but be distracted by the specifics of the example scenario. In this case, it just seems obvious to me that the correct answer is to bury the waste and then invest in developing better processing solutions. There's no such thing as waste that can't be safely processed, even in principle, with a century of lead time to prepare. When I read the first few sentences, I actually thought the counterargument was going to be about uncertainty in long term impact projections.

arthur-conmy on IAPS: Mapping Technical Safety Research at AI Companies

Here are the other GDM mech interp papers missed:
We have some blog posts of comparable standard to the Anthropic circuit updates listed:
- https://www.alignmentforum.org/posts/C5KAZQib3bzzpeyrg/full-post-progress-update-1-from-the-gdm-mech-interp-team [AF · GW]
- https://www.alignmentforum.org/posts/iGuwZTHWb6DFY3sKB/fact-finding-attempting-to-reverse-engineer-factual-recall [AF · GW]
You use a very wide scope for the "enhancing human feedback" (basically any post-training paper mentioning 'align'-ing anything). So I will use a wide scope for what counts as mech interp and also include:
- https://arxiv.org/abs/2401.06102
- https://arxiv.org/abs/2304.14767
- There are a few other papers from the PAIR group as well as Mor Geva and also Been Kim, but mostly with Google Research affiliations so it seems fine to not include these as IIRC you weren't counting pre-GDM merger Google Research/Brain work

signer on Set Theory Multiverse vs Mathematical Truth - Philosophical Discussion

Even if we can’t currently prove certain axioms, doesn’t this just reflect our epistemological limitations rather than implying all axioms are equally “true”?

It doesn't and they are fundamentally equal. The only reality is the physical one - there is no reason to complicate your ontology with platonically existing math. Math is just a collection of useful templates that may help you predict reality and that it works is always just a physical fact. Best case is that we'll know true laws of physics and they will work like some subset of math and then axioms of physics would be actually true. You can make a guess about what axioms are compatible with true physics.

Also there is Shoenfield's absoluteness theorem, which I don't understand, but which maybe prevents empirical grounding of CH?