LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Conversation Visualizer
ethanmorse · 2023-12-31T01:18:01.424Z · comments (4)

{Book Summary} The Art of Gathering
Tristan Williams (tristan-williams) · 2024-04-16T10:48:41.528Z · comments (0)

Updates to Open Phil’s career development and transition funding program
abergal · 2023-12-04T18:10:29.394Z · comments (0)

[link] Quick Thoughts on Scaling Monosemanticity
Joel Burget (joel-burget) · 2024-05-23T16:22:48.035Z · comments (1)

Experiments with an alternative method to promote sparsity in sparse autoencoders
Eoin Farrell · 2024-04-15T18:21:48.771Z · comments (7)

AI #64: Feel the Mundane Utility
Zvi · 2024-05-16T15:20:02.956Z · comments (11)

3. Premise three & Conclusion: AI systems can affect value change trajectories & the Value Change Problem
Nora_Ammann · 2023-10-26T14:38:14.916Z · comments (4)

Cryonics p(success) estimates are only weakly associated with interest in pursuing cryonics in the LW 2023 Survey
Andy_McKenzie · 2024-02-29T14:47:28.613Z · comments (6)

Cicadas, Anthropic, and the bilateral alignment problem
kromem · 2024-05-22T11:09:56.469Z · comments (6)

Can quantised autoencoders find and interpret circuits in language models?
charlieoneill (kingchucky211) · 2024-03-24T20:05:50.125Z · comments (4)

[question] How did you integrate voice-to-text AI into your workflow?
ChristianKl · 2023-11-20T12:01:37.696Z · answers+comments (12)

[link] AI Impacts 2023 Expert Survey on Progress in AI
habryka (habryka4) · 2024-01-05T19:42:17.226Z · comments (1)

AI #65: I Spy With My AI
Zvi · 2024-05-23T12:40:02.793Z · comments (7)

Heuristics for preventing major life mistakes
SK2 (lunchbox) · 2023-12-20T08:01:09.340Z · comments (2)

Online Dialogues Party — Sunday 5th November
Ben Pace (Benito) · 2023-10-27T02:41:00.506Z · comments (1)

Escaping Skeuomorphism
Stuart Johnson (stuart-johnson) · 2023-12-20T03:51:00.489Z · comments (0)

An explanation of evil in an organized world
KatjaGrace · 2024-05-02T05:20:06.240Z · comments (9)

[link] Memo on some neglected topics
Lukas Finnveden (Lanrian) · 2023-11-11T02:01:55.834Z · comments (2)

[link] align your latent spaces
bhauth · 2023-12-24T16:30:09.138Z · comments (8)

[link] [Linkpost] Concept Alignment as a Prerequisite for Value Alignment
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2023-11-04T17:34:36.563Z · comments (0)

Without Fundamental Advances, Rebellion and Coup d'État are the Inevitable Outcomes of Dictators & Monarchs Trying to Control Large, Capable Countries
Roko · 2024-01-31T10:14:02.042Z · comments (34)

A Strange ACH Corner Case
jefftk (jkaufman) · 2024-02-10T03:00:05.930Z · comments (2)

When and why should you use the Kelly criterion?
Garrett Baker (D0TheMath) · 2023-11-05T23:26:38.952Z · comments (25)

Weak vs Quantitative Extinction-level Goodhart's Law
VojtaKovarik · 2024-02-21T17:38:15.375Z · comments (1)

My Dating Heuristic
Declan Molony (declan-molony) · 2024-05-21T05:28:40.197Z · comments (4)

Uncertainty in all its flavours
Cleo Nardo (strawberry calm) · 2024-01-09T16:21:07.915Z · comments (6)

[link] Found Paper: "FDT in an evolutionary environment"
the gears to ascension (lahwran) · 2023-11-27T05:27:50.709Z · comments (47)

AISC Project: Modelling Trajectories of Language Models
NickyP (Nicky) · 2023-11-13T14:33:56.407Z · comments (0)

[link] Goodhart's Law Example: Training Verifiers to Solve Math Word Problems
Chris_Leong · 2023-11-25T00:53:26.841Z · comments (2)

How to develop a photographic memory 2/3
PhilosophicalSoul (LiamLaw) · 2023-12-30T20:18:14.255Z · comments (7)

Survey on the acceleration risks of our new RFPs to study LLM capabilities
Ajeya Cotra (ajeya-cotra) · 2023-11-10T23:59:52.515Z · comments (1)

NYU Code Debates Update/Postmortem
David Rein (david-rein) · 2024-05-24T16:08:06.151Z · comments (4)

EA Infrastructure Fund's Plan to Focus on Principles-First EA
Linch · 2023-12-06T03:24:55.844Z · comments (0)

flowing like water; hard like stone
lsusr · 2024-02-20T03:20:46.531Z · comments (4)

Tackling Moloch: How YouCongress Offers a Novel Coordination Mechanism
Hector Perez Arenas (hector-perez-arenas) · 2024-05-15T23:13:48.501Z · comments (9)

Domain-specific SAEs
jacob_drori (jacobcd52) · 2024-10-07T20:15:38.584Z · comments (0)

Interpretability of SAE Features Representing Check in ChessGPT
Jonathan Kutasov (jonathan-kutasov) · 2024-10-05T20:43:36.679Z · comments (2)

An AI crash is our best bet for restricting AI
Remmelt (remmelt-ellen) · 2024-10-11T02:12:03.491Z · comments (1)

Superintelligence Can't Solve the Problem of Deciding What You'll Do
Vladimir_Nesov · 2024-09-15T21:03:28.077Z · comments (11)

[question] Any real toeholds for making practical decisions regarding AI safety?
lukehmiles (lcmgcd) · 2024-09-29T12:03:08.084Z · answers+comments (6)

[question] What prevents SB-1047 from triggering on deep fake porn/voice cloning fraud?
ChristianKl · 2024-09-26T09:17:39.088Z · answers+comments (21)

Investigating Sensitive Directions in GPT-2: An Improved Baseline and Comparative Analysis of SAEs
Daniel Lee (daniel-lee) · 2024-09-06T02:28:41.954Z · comments (0)

European Progress Conference
Martin Sustrik (sustrik) · 2024-10-06T11:10:03.819Z · comments (11)

[link] Predicting Influenza Abundance in Wastewater Metagenomic Sequencing Data
jefftk (jkaufman) · 2024-09-23T17:25:58.380Z · comments (0)

[link] If-Then Commitments for AI Risk Reduction [by Holden Karnofsky]
habryka (habryka4) · 2024-09-13T19:38:53.194Z · comments (0)

[link] AISN #30: Investments in Compute and Military AI Plus, Japan and Singapore’s National AI Safety Institutes
aogara (Aidan O'Gara) · 2024-01-24T19:38:33.461Z · comments (1)

Response to Dileep George: AGI safety warrants planning ahead
Steven Byrnes (steve2152) · 2024-07-08T15:27:07.402Z · comments (7)

[question] Why do Minimal Bayes Nets often correspond to Causal Models of Reality?
Dalcy (Darcy) · 2024-08-03T12:39:44.085Z · answers+comments (1)

Deceptive agents can collude to hide dangerous features in SAEs
Simon Lermen (dalasnoin) · 2024-07-15T17:07:33.283Z · comments (0)

[link] Video Intro to Guaranteed Safe AI
Mike Vaiana (mike-vaiana) · 2024-07-11T17:53:47.630Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

abandon on What are your favorite books or blogs that are out of print, or whose domains have expired (especially if they also aren't on LibGen/Wayback/etc, or on Amazon)?

It's been moved to https://laneless.substack.com/ .

cstinesublime on Advice on Communicating Concisely

Thank you for the reply.

What kind of questions, analogies, or models are your fellow students responding to your explanations with? Are there any patterns in the specific feedback you've noticed? Are there any particular aspects of Deep Learning or the metaphors or terminology you're using that seem to be the biggest bottlenecks?

My hunch is that maybe you instead look at beginner's introductions to Deep Learning and Neural Networks and see how they go about conveying these concepts. If someone else has done the hard work of figuring out an expedient way to convey the subject matter, why not borrow from them (giving credit, of course)?

Please do get back if you can think of specific examples of the second case and I'll think any books or resources I know of which might be suitable.

viliam on Elizabeth's Shortform

Straightforward "strategic ignorance" (avoiding to learn things on purpose to avoid related obligations) seems like an obvious moral failure. The practical problem is that once we start judging people for strategic ignorance, it may motivate them to make their strategy indirect. If you can be blamed for not taking a swimming class that your school provided, it motivates you to choose a school that does not provide swimming classes. Or vote for a government that removes swimming classes from schools, because then it's no longer your fault.

This posits a sort of moral obligation to maximally extend your capacity to help others or take care of yourself in a sustainable way.

Yes. Unfortunately, I have only heard this idea in some form from Eliezer Yudkowsky [? · GW] and Jordan Peterson. It seems to be outside the social Overton window.

I suppose the reason is that, socially, we need a definition of "ethics" such that most people kinda reach it. Otherwise we don't get the peer pressure... and might actually get peer pressure against [LW · GW].

Seems like we have two different things here -- what is the right thing to do, and what is the optimal social norm to promote -- and the relation between them is complicated. It feels it would be nice if these two could be the same thing. Promoting the thing that is the right thing to do, sounds like the right thing to do. But that only works if people already agree, or if there is a cult-like situation that can make them agree (to the degree that they become the enforcers of the norm in private; otherwise you just get two competing moralities). In reality, outside of cults, you don't get an agreement on anything.

Another option is to choose the optimal social norm (the thing that realistically can be approved of by the majority) and pretend that this is the right thing to do. I think that's how it works in practice. The problem is what to do about those parts of "doing the right thing" that don't fit into the "optimal norm that can be socially enforced"? If you openly admit that the social norm is actually not the right thing to do, you undermine the social norm. An alternative is to adopt a (logically inconsistent, if you look too closely) position that something things are "good", but some things are "beyond-good" -- good if you choose to do them, but if you refuse to do them, it doesn't make you bad.

So, using the traditional language, saving the drowning child is an obligation for swimmers; and learning to swim is supererogatory. Until it happens that most of the people in your society learn to swim, and then you can switch and make learning to swim an obligation.

ape-in-the-coat on Change My Mind: Thirders in "Sleeping Beauty" are Just Doing Epistemology Wrong

An obvious way to do so is put a hazard sign on "probability" and just not use it, not putting resources into figuring out what "probability" SB should name, isn't it?

It's an obvious thing to do when dealing with simularity clusters poorly defined in natural language. Not so much, when we are talking about a logically pinpointed mathematical concept which we know are crucial for epistemology.

(And now I realize a possible point why you're arguing to keep "probability" term for such scenarios well-defined; so that people in ~anthropic settings can tell you their probability estimates and you, being observer, could update on that information.)

It's not just about anthropic scenarios and not just about me being able to understand other people. It's about general truth preserving mechanism of logical and mathematical reasoning. When people just use different definitions - this is annoying but fine. But when they use different definitions without realizing that these definitions are different and, moreover insist that it's you who is making a mistake - then we have an actual disagreement about math which will provide more confusion along the way. Anthropic scenarious are just the ones where this confusion is noticeable.

As for why I believe probability theory to be useful in life despite the fact that sometimes different tools need to be used

What exactly do you mean by "different tools need to be used"? Can you give me an example?

yams on A Brief Explanation of AI Control

I like this post but I think redwood has varied some on whether control is for getting alignment work out of AIs vs getting generally good-for-humanity work out of them and pushing for a pause once they reach some usefulness/danger threshold (eg well before super intelligence).

[based on my recollection of Buck seminar in MATS 6]

geoffrey-irving on Automation collapse

Bounding the space of controller actions more is the key bit. The (vague) claim is that if you have an argument that an empirically tested automated safety scheme is safe, in sense that you’ll know if the output is correct, you may be able to find a more constrained setup where more of the structure is human-defined and easier to analyze, and that the originally argument may port over to the constrained setup.

I’m not claiming this is always possible, though, just that it’s worth searching for. Currently the situation is that we don’t have well-developed arguments that we can recognize the correctness of automated safety work, so it’s hard to test the “less automation” hypothesis concretely.

I don’t think all automated research is automated safety: certainly you can do automated pure capabilities. But I may have misunderstood that part of the question.

jblack on Noah Birnbaum's Shortform

Yes, that was a pretty terrible take. Markets quite clearly do not price externalities well, and never have done. So long as any given investor rates their specific investment as being unlikely to tip the balance into doom, they get the upside of directly financially benefiting from major economic growth due to AI, and essentially the same downside risk as if they didn't invest. Arguments like "short some markets, or go long volatility, and then send those profits to Somalia to mitigate suffering for a few years before the whole world ends" are obviously not even trying to seriously reflect the widespread investment decisions that affect real markets.

raemon on What are the best arguments for/against AIs being "slightly 'nice'"?

I think he's making a bet that, if people could literally get tasty steak without killing cows for the same price (or cheaper), most of them would not pay for factory farmed cow.

raemon on "Slow" takeoff is a terrible term for "maybe even faster takeoff, actually"

alas not easily

david-matolcsi on You can, in fact, bamboozle an unaligned AI into sparing your life

Here is the promised comment on what kind of "commitment" I want to make given all the responses.

I agree with Buck that no one should make very direct commitment about this sort of thing, as there might be blackmail related scary things lurking in the shadows when one does acausal trade. I think we will probably figure out how to handle that, but we shouldn't make any strong promises of specific actions until we figure that out.

However, the promise I'm intending to keep is that if humanity wins and I'm alive to see it, I will remember how scary it was to live in the world before we knew we'd win, and if there is no currently unknown reason not to do something like the simulation scheme described in the post, and the only cost is not utilizing some small fraction of the conquered Universe, I will advocate for doing it and spend a significant fraction of my resources on it if necessary. I think it would be good for sending future AIs a signal about the seriousness of this, if some other people who are worried about everyone dying from AI, but are not strongly longtermist absolute utilitarians, also left a comment making a similar vague promise that if they ever get to control full star systems, they will contribute to this plan if no currently unknown serious counterargument comes up. Or at least leave an Agreed sing on this sentence.

Other updates from the comments: If I wrote the post now, I would more strongly emphasize a framing that this plan is basically about entering into an acausal insurance contract with other young civilizations who don't yet know what their chances are for aligning AI, neither how big their Universe is compared to the usual Universe-size in the great prior over possible worlds. Then the civilizations who align their AIs and find that they live in a relatively big Universe, bail out everyone else in the insurance contract.

But I think that, while this acausal insurance framing might be more satisfying to the people who are already thinking a lot about acausal trade, in practice the way we implement this "insurance" will likely be very similar to the scheme described in the post. So I maintain that for most people it's better not to think in terms of acausal trade, but just think about the simulation proposal described in the post.