LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

Funny Anecdote of Eliezer From His Sister
Daniel Birnbaum (daniel-birnbaum) · 2024-04-22T22:05:31.886Z · comments (4)

[question] Examples of Highly Counterfactual Discoveries?
johnswentworth · 2024-04-23T22:19:19.399Z · answers+comments (90)

Refusal in LLMs is mediated by a single direction
Andy Arditi (andy-arditi) · 2024-04-27T11:13:06.235Z · comments (66)

[link] Introducing AI Lab Watch
Zach Stein-Perlman · 2024-04-30T17:00:12.652Z · comments (7)

Mechanistically Eliciting Latent Behaviors in Language Models
Andrew Mack (andrew-mack) · 2024-04-30T18:51:13.493Z · comments (26)

Ironing Out the Squiggles
Zack_M_Davis · 2024-04-29T16:13:00.371Z · comments (33)

On Not Pulling The Ladder Up Behind You
Screwtape · 2024-04-26T21:58:29.455Z · comments (11)

[link] Simple probes can catch sleeper agents
Monte M (montemac) · 2024-04-23T21:10:47.784Z · comments (15)

The first future and the best future
KatjaGrace · 2024-04-25T06:40:04.510Z · comments (11)

Why I'm doing PauseAI
Joseph Miller (Josephm) · 2024-04-30T16:21:54.156Z · comments (8)

Priors and Prejudice
MathiasKB (MathiasKirkBonde) · 2024-04-22T15:00:41.782Z · comments (16)

[question] Which skincare products are evidence-based?
Vanessa Kosoy (vanessa-kosoy) · 2024-05-02T15:22:12.597Z · answers+comments (16)

ACX Covid Origins Post convinced readers
ErnestScribbler · 2024-05-01T13:06:20.818Z · comments (6)

[link] The Inner Ring by C. S. Lewis
Saul Munn (saul-munn) · 2024-04-24T22:48:09.228Z · comments (6)

[link] Motivation gaps: Why so much EA criticism is hostile and lazy
titotal (lombertini) · 2024-04-22T11:49:59.389Z · comments (5)

Rejecting Television
Declan Molony (declan-molony) · 2024-04-23T04:59:50.253Z · comments (7)

Towards Multimodal Interpretability: Learning Sparse Interpretable Features in Vision Transformers
hugofry · 2024-04-29T20:57:35.127Z · comments (6)

Constructability: Plainly-coded AGIs may be feasible in the near future
Épiphanie Gédéon (joy_void_joy) · 2024-04-27T16:04:45.894Z · comments (12)

Questions for labs
Zach Stein-Perlman · 2024-04-30T22:15:55.362Z · comments (9)

Duct Tape security
Isaac King (KingSupernova) · 2024-04-26T18:57:05.659Z · comments (8)

On Llama-3 and Dwarkesh Patel’s Podcast with Zuckerberg
Zvi · 2024-04-22T13:10:02.645Z · comments (4)

AISC9 has ended and there will be an AISC10
Linda Linsefors · 2024-04-29T10:53:18.812Z · comments (2)

[link] Improving Dictionary Learning with Gated Sparse Autoencoders
Senthooran Rajamanoharan (SenR) · 2024-04-25T18:43:47.003Z · comments (35)

LessWrong Community Weekend 2024, open for applications
UnplannedCauliflower · 2024-05-01T10:18:21.992Z · comments (0)

Take the wheel, Shoggoth! (Lesswrong is trying out changes to the frontpage algorithm)
Ruby · 2024-04-23T03:58:43.443Z · comments (16)

Q&A on Proposed SB 1047
Zvi · 2024-05-02T15:10:02.916Z · comments (1)

[link] "Why I Write" by George Orwell (1946)
Arjun Panickssery (arjun-panickssery) · 2024-04-25T16:02:28.668Z · comments (2)

So What's Up With PUFAs Chemically?
J Bostock (Jemist) · 2024-04-27T13:32:52.159Z · comments (23)

Changes in College Admissions
Zvi · 2024-04-24T13:50:03.487Z · comments (10)

[question] Shane Legg's necessary properties for every AGI Safety plan
jacquesthibs (jacques-thibodeau) · 2024-05-01T17:15:41.233Z · answers+comments (10)

Transcoders enable fine-grained interpretable circuit analysis for language models
Jacob Dunefsky (jacob-dunefsky) · 2024-04-30T17:58:09.982Z · comments (11)

[link] This is Water by David Foster Wallace
Nathan Young · 2024-04-24T21:21:09.445Z · comments (16)

[link] Let's Design A School, Part 1
Sable · 2024-04-23T21:50:20.937Z · comments (3)

Superposition is not "just" neuron polysemanticity
LawrenceC (LawChan) · 2024-04-26T23:22:06.066Z · comments (4)

Spatial attention as a “tell” for empathetic simulation?
Steven Byrnes (steve2152) · 2024-04-26T15:10:58.040Z · comments (11)

Mechanistic Interpretability Workshop Happening at ICML 2024!
Neel Nanda (neel-nanda-1) · 2024-05-03T01:18:26.936Z · comments (2)

[link] LLMs seem (relatively) safe
JustisMills · 2024-04-25T22:13:06.221Z · comments (23)

Towards a formalization of the agent structure problem
Alex_Altair · 2024-04-29T20:28:15.190Z · comments (2)

Key takeaways from our EA and alignment research surveys
Cameron Berg (cameron-berg) · 2024-05-03T18:10:41.416Z · comments (2)

D&D.Sci Long War: Defender of Data-mocracy
aphyer · 2024-04-26T22:30:15.780Z · comments (13)

[Aspiration-based designs] 1. Informal introduction
B Jacobs (Bob Jacobs) · 2024-04-28T13:00:43.268Z · comments (4)

Manifund Q1 Retro: Learnings from impact certs
Austin Chen (austin-chen) · 2024-05-01T16:48:33.140Z · comments (1)

[link] Book review: Deep Utopia
PeterMcCluskey · 2024-04-23T19:55:50.417Z · comments (10)

Forget Everything (Statistical Mechanics Part 1)
J Bostock (Jemist) · 2024-04-22T13:33:35.446Z · comments (5)

[link] Dequantifying first-order theories
jessicata (jessica.liu.taylor) · 2024-04-23T19:04:49.000Z · comments (9)

An Introduction to AI Sandbagging
Teun van der Weij (teun-van-der-weij) · 2024-04-26T13:40:00.126Z · comments (0)

Scaling of AI training runs will slow down after GPT-5
Maxime Riché (maxime-riche) · 2024-04-26T16:05:59.957Z · comments (5)

We are headed into an extreme compute overhang
devrandom · 2024-04-26T21:38:21.694Z · comments (21)

[link] On what research policymakers actually need
MondSemmel · 2024-04-23T19:50:12.833Z · comments (0)

[link] WSJ: Inside Amazon’s Secret Operation to Gather Intel on Rivals
trevor (TrevorWiesinger) · 2024-04-23T21:33:08.049Z · comments (5)

next page (older posts) →

Archive

Recent comments

erioire on Why is AGI/ASI Inevitable?

Yeah, many people, like the majority of users on this forum, have decided to not build AGI.

Not to build AGI yet.
Many of us would love to build it as soon as we can be confident we have a realistic and mature plan for alignment, but that's a problem that's so absurdly challenging that even if aliens landed tomorrow and handed us the "secret to Friendly AI", we would have a hell of a time trying to validate that it actually was the real thing.

If one is faced with a math problem where you could be staring at the answer and know no way to unambiguously verify said answer, you are likely not capable of solving the problem until you somehow close the inferential distance separating you from understanding. Assuming the problem is solvable at all.

rosiecam on Which skincare products are evidence-based?

It says to reapply every two hours but I... do not do that 😅 I put it on in the morning and would reapply if I was spending time outside. I don't know how important the "every 2 hours" thing is

r-2 on Which skincare products are evidence-based?

How long does the Elta MD sunscreen last?

startattheend on dkornai's Shortform

I think all criticism, all shaming, all guilt tripping, all punishments and rewards directed at children - is for the purpose of driving them to do certain actions. If your children do what you think is right, there's no need to do much of anything.

A more general and correct statement would be "Pain is for the sake of change, and all change is painful". But that change is for the sake of actions. I don't think that's too much of a simplification to be useful.

I think regret, too, is connected here. And there's certainly times when it seems like pain is the problem rather than an attempt to solve it, but I think that's a misunderstanding. And while chronic pain does reduce agency, it's a constant pain and a constant reduction of agency (not cumulative). The pain persists until the problem is solved, even if the problem does not get worse. So it's the body telling the brain "Hey, do something about this, the importance is 50 units of pain", then you will do anything to solve it as long there's a path with less than 50 units of pain which leads to a solution.

The pain does limit agency, but not because it's a real limitation. It's an artificial one that the body creates to prevent you from damaging yourself. So all important agency is still possible. If the estimated consequences of avoiding the task is more painful than doing the task, you do it. But it's again the body is just estimating the cost/benefit of tasks and choosing the optimal action by making it the least painful action.

My explanation and yours are almost identical, but there's some important differences. In my view,
suffering is good, not bad. I really don't want humanity to misunderstand this one fact, it has already had profound negative consequences. It's phantom damage created to avoid real damage. An agent which is unable to feel physical pain and exhaustion would destroy itself, therefore physical pain and exhaustion are valuable and not problems to be solved. Emotions like suffering, exhaustion, annoyance, etc. function the same as physical pain, and once they get over a certain threshold they coerce you into taking an action. Physical pain comes from nerves, but emotional pain comes from your interpretation of reality. Your brain relies on you to tell what ought to be painful (so if you overestimate risk, it just believes you). And you don't get to choose all your goals yourself, your brain wants you to fulfill your needs (prioritized by the hierarchy of needs). In short, the brain makes inaction painful, while keeping actions that it deems risky painful, and then messes with the weights/thresholds according to need. Just like with hunger (not eating is painful, but if all you have is stale or even moldy bread, then you need to be very hungry before you eat, and you will eat iff pain(hunger)>pain(eating the bread)).

An increase in power/agency feels a lot like happiness though, even according to Nietzsche who I'm not confident to argue against, so I get why you'd basically think that opposite of happiness is the opposite of agency (sorry if this summary does injustice to your point)

cameron-berg on Key takeaways from our EA and alignment research surveys

There's a lot of overlap between alignment researchers and the EA community, so I'm wondering how that was handled.

Agree that there is inherent/unavoidable overlap. As noted in the post, we were generally cautious about excluding participants from either sample for reasons you mention and also found that the key results we present here are robust to these kinds of changes in the filtration of either dataset (you can see and explore this for yourself here).

With this being said, we did ask in both the EA and the alignment survey to indicate the extent to which they are involved in alignment—note the significance of the difference here:

From alignment survey:

From EA survey:

This question/result serves both as a good filtering criterion for cleanly separating out EAs from alignment researchers and also gives a pretty strong evidence that we are drawing on completely different samples across these surveys (likely because we sourced the data for each survey through completely distinct channels).

Regarding the support for various cause areas, I'm pretty sure that you'll find the support for AI Safety/Long-Termism/X-risk is higher among those most involved in EA than among those least involved. Part of this may be because of the number of jobs available in this cause area.

Interesting—I just tried to test this. It is a bit hard to find a variable in the EA dataset that would cleanly correspond to higher vs. lower overall involvement, but we can filter by number of years one has been involved involved in EA, and there is no level-of-experience threshold I could find where there are statistically significant differences in EAs' views on how promising AI x-risk is. (Note that years of experience in EA may not be the best proxy for what you are asking, but is likely the best we've got to tackle this specific question.)

Blue is >1 year experience, red is <1 year experience:

Blue is >2 years experience, red is <2 years experience:

dalcy on Dalcy's Shortform

Thoughtdump on why I'm interested in computational mechanics:

one concrete application to natural abstractions from here [LW · GW]: tl;dr, belief structures generally seem to be fractal shaped. one major part of natural abstractions is trying to find the correspondence between structures in the environment and concepts used by the mind. so if we can do the inverse of what adam and paul did, i.e. 'discover' fractal structures from activations and figure out what stochastic process they might correspond to in the environment, that would be cool
- ... but i was initially interested in reading compmech stuff not with a particular alignment relevant thread in mind but rather because it seemed broadly similar in directions to natural abstractions.
re: how my focus would differ from my impression of current compmech work done in academia: academia seems faaaaaar less focused on actually trying out epsilon reconstruction in real world noisy data. CSSR is an example of a reconstruction algorithm. apparently people did compmech stuff on real-world data, don't know how good, but effort-wise far too less invested compared to theory work
- would be interested in these reconstruction algorithms, eg what are the bottlenecks to scaling them up, etc.
tangent: epsilon transducers seem cool. if the reconstruction algorithm is good, a prototypical example i'm thinking of is something like: pick some input-output region within a model, and literally try to discover the hmm model reconstructing it? of course it's gonna be unwieldly large. but, to shift the thread in the direction of bright-eyed theorizing ...
the foundational Calculi of Emergence paper talked about the possibility of hierarchical epsilon machines, where you do epsilon machines on top of epsilon machines and for simple examples where you can analytically do this, you get wild things like coming up with more and more compact representations of stochastic processes (eg data stream -> tree -> markov model -> stack automata -> ... ?)
- this ... sounds like natural abstractions in its wildest dreams? literally point at some raw datastream and automatically build hierarchical abstractions that get more compact as you go up
- haha but alas, (almost) no development afaik since the original paper. seems cool
and also more tangentially, compmech seemed to have a lot to talk about providing interesting semantics to various information measures aka True Names [LW · GW], so another angle i was interested in was to learn about them.
- eg crutchfield talks a lot about developing a right notion of information flow - obvious usefulness in eg formalizing boundaries?
- many other information measures from compmech with suggestive semantics—cryptic order? gauge information? synchronization order? check ruro1 and ruro2 for more.

elizabeth-1 on Elizabeth's Shortform

citric acid and a polymer

ryan_greenblatt on "AI Safety for Fleshy Humans" an AI Safety explainer by Nicky Case

Random error:

Exponential Takeoff:

AI's capabilities grow exponentially, like an economy or pandemic.

(Oddly, this scenario often gets called "Slow Takeoff"! It's slow compared to "FOOM".)

Actually, this isn't how people (in the AI safety community) generally use the term slow takeoff.

Quoting from the blog post by Paul:

Futurists have argued for years about whether the development of AGI will look more like a breakthrough within a small group (“fast takeoff”), or a continuous acceleration distributed across the broader economy or a large firm (“slow takeoff”).

[...]

(Note: this is not a post about whether an intelligence explosion will occur. That seems very likely to me. Quantitatively I expect it to go along these lines. So e.g. while I disagree with many of the claims and assumptions in Intelligence Explosion Microeconomics, I don’t disagree with the central thesis or with most of the arguments.)

Slow takeoff still can involve a singularity (aka an intelligence explosion).

The terms "fast/slow takeoff" are somewhat bad because they are often used to discuss two different questions:

How long does it take from the point where AI is seriously useful/important (e.g. results in 5% additional GDP growth per year in the US) to go to AIs which are much smarter than humans? (What people would normally think of as fast vs slow.)
Is takeoff discontinuous vs continuous?

And this explainer introduces a third idea:

Is takeoff exponential or does it have a singularity (hyperbolic growth)?

habryka4 on Nicky Case AI Safety explainer [Linkpost]

Done and fixed the accidental pronoun.

joseph-miller on Why I'm doing PauseAI

While I want people to support PauseAI

the small movement that PauseAI builds now will be the foundation which bootstraps this larger movement in the future

Is one of the main points of my post. If you support PauseAI today you may unleash a force which you cannot control tomorrow.