LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] A model of research skill
L Rudolf L (LRudL) · 2024-01-08T00:13:12.755Z · comments (6)

The Best of Don’t Worry About the Vase
Zvi · 2023-12-13T12:50:02.510Z · comments (4)

Conditional prediction markets are evidential, not causal
philh · 2024-02-07T21:52:47.476Z · comments (10)

Owain Evans on Situational Awareness and Out-of-Context Reasoning in LLMs
Michaël Trazzi (mtrazzi) · 2024-08-24T04:30:11.807Z · comments (0)

Against empathy-by-default
Steven Byrnes (steve2152) · 2024-10-16T16:38:49.926Z · comments (16)

AI things that are perhaps as important as human-controlled AI
Chi Nguyen · 2024-03-03T18:07:24.291Z · comments (4)

A Path out of Insufficient Views
Unreal · 2024-09-24T20:00:27.332Z · comments (46)

Open Call for Research Assistants in Developmental Interpretability
Jesse Hoogland (jhoogland) · 2023-08-30T09:02:59.781Z · comments (11)

It's OK to be biased towards humans
dr_s · 2023-11-11T11:59:16.568Z · comments (69)

Genetic fitness is a measure of selection strength, not the selection target
Kaj_Sotala · 2023-11-04T19:02:13.783Z · comments (43)

Towards a formalization of the agent structure problem
Alex_Altair · 2024-04-29T20:28:15.190Z · comments (5)

Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision
leogao · 2023-12-16T05:39:10.558Z · comments (5)

Protest against Meta's irreversible proliferation (Sept 29, San Francisco)
Holly_Elmore · 2023-09-19T23:40:30.202Z · comments (33)

[link] Google Gemini Announced
Jacob G-W (g-w1) · 2023-12-06T16:14:07.192Z · comments (22)

Ten Modes of Culture War Discourse
jchan · 2024-01-31T13:58:20.572Z · comments (15)

Math-to-English Cheat Sheet
nahoj · 2024-04-08T09:19:40.814Z · comments (5)

AI #27: Portents of Gemini
Zvi · 2023-08-31T12:40:05.631Z · comments (37)

Complexity of value but not disvalue implies more focus on s-risk. Moral uncertainty and preference utilitarianism also do.
Chi Nguyen · 2024-02-23T06:10:05.881Z · comments (18)

Safe Predictive Agents with Joint Scoring Rules
Rubi J. Hudson (Rubi) · 2024-10-09T16:38:16.535Z · comments (10)

[link] OpenAI releases GPT-4o, natively interfacing with text, voice and vision
Martín Soto (martinsq) · 2024-05-13T18:50:52.337Z · comments (23)

On Anthropic’s Sleeper Agents Paper
Zvi · 2024-01-17T16:10:05.145Z · comments (5)

[link] [Closed] Agent Foundations track in MATS
Vanessa Kosoy (vanessa-kosoy) · 2023-10-31T08:12:50.482Z · comments (1)

[link] Theories of Change for AI Auditing
Lee Sharkey (Lee_Sharkey) · 2023-11-13T19:33:43.928Z · comments (0)

[link] On the Role of Proto-Languages
adamShimi · 2024-09-22T16:50:34.720Z · comments (1)

AI #44: Copyright Confrontation
Zvi · 2023-12-28T14:30:10.237Z · comments (13)

Dating Roundup #2: If At First You Don’t Succeed
Zvi · 2024-01-02T16:00:04.955Z · comments (29)

[link] Land Reclamation is in the 9th Circle of Stagnation Hell
Maxwell Tabarrok (maxwell-tabarrok) · 2024-01-12T13:36:27.159Z · comments (6)

Safe Stasis Fallacy
Davidmanheim · 2024-02-05T10:54:44.061Z · comments (2)

Thiel on AI & Racing with China
Ben Pace (Benito) · 2024-08-20T03:19:18.966Z · comments (10)

[link] Come to Manifest 2024 (June 7-9 in Berkeley)
Saul Munn (saul-munn) · 2024-03-27T21:30:17.306Z · comments (2)

[link] Unlocking Solutions—By Understanding Coordination Problems
James Stephen Brown (james-brown) · 2024-07-27T04:52:13.435Z · comments (4)

[link] Questions are usually too cheap
Nathan Young · 2024-05-11T13:00:54.302Z · comments (19)

[Closed] PIBBSS is hiring in a variety of roles (alignment research and incubation program)
Nora_Ammann · 2024-04-09T08:12:59.241Z · comments (0)

On “first critical tries” in AI alignment
Joe Carlsmith (joekc) · 2024-06-05T00:19:02.814Z · comments (8)

[link] the micro-fulfillment cambrian explosion
bhauth · 2023-12-04T01:15:34.342Z · comments (5)

Monthly Roundup #17: April 2024
Zvi · 2024-04-15T12:10:03.126Z · comments (4)

We are headed into an extreme compute overhang
devrandom · 2024-04-26T21:38:21.694Z · comments (33)

AI #37: Moving Too Fast
Zvi · 2023-11-09T17:50:04.324Z · comments (5)

AI #40: A Vision from Vitalik
Zvi · 2023-11-30T17:30:08.350Z · comments (12)

Calendar feature geometry in GPT-2 layer 8 residual stream SAEs
Patrick Leask (patrickleask) · 2024-08-17T01:16:53.764Z · comments (0)

Causal Graphs of GPT-2-Small's Residual Stream
David Udell · 2024-07-09T22:06:55.775Z · comments (7)

Human wanting
TsviBT · 2023-10-24T01:05:39.374Z · comments (1)

AI #71: Farewell to Chevron
Zvi · 2024-07-04T13:40:05.905Z · comments (9)

AMA: Earning to Give
jefftk (jkaufman) · 2023-11-07T16:20:10.972Z · comments (8)

AI #50: The Most Dangerous Thing
Zvi · 2024-02-08T14:30:13.168Z · comments (4)

Trading off Lives
jefftk (jkaufman) · 2024-01-03T03:40:05.603Z · comments (12)

Per protocol analysis as medical malpractice
braces · 2024-01-31T16:22:21.367Z · comments (8)

Be More Katja
Nathan Young · 2024-03-11T21:12:14.249Z · comments (0)

The lost millennium
Ege Erdil (ege-erdil) · 2023-08-24T03:48:40.035Z · comments (14)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

nathan-helm-burger on [Linkpost] Hawkish nationalism vs international AI power and benefit sharing

In the scope of this article, we will consider Transformative Artificial Intelligence (TAI) according to a common (albeit vague) definition where TAI is ‘an AI that precipitates a transition comparable to (or more significant than) the agricultural or industrial revolution’. We will not consider the impact of ASI which achieves performance at a level far above humans. The enormous power of ASI means that social interactions, economic mechanisms and geopolitical dynamics are likely to be transformed in ways so significant that current forecasting is marred by uncertainty. In particular, if one actor obtains a ‘decisive strategic advantage’, the decisions made by that actor plausibly make it very hard to predict the outcome. We believe that the relevance of this article increases with longer TAI timelines and longer transition periods between TAI and ASI. This is because slower development speeds increase the probability of multipolar outcomes and give more time for governance and collaboration to come into effect. While in a fast takeoff scenario the systemic and misuse risks will certainly be severe, the potential misalignment risks stemming from a hastily developed ASI might dwarf them.

I like your article, it has some pleasant forecasts and nice recommendations for what to do if our near future turns out to be serendipitously comfortable. Makes me feel good to read it, and imagine such a future in store for us.

I also like that you are explicit about the fact that the recommendations in this article apply only to a particular subset of possible futures.

Where we get to transformative AI, but not further. Either humanity comes together on a global pause, or the technology does not allow for rapid advancement. The TAI in this possible future does not unlock the ability of defectors-from-the-pause to defeat the entire rest of the world combined.

Also, the possible weirdness is kept in check. Autonomous seafloor or subterranean robots don't kickstart a new industrial base. Local space isn't colonized by autonomous robotic probes. Humanity has immense world-changing power at its fingertips, and chooses to proceed slowly and carefully, in peaceful cooperation.

I'm working on a similar article, which aims to address a wider set of future possibilities. I described this path that you outlined here "The Gentle Path". I'll cite your article as an example of this hopeful option.

I think someone more pessimistic than me might describe this article with an analogy along the lines of: "How we recommend arranging our new garden furniture after we win the lottery."

That would be a bit unfair. I'd give this space of future possibilities better than a 1:1000 chance. Still less than 1%, but worth keeping in mind.

anthonyc on What's a good book for a technically-minded 11-year old?

I am assuming you're trying not to overly bias the responses, but there's obviously a lot of facts about this particular 11 year old that could very much change the recommendations. Without that context, I'm trying to think back to what I was reading at that age (in the late 90s, so not much new on my list, there may be better out there now), or what I wish I'd known to read at that age. I think at that age the thing I most enjoyed and most grew from were books that taught me how to play with concepts and ideas in a principled way in order to think about them in new contexts, as well as some history of science and math.

Any of the popular math books by Robert and Ellen Kaplan or Ian Stewart, including Ian Stewart's fiction. Things like The Art of the Infinite, or Flatterland.
Carl Sagan's Cosmos or The Dragons of Eden
Basically anything by Asimov. Also Terry Pratchett if the kid is more fantasy- than sci-fi- oriented (less science, but plenty of philosophy and reasoning about the implications of new and old ideas)
Alice in Wonderland if you include context about what Lewis Caroll was really talking about. Also The Phantom Tollbooth if they haven't read it yet. Even older classics like things by Jules Verne and HG Wells can be interesting choices. If the kid has only ever read more modern sci-fi and fantasy, it can be interesting to learn how people used to think about the future or other worlds. I personally really liked Last and First Men and Starmaker; I think I read those around the end of high school, but nothing in them was particularly advanced IIRC.
It might be a little early for Godel Escher Bach unless they're particularly into formal mathematics, but probably in another 2-3 years that's one to put on the list.
Plato and Aristotle. At this age the kid has technically been taught all the math and science needed to understand anything they wrote.
This one is more dependent on specific interests, but growing up I loved to cook, and lived in a house of people who loved to cook. For me, books like On Food and Cooking by Harold McGee (and later, The Food Lab by Kenji Lopez-Alt and Cooking for Geeks by Jeff Potter) were basically my introduction to organic chemistry. In my sister's words, for me, the kitchen is just another lab. They also helped build some very useful life skills.
Not sure if there's a specific reason for limiting to just books, but there's also some excellent youtube channels I'd recommend to such a kid. Veritasium, eigenchris, Vsauce, numberphile, Isaac Arthur, and even things like TierZoo can be great. Some of the videos may be a bit advanced for now, but just skip around and explore.

green_leaf on Change My Mind: Thirders in "Sleeping Beauty" are Just Doing Epistemology Wrong

If the SB always guesses heads, she'll be correct of the time. For that reason, that is her credence.

david-johnston on The Hidden Complexity of Wishes

Here's a basic model of policy collapse: suppose there exist pathological policies of low prior probability (/high algorithmic complexity) such that they play the training game when it is strategically wise to do so, and when they get a good opportunity they defect in order to pursue some unknown aim.

Because they play the training game, a wide variety of training objectives will collapse to one of these policies if the system in training starts exploring policies of sufficiently high algorithmic complexity. So, according to this crude model, there's a complexity bound: stay under it and you're fine, go over it and you get pathological behaviour. Roughly, whatever desired behaviour requires the most algorithmically complex policy is the one that is most pertinent for assessing policy collapse risk (because that's the one that contributes most of the algorithmic complexity, and so it give your first order estimate of whether or not you're crossing the collapse threshold). So, which desired behaviour requires the most complex policy: is it, for example, respecting commonsense moral constraints, or is it inventing molecular nanotechnology?

Tangentially, the policy collapse theory does not predict outcomes that look anything like malicious compliance. It predicts that, if you're in a position of power over the AI system, your mother is saved exactly as you want her to be. If you are not in such a position then your mother is not saved at all and you get a nanobot war instead or something. That is, if you do run afoul of policy collapse, it doesn't matter if you want your system to pursue simple or complex goals, you're up shit creek either way.

roger-dearnaley on Fact Finding: Attempting to Reverse-Engineer Factual Recall on the Neuron Level (Post 1)

I think something that might be quite illuminating for this factual recall question (as well has having potential safety uses) is editing the facts. Suppose, for this case, you take just the layer 2-6 MLPs with frozen attention, you warm-start from the existing model, add a loss function term that penalizes changes to the model away from that initialization (perhaps using a combination of L1 and L2 norms), and train that model on a dataset that consists of your full set of athlete names embeddings and their sports (in a frequency ratio matching the original training data, so with more copies of data about more famous athletes), but with one factual edit to it to change the sport of one athlete. Train until it has memorized this slightly edited set of facts reasonably well, and then look at what the pattern of weights and neurons that changed significantly is. Repeat several times for the same athlete, and see how consistent/stochastic the results are. Repeat for many athlete/sport edit combinations. This should give you a lot of information on how widely distributed the representation of the data on a specific athlete is, and ought to give you enough information to distinguish between not only your single step vs hashing hypotheses, but also things like bagging and combinations of different algorithms. Even more interesting would be to do this not just for an athlete's sport, but also some independent fact about them like their number.

Of course, this is computationally somewhat expensive, and you do need to find metaparameters that don't make this too expensive, while encouraging the resulting change to be as localized as possible consistent with getting a good score on the altered fact.

radford-neal-1 on What's a good book for a technically-minded 11-year old?

I re-read "I Robot" recently, and I don't think it's particularly good. A better Asimov is "The Gods Themselves" (but note that there is some degree of sexuality, though not of the sort I would say that an 11-year should be shielded from).

I'd also recommend "The Flying Sorcerers", by David Gerrold and Larry Niven. It helps if they've read some other science fiction (this is sf, not fantasy), in order to get the puns.

nathan-helm-burger on Bitter lessons about lucid dreaming

I learned about lucid dreaming at 13 and decided to try it. I practiced self-hynosis trance states while falling asleep. Took a couple weeks of practice before getting it to work. After that, it was easy.

Too easy.

Couldn't turn it off. Every dream became a lucid dream, except no, they were lucid dreams within dreams. I'd wake up and go to school then wake up and realize I was still dreaming. Real life and dreams began to blur together. My dreams were so vivid I couldn't tell them apart reliably. My dreams would alternate between lucid and not, nightmares crept in. It was thoroughly unnerving. So, I did the same self-hypnosis practice in reverse. A week of so of demanding normalcy again, and my dreams went back to normal.

Eventually, in grad school, I went through a similar process to stop myself from remembering my dreams, because I was so stressed out that all my dreams were nightmares.

After grad school I went back to allowing myself to occasionally remember dreams, at least fuzzily while waking or recalling the previous night's dream while drifting to sleep.

abstractapplic on D&D Sci Coliseum: Arena of Data

>only the last 12 having boots 2 and gauntlets 3 (likely post-theft)

Didn't notice that but it confirms my theory, nice.

>It seems to me that they appear both as red and black, though.

Ah, I see where the error in my code was that made me think otherwise. Strange coincidence: I thought "oh yeah a powerful wealthy elf ninja who pointedly wears black when assigned red clothes, what a neat but oddly specific 8-bit theater reference" and then it turned out to be a glitch.

archimedes on LLMs can learn about themselves by introspection

Seeing the distribution calibration you point out does update my opinion a bit.

I feel like there’s still a significant distinction though between adding one calculation step to the question versus asking it to model multiple responses. It would have to model its own distribution in a single pass rather than having the distributions measured over multiple passes align (which I’d expect to happen if the fine-tuning teaches it the hypothetical is just like adding a calculation to the end).

As an analogy, suppose I have a pseudorandom black box function that returns an integer. In order to approximate the distribution of its outputs mod 10, I don’t have to know anything about the function; I just can just sample the function and apply mod 10 post hoc. If I want to say something about this distribution without multiple samples, then I actually have to know something about the function.

linch on What actual bad outcome has "ethics-based" RLHF AI Alignment already prevented?

They were likely using inferior techniques to RLHF to implement ~Google corporate standards; not sure what you mean by "ethics-based," presumably they have different ethics than you (or LW) do,es but intent alignment has always been about doing what the user/operator wants, not about solving ethics.