LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Scenario Forecasting Workshop: Materials and Learnings
elifland · 2024-03-08T02:30:46.517Z · comments (3)

Gemini 1.0
Zvi · 2023-12-07T14:40:05.243Z · comments (7)

On Overhangs and Technological Change
Roko · 2023-11-05T22:58:51.306Z · comments (19)

n of m ring signatures
DanielFilan · 2023-12-04T20:00:06.580Z · comments (7)

[link] on the dollar-yen exchange rate
bhauth · 2024-04-07T04:49:53.920Z · comments (21)

AI #52: Oops
Zvi · 2024-02-22T21:50:07.393Z · comments (9)

Toy models of AI control for concentrated catastrophe prevention
Fabien Roger (Fabien) · 2024-02-06T01:38:19.865Z · comments (2)

Transfer learning and generalization-qua-capability in Babbage and Davinci (or, why division is better than Spanish)
RP (Complex Bubble Tea) · 2024-02-09T07:00:45.825Z · comments (6)

Observations on Teaching for Four Weeks
ClareChiaraVincent · 2024-05-06T16:55:59.315Z · comments (14)

[link] Announcing Human-aligned AI Summer School
Jan_Kulveit · 2024-05-22T08:55:10.839Z · comments (0)

Apply to the Conceptual Boundaries Workshop for AI Safety
Chipmonk · 2023-11-27T21:04:59.037Z · comments (0)

Goal-Completeness is like Turing-Completeness for AGI
Liron · 2023-12-19T18:12:29.947Z · comments (26)

Changes in College Admissions
Zvi · 2024-04-24T13:50:03.487Z · comments (11)

Paper in Science: Managing extreme AI risks amid rapid progress
JanB (JanBrauner) · 2024-05-23T08:40:40.678Z · comments (2)

The Shortest Path Between Scylla and Charybdis
Thane Ruthenis · 2023-12-18T20:08:34.995Z · comments (8)

Altman firing retaliation incoming?
trevor (TrevorWiesinger) · 2023-11-19T00:10:15.645Z · comments (23)

GPT-2030 and Catastrophic Drives: Four Vignettes
jsteinhardt · 2023-11-10T07:30:06.480Z · comments (5)

[link] An Opinionated Evals Reading List
Marius Hobbhahn (marius-hobbhahn) · 2024-10-15T14:38:58.778Z · comments (0)

[link] Prices are Bounties
Maxwell Tabarrok (maxwell-tabarrok) · 2024-10-12T14:51:40.689Z · comments (12)

Applications of Chaos: Saying No (with Hastings Greer)
Elizabeth (pktechgirl) · 2024-09-21T16:30:07.415Z · comments (16)

AI #82: The Governor Ponders
Zvi · 2024-09-19T13:30:04.863Z · comments (8)

[link] Peak Human Capital
PeterMcCluskey · 2024-09-30T21:13:30.421Z · comments (2)

Please do not use AI to write for you
Richard_Kennaway · 2024-08-21T09:53:34.425Z · comments (34)

Interoperable High Level Structures: Early Thoughts on Adjectives
johnswentworth · 2024-08-22T21:12:38.223Z · comments (1)

Tall Tales at Different Scales: Evaluating Scaling Trends For Deception In Language Models
Felix Hofstätter · 2023-11-08T11:37:43.997Z · comments (0)

[link] Anthropic announces interpretability advances. How much does this advance alignment?
Seth Herd · 2024-05-21T22:30:52.638Z · comments (4)

They are made of repeating patterns
quetzal_rainbow · 2023-11-13T18:17:43.189Z · comments (4)

Public Weights?
jefftk (jkaufman) · 2023-11-02T02:50:18.095Z · comments (19)

Bounty: Diverse hard tasks for LLM agents
Beth Barnes (beth-barnes) · 2023-12-17T01:04:05.460Z · comments (31)

[question] why did OpenAI employees sign
bhauth · 2023-11-27T05:21:28.612Z · answers+comments (23)

Should rationalists be spiritual / Spirituality as overcoming delusion
Kaj_Sotala · 2024-03-25T16:48:08.397Z · comments (57)

Job listing: Communications Generalist / Project Manager
Gretta Duleba (gretta-duleba) · 2023-11-06T20:21:03.721Z · comments (7)

[link] in defense of Linus Pauling
bhauth · 2024-06-03T21:27:43.962Z · comments (8)

Notes on control evaluations for safety cases
ryan_greenblatt · 2024-02-28T16:15:17.799Z · comments (0)

Wrong answer bias
lukehmiles (lcmgcd) · 2024-02-01T20:05:38.573Z · comments (24)

[link] Chapter 1 of How to Win Friends and Influence People
gull · 2024-01-28T00:32:52.865Z · comments (5)

AI #67: Brief Strange Trip
Zvi · 2024-06-06T18:50:03.514Z · comments (6)

Consent across power differentials
Ramana Kumar (ramana-kumar) · 2024-07-09T11:42:03.177Z · comments (12)

[LDSL#0] Some epistemological conundrums
tailcalled · 2024-08-07T19:52:55.688Z · comments (10)

So you want to work on technical AI safety
gw · 2024-06-24T14:29:57.481Z · comments (3)

The Broken Screwdriver and other parables
bhauth · 2024-03-04T03:34:38.807Z · comments (1)

AI #58: Stargate AGI
Zvi · 2024-04-04T13:10:06.342Z · comments (9)

Book Review: Righteous Victims - A History of the Zionist-Arab Conflict
Yair Halberstadt (yair-halberstadt) · 2024-06-24T11:02:03.490Z · comments (8)

An issue with training schemers with supervised fine-tuning
Fabien Roger (Fabien) · 2024-06-27T15:37:56.020Z · comments (12)

[link] DM Parenting
Shoshannah Tekofsky (DarkSym) · 2024-07-16T08:50:08.144Z · comments (4)

Why the Best Writers Endure Isolation
Declan Molony (declan-molony) · 2024-07-16T05:58:25.032Z · comments (6)

Philosophers wrestling with evil, as a social media feed
David Gross (David_Gross) · 2024-06-03T22:25:22.507Z · comments (2)

Mechanistic Interpretability Workshop Happening at ICML 2024!
Neel Nanda (neel-nanda-1) · 2024-05-03T01:18:26.936Z · comments (6)

[link] On scalable oversight with weak LLMs judging strong LLMs
zac_kenton (zkenton) · 2024-07-08T08:59:58.523Z · comments (18)

[link] JumpReLU SAEs + Early Access to Gemma 2 SAEs
Senthooran Rajamanoharan (SenR) · 2024-07-19T16:10:54.664Z · comments (10)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

nathan-helm-burger on [Linkpost] Hawkish nationalism vs international AI power and benefit sharing

In the scope of this article, we will consider Transformative Artificial Intelligence (TAI) according to a common (albeit vague) definition where TAI is ‘an AI that precipitates a transition comparable to (or more significant than) the agricultural or industrial revolution’. We will not consider the impact of ASI which achieves performance at a level far above humans. The enormous power of ASI means that social interactions, economic mechanisms and geopolitical dynamics are likely to be transformed in ways so significant that current forecasting is marred by uncertainty. In particular, if one actor obtains a ‘decisive strategic advantage’, the decisions made by that actor plausibly make it very hard to predict the outcome. We believe that the relevance of this article increases with longer TAI timelines and longer transition periods between TAI and ASI. This is because slower development speeds increase the probability of multipolar outcomes and give more time for governance and collaboration to come into effect. While in a fast takeoff scenario the systemic and misuse risks will certainly be severe, the potential misalignment risks stemming from a hastily developed ASI might dwarf them.

I like your article, it has some pleasant forecasts and nice recommendations for what to do if our near future turns out to be serendipitously comfortable. Makes me feel good to read it, and imagine such a future in store for us.

I also like that you are explicit about the fact that the recommendations in this article apply only to a particular subset of possible futures.

Where we get to transformative AI, but not further. Either humanity comes together on a global pause, or the technology does not allow for rapid advancement. The TAI in this possible future does not unlock the ability of defectors-from-the-pause to defeat the entire rest of the world combined.

Also, the possible weirdness is kept in check. Autonomous seafloor or subterranean robots don't kickstart a new industrial base. Local space isn't colonized by autonomous robotic probes. Humanity has immense world-changing power at its fingertips, and chooses to proceed slowly and carefully, in peaceful cooperation.

I'm working on a similar article, which aims to address a wider set of future possibilities. I described this path that you outlined here "The Gentle Path". I'll cite your article as an example of this hopeful option.

I think someone more pessimistic than me might describe this article with an analogy along the lines of: "How we recommend arranging our new garden furniture after we win the lottery."

That would be a bit unfair. I'd give this space of future possibilities better than a 1:1000 chance. Still less than 1%, but worth keeping in mind.

anthonyc on What's a good book for a technically-minded 11-year old?

I am assuming you're trying not to overly bias the responses, but there's obviously a lot of facts about this particular 11 year old that could very much change the recommendations. Without that context, I'm trying to think back to what I was reading at that age (in the late 90s, so not much new on my list, there may be better out there now), or what I wish I'd known to read at that age. I think at that age the thing I most enjoyed and most grew from were books that taught me how to play with concepts and ideas in a principled way in order to think about them in new contexts, as well as some history of science and math.

Any of the popular math books by Robert and Ellen Kaplan or Ian Stewart, including Ian Stewart's fiction. Things like The Art of the Infinite, or Flatterland.
Carl Sagan's Cosmos or The Dragons of Eden
Basically anything by Asimov. Also Terry Pratchett if the kid is more fantasy- than sci-fi- oriented (less science, but plenty of philosophy and reasoning about the implications of new and old ideas)
Alice in Wonderland if you include context about what Lewis Caroll was really talking about. Also The Phantom Tollbooth if they haven't read it yet. Even older classics like things by Jules Verne and HG Wells can be interesting choices. If the kid has only ever read more modern sci-fi and fantasy, it can be interesting to learn how people used to think about the future or other worlds. I personally really liked Last and First Men and Starmaker; I think I read those around the end of high school, but nothing in them was particularly advanced IIRC.
It might be a little early for Godel Escher Bach unless they're particularly into formal mathematics, but probably in another 2-3 years that's one to put on the list.
Plato and Aristotle. At this age the kid has technically been taught all the math and science needed to understand anything they wrote.
This one is more dependent on specific interests, but growing up I loved to cook, and lived in a house of people who loved to cook. For me, books like On Food and Cooking by Harold McGee (and later, The Food Lab by Kenji Lopez-Alt and Cooking for Geeks by Jeff Potter) were basically my introduction to organic chemistry. In my sister's words, for me, the kitchen is just another lab. They also helped build some very useful life skills.
Not sure if there's a specific reason for limiting to just books, but there's also some excellent youtube channels I'd recommend to such a kid. Veritasium, eigenchris, Vsauce, numberphile, Isaac Arthur, and even things like TierZoo can be great. Some of the videos may be a bit advanced for now, but just skip around and explore.
This is also an "11 is probably too young for most" recommendation, there are some parts that are somewhat disturbing, but Ra by qntm (also available as a free online serial) is excellent but very weird. It has the hardest of hard magic systems. Like, literally, the word "mage" in story is "MagE" because they're a kind of engineer. There's even a whole appendix on why "make something invisible" is actually really complicated.
If they like literature and wordplay as well as more mathy technical stuff, Jasper Fforde's Thursday Next books are a lot of fun. Surprising density of math, philosophy, and sci-fi concepts for a series about jumping into classic novels. Good for sparking "Hey, what are they referencing here?" conversations.

green_leaf on Change My Mind: Thirders in "Sleeping Beauty" are Just Doing Epistemology Wrong

If the SB always guesses heads, she'll be correct of the time. For that reason, that is her credence.

david-johnston on The Hidden Complexity of Wishes

Here's a basic model of policy collapse: suppose there exist pathological policies of low prior probability (/high algorithmic complexity) such that they play the training game when it is strategically wise to do so, and when they get a good opportunity they defect in order to pursue some unknown aim.

Because they play the training game, a wide variety of training objectives will collapse to one of these policies if the system in training starts exploring policies of sufficiently high algorithmic complexity. So, according to this crude model, there's a complexity bound: stay under it and you're fine, go over it and you get pathological behaviour. Roughly, whatever desired behaviour requires the most algorithmically complex policy is the one that is most pertinent for assessing policy collapse risk (because that's the one that contributes most of the algorithmic complexity, and so it give your first order estimate of whether or not you're crossing the collapse threshold). So, which desired behaviour requires the most complex policy: is it, for example, respecting commonsense moral constraints, or is it inventing molecular nanotechnology?

Tangentially, the policy collapse theory does not predict outcomes that look anything like malicious compliance. It predicts that, if you're in a position of power over the AI system, your mother is saved exactly as you want her to be. If you are not in such a position then your mother is not saved at all and you get a nanobot war instead or something. That is, if you do run afoul of policy collapse, it doesn't matter if you want your system to pursue simple or complex goals, you're up shit creek either way.

roger-dearnaley on Fact Finding: Attempting to Reverse-Engineer Factual Recall on the Neuron Level (Post 1)

I think something that might be quite illuminating for this factual recall question (as well has having potential safety uses) is editing the facts. Suppose, for this case, you take just the layer 2-6 MLPs with frozen attention, you warm-start from the existing model, add a loss function term that penalizes changes to the model away from that initialization (perhaps using a combination of L1 and L2 norms), and train that model on a dataset that consists of your full set of athlete names embeddings and their sports (in a frequency ratio matching the original training data, so with more copies of data about more famous athletes), but with one factual edit to it to change the sport of one athlete. Train until it has memorized this slightly edited set of facts reasonably well, and then look at what the pattern of weights and neurons that changed significantly is. Repeat several times for the same athlete, and see how consistent/stochastic the results are. Repeat for many athlete/sport edit combinations. This should give you a lot of information on how widely distributed the representation of the data on a specific athlete is, and ought to give you enough information to distinguish between not only your single step vs hashing hypotheses, but also things like bagging and combinations of different algorithms. Even more interesting would be to do this not just for an athlete's sport, but also some independent fact about them like their number.

Of course, this is computationally somewhat expensive, and you do need to find metaparameters that don't make this too expensive, while encouraging the resulting change to be as localized as possible consistent with getting a good score on the altered fact.

radford-neal-1 on What's a good book for a technically-minded 11-year old?

I re-read "I Robot" recently, and I don't think it's particularly good. A better Asimov is "The Gods Themselves" (but note that there is some degree of sexuality, though not of the sort I would say that an 11-year should be shielded from).

I'd also recommend "The Flying Sorcerers", by David Gerrold and Larry Niven. It helps if they've read some other science fiction (this is sf, not fantasy), in order to get the puns.

nathan-helm-burger on Bitter lessons about lucid dreaming

I learned about lucid dreaming at 13 and decided to try it. I practiced self-hynosis trance states while falling asleep. Took a couple weeks of practice before getting it to work. After that, it was easy.

Too easy.

Couldn't turn it off. Every dream became a lucid dream, except no, they were lucid dreams within dreams. I'd wake up and go to school then wake up and realize I was still dreaming. Real life and dreams began to blur together. My dreams were so vivid I couldn't tell them apart reliably. My dreams would alternate between lucid and not, nightmares crept in. It was thoroughly unnerving. So, I did the same self-hypnosis practice in reverse. A week of so of demanding normalcy again, and my dreams went back to normal.

Eventually, in grad school, I went through a similar process to stop myself from remembering my dreams, because I was so stressed out that all my dreams were nightmares.

After grad school I went back to allowing myself to occasionally remember dreams, at least fuzzily while waking or recalling the previous night's dream while drifting to sleep.

abstractapplic on D&D Sci Coliseum: Arena of Data

>only the last 12 having boots 2 and gauntlets 3 (likely post-theft)

Didn't notice that but it confirms my theory, nice.

>It seems to me that they appear both as red and black, though.

Ah, I see where the error in my code was that made me think otherwise. Strange coincidence: I thought "oh yeah a powerful wealthy elf ninja who pointedly wears black when assigned red clothes, what a neat but oddly specific 8-bit theater reference" and then it turned out to be a glitch.

archimedes on LLMs can learn about themselves by introspection

Seeing the distribution calibration you point out does update my opinion a bit.

I feel like there’s still a significant distinction though between adding one calculation step to the question versus asking it to model multiple responses. It would have to model its own distribution in a single pass rather than having the distributions measured over multiple passes align (which I’d expect to happen if the fine-tuning teaches it the hypothetical is just like adding a calculation to the end).

As an analogy, suppose I have a pseudorandom black box function that returns an integer. In order to approximate the distribution of its outputs mod 10, I don’t have to know anything about the function; I just can just sample the function and apply mod 10 post hoc. If I want to say something about this distribution without multiple samples, then I actually have to know something about the function.

linch on What actual bad outcome has "ethics-based" RLHF AI Alignment already prevented?

They were likely using inferior techniques to RLHF to implement ~Google corporate standards; not sure what you mean by "ethics-based," presumably they have different ethics than you (or LW) do,es but intent alignment has always been about doing what the user/operator wants, not about solving ethics.