LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

0.202 Bits of Evidence In Favor of Futarchy
niplav · 2024-09-29T21:57:59.896Z · comments (0)

[link] AISafety.info: What is the "natural abstractions hypothesis"?
Algon · 2024-10-05T12:31:14.195Z · comments (2)

[link] Characterizing stable regions in the residual stream of LLMs
Jett Janiak (jett) · 2024-09-26T13:44:58.792Z · comments (4)

LASR Labs Spring 2025 applications are open!
Erin Robertson · 2024-10-04T13:44:20.524Z · comments (0)

[link] A Percentage Model of a Person
Sable · 2024-10-12T17:55:07.560Z · comments (3)

A New Class of Glitch Tokens - BPE Subtoken Artifacts (BSA)
Lao Mein (derpherpize) · 2024-09-20T13:13:26.181Z · comments (7)

Exploring SAE features in LLMs with definition trees and token lists
mwatkins · 2024-10-04T22:15:28.108Z · comments (5)

Glitch Token Catalog - (Almost) a Full Clear
Lao Mein (derpherpize) · 2024-09-21T12:22:16.403Z · comments (3)

COT Scaling implies slower takeoff speeds
Logan Zoellner (logan-zoellner) · 2024-09-28T16:20:00.320Z · comments (56)

I'm creating a deep dive podcast episode about the original Leverage Research - would you like to take part?
spencerg · 2024-09-22T14:03:22.164Z · comments (2)

Distinguish worst-case analysis from instrumental training-gaming
Olli Järviniemi (jarviniemi) · 2024-09-05T19:13:34.443Z · comments (0)

Review Report of Davidson on Takeoff Speeds (2023)
Trent Kannegieter · 2023-12-22T18:48:55.983Z · comments (11)

Medical Roundup #2
Zvi · 2024-04-09T13:40:05.908Z · comments (18)

Games for AI Control
charlie_griffin (cjgriffin) · 2024-07-11T18:40:50.607Z · comments (0)

My best guess at the important tricks for training 1L SAEs
Arthur Conmy (arthur-conmy) · 2023-12-21T01:59:06.208Z · comments (4)

Turning Your Back On Traffic
jefftk (jkaufman) · 2024-07-17T01:00:08.627Z · comments (7)

Otherness and control in the age of AGI
Joe Carlsmith (joekc) · 2024-01-02T18:15:54.168Z · comments (0)

[link] I didn't have to avoid you; I was just insecure
Chipmonk · 2024-08-17T16:41:50.237Z · comments (7)

[link] The Hippie Rabbit Hole -Nuggets of Gold in Rivers of Bullshit
Jonathan Moregård (JonathanMoregard) · 2024-01-05T18:27:01.769Z · comments (20)

Deconfusing In-Context Learning
Arjun Panickssery (arjun-panickssery) · 2024-02-25T09:48:17.690Z · comments (1)

[question] Is a random box of gas predictable after 20 seconds?
Thomas Kwa (thomas-kwa) · 2024-01-24T23:00:53.184Z · answers+comments (35)

Striking Implications for Learning Theory, Interpretability — and Safety?
RogerDearnaley (roger-d-1) · 2024-01-05T08:46:58.915Z · comments (4)

Possible OpenAI's Q* breakthrough and DeepMind's AlphaGo-type systems plus LLMs
Burny · 2023-11-23T03:16:09.358Z · comments (25)

Gated Attention Blocks: Preliminary Progress toward Removing Attention Head Superposition
cmathw · 2024-04-08T11:14:43.268Z · comments (4)

Thousands of malicious actors on the future of AI misuse
Zershaaneh Qureshi (zershaaneh-qureshi) · 2024-04-01T10:08:42.357Z · comments (0)

UDT1.01: The Story So Far (1/10)
Diffractor · 2024-03-27T23:22:35.170Z · comments (6)

On DeepMind’s Frontier Safety Framework
Zvi · 2024-06-18T13:30:21.154Z · comments (4)

The murderous shortcut: a toy model of instrumental convergence
Thomas Kwa (thomas-kwa) · 2024-10-02T06:48:06.787Z · comments (0)

Open Source Replication of Anthropic’s Crosscoder paper for model-diffing
Connor Kissane (ckkissane) · 2024-10-27T18:46:21.316Z · comments (1)

OODA your OODA Loop
Raemon · 2024-10-11T00:50:48.119Z · comments (3)

AI Safety Camp 10
Robert Kralisch (nonmali-1) · 2024-10-26T11:08:09.887Z · comments (7)

AI #49: Bioweapon Testing Begins
Zvi · 2024-02-01T15:30:04.690Z · comments (11)

[link] A High Decoupling Failure
Maxwell Tabarrok (maxwell-tabarrok) · 2024-04-14T19:46:09.552Z · comments (5)

Your LLM Judge may be biased
Henry Papadatos (henry) · 2024-03-29T16:39:22.534Z · comments (9)

[link] Twitter thread on AI takeover scenarios
Richard_Ngo (ricraz) · 2024-07-31T00:24:33.866Z · comments (0)

[link] [Fiction] A Confession
Arjun Panickssery (arjun-panickssery) · 2024-04-18T16:28:48.194Z · comments (2)

AI #66: Oh to Be Less Online
Zvi · 2024-05-30T14:20:03.334Z · comments (6)

What is wisdom?
TsviBT · 2023-11-14T02:13:49.681Z · comments (3)

Free Will and Dodging Anvils: AIXI Off-Policy
Cole Wyeth (Amyr) · 2024-08-29T22:42:24.485Z · comments (12)

[link] WSJ: Inside Amazon’s Secret Operation to Gather Intel on Rivals
trevor (TrevorWiesinger) · 2024-04-23T21:33:08.049Z · comments (5)

[question] Is there software to practice reading expressions?
lsusr · 2024-04-23T21:53:00.679Z · answers+comments (10)

[link] Turning 22 in the Pre-Apocalypse
testingthewaters · 2024-08-22T20:28:25.794Z · comments (14)

Interview with Vanessa Kosoy on the Value of Theoretical Research for AI
WillPetillo · 2023-12-04T22:58:40.005Z · comments (0)

Enhancing intelligence by banging your head on the wall
Bezzi · 2023-12-12T21:00:48.584Z · comments (26)

[link] Dark Skies Book Review
PeterMcCluskey · 2023-12-29T18:28:59.352Z · comments (3)

The Defence production act and AI policy
[deleted] · 2024-03-01T14:26:09.064Z · comments (0)

Principles For Product Liability (With Application To AI)
johnswentworth · 2023-12-10T21:27:41.403Z · comments (55)

The Evolution of Humans Was Net-Negative for Human Values
Zack_M_Davis · 2024-04-01T16:01:10.037Z · comments (1)

Good job opportunities for helping with the most important century
HoldenKarnofsky · 2024-01-18T17:30:03.332Z · comments (0)

[link] Who is Sam Bankman-Fried (SBF) really, and how could he have done what he did? - three theories and a lot of evidence
spencerg · 2023-11-11T01:04:22.747Z · comments (28)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

dagon on Quantum Immortality: A Perspective if AI Doomers are Probably Right

If quantum immortality is true

This is a big if. It may be true (though it also implies that events as unlikely as Boltzmann Brains are true as well), but it's not true in a way that has causal impact on my current predicted experiences. If so, then the VAST VAST MAJORITY of universes don't contain me in the first place, and the also-extreme majority of those that do will have me die.

Assume quantum uncertainty affects how the coins land. I survive the night only if I correctly guess the 10th digit of π and/or all seven coins land heads, otherwise I will be killed in my sleep.

In a literal experiment, where a human researcher kills you based on their observations of coins and calculation of pi, I don't think you should be confident of surviving the night. If you DO survive, you don't learn much about uncorrelated probabilities - there's a near-infinite number of worlds, and fewer and fewer of them will contain you.

I guess this is a variant of option (1) - Deny that QI is meaningful. You don't give up on probability - you can estimate a (1/2)^7 * 1/10 = 0.00078 chance of surviving.

dagon on The Case Against Moral Realism

I think there's a much simpler case against it: show me the instrument readings, or at least tell me the unit of measure.

thomas-kwa on Should CA, TX, OK, and LA merge into a giant swing state, just for elections?

I mention exactly this in paragraph 3.

startattheend on Does the "ancient wisdom" argument have any validity? If a particular teaching or tradition is old, to what extent does this make it more trustworthy?

I worded that a bit badly, I meant I had a hard time thinking of better (meaning kinder) explanations, not better (meaning more likely) explanations. Across all websites I've been on in my life, I have posted more than 100000 comments (resulting in many interactions), so while things like psychoanalyzing people, assuming intentions, and making stereotypes is "bad", I simply have too much training data, and too few incorrect guesses not to do this. I do, however, intentionally overestimate people (since I want to talk to intelligent people, I give people the benefit of doubt for as long as possible) but this means that mistakes are attributed to their intentions, personality or values, rather than careless mistakes or superficial heuristics. In this situation, I've assumed that they're offended by the idea that traditional socities rival the science method in some situations. But it may be something more superficial like "I find short comments to be effortless", "somebody else already said that" or "I didn't understand your explanation and I consider it your fault". But like I said in another comment, I remember the first downvotes being disagreements (red X) rather than regular downvotes, so I took it as meaning "this is wrong" rather than "I don't like this comment". Not that any of this matters very much, admittedly

micahcarroll on Targeted Manipulation and Deception Emerge when Optimizing LLMs for User Feedback

User feedback training reliably leads to emergent manipulation in our experimental scenarios, suggesting that it can lead to it in real user feedback settings too.

kenoubi on Requirements for a Basin of Attraction to Alignment

Sorry, I think it's entirely possible that this is just me not knowing or understanding some of the background material, but where exactly does this diverge from justifying the AI pursuing a goal of maximizing the inclusive genetic fitness of its creators? Which clearly either isn't what humans actually want (there are things humans can do to make themselves have more descendants that no humans, including the specific ones who could take those actions, want to take, because of godshatter) or is just circular (who knows what will maximize inclusive genetic fitness in an environment that is being created, in large part, by the decision of how to promote inclusive genetic fitness?). At some point, your writing started talking about "design goals", but I don't understand why tools / artifacts constructed by evolved creatures, that happen to increase the inclusive genetic fitness of the evolved creatures who constructed them by means other than the design goals of those who constructed them, wouldn't be favored by evolution, and thus part of the "purpose" of the evolved creatures in constructing them; and this doesn't seem like an "error" even in the limit of optimal pursuit of inclusive genetic fitness, this seems to be just what optimal pursuit of IGF would actually do. In other words, I don't want a very powerful human-constructed optimizer to pursue the maximization of human IGF, and I think hardly any other humans do either; but I don't understand in detail why your argument doesn't justify AI pursuit of maximizing human IGF, to the detriment of what humans actually value.

saul-munn on Saul Munn's Shortform

Active Recall and Spaced Repetition are Different Things

Epistemic status: splitting hairs.

There’s been a lot of recent work on memory. This is great, but popular communication of that progress consistently mixes up active recall and spaced repetition. That consistently bugged me — hence this piece.

If you already have a good understanding of active recall and spaced repetition, skim sections I and II, then skip to section III.

Note: this piece doesn’t meticulously cite sources, and will probably be slightly out of date in a few years. I link some great posts that have far more technical substance at the end, if you’re interested in learning more & actually reading the literature.

I. Active Recall

When you want to learn some new topic, or review something you’ve previously learned, you have different strategies at your disposal. Some examples:

Watch a YouTube video on the topic.
Do practice problems.
Review notes you’d previously taken.
Try to explain the topic to a friend.
etc

Some of these boil down to “stuff the information into your head” (YouTube video, reviewing notes) and others boil down to “do stuff that requires you to use/remember the information” (doing practice problems, explaining to a friend). Broadly speaking, the second category — doing stuff that requires you to actively recall the information — is way, way more effective.

That’s called “active recall.”

II. (Efficiently) Spaced Repetition

After you learn something, you’re likely to forget it pretty quickly:

Fortunately, reviewing the thing you learned pushes you back up to 100% retention, and this happens each time you “repeat” a review:

That’s a lot better!

…but that’s also a lot of work. You have to review the thing you learned in intervals, which takes time/effort. So, how can you do the least the number of repetitions to keep your retention as high as possible? In other words — what should be the size of the intervals? Should you space them out every day? Every week? Should you change the size of the spaces between repetitions? How?

As it turns out, efficiently spacing out repetitions of reviews is a pretty well-studied problem. The answer is “riiiight before you’re about to forget it:”

Generally speaking, you should do a review right before it crosses some threshold for retention. What that threshold actually is depends on some fiddly details, but the central idea remains the same: repeating a review riiight before you hit that threshold is the most efficient spacing possible.

This is called (efficiently) spaced repetition. Systems that use spaced repetitions — software, methods, etc — are called “spaced repetition systems” or “SRS.”

III. The difference

Active recall and spaced repetition are independent strategies. One of them (active recall) is a method for reviewing material; the other (effective spaced repetition) is a method for how to best time reviews. You can use one, the other, or both:

Examples of their independence:

You could listen to a lecture on a topic once now, and again a year from now (not active recall, very inefficiently spaced repetition)
You could watch YouTube videos on a topic in efficiently spaced intervals (not active recall, yes spaced repetition)
You could quiz yourself with flashcards once, then never again (yes active recall, no spaced repetition)
You could do flashcards on something in efficiently spaced intervals (both spaced repetition and active recall).

IV. Implications

Why does this matter?

Mostly, it doesn’t, and I’m just splitting hairs. But occasionally, it’s prohibitively difficult to use one method, but still quite possible to use the other. In these cases, the right thing to do isn’t to give up on both — it’s to use the one that works!

For example, you can do a bit of efficiently spaced repetition when learning people’s names, by saying their name aloud:

immediately after learning it (“hi, my name’s Alice” “nice to meet you, Alice!”)
partway through the conversation (“but i’m still not sure of the proposal. what do you think, Alice?”)
at the end of the conversation (“thanks for chatting, Alice!”)
that night (“who did I meet today? oh yeah, Alice!”)

…but it’s a lot more difficult to use active recall to remember people’s names. (The closest I’ve gotten is to try to first bring into my mind’s eye what their face looks like, then to try to remember their name.)

Another example in the opposite direction: learning your way around a city in a car. It’s really easy to do active recall: have Google Maps opened on your phone and ask yourself what the next direction is each time before you look down; guess what the next street is going to be before you get there; etc. But it’s much more difficult to efficiently space your reviews out: review timing ends up mostly in the hands of your travel schedule.

For more on the topic of deliberately using memory systems to quickly learn the geography of a new place, see this post.

startattheend on Does the "ancient wisdom" argument have any validity? If a particular teaching or tradition is old, to what extent does this make it more trustworthy?

That makes sense, I just evaluated the comment in isolation. But I believe that the first few downvotes were as "incorrect" (the red X) rather than regular downvotes (down arrow), which is why the feedback occured to me as simply mistaken (as the comment is not false).

I've noticed, by the way, that most comments posted tend to get downvoted initially and then return to 0 over time. There may be a few regular, highly active users with high standards or something, and less casual users with lower standards which balance them out over time. I've gone to -10 and back before.

bensenberner on Open Thread Fall 2024

Hi! I joined LW in order to post a research paper that I wrote over the summer, but I figured I'd post here first to describe a bit of the journey that led to this paper.

I got into rationality around 14 years ago when I read a blog called "you are not so smart", which pushed me to audit potential biases in myself and others, and to try and understand ideas/systems end-to-end without handwaving.

I studied computer science at university, partially because I liked the idea that with enough time I could understand any code (unlike essays, where investigating bibliographies for the sources of claims might lead to dead ends), and also because software pays well. I specialized in machine learning because I thought that algorithms that could make accurate predictions based on patterns in the world that were too complex for people to hardcode were cool. I had this sense that somewhere, someone must understand the "first principles" behind how to choose a neural network architecture, or that there was some way of reverse-engineering what deep learning models learned. Later I realized that there weren't really first principles regarding optimizing training, and that spending time trying to hardcode priors into models representing high-dimensional data was less effective than just getting more data (and then never understanding what exactly the model had learned).

I did a couple of kaggle competitions and wanted to try industrial machine learning. I took a SWE job on a data-heavy team at a tech company working on the ETLs powering models, and then did some backend work which took me away from large datasets for a couple years. I decided to read through recent deep learning textbooks and re-implement research papers at a self-directed programming retreat. Eventually I was able to work on a large scale recommendation system, but I still felt a long way from the cutting edge, which had evolved to GPT-4. At this point, my initial fascination with the field had become tinged with concern, as I saw people (including myself) beginning to rely on language model outputs as if they were true without consulting primary sources. I wanted to understand what language models "knew" and whether we could catch issues with their "reasoning."

I considered grad school, but I figured I'd have a better application if I understood how ChatGPT was trained, and how far we'd progressed in reverse engineering neural networks' internal representations of their training data.

I participated in the AI Safety fundamentals course which covered both of these topics, focusing particularly on the mechanistic interpretability section. I worked through parts of the ARENA curriculum, found an opportunity to collaborate on a research project, and decided to commit to it over the summer, which led to the paper I mentioned in the beginning! I'll link to it below.

abandon on Does the "ancient wisdom" argument have any validity? If a particular teaching or tradition is old, to what extent does this make it more trustworthy?

I can't think of better explanations.

Why do you expect difficulty thinking of explanations to correlate with the only one you can think of being correct? It seems obvious to me that if you have a general issue with thinking of explanations, the ones you do think of will also be worse than average.