LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

A List of 45+ Mech Interp Project Ideas from Apollo Research’s Interpretability Team
Lee Sharkey (Lee_Sharkey) · 2024-07-18T14:15:50.248Z · comments (18)

[link] My Number 1 Epistemology Book Recommendation: Inventing Temperature
adamShimi · 2024-09-08T14:30:40.456Z · comments (18)

Anthropic's Certificate of Incorporation
Zach Stein-Perlman · 2024-06-12T13:00:30.806Z · comments (4)

Talent Needs of Technical AI Safety Teams
yams (william-brewer) · 2024-05-24T00:36:40.486Z · comments (64)

BIG-Bench Canary Contamination in GPT-4
Jozdien · 2024-10-22T15:40:48.166Z · comments (12)

The LessWrong 2022 Review
habryka (habryka4) · 2023-12-05T04:00:00.000Z · comments (43)

A bird's eye view of ARC's research
Jacob_Hilton · 2024-10-23T15:50:06.123Z · comments (12)

[link] Anthropic release Claude 3, claims >GPT-4 Performance
LawrenceC (LawChan) · 2024-03-04T18:23:54.065Z · comments (41)

Why I funded PIBBSS
Ryan Kidd (ryankidd44) · 2024-09-15T19:56:33.018Z · comments (20)

Current AIs Provide Nearly No Data Relevant to AGI Alignment
Thane Ruthenis · 2023-12-15T20:16:09.723Z · comments (155)

Mapping the semantic void: Strange goings-on in GPT embedding spaces
mwatkins · 2023-12-14T13:10:22.691Z · comments (31)

The Pareto Best and the Curse of Doom
Screwtape · 2024-02-21T23:10:01.359Z · comments (21)

Rationality Research Report: Towards 10x OODA Looping?
Raemon · 2024-02-24T21:06:38.703Z · comments (21)

[link] Gender Exploration
sapphire (deluks917) · 2024-01-14T18:57:32.893Z · comments (25)

[link] introduction to cancer vaccines
bhauth · 2024-05-05T01:06:16.972Z · comments (19)

Four visions of Transformative AI success
Steven Byrnes (steve2152) · 2024-01-17T20:45:46.976Z · comments (22)

How much to update on recent AI governance moves?
habryka (habryka4) · 2023-11-16T23:46:01.601Z · comments (5)

Social status part 1/2: negotiations over object-level preferences
Steven Byrnes (steve2152) · 2024-03-05T16:29:07.143Z · comments (15)

[link] Practically A Book Review: Appendix to "Nonlinear's Evidence: Debunking False and Misleading Claims" (ThingOfThings)
tailcalled · 2024-01-03T17:07:13.990Z · comments (25)

The Pearly Gates
lsusr · 2024-05-30T04:01:14.198Z · comments (6)

The Parable Of The Fallen Pendulum - Part 1
johnswentworth · 2024-03-01T00:25:00.111Z · comments (32)

Simple versus Short: Higher-order degeneracy and error-correction
Daniel Murfet (dmurfet) · 2024-03-11T07:52:46.307Z · comments (6)

The case for more ambitious language model evals
Jozdien · 2024-01-30T00:01:13.876Z · comments (30)

Ten arguments that AI is an existential risk
KatjaGrace · 2024-08-13T17:00:03.397Z · comments (41)

Introduction to French AI Policy
Lucie Philippon (lucie-philippon) · 2024-07-04T03:39:45.273Z · comments (12)

Please stop using mediocre AI art in your posts
Raemon · 2024-08-25T00:13:52.890Z · comments (24)

Being nicer than Clippy
Joe Carlsmith (joekc) · 2024-01-16T19:44:23.893Z · comments (32)

Experiences and learnings from both sides of the AI safety job market
Marius Hobbhahn (marius-hobbhahn) · 2023-11-15T15:40:32.196Z · comments (4)

A Selection of Randomly Selected SAE Features
CallumMcDougall (TheMcDouglas) · 2024-04-01T09:09:49.235Z · comments (2)

What I Would Do If I Were Working On AI Governance
johnswentworth · 2023-12-08T06:43:42.565Z · comments (32)

[link] A primer on the current state of longevity research
Abhishaike Mahajan (abhishaike-mahajan) · 2024-08-22T17:14:57.990Z · comments (6)

' petertodd'’s last stand: The final days of open GPT-3 research
mwatkins · 2024-01-22T18:47:00.710Z · comments (16)

You should go to ML conferences
Jan_Kulveit · 2024-07-24T11:47:52.214Z · comments (13)

Attitudes about Applied Rationality
Camille Berger (Camille Berger) · 2024-02-03T14:42:22.770Z · comments (18)

Clarifying METR's Auditing Role
Beth Barnes (beth-barnes) · 2024-05-30T18:41:56.029Z · comments (1)

Fact Finding: Attempting to Reverse-Engineer Factual Recall on the Neuron Level (Post 1)
Neel Nanda (neel-nanda-1) · 2023-12-23T02:44:24.270Z · comments (8)

The Leopold Model: Analysis and Reactions
Zvi · 2024-06-14T15:10:03.480Z · comments (19)

[link] Please support this blog (with money)
Elizabeth (pktechgirl) · 2024-08-17T15:30:05.641Z · comments (2)

"AI Alignment" is a Dangerously Overloaded Term
Roko · 2023-12-15T14:34:29.850Z · comments (100)

OthelloGPT learned a bag of heuristics
jylin04 · 2024-07-02T09:12:56.377Z · comments (10)

Discriminating Behaviorally Identical Classifiers: a model problem for applying interpretability to scalable oversight
Sam Marks (samuel-marks) · 2024-04-18T16:17:39.136Z · comments (10)

[question] How do you feel about LessWrong these days? [Open feedback thread]
jacobjacob · 2023-12-05T20:54:42.317Z · answers+comments (281)

2023 in AI predictions
jessicata (jessica.liu.taylor) · 2024-01-01T05:23:42.514Z · comments (35)

[link] Most smart and skilled people are outside of the EA/rationalist community: an analysis
titotal (lombertini) · 2024-07-12T12:13:56.215Z · comments (36)

Stuxnet, not Skynet: Humanity's disempowerment by AI
Roko · 2023-11-04T22:23:55.428Z · comments (24)

[link] Perplexity wins my AI race
Elizabeth (pktechgirl) · 2024-08-24T19:20:10.859Z · comments (12)

New LessWrong feature: Dialogue Matching
jacobjacob · 2023-11-16T21:27:16.763Z · comments (22)

Picking Mentors For Research Programmes
Raymond D · 2023-11-10T13:01:14.197Z · comments (8)

Why I'm doing PauseAI
Joseph Miller (Josephm) · 2024-04-30T16:21:54.156Z · comments (16)

Demystifying "Alignment" through a Comic
milanrosko · 2024-06-09T08:24:22.454Z · comments (19)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

megasilverfist on Where to Draw the Boundaries?

So, there is a legitimate complaint here. It's true that sailors in the ancient world had a legitimate reason to want a word in their language whose extension was {salmon, guppies, sharks, dolphins, ...}. (And modern scholars writing a translation for present-day English speakers might even translate that word as fish, because most members of that category are what we would call fish.) It indeed would not necessarily be helping the sailors to tell them that they need to exclude dolphins from the extension of that word, and instead include dolphins in the extension of their word for {monkeys, squirrels, horses ...}. Likewise, most modern biologists have little use for a word that groups dolphins and guppies together.

Ok, but salmon and guppies are more closely related to dolphins than sharks. Like I get where you are going with this, but "fish" is barely a natural category and it isn't obviously more of one than all descendants of the last common ancestor of the actinopterygians. Even if you limit it to marine descendants it still lets you predict bone vs cartilaginous skeletal system.

localdeity on Matt Goldenberg's Short Form Feed

The political version of the question isn't functionally the same as the skin cream version, because the former isn't a randomized intervention—cities that decided to add gun control laws seem likely to have other crime-related events and law changes at the same time, which could produce a spurious result in either direction. So it's quite reasonable to say "My opinion is determined by my priors and the evidence didn't appreciably affect my position."

harfe on If we had known the atmosphere would ignite

This is not a formal definition.

Your English sentence has no apparent connection to mathematical objects, which would be necessary for a rigorous and formal definition.

rotatingpaguro on Occupational Licensing Roundup #1

Related, I have a vague understanding on how product safety certification works in EU, and there are multiple private companies doing the certification in every state.

nc-1 on IAPS: Mapping Technical Safety Research at AI Companies

I would have found it helpful in your report for there to be a ROSES-type diagram or other flowchart showing the steps in your paper collation. This would bring it closer in line with other scoping reviews and would have made it easier to understand your methodology.

oscard on IAPS: Mapping Technical Safety Research at AI Companies

Thanks for that list of papers/posts. For most of the papers you linked, they’re not included because they did not feature in either of our search strategies: (1) titles containing specific keywords that we searched for on arXiv; (2) the paper is linked on the company’s website. I agree this is a limitation of our methodology. We won't add these papers in now as that would be somewhat ad hoc, and inconsistent between the companies.

Re the blog posts from Anthropic and what counts as a paper, I agree this is a tricky demarcation problem. We included the 'Circuit Updates' because it was linked to as a 'paper' on the Anthropic website. Even if GDM has a higher bar for what counts as a 'paper' than Anthropic, I think we don't really want to be adjudicating this, so I feel comfortable just deferring to each company about what counts as a paper for them.

remmelt-ellen on If we had known the atmosphere would ignite

For an overview of why such a guarantee would turn out impossible, suggest taking a look at Will Petillo's post Lenses of Control [AF · GW].

remmelt-ellen on If we had known the atmosphere would ignite

Defining alignment (sufficiently rigorous so that a formal proof of (im)possibility of alignment is conceivable) is a hard thing!

It's less hard than you think, if you use a minimal-threshold definition of alignment:

That "AGI" continuing to exist, in some modified form, does not result eventually in changes to world conditions/contexts that fall outside the ranges that existing humans could survive under.

christiankl on Does the "ancient wisdom" argument have any validity? If a particular teaching or tradition is old, to what extent does this make it more trustworthy?

When it comes to Buddhist practice, it's worth noting that practicing techniques by the book is not how Buddhism was practiced for most of the time in the last 2500 years. It was mostly an oral tradition and as such the knowledge that's passed down from teacher to student evolves over time in various ways.

Many modern Buddhist tradition put much more emphasis on meditation in contrast to ritualized behavior.

In Buddhism (and in Christanity for that matter) for thousands of years meditation was largely done in monasteries and not by lay-people. In many Buddhist communities "lay-people aren't supposed to meditate" is something you could call "ancient wisdom".

In someone convinces you in a Western context that following some practice is ancient wisdom, they are likely doing a lot of picking and choosing in a way that does not make it clear how ancient the thing they are promoting actually happens to be.

gunnar_zarncke on [Intuitive self-models] 2. Conscious Awareness

I think your explanation in section 8.5.2 resolves our disagreement nicely. You refer to S(X) thoughts that "spawn up" successive thoughts that eventually lead to X (I'd say X') actions shortly after (or much later). While I was referring to S(X) that cannot give rise to X immediately. I think the difference was that you are more lenient with what X can be, such that S(X) can be about an X that is happening much later, which wouldn't work in my model of thoughts.

Explicit (self-reflective) desire
Statement: “I want to be inside.”
Intuitive model underlying that statement: There’s a frame (§2.2.3 [LW · GW]) “X wants Y” (§3.3.4 [LW · GW]). This frame is being invoked, with X as the homunculus, and Y as the concept of “inside” as a location / environment.
How I describe what’s happening using my framework: There’s a systematic pattern (in this particular context), call it P, where self-reflective thoughts concerning the inside, like “myself being inside” or “myself going inside”, tend to trigger positive valence. That positive valence is why such thoughts arise in the first place, and it’s also why those thoughts tend to lead to actual going-inside behavior.
In my framework, that’s really the whole story. There’s this pattern P. And we can talk about the upstream causes of P—something involving innate drives and learned heuristics in the brain. And we can likewise talk about the downstream effects of P—P tends to spawn behaviors like going inside, brainstorming how to get inside, etc. But “what’s really going on” (in the “territory” of my brain algorithm) is a story about the pattern P, not about the homunculus. The homunculus only arises secondarily, as the way that I perceive the pattern P (in the “map” of my intuitive self-model).