LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

In favour of exploring nagging doubts about x-risk
owencb · 2024-06-25T23:52:01.322Z · comments (2)

New LessWrong review winner UI ("The LeastWrong" section and full-art post pages)
kave · 2024-02-28T02:42:05.801Z · comments (64)

[link] A Chess-GPT Linear Emergent World Representation
Adam Karvonen (karvonenadam) · 2024-02-08T04:25:15.222Z · comments (14)

On the future of language models
owencb · 2023-12-20T16:58:28.433Z · comments (17)

The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks
Lucius Bushnaq (Lblack) · 2024-05-20T17:53:25.985Z · comments (4)

Scaling and evaluating sparse autoencoders
leogao · 2024-06-06T22:50:39.440Z · comments (6)

Me, Myself, and AI: the Situational Awareness Dataset (SAD) for LLMs
L Rudolf L (LRudL) · 2024-07-08T22:24:38.441Z · comments (28)

[question] What convincing warning shot could help prevent extinction from AI?
Charbel-Raphaël (charbel-raphael-segerie) · 2024-04-13T18:09:29.096Z · answers+comments (18)

SAE reconstruction errors are (empirically) pathological
wesg (wes-gurnee) · 2024-03-29T16:37:29.608Z · comments (16)

[link] A case for AI alignment being difficult
jessicata (jessica.liu.taylor) · 2023-12-31T19:55:26.130Z · comments (56)

Nonlinear’s Evidence: Debunking False and Misleading Claims
KatWoods (ea247) · 2023-12-12T13:16:12.008Z · comments (171)

Catching AIs red-handed
ryan_greenblatt · 2024-01-05T17:43:10.948Z · comments (22)

Backdoors as an analogy for deceptive alignment
Jacob_Hilton · 2024-09-06T15:30:06.172Z · comments (2)

[link] The Witness
Richard_Ngo (ricraz) · 2023-12-03T22:27:16.248Z · comments (5)

[link] Transformer Circuit Faithfulness Metrics Are Not Robust
Joseph Miller (Josephm) · 2024-07-12T03:47:30.077Z · comments (5)

[link] Poker is a bad game for teaching epistemics. Figgie is a better one.
rossry · 2024-07-08T06:05:20.459Z · comments (47)

Dreams of AI alignment: The danger of suggestive names
TurnTrout · 2024-02-10T01:22:51.715Z · comments (59)

Key takeaways from our EA and alignment research surveys
Cameron Berg (cameron-berg) · 2024-05-03T18:10:41.416Z · comments (10)

I turned decision theory problems into memes about trolleys
Tapatakt · 2024-10-30T20:13:29.589Z · comments (20)

Sorry for the downtime, looks like we got DDosd
habryka (habryka4) · 2024-12-02T04:14:30.209Z · comments (13)

[link] Carl Sagan, nuking the moon, and not nuking the moon
eukaryote · 2024-04-13T04:08:50.166Z · comments (8)

LLM Applications I Want To See
sarahconstantin · 2024-08-19T21:10:03.101Z · comments (5)

The Big Nonprofits Post
Zvi · 2024-11-29T16:10:06.938Z · comments (10)

LLMs can learn about themselves by introspection
Felix J Binder (fjb) · 2024-10-18T16:12:51.231Z · comments (38)

Lsusr's Rationality Dojo
lsusr · 2024-02-13T05:52:03.757Z · comments (17)

Open Source Sparse Autoencoders for all Residual Stream Layers of GPT2-Small
Joseph Bloom (Jbloom) · 2024-02-02T06:54:53.392Z · comments (37)

Response to nostalgebraist: proudly waving my moral-antirealist battle flag
Steven Byrnes (steve2152) · 2024-05-29T16:48:29.408Z · comments (29)

What happens if you present 500 people with an argument that AI is risky?
KatjaGrace · 2024-09-04T16:40:03.562Z · comments (7)

Refactoring cryonics as structural brain preservation
Andy_McKenzie · 2024-09-11T18:36:30.285Z · comments (14)

[link] Notes from a Prompt Factory
Richard_Ngo (ricraz) · 2024-03-10T05:13:39.384Z · comments (19)

On Dwarksh’s Podcast with Leopold Aschenbrenner
Zvi · 2024-06-10T12:40:03.348Z · comments (7)

Live Theory Part 0: Taking Intelligence Seriously
Sahil · 2024-06-26T21:37:10.479Z · comments (3)

A simple model of math skill
Alex_Altair · 2024-07-21T18:57:33.697Z · comments (16)

General Thoughts on Secular Solstice
Jeffrey Heninger (jeffrey-heninger) · 2024-03-23T18:48:43.940Z · comments (60)

[link] LessOnline (May 31—June 2, Berkeley, CA)
Ben Pace (Benito) · 2024-03-26T02:34:00.000Z · comments (24)

[link] Advice for journalists
Nathan Young · 2024-10-07T16:46:40.929Z · comments (53)

[link] Advice for Activists from the History of Environmentalism
Jeffrey Heninger (jeffrey-heninger) · 2024-05-16T18:40:02.064Z · comments (8)

Behavioral red-teaming is unlikely to produce clear, strong evidence that models aren't scheming
Buck · 2024-10-10T13:36:53.810Z · comments (4)

Why comparative advantage does not help horses
Sherrinford · 2024-09-30T22:27:57.450Z · comments (10)

[link] The Minority Coalition
Richard_Ngo (ricraz) · 2024-06-24T20:01:27.436Z · comments (7)

Hierarchical Agency: A Missing Piece in AI Alignment
Jan_Kulveit · 2024-11-27T05:49:04.241Z · comments (18)

[link] My cover story in Jacobin on AI capitalism and the x-risk debates
garrison · 2024-02-12T23:34:16.526Z · comments (5)

[Valence series] 1. Introduction
Steven Byrnes (steve2152) · 2023-12-04T15:40:21.274Z · comments (14)

[link] CIV: a story
Richard_Ngo (ricraz) · 2024-06-15T22:36:50.415Z · comments (6)

[link] "Deep Learning" Is Function Approximation
Zack_M_Davis · 2024-03-21T17:50:36.254Z · comments (28)

Announcing the London Initiative for Safe AI (LISA)
James Fox · 2024-02-02T23:17:47.011Z · comments (0)

OpenAI #8: The Right to Warn
Zvi · 2024-06-17T12:00:02.639Z · comments (8)

Access to powerful AI might make computer security radically easier
Buck · 2024-06-08T06:00:19.310Z · comments (14)

Dialogue introduction to Singular Learning Theory
Olli Järviniemi (jarviniemi) · 2024-07-08T16:58:10.108Z · comments (14)

Comments on Anthropic's Scaling Monosemanticity
Robert_AIZI · 2024-06-03T12:15:44.708Z · comments (8)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

viliam on Orca communication project - seeking feedback (and collaborators)

I'm interested in learning about how different languages are structured, especially Esparanto/Ido and Lojban

I don't think it will help you to communicate with orcas, but okay.

Esperanto/Ido are more regular that natural languages, simply because languages gradually collect things that are not strictly necessary, such as synonyms, different declinations for different classes of words, or taking one word from one language and then another related word from a different language. For example, in English, compare the etymologies of "see" and "visible". But the concepts are related, so wouldn't it be easier to just say "see-able" instead? If you remove these kinds of irregularities (each of them sounds like not a big deal, but those "not big deals" accumulate)... you end up with a language that is 10x easier to learn and remember, while being able to express the same concepts. But it's essentially still the same thing, only simpler.

I am less familiar with Lojban, but I think the original idea was to make it more precise, kinda like a computer language. But the actual design decisions seem to me more like "Hollywood rationality" or "cargo cult"; making yourself superficially sound more like a computer does not necessarily give you the computer-like clarity or efficiency. For example, all nouns have to be exactly 5 letters long. Uhm, interesting, but what improvement exactly do you think is achieved by that? Or, the original version required you to specify all parameters for words, for example you couldn't say "go" without specifying where from, where to, by what means, through what, and when (or something like that, maybe I got some of the parameters wrong), in given order. Uhm, interesting, but what if you do not want to specify some of those things; like, if you translate from English to Lojban, and the original text did not contain that information? So you say things like "I am going from unspecified to school through unspecified by unspecified at unspecified". I guess it is nice to be reminded what exactly is unspecified, but if you talk like this all the time, it becomes pretty annoying. So the language was updated to contain something like prepositions, but reinvented badly - instead of specifying the relation, they specify the numeric order of the parameter - so the sentence now sounds like "I am going #2 school", because the #2 parameter for "to go" is "where do you go to" (but for a different verb, the destination could be the #1 or #3 parameter, so you need to remember the exact order of the parameters for each verb separately). Ironically, if we follow the analogy with the programming languages, what we would need here is the named parameters from Python. (But that is basically reinventing prepositions.) And so on; it seems to me that the language design contains many ideas that sound impressive, but the actual use... uhm, I am not sure whether someone actually uses the language, so you would have to ask those.

But most likely, this will all be irrelevant for orcas. Their languages may be regular or irregular, with fixed or random word order, or maybe with some categories that do not exist in human languages.

lucie-philippon on (The) Lightcone is nothing without its people: LW + Lighthaven's big fundraiser

My bad, I mistook Mieux Donner for an older organisation that was trying to setup this.

I checked online, and it does not seem that it's possible to get the deduction for non-profits outside the EU even through a proxy, except if their action is related to France or is humanitarian.

Source: https://www.centre-francais-fondations.org/dons-transnationaux/

yair-halberstadt on Drexler's Nanotech Software

I would have concerns about suitably generic, flexible and sensitive humanoid robots, yes.

towards_keeperhood on Orca communication project - seeking feedback (and collaborators)

Currently we basically don't have any datasets where it's labelled what orca says what. When I listen to recordings, I cannot distinguish voices, though idk it's possible that people who listened a lot more can. I think just unsupervised voice clustering would probably not work very accurately. I'd guess it's probably possible to get data on who said what by using an array of hydrophones to infer the location of the sound, but we need very accurate position inference because different orcas are often just 1-10m distance from each other, and for this we might need to get/infer decent estimates of how water temperature varies by depth, and generally there have not yet been attempts to get high precision through this method. (It's definitely harder in water than in air.)

Yeah basically I initially also had rough thoughts into this direction, but I think the create-and-teach language way is probably a lot faster.

I think the earth species project is trying to use AI to decode animal communication, though they don't focus on orcas in particular, but many species including e.g. beluga whales. Didn't look into it a lot but seems possible I could do sth like this in a smarter and more promising way, but probably still would take long.

romeostevensit on Which Biases are most important to Overcome?

Is-ought confabulation Means-ends confabulation Scope sensitivity Fundamental attribution error Attribute substitution Ambiguity aversion Reasoning from consequences

viliam on Linkpost: Rat Traps by Sheon Han in Asterisk Mag

Yeah, I am not even sure what was the point of the article. What is the thing we are supposed to update about? Writing in a different style, or changing our opinions (about what exactly?), or finding completely new topics to talk about so that we are not boring the article author, or...?

daniele-de-nuntiis on A Meritocracy of Taste

From my experience they're getting pretty good, depends on the social but IG reels or YT can keep me entertained with nothing-content for hours

simon on Do simulacra dream of digital sheep?

The argument presented by Aaronson is that, since it would take as much computation to convert the rock/waterfall computation into a usable computation as it would be to just do the usable computation directly, the rock/waterfall isn't really doing the computation.

I find this argument rather convincing, as we are talking about a possible internal property here, and not about the external relation with the rest of the world (which we already agree is rather useless).

self on [deleted]

(I've since found https://www.lesswrong.com/rationality, [? · GW] which does the job.)

davidmanheim on Do simulacra dream of digital sheep?

As with OP, I strongly recommend Aaronson, who explains why waterfalls aren't doing computation in ways that refute the rock example you discuss: https://www.scottaaronson.com/papers/philos.pdf