LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Near-mode thinking on AI
Olli Järviniemi (jarviniemi) · 2024-08-04T20:47:28.085Z · comments (8)

[link] Investigating the Chart of the Century: Why is food so expensive?
Maxwell Tabarrok (maxwell-tabarrok) · 2024-08-16T13:21:23.596Z · comments (25)

Parasites (not a metaphor)
lukehmiles (lcmgcd) · 2024-08-08T20:07:13.593Z · comments (17)

A List of 45+ Mech Interp Project Ideas from Apollo Research’s Interpretability Team
Lee Sharkey (Lee_Sharkey) · 2024-07-18T14:15:50.248Z · comments (18)

Efficient Dictionary Learning with Switch Sparse Autoencoders
Anish Mudide (anish-mudide) · 2024-07-22T18:45:53.502Z · comments (19)

Ten arguments that AI is an existential risk
KatjaGrace · 2024-08-13T17:00:03.397Z · comments (41)

Introduction to French AI Policy
Lucie Philippon (lucie-philippon) · 2024-07-04T03:39:45.273Z · comments (12)

[link] A primer on the current state of longevity research
Abhishaike Mahajan (abhishaike-mahajan) · 2024-08-22T17:14:57.990Z · comments (6)

Why I funded PIBBSS
Ryan Kidd (ryankidd44) · 2024-09-15T19:56:33.018Z · comments (14)

You should go to ML conferences
Jan_Kulveit · 2024-07-24T11:47:52.214Z · comments (13)

[link] My Number 1 Epistemology Book Recommendation: Inventing Temperature
adamShimi · 2024-09-08T14:30:40.456Z · comments (17)

Skills from a year of Purposeful Rationality Practice
Raemon · 2024-09-18T02:05:58.726Z · comments (7)

What happens if you present 500 people with an argument that AI is risky?
KatjaGrace · 2024-09-04T16:40:03.562Z · comments (7)

OthelloGPT learned a bag of heuristics
jylin04 · 2024-07-02T09:12:56.377Z · comments (10)

Please stop using mediocre AI art in your posts
Raemon · 2024-08-25T00:13:52.890Z · comments (24)

[link] Please support this blog (with money)
Elizabeth (pktechgirl) · 2024-08-17T15:30:05.641Z · comments (2)

[link] Most smart and skilled people are outside of the EA/rationalist community: an analysis
titotal (lombertini) · 2024-07-12T12:13:56.215Z · comments (36)

Danger, AI Scientist, Danger
Zvi · 2024-08-15T22:40:06.715Z · comments (9)

In favour of exploring nagging doubts about x-risk
owencb · 2024-06-25T23:52:01.322Z · comments (2)

Backdoors as an analogy for deceptive alignment
Jacob_Hilton · 2024-09-06T15:30:06.172Z · comments (1)

[link] Poker is a bad game for teaching epistemics. Figgie is a better one.
rossry · 2024-07-08T06:05:20.459Z · comments (47)

Me, Myself, and AI: the Situational Awareness Dataset (SAD) for LLMs
L Rudolf L (LRudL) · 2024-07-08T22:24:38.441Z · comments (28)

A simple model of math skill
Alex_Altair · 2024-07-21T18:57:33.697Z · comments (16)

LLM Applications I Want To See
sarahconstantin · 2024-08-19T21:10:03.101Z · comments (4)

[link] Transformer Circuit Faithfulness Metrics Are Not Robust
Joseph Miller (Josephm) · 2024-07-12T03:47:30.077Z · comments (5)

[link] The Minority Faction
Richard_Ngo (ricraz) · 2024-06-24T20:01:27.436Z · comments (5)

[link] Perplexity wins my AI race
Elizabeth (pktechgirl) · 2024-08-24T19:20:10.859Z · comments (12)

Dialogue introduction to Singular Learning Theory
Olli Järviniemi (jarviniemi) · 2024-07-08T16:58:10.108Z · comments (11)

[link] Compact Proofs of Model Performance via Mechanistic Interpretability
LawrenceC (LawChan) · 2024-06-24T19:27:21.214Z · comments (3)

On Claude 3.5 Sonnet
Zvi · 2024-06-24T12:00:05.719Z · comments (14)

[link] Ilya Sutskever created a new AGI startup
harfe · 2024-06-19T17:17:17.366Z · comments (35)

A Solomonoff Inductor Walks Into a Bar: Schelling Points for Communication
johnswentworth · 2024-07-26T00:33:42.000Z · comments (1)

[link] Against Aschenbrenner: How 'Situational Awareness' constructs a narrative that undermines safety and threatens humanity
GideonF · 2024-07-15T18:37:40.232Z · comments (17)

Refactoring cryonics as structural brain preservation
Andy_McKenzie · 2024-09-11T18:36:30.285Z · comments (14)

[question] Am I confused about the "malign universal prior" argument?
nostalgebraist · 2024-08-27T23:17:22.779Z · answers+comments (33)

[link] the Giga Press was a mistake
bhauth · 2024-08-21T04:51:24.150Z · comments (26)

SB 1047: Final Takes and Also AB 3211
Zvi · 2024-08-27T22:10:07.647Z · comments (11)

It's time for a self-reproducing machine
Carl Feynman (carl-feynman) · 2024-08-07T21:52:22.819Z · comments (71)

New page: Integrity
Zach Stein-Perlman · 2024-07-10T15:00:41.050Z · comments (3)

Circular Reasoning
abramdemski · 2024-08-05T18:10:32.736Z · comments (36)

AI #73: Openly Evil AI
Zvi · 2024-07-18T14:40:05.770Z · comments (20)

Covert Malicious Finetuning
Tony Wang (tw) · 2024-07-02T02:41:51.698Z · comments (4)

[link] I found >800 orthogonal "write code" steering vectors
Jacob G-W (g-w1) · 2024-07-15T19:06:17.636Z · comments (19)

Singular learning theory: exercises
Zach Furman (zfurman) · 2024-08-30T20:00:03.785Z · comments (3)

[link] Re: Anthropic's suggested SB-1047 amendments
RobertM (T3t) · 2024-07-27T22:32:39.447Z · comments (13)

I'm a bit skeptical of AlphaFold 3
Oleg Trott (oleg-trott) · 2024-06-25T00:04:41.274Z · comments (14)

Defining alignment research
Richard_Ngo (ricraz) · 2024-08-19T20:42:29.279Z · comments (21)

Dragon Agnosticism
jefftk (jkaufman) · 2024-08-01T17:00:06.434Z · comments (61)

[link] Detecting Genetically Engineered Viruses With Metagenomic Sequencing
jefftk (jkaufman) · 2024-06-27T14:01:34.868Z · comments (10)

[link] Executable philosophy as a failed totalizing meta-worldview
jessicata (jessica.liu.taylor) · 2024-09-04T22:50:18.294Z · comments (40)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

jessica-liu-taylor on The Obliqueness Thesis

"as important as ever": no, because our potential influence is lower, and the influence isn't on things shaped like our values, there has to be a translation, and the translation is different from the original.

CEV: while it addresses "extrapolation" it seems broadly based on assuming the extrapolation is ontologically easy, and "our CEV" is an unproblematic object we can talk about (even though we need logical uncertainty over it, and logical induction has additional free parameters in the limit). I'm really trying to respond to orthogonality not CEV though.

from a practical perspective: notice that I am not behaving like Eliezer Yudkowsky. I am not saying the Orthogonality Thesis is true and important to ASI. I am not saying a project of aligning superintelligence with human values is a priority. I am not taking research approaches that assume a Diagonal/Orthogonal factorization. I left MIRI because I didn't like their security policies, I thought discussion of abstract research ideas was more important. I am not calling for a global AI shutdown so this project (which is in my view confused) can be completed. I am actually against AI regulation on the margin (I don't have a full argument for this, it's a political matter at this point).

I think practicality looks more like having near-term preferences related to modest intelligence increases (as with current humans vs humans with neural nets; how do neural nets benefit or harm you, practically?), and not expecting your preferences to extend into the distant future with many ontology changes, so don't worry about grabbing hold of the whole future etc, think about how to reduce value drift while accepting intelligence increases on the margin. This is a bit like CEV except CEV is in a thought experiment instead of reality.

The "Models of ASI should start with realism" bit IS about practicalities, namely, I think focusing on first forecasting absent a strategy of what to do about the future is practical with respect to any possible influence on the far future; practically, I think your attempted jump to practicality (which might be related to philosophical pragmatism) is impractical in this context.

It occurs to me that maybe you mean something like "Our current (non-extrapolated) values are our real values, and maybe it's impossible to build or become a superintelligence that shares our real values so we'll have to choose between alignment and superintelligence." Is this close to your position?

Close. Alignment of already-existing human values with superintelligence is impossible (I think) because of the arguments given. That doesn't mean humans have no preferences indirectly relating to superintelligence (especially, we have preferences about modest intelligence increases, and there's some iterative process).

bogdan-ionut-cirstea on Bogdan Ionut Cirstea's Shortform

ChatGPT4 generates social psychology hypotheses that are rated as original as those proposed by human experts https://x.com/BogdanIonutCir2/status/1836720153444139154

daniel-c on Lucius Bushnaq's Shortform

Noted, that does seem a lot more tractable than using natural latents to pin down details of CEV by itself

raemon on Skills from a year of Purposeful Rationality Practice

Yeah I had vaguely remembered this story but not the details.

kshitij-sachan on The Geometric Expectation

X

extreme nit, you probably meant for this be lowercase. I love this series!

oumuamua on Slave Morality: A place for every man and every man in his place

This is amazing to me, frankly. Thank you for your comment. I mean, good bless you, but I can’t view slave morality as I’ve described it as anything other than pathetic. I also value kindness, but, for instance, I admire people like Elon Musk infinitely more than the kind, soft person who’s life will ultimately be close to meaningless, and I’d still admire him if he was much crueller and sadistic than he actually is.

We are at the brink of changing the course of the universe forever, even thinking about the downtrodden too much feels outright immoral to me at this point.

seth-herd on The alignment stability problem

As per our discussions on our other posts, I don't think we can say that value learning in itself solves the problem. The issue of whether the ASI's interpretation of its central goal or instructions changing is not automatically solved by adopting that approach. The value mutability problem you link to is a separate issue. I'm not addressing here whether human values might change, but whether an AGI's interpretations of its central goal/values might change.

lblack on Lucius Bushnaq's Shortform

My claim is that the natural latents the AI needs to share for this setup are not about the details of what a 'CEV' is. They are about what researchers mean when they talk about initializing, e.g., a physics simulation with the state of the Earth at a specific moment in time.

m-y-zuo on How does someone prove that their general intelligence is above average?

I am not asking about ‘true’ general intelligence? Or whatever that implies.

If your not sure, I am asking regarding the term commonly called ‘general intelligence’, or sometimes also known as ‘general mental ability factor’ or ‘g-factor’, in mainstream academic papers. Such as those found in pedagogy, memetics, genetics, etc…

See: https://scholar.google.com/scholar?hl=en&as_sdt=0%252C5&q=“general+intelligence”&btnG=

Where many many thousands of researchers over the last few decades are referring to this.

Here is a direct quote by a pretty well known expert among intelligence researchers, writing in 2004:

“ During the past few decades, the word intelligence has been attached to an increasing number of different forms of competence and accomplishment-emo-tional intelligence, football intelligence, and so on. Researchers in the field, however, have largely abandoned the term, together with their old debates over what sorts of abilities should and should not be classified as part of intelligence. Helped by the advent of new technologies for researching the brain, they have increasingly turned their attention to a century-old concept of a single overarching mental power. They call it simply g, which is short for the general mental ability factor. The g factor is a universal and reliably measured distinction among humans in their ability to learn, reason, and solve problems. It corresponds to what most people mean when they describe some individuals as smarter than others, and it's well measured by IQ (intelligence quotient) tests, which assess high-level mental skills such as the ability to draw inferences, see similarities and differences, and process complex information of virtually any kind. Understanding g's biological basis in the brain is the new frontier in intelligence research today. The g factor was discovered by the first mental testers, who found that people who scored well on one type of mental test tended to score well on all of them. Regardless of their contents (words, numbers, pictures, shapes), how they are administered (individually or in groups; orally, in writing, or pantomimed), or what they're intended to measure (vocabulary, mathematical reasoning, spatial ability), all mental tests measure mostly the same thing. This common factor, g, can be distilled from scores on any broad set of cognitive tests, and it takes the same form among individuals of every age, race, sex, and nation yet studied. In other words, the g factor exists independently of schooling, paper-and-pencil tests, and culture.”

seth-herd on The alignment stability problem

I think my terminology isn't totally clear. By "goals" in that statement, I mean what we mean by "'values" in humans. The two are used in overlapping and mostly interchangable ways in my writing

Humans aren't sufficiently intelligent to be all that internally consistent
In many cases of humans changing goals, I'd say they're actually changing subgoals, while their central goal (be happy/satisfied/joyous) remains the same. This may be described as changing goals while keeping the same values.
Note 'in the short term' (I think you're quoting Bostrom? The context isn't quite clear). In the long term, with increasing intelligence and self-awareness, I'd expect some of people's goals to change as they become more self-aware and work toward more internal coherence (e.g., many people change their goal of eating delicious food when they realize it conflicts with their more important goal of being happy and living a a long life).

Yes, humans may change exactly that way. A friend said he'd gotten divorced after getting a CPAP to solve his sleep apnea: "When we got married, we were both sad and angry people. Now I'm not." But that's because we're pretty random and biology determined.