LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Alignment Implications of LLM Successes: a Debate in One Act
Zack_M_Davis · 2023-10-21T15:22:23.053Z · comments (50)

Book Review: Going Infinite
Zvi · 2023-10-24T15:00:02.251Z · comments (110)

AI companies aren't really using external evaluators
Zach Stein-Perlman · 2024-05-24T16:01:21.184Z · comments (15)

Refusal in LLMs is mediated by a single direction
Andy Arditi (andy-arditi) · 2024-04-27T11:13:06.235Z · comments (87)

Believing In
AnnaSalamon · 2024-02-08T07:06:13.072Z · comments (50)

[link] Introducing AI Lab Watch
Zach Stein-Perlman · 2024-04-30T17:00:12.652Z · comments (30)

What are the results of more parental supervision and less outdoor play?
juliawise · 2023-11-25T12:52:29.986Z · comments (31)

Announcing MIRI’s new CEO and leadership team
Gretta Duleba (gretta-duleba) · 2023-10-10T19:22:11.821Z · comments (52)

Thoughts on responsible scaling policies and regulation
paulfchristiano · 2023-10-24T22:21:18.341Z · comments (33)

Modern Transformers are AGI, and Human-Level
abramdemski · 2024-03-26T17:46:19.373Z · comments (90)

MIRI 2024 Mission and Strategy Update
Malo (malo) · 2024-01-05T00:20:54.169Z · comments (44)

SAE feature geometry is outside the superposition hypothesis
jake_mendel · 2024-06-24T16:07:14.604Z · comments (17)

The Best Lay Argument is not a Simple English Yud Essay
J Bostock (Jemist) · 2024-09-10T17:34:28.422Z · comments (9)

AGI Safety and Alignment at Google DeepMind: A Summary of Recent Work
Rohin Shah (rohinmshah) · 2024-08-20T16:22:45.888Z · comments (33)

CFAR Takeaways: Andrew Critch
Raemon · 2024-02-14T01:37:03.931Z · comments (62)

Brute Force Manufactured Consensus is Hiding the Crime of the Century
Roko · 2024-02-03T20:36:59.806Z · comments (156)

LLM Generality is a Timeline Crux
eggsyntax · 2024-06-24T12:52:07.704Z · comments (103)

ChatGPT can learn indirect control
Raymond D · 2024-03-21T21:11:06.649Z · comments (27)

The ‘strong’ feature hypothesis could be wrong
lewis smith (lsgos) · 2024-08-02T14:33:58.898Z · comments (17)

AI Control: Improving Safety Despite Intentional Subversion
Buck · 2023-12-13T15:51:35.982Z · comments (7)

Superbabies: Putting The Pieces Together
sarahconstantin · 2024-07-11T20:40:05.036Z · comments (37)

[link] "How could I have thought that faster?"
mesaoptimizer · 2024-03-11T10:56:17.884Z · comments (32)

The Great Data Integration Schlep
sarahconstantin · 2024-09-13T15:40:02.298Z · comments (12)

Ability to solve long-horizon tasks correlates with wanting things in the behaviorist sense
So8res · 2023-11-24T17:37:43.020Z · comments (83)

OpenAI: Fallout
Zvi · 2024-05-28T13:20:04.325Z · comments (25)

[link] The Lighthaven Campus is open for bookings
habryka (habryka4) · 2023-09-30T01:08:12.664Z · comments (18)

Mechanistically Eliciting Latent Behaviors in Language Models
Andrew Mack (andrew-mack) · 2024-04-30T18:51:13.493Z · comments (37)

Towards more cooperative AI safety strategies
Richard_Ngo (ricraz) · 2024-07-16T04:36:29.191Z · comments (130)

We're Not Ready: thoughts on "pausing" and responsible scaling policies
HoldenKarnofsky · 2023-10-27T15:19:33.757Z · comments (33)

[link] Jaan Tallinn's 2023 Philanthropy Overview
jaan · 2024-05-20T12:11:39.416Z · comments (5)

Maybe Anthropic's Long-Term Benefit Trust is powerless
Zach Stein-Perlman · 2024-05-27T13:00:47.991Z · comments (21)

[link] Sam Altman’s Chip Ambitions Undercut OpenAI’s Safety Strategy
garrison · 2024-02-10T19:52:55.191Z · comments (52)

Pay Risk Evaluators in Cash, Not Equity
Adam Scholl (adam_scholl) · 2024-09-07T02:37:59.659Z · comments (19)

This might be the last AI Safety Camp
Remmelt (remmelt-ellen) · 2024-01-24T09:33:29.438Z · comments (34)

Funny Anecdote of Eliezer From His Sister
Noah Birnbaum (daniel-birnbaum) · 2024-04-22T22:05:31.886Z · comments (6)

Toward A Mathematical Framework for Computation in Superposition
Dmitry Vaintrob (dmitry-vaintrob) · 2024-01-18T21:06:57.040Z · comments (17)

Propaganda or Science: A Look at Open Source AI and Bioterrorism Risk
1a3orn · 2023-11-02T18:20:29.569Z · comments (79)

[link] Sam Altman fired from OpenAI
LawrenceC (LawChan) · 2023-11-17T20:42:30.759Z · comments (75)

[question] Examples of Highly Counterfactual Discoveries?
johnswentworth · 2024-04-23T22:19:19.399Z · answers+comments (99)

What's Going on With OpenAI's Messaging?
ozziegooen · 2024-05-21T02:22:04.171Z · comments (13)

Thoughts on “AI is easy to control” by Pope & Belrose
Steven Byrnes (steve2152) · 2023-12-01T17:30:52.720Z · comments (55)

Two easy things that maybe Just Work to improve AI discourse
jacobjacob · 2024-06-08T15:51:18.078Z · comments (35)

The impossible problem of due process
mingyuan · 2024-01-16T05:18:33.415Z · comments (64)

My Interview With Cade Metz on His Reporting About Slate Star Codex
Zack_M_Davis · 2024-03-26T17:18:05.114Z · comments (187)

My AI Model Delta Compared To Christiano
johnswentworth · 2024-06-12T18:19:44.768Z · comments (70)

Response to Aschenbrenner's "Situational Awareness"
Rob Bensinger (RobbBB) · 2024-06-06T22:57:11.737Z · comments (27)

Announcing Timaeus
Jesse Hoogland (jhoogland) · 2023-10-22T11:59:03.938Z · comments (15)

How I Learned To Stop Trusting Prediction Markets and Love the Arbitrage
orthonormal · 2024-08-06T02:32:41.364Z · comments (25)

Labs should be explicit about why they are building AGI
peterbarnett · 2023-10-17T21:09:20.711Z · comments (16)

On Not Pulling The Ladder Up Behind You
Screwtape · 2024-04-26T21:58:29.455Z · comments (21)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

jessica-liu-taylor on The Obliqueness Thesis

"as important as ever": no, because our potential influence is lower, and the influence isn't on things shaped like our values, there has to be a translation, and the translation is different from the original.

CEV: while it addresses "extrapolation" it seems broadly based on assuming the extrapolation is ontologically easy, and "our CEV" is an unproblematic object we can talk about (even though we need logical uncertainty over it, and logical induction has additional free parameters in the limit). I'm really trying to respond to orthogonality not CEV though.

from a practical perspective: notice that I am not behaving like Eliezer Yudkowsky. I am not saying the Orthogonality Thesis is true and important to ASI, I am instead saying intelligence/values are Oblique and probably nearly Diagonal (though it's unclear what I mean by "nearly"). I am not saying a project of aligning superintelligence with human values is a priority. I am not taking research approaches that assume a Diagonal/Orthogonal factorization. I left MIRI because I didn't like their security policies, I thought discussion of abstract research ideas was more important. I am not calling for a global AI shutdown so this project (which is in my view confused) can be completed. I am actually against AI regulation on the margin (I don't have a full argument for this, it's a political matter at this point).

I think practicality looks more like having near-term preferences related to modest intelligence increases (as with current humans vs humans with neural nets; how do neural nets benefit or harm you, practically?), and not expecting your preferences to extend into the distant future with many ontology changes, so don't worry about grabbing hold of the whole future etc, think about how to reduce value drift while accepting intelligence increases on the margin. This is a bit like CEV except CEV is in a thought experiment instead of reality.

The "Models of ASI should start with realism" bit IS about practicalities, namely, I think focusing on first forecasting absent a strategy of what to do about the future is practical with respect to any possible influence on the far future; practically, I think your attempted jump to practicality (which might be related to philosophical pragmatism) is impractical in this context.

It occurs to me that maybe you mean something like "Our current (non-extrapolated) values are our real values, and maybe it's impossible to build or become a superintelligence that shares our real values so we'll have to choose between alignment and superintelligence." Is this close to your position?

Close. Alignment of already-existing human values with superintelligence is impossible (I think) because of the arguments given. That doesn't mean humans have no preferences indirectly relating to superintelligence (especially, we have preferences about modest intelligence increases, and there's some iterative process).

bogdan-ionut-cirstea on Bogdan Ionut Cirstea's Shortform

ChatGPT4 generates social psychology hypotheses that are rated as original as those proposed by human experts https://x.com/BogdanIonutCir2/status/1836720153444139154

daniel-c on Lucius Bushnaq's Shortform

Noted, that does seem a lot more tractable than using natural latents to pin down details of CEV by itself

raemon on Skills from a year of Purposeful Rationality Practice

Yeah I had vaguely remembered this story but not the details.

kshitij-sachan on The Geometric Expectation

X

extreme nit, you probably meant for this be lowercase. I love this series!

oumuamua on Slave Morality: A place for every man and every man in his place

This is amazing to me, frankly. Thank you for your comment. I mean, good bless you, but I can’t view slave morality as I’ve described it as anything other than pathetic. I also value kindness, but, for instance, I admire people like Elon Musk infinitely more than the kind, soft person who’s life will ultimately be close to meaningless, and I’d still admire him if he was much crueller and sadistic than he actually is.

We are at the brink of changing the course of the universe forever, even thinking about the downtrodden too much feels outright immoral to me at this point.

seth-herd on The alignment stability problem

As per our discussions on our other posts, I don't think we can say that value learning in itself solves the problem. The issue of whether the ASI's interpretation of its central goal or instructions changing is not automatically solved by adopting that approach. The value mutability problem you link to is a separate issue. I'm not addressing here whether human values might change, but whether an AGI's interpretations of its central goal/values might change.

lblack on Lucius Bushnaq's Shortform

My claim is that the natural latents the AI needs to share for this setup are not about the details of what a 'CEV' is. They are about what researchers mean when they talk about initializing, e.g., a physics simulation with the state of the Earth at a specific moment in time.

m-y-zuo on How does someone prove that their general intelligence is above average?

I am not asking about ‘true’ general intelligence? Or whatever that implies.

If your not sure, I am asking regarding the term commonly called ‘general intelligence’, or sometimes also known as ‘general mental ability factor’ or ‘g-factor’, in mainstream academic papers. Such as those found in pedagogy, memetics, genetics, etc…

See: https://scholar.google.com/scholar?hl=en&as_sdt=0%252C5&q=“general+intelligence”&btnG=

Where many many thousands of researchers over the last few decades are referring to this.

Here is a direct quote by a pretty well known expert among intelligence researchers, writing in 2004:

“ During the past few decades, the word intelligence has been attached to an increasing number of different forms of competence and accomplishment-emo-tional intelligence, football intelligence, and so on. Researchers in the field, however, have largely abandoned the term, together with their old debates over what sorts of abilities should and should not be classified as part of intelligence. Helped by the advent of new technologies for researching the brain, they have increasingly turned their attention to a century-old concept of a single overarching mental power. They call it simply g, which is short for the general mental ability factor. The g factor is a universal and reliably measured distinction among humans in their ability to learn, reason, and solve problems. It corresponds to what most people mean when they describe some individuals as smarter than others, and it's well measured by IQ (intelligence quotient) tests, which assess high-level mental skills such as the ability to draw inferences, see similarities and differences, and process complex information of virtually any kind. Understanding g's biological basis in the brain is the new frontier in intelligence research today. The g factor was discovered by the first mental testers, who found that people who scored well on one type of mental test tended to score well on all of them. Regardless of their contents (words, numbers, pictures, shapes), how they are administered (individually or in groups; orally, in writing, or pantomimed), or what they're intended to measure (vocabulary, mathematical reasoning, spatial ability), all mental tests measure mostly the same thing. This common factor, g, can be distilled from scores on any broad set of cognitive tests, and it takes the same form among individuals of every age, race, sex, and nation yet studied. In other words, the g factor exists independently of schooling, paper-and-pencil tests, and culture.”

seth-herd on The alignment stability problem

I think my terminology isn't totally clear. By "goals" in that statement, I mean what we mean by "'values" in humans. The two are used in overlapping and mostly interchangable ways in my writing

Humans aren't sufficiently intelligent to be all that internally consistent
In many cases of humans changing goals, I'd say they're actually changing subgoals, while their central goal (be happy/satisfied/joyous) remains the same. This may be described as changing goals while keeping the same values.
Note 'in the short term' (I think you're quoting Bostrom? The context isn't quite clear). In the long term, with increasing intelligence and self-awareness, I'd expect some of people's goals to change as they become more self-aware and work toward more internal coherence (e.g., many people change their goal of eating delicious food when they realize it conflicts with their more important goal of being happy and living a a long life).

Yes, humans may change exactly that way. A friend said he'd gotten divorced after getting a CPAP to solve his sleep apnea: "When we got married, we were both sad and angry people. Now I'm not." But that's because we're pretty random and biology determined.