LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

MATS Summer 2023 Retrospective
utilistrutil · 2023-12-01T23:29:47.958Z · comments (34)

Reactions to the Executive Order
Zvi · 2023-11-01T20:40:02.438Z · comments (4)

Secondary forces of debt
KatjaGrace · 2024-06-27T21:10:06.131Z · comments (18)

Questions for labs
Zach Stein-Perlman · 2024-04-30T22:15:55.362Z · comments (11)

Send us example gnarly bugs
Beth Barnes (beth-barnes) · 2023-12-10T05:23:00.773Z · comments (10)

ACX Covid Origins Post convinced readers
ErnestScribbler · 2024-05-01T13:06:20.818Z · comments (7)

Reward hacking behavior can generalize across tasks
Kei · 2024-05-28T16:33:50.674Z · comments (5)

Lying Alignment Chart
Zack_M_Davis · 2023-11-29T16:15:28.102Z · comments (17)

Mid-conditional love
KatjaGrace · 2024-04-17T04:00:08.341Z · comments (21)

[link] Not every accommodation is a Curb Cut Effect: The Handicapped Parking Effect, the Clapper Effect, and more
Michael Cohn (michael-cohn) · 2024-09-15T05:27:36.691Z · comments (37)

Value fragility and AI takeover
Joe Carlsmith (joekc) · 2024-08-05T21:28:07.306Z · comments (5)

[link] AI takeoff and nuclear war
owencb · 2024-06-11T19:36:24.710Z · comments (6)

On Claude 3.0
Zvi · 2024-03-06T18:50:04.766Z · comments (5)

The case for a negative alignment tax
Cameron Berg (cameron-berg) · 2024-09-18T18:33:18.491Z · comments (19)

Universal Love Integration Test: Hitler
Raemon · 2024-01-10T23:55:35.526Z · comments (65)

[link] Are language models good at making predictions?
dynomight · 2023-11-06T13:10:36.379Z · comments (14)

My guess at Conjecture's vision: triggering a narrative bifurcation
Alexandre Variengien (alexandre-variengien) · 2024-02-06T19:10:42.690Z · comments (12)

[link] Bengio's Alignment Proposal: "Towards a Cautious Scientist AI with Convergent Safety Bounds"
mattmacdermott · 2024-02-29T13:59:34.959Z · comments (19)

On the CrowdStrike Incident
Zvi · 2024-07-22T12:40:05.894Z · comments (14)

AI #30: Dalle-3 and GPT-3.5-Instruct-Turbo
Zvi · 2023-09-21T12:00:06.616Z · comments (8)

AISC9 has ended and there will be an AISC10
Linda Linsefors · 2024-04-29T10:53:18.812Z · comments (4)

[question] What could a policy banning AGI look like?
TsviBT · 2024-03-13T14:19:07.783Z · answers+comments (23)

[link] The problems with the concept of an infohazard as used by the LW community [Linkpost]
Noosphere89 (sharmake-farah) · 2023-12-22T16:13:54.822Z · comments (43)

[link] Claude 3.5 Sonnet
Zach Stein-Perlman · 2024-06-20T18:00:35.443Z · comments (41)

Grief is a fire sale
Nathan Young · 2024-03-04T01:11:06.882Z · comments (1)

Vote on Anthropic Topics to Discuss
Ben Pace (Benito) · 2024-03-06T19:43:47.194Z · comments (55)

[link] The Offense-Defense Balance Rarely Changes
Maxwell Tabarrok (maxwell-tabarrok) · 2023-12-09T15:21:23.340Z · comments (23)

Analogies between scaling labs and misaligned superintelligent AI
scasper · 2024-02-21T19:29:39.033Z · comments (5)

Luck based medicine: angry eldritch sugar gods edition
Elizabeth (pktechgirl) · 2023-09-19T04:40:06.334Z · comments (14)

[link] MIRI's June 2024 Newsletter
Harlan · 2024-06-14T23:02:23.721Z · comments (18)

Neural uncertainty estimation review article (for alignment)
Charlie Steiner · 2023-12-05T08:01:32.723Z · comments (3)

(Not) Derailing the LessOnline Puzzle Hunt
Error · 2024-06-04T01:28:31.688Z · comments (2)

Idealized Agents Are Approximate Causal Mirrors (+ Radical Optimism on Agent Foundations)
Thane Ruthenis · 2023-12-22T20:19:13.865Z · comments (14)

A Simple Toy Coherence Theorem
johnswentworth · 2024-08-02T17:47:50.642Z · comments (15)

[link] The Cognitive-Theoretic Model of the Universe: A Partial Summary and Review
jessicata (jessica.liu.taylor) · 2024-03-27T19:59:27.893Z · comments (33)

On the UK Summit
Zvi · 2023-11-07T13:10:04.895Z · comments (6)

Q&A on Proposed SB 1047
Zvi · 2024-05-02T15:10:02.916Z · comments (8)

Coherence of Caches and Agents
johnswentworth · 2024-04-01T23:04:31.320Z · comments (9)

Anthropic Fall 2023 Debate Progress Update
Ansh Radhakrishnan (anshuman-radhakrishnan-1) · 2023-11-28T05:37:30.070Z · comments (9)

Darwinian Traps and Existential Risks
KristianRonn · 2024-08-25T22:37:14.142Z · comments (14)

SAE-VIS: Announcement Post
CallumMcDougall (TheMcDouglas) · 2024-03-31T15:30:49.079Z · comments (8)

Dialogue on the Claim: "OpenAI's Firing of Sam Altman (And Shortly-Subsequent Events) On Net Reduced Existential Risk From AGI"
johnswentworth · 2023-11-21T17:39:17.828Z · comments (84)

Scalable Oversight and Weak-to-Strong Generalization: Compatible approaches to the same problem
Ansh Radhakrishnan (anshuman-radhakrishnan-1) · 2023-12-16T05:49:23.672Z · comments (3)

[question] How to talk about reasons why AGI might not be near?
Kaj_Sotala · 2023-09-17T08:18:31.100Z · answers+comments (19)

Companies' safety plans neglect risks from scheming AI
Zach Stein-Perlman · 2024-06-03T15:00:20.236Z · comments (4)

On Dwarkesh’s Podcast with OpenAI’s John Schulman
Zvi · 2024-05-21T17:30:04.332Z · comments (4)

[Full Post] Progress Update #1 from the GDM Mech Interp Team
Neel Nanda (neel-nanda-1) · 2024-04-19T19:06:59.185Z · comments (10)

Mistakes people make when thinking about units
Isaac King (KingSupernova) · 2024-06-25T03:39:20.138Z · comments (14)

[link] Nick Bostrom’s new book, “Deep Utopia”, is out today
PeterH · 2024-03-27T11:24:01.401Z · comments (5)

The World in 2029
Nathan Young · 2024-03-02T18:03:29.368Z · comments (37)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

cubefox on Ozyrus's Shortform

May I ask whether there is anything planned on fixing this rendering/loading bug [LW(p) · GW(p)] which occurs with Firefox? It affects unread/uncached posts opened in a background tab.

alexander-gufan on On Measuring Intellectual Performance - personal experience and several thoughts

Of course I do.

ricraz on The case for more Alignment Target Analysis (ATA)

It seems very plausible to me that alignment targets in practice will evolve out of things like the OpenAI Model Spec. If anyone has suggestions for how to improve that, please DM me.

cubefox on What's the Deal with Logical Uncertainty?

In theories of Bayesianism, the axioms of probability theory are conventionally assumed to say that all logical truths have probability one, and that the probability of a disjunction of logically inconsistent statements is the sum of their probabilities. Corresponding to the second and third Kolmogorov axiom.

If one then e.g. regards the Peano axioms as certain, then all theorems of Peano arithmetic must also be certain, because those are just logical consequences. And all statements which can be disproved in Peano arithmetic then must have probability zero. So the above version of the Kolmogorov axioms is assuming we are logically omniscient. So this form of Bayesianism doesn't allow us to assign anything like 0.5 probability to the googolth digit of pi being odd: We must assign 1 if it's odd, or 0 if it's even.

I think the simple solution is to not talk about logical tautologies and contradictions when expressing the Kolmogorov axioms for a theory of subjective Bayesianism. Instead talk about what we actually know a priori, not about tautologies which we merely could know a priori (if we were logically omniscient). Then the second Kolmogorov axiom says that statements we actually know a priori have to be assigned probability 1, and disjunctions of statements actually known a priori to be inconsistent have to be assigned the sum of their probabilities.

Then we are allowed to assign probabilities less than 1 to statements where we don't actually know that they are tautologies, e.g. 0.5 to "the googolth digit of pi is odd" even if this happens to be, unbeknownst to us, a theorem of Peano arithmetic.

rotatingpaguro on On Measuring Intellectual Performance - personal experience and several thoughts

A very interesting problem is measuring something like general intelligence. I’m not going to delve deeply into this topic but simply want to draw attention to an idea that is often implied, though rarely expressed, in the framing of such a problem: the assumption that an "intelligence level," whatever it may be, corresponds to some inherent properties of a person and can be measured through their manifestations. Moreover, we often talk about measurements with a precision of a few percentage points, which suggests that, in theory, the measurement should be based on very stable indicators.
What’s fascinating is that this assumption receives very little scrutiny, while in cases where we talk about "mechanical" parameters of the human body (such as physical performance), we know that such parameters, aside from a person’s potential, heavily depends on numerous external factors and what that person has been doing over the past couple of weeks.

Do you know about the g-factor?

faul_sname on RLHF is the worst possible thing done when facing the alignment problem

Unfortunately, until the world has acted to patch up some terrible security holes in society, we are all in a very fragile state.

Agreed.

I have been working on AI Biorisk Evals with SecureBio for nearly a year now.

I appreciate that. I also really like the NAO project, which also a SecureBio thing. Good work y'all!

As models increase in general capabilities, so too do they incidentally get more competent at assisting with the creation of bioweapons. It is my professional opinion that they are currently providing non-zero uplift over a baseline of 'bad actor with web search, including open-access scientific papers'.

Yeah, if your threat model is "AI can help people do more things, including bad things" that is a valid threat model and seems correct to me. That said, my world model has a giant gaping hole where one would expect an explanation for why "people can do lots of things" hasn't already led to a catastrophe (it's not like the bio risk threat model needs AGI assistance, a couple undergrad courses and some lab experience reproducing papers should be quite sufficient).

In any case, I don't think RLHF makes this problem actively worse, and it could plausibly help a bit though obviously the help is of the form "adds a trivial inconvenience to destroying the world".

Some people don't believe the we will get to Powerful AI Agents before we've already arrived at other world states that make it unlikely we will continue to proceed on a trajectory towards Powerful AI Agents.

If you replace "a trajectory towards powerful AI agents" with "a trajectory towards powerful AI agents that was foreseen in 2024 and could be meaningfully changed in predictable ways by people in 2024 using information that exists in 2024" that's basically my position.

sharmake-farah on RLHF is the worst possible thing done when facing the alignment problem

As someone who does disagree with this:

End-state-focused Value-aligned Highly-Effective-Optimizer AI Agents (hereafter Powerful AI Agents) with the primary value of win-at-all-costs are extremely dangerous.

I think the disagreement point is this:

Some say that intelligent agents being dangerously powerful is good actually, and won't be a risky situation

I have several cruxes/reasons for why I disagree:

I think I have quite a lot less probability on unrestrained AI conflict than @tailcalled [LW · GW] does.
I disagree with the assumption of this:

But nobody has come up with an acceptable end goal for the world, because any goal we can come up with tends to want to consume everything, which destroys humanity.

Because I think that a lot of the reason the search came up empty is because people were attempting to solve the problem too far ahead of the actual risk.

Also, I think it matters a lot which specific utility function matters here, analogously to how picking a specific number or function from a variable N tells you a lot more about what will happen than just reasoning about the variable in the abstract.

3. I think the world isn't as offense-dominant as LWers/EAs tend to think, and that while some level of offense outpacing defense is quite plausible, I don't think it's as extreme as "defense is essentially pointless."

christiankl on Inquisitive vs. adversarial rationality

You define your terms when you say:

An inquisitive (aka inquisitorial) legal system is one in which the judge acts as both judge and prosecutor, personally digging into the facts before ruling.

In the German system, digging into the facts before the ruling is part of the job of the judge. They are doing it from a neutral perspective, but digging into facts is part of what they are supposed to do. In Anglosaxon common law on the other hand it's the job of both parties of a legal case to law out all the facts that they think support their side and it's not the job to dig into facts that neither of the sides presented.

raytaylor on Mental Health and the Alignment Problem: A Compilation of Resources (updated April 2023)

It was nice to see C S Lewis as a reminder we've kinda been here before.

One of the things which helped groups during the fight for the Nuclear Test Ban Treaty in the US was Joanna Macy's "despair work", which was developed from individual grief work.

Joanna started in intelligence and has been facing X-risks and slow actions of governments with others since the 1970s, and built a network of people doing that, and she still does. She did a lot in Cernobyl, and did some of the earliest longtermism and deep time work on nuclear waste storage.

Her despair work has been adapted for climate change and rainforest protection, so I'm sure it could be adapted for AI and other Xrisks/Srisks too, and even tougher goals like achieving universal veganism, instituting rational policymaking or "dealing with parents" ;-)

Trainers in despair work:
https://workthatreconnects.org/find-a-facilitator/, or ask me, or Dr Chris Johnstone for recommendations.

Trainer's Manual for groups (recommended):
- Coming Back to Life, Joanna Macy

Books:
- Despair and Empowerment in the Nuclear Age, Joanna Macy
- Active Hope, Chris Johnstone and Joanna Macy

More recent video:
(be ready to filter some of the 1970s vocabulary; they're both confident with intense emotion)

oklogic on My Critique of Effective Altruism

As a full throated defender of pulling the lever (given traditional assumptions such as a lack of an audience, complete knowledge of each outcomes, productivity of the people on the tracks) , there are numerous issues with your proposals:

1.) Vague alternative: You seem to be pushing towards some form of virtue ethics/basic intuitionism, but there are numerous problems with this approach. Besides determining whose basic intuitions count and whose don't, or which virtues are important, there is very real problems when these virtues conflict. For instance, imagine you are walking at night, and are trying to cross a street. The sign says red, but no cars are around. Do you jaywalk? In this circumstance, one is forced to make a decision which pits two virtues/ intuitions against each other. The beauty of utilitarianism is that it allows us to choose in these circumstances.

2.) Subjective Morality: Yes, utilitarianism, may not be "objective" in the sense that there is no intrinsic reason to value human flourishing, but I believe utilitarianism to be the viewpoint which closest conforms to what most people value. To illustrate why this matters, image you need to decide what color to paint a room. Nobody has very strong opinions, but most people in your household prefer the color blue. Yes, blue might not be "objectively" the best, but if most of the people in your household like the color blue the most, there is little reason not to. We are all individually going to seek what we value, so we might as well collectively agree to a system which reflects the preferences of most people.

3.) Altruism in Disguise:

Another thing to notice is that virtue ethics can be a form of effective altruism when practiced in specific ways. In general, bettering yourself as a person by becoming more rational, less biased, act, will in fact make the world a better place, and giving time to form meaningful relationships, engage in leisure, etc. can actually increase productivity in the long run.

You also seem advocate for fundamental changes in society, changes I am not sure I would agree with, but if your proposed changes are indeed the best way to increase the general happiness of the population, it would be, by definition, the goal of the EA movement. I think a lot of people look at the recent stuff with SBF and AI research and come to think the EA movement is only concerned with lofty existential risk scenarios, but there is a lot more to it then that.