LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] What Depression Is Like
Sable · 2024-08-27T17:43:22.549Z · comments (21)

Towards Multimodal Interpretability: Learning Sparse Interpretable Features in Vision Transformers
hugofry · 2024-04-29T20:57:35.127Z · comments (8)

The Good Life in the face of the apocalypse
Elizabeth (pktechgirl) · 2023-10-16T22:40:15.200Z · comments (8)

Live Theory Part 0: Taking Intelligence Seriously
Sahil · 2024-06-26T21:37:10.479Z · comments (3)

[Paper] All's Fair In Love And Love: Copy Suppression in GPT-2 Small
CallumMcDougall (TheMcDouglas) · 2023-10-13T18:32:02.376Z · comments (4)

Coup probes: Catching catastrophes with probes trained off-policy
Fabien Roger (Fabien) · 2023-11-17T17:58:28.687Z · comments (7)

[link] Essay competition on the Automation of Wisdom and Philosophy — $25k in prizes
owencb · 2024-04-16T10:10:13.338Z · comments (11)

[link] Palworld development blog post
bhauth · 2024-01-28T05:56:19.984Z · comments (12)

AISafety.com – Resources for AI Safety
Søren Elverlin (soren-elverlin-1) · 2024-05-17T15:57:11.712Z · comments (3)

[link] "The Heart of Gaming is the Power Fantasy", and Cohabitive Games
Raemon · 2023-10-08T21:02:33.526Z · comments (49)

My Criticism of Singular Learning Theory
Joar Skalse (Logical_Lunatic) · 2023-11-19T15:19:16.874Z · comments (56)

Refusal mechanisms: initial experiments with Llama-2-7b-chat
Andy Arditi (andy-arditi) · 2023-12-08T17:08:01.250Z · comments (7)

Decomposing the QK circuit with Bilinear Sparse Dictionary Learning
keith_wynroe · 2024-07-02T13:17:16.352Z · comments (7)

Bostrom Goes Unheard
Zvi · 2023-11-13T14:11:07.586Z · comments (9)

Why you should be using a retinoid
GeneSmith · 2024-08-19T03:07:41.722Z · comments (53)

[link] New voluntary commitments (AI Seoul Summit)
Zach Stein-Perlman · 2024-05-21T11:00:41.794Z · comments (17)

GPT-o1
Zvi · 2024-09-16T13:40:06.236Z · comments (32)

An Introduction To The Mandelbrot Set That Doesn't Mention Complex Numbers
Yitz (yitz) · 2024-01-17T09:48:07.930Z · comments (11)

Fluent, Cruxy Predictions
Raemon · 2024-07-10T18:00:06.424Z · comments (10)

Constructability: Plainly-coded AGIs may be feasible in the near future
Épiphanie Gédéon (joy_void_joy) · 2024-04-27T16:04:45.894Z · comments (13)

OpenAI o1, Llama 4, and AlphaZero of LLMs
Vladimir_Nesov · 2024-09-14T21:27:41.241Z · comments (23)

Announcing Athena - Women in AI Alignment Research
Claire Short (claire-short) · 2023-11-07T21:46:41.741Z · comments (2)

Survey of 2,778 AI authors: six parts in pictures
KatjaGrace · 2024-01-06T04:43:34.590Z · comments (1)

Studying The Alien Mind
Quentin FEUILLADE--MONTIXI (quentin-feuillade-montixi) · 2023-12-05T17:27:28.049Z · comments (10)

The Gemini Incident
Zvi · 2024-02-22T21:00:04.594Z · comments (19)

3C's: A Recipe For Mathing Concepts
johnswentworth · 2024-07-03T01:06:11.944Z · comments (5)

Self-Referential Probabilistic Logic Admits the Payor's Lemma
Yudhister Kumar (randomwalks) · 2023-11-28T10:27:29.029Z · comments (14)

LessWrong Community Weekend 2024, open for applications
UnplannedCauliflower · 2024-05-01T10:18:21.992Z · comments (2)

[link] The Shutdown Problem: An AI Engineering Puzzle for Decision Theorists
EJT (ElliottThornley) · 2023-10-23T21:00:48.398Z · comments (22)

[question] How have you become more hard-working?
Chi Nguyen · 2023-09-25T12:37:39.860Z · answers+comments (40)

[link] MIRI's May 2024 Newsletter
Harlan · 2024-05-15T00:13:30.153Z · comments (1)

New report: "Scheming AIs: Will AIs fake alignment during training in order to get power?"
Joe Carlsmith (joekc) · 2023-11-15T17:16:42.088Z · comments (26)

Thomas Kwa's research journal
Thomas Kwa (thomas-kwa) · 2023-11-23T05:11:08.907Z · comments (1)

Quick look: applications of chaos theory
Elizabeth (pktechgirl) · 2024-08-18T15:00:07.853Z · comments (45)

[link] My thesis (Algorithmic Bayesian Epistemology) explained in more depth
Eric Neyman (UnexpectedValues) · 2024-05-09T19:43:16.543Z · comments (4)

I'm a Former Israeli Officer. AMA
Yovel Rom · 2023-10-10T08:33:51.557Z · comments (70)

Spaciousness In Partner Dance: A Naturalism Demo
LoganStrohl (BrienneYudkowsky) · 2023-11-19T07:00:19.555Z · comments (5)

[link] Is "superhuman" AI forecasting BS? Some experiments on the "539" bot from the Centre for AI Safety
titotal (lombertini) · 2024-09-18T13:07:40.754Z · comments (2)

A couple productivity tips for overthinkers
Steven Byrnes (steve2152) · 2024-04-20T16:05:50.332Z · comments (13)

Corrigibility = Tool-ness?
johnswentworth · 2024-06-28T01:19:48.883Z · comments (8)

[Valence series] 2. Valence & Normativity
Steven Byrnes (steve2152) · 2023-12-07T16:43:49.919Z · comments (5)

EU policymakers reach an agreement on the AI Act
tlevin (trevor) · 2023-12-15T06:02:44.668Z · comments (7)

Some Vacation Photos
johnswentworth · 2024-01-04T17:15:01.187Z · comments (0)

[link] [Linkpost] Practically-A-Book Review: Rootclaim $100,000 Lab Leak Debate
trevor (TrevorWiesinger) · 2024-03-28T16:03:36.452Z · comments (22)

Questions for labs
Zach Stein-Perlman · 2024-04-30T22:15:55.362Z · comments (11)

Attention SAEs Scale to GPT-2 Small
Connor Kissane (ckkissane) · 2024-02-03T06:50:22.583Z · comments (4)

OpenAI: Leaks Confirm the Story
Zvi · 2023-12-12T14:00:04.812Z · comments (9)

The Parable Of The Fallen Pendulum - Part 2
johnswentworth · 2024-03-12T21:41:30.180Z · comments (8)

Reward hacking behavior can generalize across tasks
Kei · 2024-05-28T16:33:50.674Z · comments (5)

MATS Summer 2023 Retrospective
utilistrutil · 2023-12-01T23:29:47.958Z · comments (34)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

cubefox on Ozyrus's Shortform

May I ask whether there is anything planned on fixing this rendering/loading bug [LW(p) · GW(p)] which occurs with Firefox? It affects unread/uncached posts opened in a background tab.

alexander-gufan on On Measuring Intellectual Performance - personal experience and several thoughts

Of course I do.

ricraz on The case for more Alignment Target Analysis (ATA)

It seems very plausible to me that alignment targets in practice will evolve out of things like the OpenAI Model Spec. If anyone has suggestions for how to improve that, please DM me.

cubefox on What's the Deal with Logical Uncertainty?

In theories of Bayesianism, the axioms of probability theory are conventionally assumed to say that all logical truths have probability one, and that the probability of a disjunction of logically inconsistent statements is the sum of their probabilities. Corresponding to the second and third Kolmogorov axiom.

If one then e.g. regards the Peano axioms as certain, then all theorems of Peano arithmetic must also be certain, because those are just logical consequences. And all statements which can be disproved in Peano arithmetic then must have probability zero. So the above version of the Kolmogorov axioms is assuming we are logically omniscient. So this form of Bayesianism doesn't allow us to assign anything like 0.5 probability to the googolth digit of pi being odd: We must assign 1 if it's odd, or 0 if it's even.

I think the simple solution is to not talk about logical tautologies and contradictions when expressing the Kolmogorov axioms for a theory of subjective Bayesianism. Instead talk about what we actually know a priori, not about tautologies which we merely could know a priori (if we were logically omniscient). Then the second Kolmogorov axiom says that statements we actually know a priori have to be assigned probability 1, and disjunctions of statements actually known a priori to be inconsistent have to be assigned the sum of their probabilities.

Then we are allowed to assign probabilities less than 1 to statements where we don't actually know that they are tautologies, e.g. 0.5 to "the googolth digit of pi is odd" even if this happens to be, unbeknownst to us, a theorem of Peano arithmetic.

rotatingpaguro on On Measuring Intellectual Performance - personal experience and several thoughts

A very interesting problem is measuring something like general intelligence. I’m not going to delve deeply into this topic but simply want to draw attention to an idea that is often implied, though rarely expressed, in the framing of such a problem: the assumption that an "intelligence level," whatever it may be, corresponds to some inherent properties of a person and can be measured through their manifestations. Moreover, we often talk about measurements with a precision of a few percentage points, which suggests that, in theory, the measurement should be based on very stable indicators.
What’s fascinating is that this assumption receives very little scrutiny, while in cases where we talk about "mechanical" parameters of the human body (such as physical performance), we know that such parameters, aside from a person’s potential, heavily depends on numerous external factors and what that person has been doing over the past couple of weeks.

Do you know about the g-factor?

faul_sname on RLHF is the worst possible thing done when facing the alignment problem

Unfortunately, until the world has acted to patch up some terrible security holes in society, we are all in a very fragile state.

Agreed.

I have been working on AI Biorisk Evals with SecureBio for nearly a year now.

I appreciate that. I also really like the NAO project, which also a SecureBio thing. Good work y'all!

As models increase in general capabilities, so too do they incidentally get more competent at assisting with the creation of bioweapons. It is my professional opinion that they are currently providing non-zero uplift over a baseline of 'bad actor with web search, including open-access scientific papers'.

Yeah, if your threat model is "AI can help people do more things, including bad things" that is a valid threat model and seems correct to me. That said, my world model has a giant gaping hole where one would expect an explanation for why "people can do lots of things" hasn't already led to a catastrophe (it's not like the bio risk threat model needs AGI assistance, a couple undergrad courses and some lab experience reproducing papers should be quite sufficient).

In any case, I don't think RLHF makes this problem actively worse, and it could plausibly help a bit though obviously the help is of the form "adds a trivial inconvenience to destroying the world".

Some people don't believe the we will get to Powerful AI Agents before we've already arrived at other world states that make it unlikely we will continue to proceed on a trajectory towards Powerful AI Agents.

If you replace "a trajectory towards powerful AI agents" with "a trajectory towards powerful AI agents that was foreseen in 2024 and could be meaningfully changed in predictable ways by people in 2024 using information that exists in 2024" that's basically my position.

sharmake-farah on RLHF is the worst possible thing done when facing the alignment problem

As someone who does disagree with this:

End-state-focused Value-aligned Highly-Effective-Optimizer AI Agents (hereafter Powerful AI Agents) with the primary value of win-at-all-costs are extremely dangerous.

I think the disagreement point is this:

Some say that intelligent agents being dangerously powerful is good actually, and won't be a risky situation

I have several cruxes/reasons for why I disagree:

I think I have quite a lot less probability on unrestrained AI conflict than @tailcalled [LW · GW] does.
I disagree with the assumption of this:

But nobody has come up with an acceptable end goal for the world, because any goal we can come up with tends to want to consume everything, which destroys humanity.

Because I think that a lot of the reason the search came up empty is because people were attempting to solve the problem too far ahead of the actual risk.

Also, I think it matters a lot which specific utility function matters here, analogously to how picking a specific number or function from a variable N tells you a lot more about what will happen than just reasoning about the variable in the abstract.

3. I think the world isn't as offense-dominant as LWers/EAs tend to think, and that while some level of offense outpacing defense is quite plausible, I don't think it's as extreme as "defense is essentially pointless."

christiankl on Inquisitive vs. adversarial rationality

You define your terms when you say:

An inquisitive (aka inquisitorial) legal system is one in which the judge acts as both judge and prosecutor, personally digging into the facts before ruling.

In the German system, digging into the facts before the ruling is part of the job of the judge. They are doing it from a neutral perspective, but digging into facts is part of what they are supposed to do. In Anglosaxon common law on the other hand it's the job of both parties of a legal case to law out all the facts that they think support their side and it's not the job to dig into facts that neither of the sides presented.

raytaylor on Mental Health and the Alignment Problem: A Compilation of Resources (updated April 2023)

It was nice to see C S Lewis as a reminder we've kinda been here before.

One of the things which helped groups during the fight for the Nuclear Test Ban Treaty in the US was Joanna Macy's "despair work", which was developed from individual grief work.

Joanna started in intelligence and has been facing X-risks and slow actions of governments with others since the 1970s, and built a network of people doing that, and she still does. She did a lot in Cernobyl, and did some of the earliest longtermism and deep time work on nuclear waste storage.

Her despair work has been adapted for climate change and rainforest protection, so I'm sure it could be adapted for AI and other Xrisks/Srisks too, and even tougher goals like achieving universal veganism, instituting rational policymaking or "dealing with parents" ;-)

Trainers in despair work:
https://workthatreconnects.org/find-a-facilitator/, or ask me, or Dr Chris Johnstone for recommendations.

Trainer's Manual for groups (recommended):
- Coming Back to Life, Joanna Macy

Books:
- Despair and Empowerment in the Nuclear Age, Joanna Macy
- Active Hope, Chris Johnstone and Joanna Macy

More recent video:
(be ready to filter some of the 1970s vocabulary; they're both confident with intense emotion)

oklogic on My Critique of Effective Altruism

As a full throated defender of pulling the lever (given traditional assumptions such as a lack of an audience, complete knowledge of each outcomes, productivity of the people on the tracks) , there are numerous issues with your proposals:

1.) Vague alternative: You seem to be pushing towards some form of virtue ethics/basic intuitionism, but there are numerous problems with this approach. Besides determining whose basic intuitions count and whose don't, or which virtues are important, there is very real problems when these virtues conflict. For instance, imagine you are walking at night, and are trying to cross a street. The sign says red, but no cars are around. Do you jaywalk? In this circumstance, one is forced to make a decision which pits two virtues/ intuitions against each other. The beauty of utilitarianism is that it allows us to choose in these circumstances.

2.) Subjective Morality: Yes, utilitarianism, may not be "objective" in the sense that there is no intrinsic reason to value human flourishing, but I believe utilitarianism to be the viewpoint which closest conforms to what most people value. To illustrate why this matters, image you need to decide what color to paint a room. Nobody has very strong opinions, but most people in your household prefer the color blue. Yes, blue might not be "objectively" the best, but if most of the people in your household like the color blue the most, there is little reason not to. We are all individually going to seek what we value, so we might as well collectively agree to a system which reflects the preferences of most people.

3.) Altruism in Disguise:

Another thing to notice is that virtue ethics can be a form of effective altruism when practiced in specific ways. In general, bettering yourself as a person by becoming more rational, less biased, act, will in fact make the world a better place, and giving time to form meaningful relationships, engage in leisure, etc. can actually increase productivity in the long run.

You also seem advocate for fundamental changes in society, changes I am not sure I would agree with, but if your proposed changes are indeed the best way to increase the general happiness of the population, it would be, by definition, the goal of the EA movement. I think a lot of people look at the recent stuff with SBF and AI research and come to think the EA movement is only concerned with lofty existential risk scenarios, but there is a lot more to it then that.