LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Reducing global AI competition through the Commerce Control List and Immigration reform: a dual-pronged approach
Ben Smith (ben-smith) · 2024-09-03T05:28:24.549Z · comments (2)

Interview with Robert Kralisch on Simulators
WillPetillo · 2024-08-26T05:49:15.543Z · comments (0)

My experience applying to MATS 6.0
mic (michael-chen) · 2024-07-18T19:02:21.849Z · comments (3)

Superintelligence Can't Solve the Problem of Deciding What You'll Do
Vladimir_Nesov · 2024-09-15T21:03:28.077Z · comments (10)

[link] My lukewarm take on GLP-1 agonists
George3d6 · 2024-08-26T12:34:27.929Z · comments (0)

[question] Is there any rigorous work on using anthropic uncertainty to prevent situational awareness / deception?
David Scott Krueger (formerly: capybaralet) (capybaralet) · 2024-09-04T12:40:07.678Z · answers+comments (6)

[link] How (and why) to get tested for CMV
Metacelsus · 2024-07-15T20:06:05.649Z · comments (0)

[link] Non-Transactional Compliments
Jonathan Moregård (JonathanMoregard) · 2024-08-09T13:42:16.471Z · comments (0)

Four ways I've made bad decisions
Sodium · 2024-07-14T22:18:47.630Z · comments (1)

All the Following are Distinct
Gianluca Calcagni (gianluca-calcagni) · 2024-08-02T16:35:51.815Z · comments (3)

The Residual Expansion: A Framework for thinking about Transformer Circuits
Daniel Tan (dtch1997) · 2024-08-02T11:04:56.347Z · comments (13)

An information-theoretic study of lying in LLMs
Annah (annah) · 2024-08-02T10:06:39.312Z · comments (0)

[question] If AI is in a bubble and the bubble bursts, what would you do?
Remmelt (remmelt-ellen) · 2024-08-19T10:56:03.948Z · answers+comments (5)

Probabilistic Logic <=> Oracles?
Yudhister Kumar (randomwalks) · 2024-07-01T05:36:35.350Z · comments (0)

Juneberry Puffs
jefftk (jkaufman) · 2024-06-21T18:50:01.677Z · comments (0)

AI labs can boost external safety research
Zach Stein-Perlman · 2024-07-31T19:30:16.207Z · comments (0)

The new UK government's stance on AI safety
Elliot_Mckernon (elliot) · 2024-07-31T15:23:59.235Z · comments (0)

Activation Pattern SVD: A proposal for SAE Interpretability
Daniel Tan (dtch1997) · 2024-06-28T22:12:48.789Z · comments (2)

Determining the power of investors over Frontier AI Labs is strategically important to reduce x-risk
Lucie Philippon (lucie-philippon) · 2024-07-25T01:12:20.518Z · comments (7)

My career exploration: Tools for building confidence
lynettebye · 2024-09-13T11:37:55.843Z · comments (0)

Room Available in Boston Group House
NoSignalNoNoise (AspiringRationalist) · 2024-07-23T02:55:59.602Z · comments (1)

Toward a taxonomy of cognitive benchmarks for agentic AGIs
Ben Smith (ben-smith) · 2024-06-27T23:50:11.714Z · comments (0)

[link] Holomorphic surjection theorem (Picard's little theorem)
dkl9 · 2024-07-21T13:24:18.300Z · comments (0)

Simulation-aware causal decision theory: A case for one-boxing in CDT
kongus_bongus · 2024-08-09T18:09:20.013Z · comments (11)

D&D.Sci: Whom Shall You Call? [Evaluation and Ruleset]
abstractapplic · 2024-07-17T22:34:25.111Z · comments (2)

Slave Morality: A place for every man and every man in his place
Martin Sustrik (sustrik) · 2024-09-19T04:20:04.491Z · comments (5)

Automating LLM Auditing with Developmental Interpretability
htlou · 2024-09-04T15:50:04.337Z · comments (0)

Physical Therapy Sucks (but have you tried hiding it in some peanut butter?)
Declan Molony (declan-molony) · 2024-09-10T05:54:47.000Z · comments (12)

[link] Announcing the AI Forecasting Benchmark Series | July 8, $120k in Prizes
ChristianWilliams · 2024-07-02T22:33:23.512Z · comments (0)

Announcing the Ultimate Jailbreaking Championship
InnerHufflepuff (grayswan) · 2024-09-04T00:35:31.234Z · comments (1)

Cat Sustenance Fortification
jefftk (jkaufman) · 2024-07-31T02:30:04.898Z · comments (7)

Are LLMs on the Path to AGI?
Davidmanheim · 2024-08-30T03:14:04.710Z · comments (2)

Being hella lost as rationality practice
Rachel Shu (wearsshoes) · 2024-06-24T23:50:59.991Z · comments (0)

Antitrust as Controlled Creative Destruction
Martin Sustrik (sustrik) · 2024-07-10T16:40:01.503Z · comments (2)

[link] Does robustness improve with scale?
ChengCheng (ccstan99) · 2024-07-25T20:55:53.359Z · comments (0)

Against Explosive Growth
c.trout (ctrout) · 2024-09-04T21:45:03.120Z · comments (1)

Rabin's Paradox
Charlie Steiner · 2024-08-14T05:40:25.572Z · comments (39)

Longevity: A critical look at "Loss of epigenetic information as a cause of mammalian aging"
Anna Crow · 2024-07-24T01:40:57.634Z · comments (2)

Emergence, The Blind Spot of GenAI Interpretability?
Quentin FEUILLADE--MONTIXI (quentin-feuillade-montixi) · 2024-08-10T10:07:53.654Z · comments (5)

[link] Benefits of Psyllium Dietary Fiber in Particular
Brendan Long (korin43) · 2024-08-28T18:13:23.891Z · comments (6)

[link] AI x Human Flourishing: Introducing the Cosmos Institute
Brendan McCord (brendan-mccord) · 2024-09-05T18:23:32.690Z · comments (5)

Stacked Laptop Monitor Update
jefftk (jkaufman) · 2024-07-15T09:40:06.075Z · comments (3)

Primary Perceptive Systems
ChristianKl · 2024-08-15T11:26:01.667Z · comments (2)

[question] Looking to interview AI Safety researchers for a book
jeffreycaruso · 2024-08-24T19:57:33.119Z · answers+comments (0)

[link] The Ap Distribution
criticalpoints · 2024-08-24T21:45:35.029Z · comments (3)

[link] Verification methods for international AI agreements
Akash (akash-wasil) · 2024-08-31T14:58:10.986Z · comments (1)

Funding for work that builds capacity to address risks from transformative AI
abergal · 2024-08-14T23:52:09.922Z · comments (0)

[question] Looking for intuitions to extend bargaining notions
ProgramCrafter (programcrafter) · 2024-08-24T05:00:13.995Z · answers+comments (0)

[link] The Best Bits From Build, Baby, Build
Maxwell Tabarrok (maxwell-tabarrok) · 2024-07-11T14:09:10.131Z · comments (0)

[question] Building an Inexpensive, Aesthetic, Private Forum
Aaron Graifman (aaron-graifman) · 2024-09-09T17:10:42.677Z · answers+comments (15)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

rotatingpaguro on On Measuring Intellectual Performance - personal experience and several thoughts

A very interesting problem is measuring something like general intelligence. I’m not going to delve deeply into this topic but simply want to draw attention to an idea that is often implied, though rarely expressed, in the framing of such a problem: the assumption that an "intelligence level," whatever it may be, corresponds to some inherent properties of a person and can be measured through their manifestations. Moreover, we often talk about measurements with a precision of a few percentage points, which suggests that, in theory, the measurement should be based on very stable indicators.
What’s fascinating is that this assumption receives very little scrutiny, while in cases where we talk about "mechanical" parameters of the human body (such as physical performance), we know that such parameters, aside from a person’s potential, heavily depends on numerous external factors and what that person has been doing over the past couple of weeks.

Do you know about the g-factor?

faul_sname on RLHF is the worst possible thing done when facing the alignment problem

Unfortunately, until the world has acted to patch up some terrible security holes in society, we are all in a very fragile state.

Agreed.

I have been working on AI Biorisk Evals with SecureBio for nearly a year now.

I appreciate that. I also really like the NAO project, which also a SecureBio thing. Good work y'all!

As models increase in general capabilities, so too do they incidentally get more competent at assisting with the creation of bioweapons. It is my professional opinion that they are currently providing non-zero uplift over a baseline of 'bad actor with web search, including open-access scientific papers'.

Yeah, if your threat model is "AI can help people do more things, including bad things" that is a valid threat model and seems correct to me. That said, my world model has a giant gaping hole where one would expect an explanation for why "people can do lots of things" hasn't already led to a catastrophe (it's not like the bio risk threat model needs AGI assistance, a couple undergrad courses and some lab experience reproducing papers should be quite sufficient).

In any case, I don't think RLHF makes this problem actively worse, and it could plausibly help a bit though obviously the help is of the form "adds a trivial inconvenience to destroying the world".

Some people don't believe the we will get to Powerful AI Agents before we've already arrived at other world states that make it unlikely we will continue to proceed on a trajectory towards Powerful AI Agents.

If you replace "a trajectory towards powerful AI agents" with "a trajectory towards powerful AI agents that was foreseen in 2024 and could be meaningfully changed in predictable ways by people in 2024 using information that exists in 2024" that's basically my position.

sharmake-farah on RLHF is the worst possible thing done when facing the alignment problem

As someone who does disagree with this:

End-state-focused Value-aligned Highly-Effective-Optimizer AI Agents (hereafter Powerful AI Agents) with the primary value of win-at-all-costs are extremely dangerous.

I think the disagreement point is this:

Some say that intelligent agents being dangerously powerful is good actually, and won't be a risky situation

I have several cruxes/reasons for why I disagree:

I think I have quite a lot less probability on unrestrained AI conflict than @tailcalled [LW · GW] does.
I disagree with the assumption of this:

But nobody has come up with an acceptable end goal for the world, because any goal we can come up with tends to want to consume everything, which destroys humanity.

Because I think that a lot of the reason the search came up empty is because people were attempting to solve the problem too far ahead of the actual risk.

Also, I think it matters a lot which specific utility function matters here, analogously to how picking a specific number or function from a variable N tells you a lot more about what will happen than just reasoning about the variable in the abstract.

3. I think the world isn't as offense-dominant as LWers/EAs tend to think, and that while some level of offense outpacing defense is quite plausible, I don't think it's as extreme as "defense is essentially pointless."

charbel-raphael on The case for more Alignment Target Analysis (ATA)

I believe we should not create a Sovereign AI. Developing a goal-directed agent of this kind will always be too dangerous. Instead, we should aim for a scenario similar to CERN, where powerful AI systems are used for research in secure labs, but not deployed in the economy.

I don't want AIs to takeover.

christiankl on Inquisitive vs. adversarial rationality

You define your terms when you say:

An inquisitive (aka inquisitorial) legal system is one in which the judge acts as both judge and prosecutor, personally digging into the facts before ruling.

In the German system, digging into the facts before the ruling is part of the job of the judge. They are doing it from a neutral perspective, but digging into facts is part of what they are supposed to do. In Anglosaxon common law on the other hand it's the job of both parties of a legal case to law out all the facts that they think support their side and it's not the job to dig into facts that neither of the sides presented.

raytaylor on Mental Health and the Alignment Problem: A Compilation of Resources (updated April 2023)

It was nice to see C S Lewis as a reminder we've kinda been here before.

One of the things which helped groups during the fight for the Nuclear Test Ban Treaty in the US was Joanna Macy's "despair work", which was developed from individual grief work.

Joanna started in intelligence and has been facing X-risks and slow actions of governments with others since the 1970s, and built a network of people doing that, and she still does. She did a lot in Cernobyl, and did some of the earliest longtermism and deep time work on nuclear waste storage.

Her despair work has been adapted for climate change and rainforest protection, so I'm sure it could be adapted for AI and other Xrisks/Srisks too, and even tougher goals like achieving universal veganism, instituting rational policymaking or "dealing with parents" ;-)

Trainers in despair work:
https://workthatreconnects.org/find-a-facilitator/, or ask me, or Dr Chris Johnstone for recommendations.

Trainer's Manual for groups (recommended):
- Coming Back to Life, Joanna Macy

Books:
- Despair and Empowerment in the Nuclear Age, Joanna Macy
- Active Hope, Chris Johnstone and Joanna Macy

More recent video:
(be ready to filter some of the 1970s vocabulary; they're both confident with intense emotion)

oklogic on My Critique of Effective Altruism

As a full throated defender of pulling the lever (given traditional assumptions such as a lack of an audience, complete knowledge of each outcomes, productivity of the people on the tracks) , there are numerous issues with your proposals:

1.) Vague alternative: You seem to be pushing towards some form of virtue ethics/basic intuitionism, but there are numerous problems with this approach. Besides determining whose basic intuitions count and whose don't, or which virtues are important, there is very real problems when these virtues conflict. For instance, imagine you are walking at night, and are trying to cross a street. The sign says red, but no cars are around. Do you jaywalk? In this circumstance, one is forced to make a decision which pits two virtues/ intuitions against each other. The beauty of utilitarianism is that it allows us to choose in these circumstances.

2.) Subjective Morality: Yes, utilitarianism, may not be "objective" in the sense that there is no intrinsic reason to value human flourishing, but I believe utilitarianism to be the viewpoint which closest conforms to what most people value. To illustrate why this matters, image you need to decide what color to paint a room. Nobody has very strong opinions, but most people in your household prefer the color blue. Yes, blue might not be "objectively" the best, but if most of the people in your household like the color blue the most, there is little reason not to. We are all individually going to seek what we value, so we might as well collectively agree to a system which reflects the preferences of most people.

3.) Altruism in Disguise:

Another thing to notice is that virtue ethics can be a form of effective altruism when practiced in specific ways. In general, bettering yourself as a person by becoming more rational, less biased, act, will in fact make the world a better place, and giving time to form meaningful relationships, engage in leisure, etc. can actually increase productivity in the long run.

You also seem advocate for fundamental changes in society, changes I am not sure I would agree with, but if your proposed changes are indeed the best way to increase the general happiness of the population, it would be, by definition, the goal of the EA movement. I think a lot of people look at the recent stuff with SBF and AI research and come to think the EA movement is only concerned with lofty existential risk scenarios, but there is a lot more to it then that.

raemon on We Don't Know Our Own Values, but Reward Bridges The Is-Ought Gap

I felt a bit confused by this bit:

At some point I sit down and think about escamoles. Yeah, ants are kinda gross, but on reflection I don’t think I endorse that reaction to escamoles. I can see why my reward system would generate an “ew, gross” signal, but I model that reward as being the result of two decoupled things: either a hardcoded aversion to insects, or my actual values. I know that I am automatically averse to putting insects in my mouth so it's less likely that the negative reward is evidence of my values in this case; the signal is explained away in the usual epistemic sense by some cause other than my values. So, I partially undo the value-downgrade I had assigned to escamoles in response to the “ew, gross” reaction. I might still feel some disgust, but I consciously override that disgust to some extent.
That last example is particularly interesting, since it highlights a nontrivial prediction of this model. Insofar as reward is treated as evidence about values, and our beliefs about values update in the ordinary epistemic manner, we should expect all the typical phenomena of epistemic updating to carry over to learning about our values. Explaining-away is one such phenomenon. What do other standard epistemic phenomena look like, when carried over to learning about values using reward as evidence?

I feel like this sort of makes sense but don't quite parse why this counted as "explaining away." How do I know my hardcoded reactions aren't values?

cubefox on What you know when you know nothing

Ludwig Wittgenstein tried to solve this problem in an a priori fashion with a theory of "logical atomism". So without a refined model of the world. In the Tractatus he postulated that there must be "atomic" propositions. For example, the proposition that Bob is a bachelor is clearly not atomic (but complex) because it can be decomposed into the proposition that Bob is a man and that Bob is unmarried. And those are arguably themselves sort of complex statements, since the concept of a man or of marriage themselves allow for definitions from simpler terms. But at some point, we hit undefinable, primitive terms, presumably those which refer directly to particular sense data or the like. Wittgenstein then argued that these atomic propositions have to be regarded as being independent and having probability 1/2.

Or more precisely, he came up with the concept of truth tables, and counted the fraction of the rows in which the conditions of the truth tables are satisfied. Each row he assumed to be a priori equally likely when only atomic propositions are involved. So for atomic propositions P and Q, the complex proposition "P and Q" has probability 1/4 (only one out of four rows makes this proposition true, namely "true and true"), and the complex proposition "P or Q" has probability 3/4 (three out of four rows in the truth table make a disjunction true: all except "false or false").

This turns out to be equivalent to assuming that all atomic propositions have a) probability 1/2 and are b) independent of each other.

However, logical atomism was later broadly abandoned for various reasons. One is that it is hard to define what an atomic proposition is. For example, I can't assume that "this particular spot in my visual field is blue" is atomic. Because it is incompatible with the statement "this particular spot in my visual field is yellow". The same spot can't be both blue and yellow, even though that wouldn't be a logical contradiction. The two statements are therefore not independent, so they can't be atomic. But it is hard to think of anything more primitive than qualia predicates like colors.

raemon on Skills from a year of Purposeful Rationality Practice

Yeah. I'm not entirely sure what you mean by the metaphor, but I use a similar metaphor.

I think, ideally, you'll have at least 3 plans and at least 3 different frames for "what are your plans trying to accomplish". i.e. the 3 plans are solving somewhat different problems. And this is specifically getting at the "being able to move around freely in 3 dimensions" thingy.

The actual top-level blogpost I plan to write is probably going to be called "3 plans, 2 frames, and a crux", which is based on what actually feels reasonable to ask of people in most circumstances. i.e:

Come up with at least 3 different plans.
At least two of them should looking at the problem differently.
If it's obvious which plan is your favorite, figure out what observations you could make later on that might change your mind towards one of the other two.

When you're fluent at this sort of thing, you should have lots of different plans and backup plans in mind such that you have way more than 3, but, this felt like a reasonable bid given that people will probably find all these steps cumbersome and annoying at first.