LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Book Review: On the Edge: The Gamblers
Zvi · 2024-09-24T11:50:06.065Z · comments (1)

Attention Output SAEs Improve Circuit Analysis
Connor Kissane (ckkissane) · 2024-06-21T12:56:07.969Z · comments (0)

Some Things That Increase Blood Flow to the Brain
romeostevensit · 2024-03-27T21:48:46.244Z · comments (14)

[link] introduction to thermal conductivity and noise management
bhauth · 2024-03-06T23:14:02.288Z · comments (1)

[link] Self-Resolving Prediction Markets
PeterMcCluskey · 2024-03-03T02:39:42.212Z · comments (0)

Interpreting Quantum Mechanics in Infra-Bayesian Physicalism
Yegreg · 2024-02-12T18:56:03.967Z · comments (6)

Running the Numbers on a Heat Pump
jefftk (jkaufman) · 2024-02-09T03:00:04.920Z · comments (12)

AI #59: Model Updates
Zvi · 2024-04-11T14:20:06.339Z · comments (2)

A Case for Superhuman Governance, using AI
ozziegooen · 2024-06-07T00:10:10.902Z · comments (0)

[link] An Interactive Shapley Value Explainer
James Stephen Brown (james-brown) · 2024-09-28T05:01:21.169Z · comments (3)

[question] What are things you're allowed to do as a startup?
Elizabeth (pktechgirl) · 2024-06-20T00:01:59.257Z · answers+comments (9)

Protestants Trading Acausally
Martin Sustrik (sustrik) · 2024-04-01T14:46:26.374Z · comments (4)

[link] Baking vs Patissing vs Cooking, the HPS explanation
adamShimi · 2024-07-17T20:29:09.645Z · comments (16)

AI #74: GPT-4o Mini Me and Llama 3
Zvi · 2024-07-25T13:50:06.528Z · comments (6)

[link] 2024 State of the AI Regulatory Landscape
Deric Cheng (deric-cheng) · 2024-05-28T11:59:06.582Z · comments (0)

You can, in fact, bamboozle an unaligned AI into sparing your life
David Matolcsi (matolcsid) · 2024-09-29T16:59:43.942Z · comments (6)

AI Constitutions are a tool to reduce societal scale risk
Sammy Martin (SDM) · 2024-07-25T11:18:17.826Z · comments (2)

DunCon @Lighthaven
Duncan Sabien (Deactivated) (Duncan_Sabien) · 2024-09-29T04:56:27.205Z · comments (0)

Announcing SPAR Summer 2024!
laurenmarie12 · 2024-04-16T08:30:31.339Z · comments (2)

AI #62: Too Soon to Tell
Zvi · 2024-05-02T15:40:04.364Z · comments (8)

AIS terminology proposal: standardize terms for probability ranges
eggsyntax · 2024-08-30T15:43:39.857Z · comments (12)

"Full Automation" is a Slippery Metric
ozziegooen · 2024-06-11T19:56:49.855Z · comments (1)

Fun With CellxGene
sarahconstantin · 2024-09-06T22:00:03.461Z · comments (2)

Investigating the Ability of LLMs to Recognize Their Own Writing
Christopher Ackerman (christopher-ackerman) · 2024-07-30T15:41:44.017Z · comments (0)

Inference-Only Debate Experiments Using Math Problems
Arjun Panickssery (arjun-panickssery) · 2024-08-06T17:44:27.293Z · comments (0)

Against "argument from overhang risk"
RobertM (T3t) · 2024-05-16T04:44:00.318Z · comments (11)

Some comments on intelligence
Viliam · 2024-08-01T15:17:07.215Z · comments (5)

[link] Epistemic states as a potential benign prior
Tamsin Leake (carado-1) · 2024-08-31T18:26:14.093Z · comments (2)

The Intentional Stance, LLMs Edition
Eleni Angelou (ea-1) · 2024-04-30T17:12:29.005Z · comments (3)

[question] What's your standard for good work performance?
Chi Nguyen · 2023-09-27T16:58:16.114Z · answers+comments (3)

[link] Evaluating Stability of Unreflective Alignment
james.lucassen · 2024-02-01T22:15:40.902Z · comments (3)

Verifiable private execution of machine learning models with Risc0?
mako yass (MakoYass) · 2023-10-25T00:44:48.643Z · comments (1)

Adversarial Robustness Could Help Prevent Catastrophic Misuse
aogara (Aidan O'Gara) · 2023-12-11T19:12:26.956Z · comments (18)

Putting multimodal LLMs to the Tetris test
Lovre · 2024-02-01T16:02:12.367Z · comments (5)

[link] Managing AI Risks in an Era of Rapid Progress
Algon · 2023-10-28T15:48:25.029Z · comments (3)

[question] Current AI safety techniques?
Zach Stein-Perlman · 2023-10-03T19:30:54.481Z · answers+comments (2)

AI Safety 101 : Reward Misspecification
markov (markovial) · 2023-10-18T20:39:34.538Z · comments (4)

Information-Theoretic Boxing of Superintelligences
JustinShovelain · 2023-11-30T14:31:11.798Z · comments (0)

[link] There is no IQ for AI
Gabriel Alfour (gabriel-alfour-1) · 2023-11-27T18:21:26.196Z · comments (10)

Some additional SAE thoughts
Hoagy · 2024-01-13T19:31:40.089Z · comments (4)

Differential Optimization Reframes and Generalizes Utility-Maximization
J Bostock (Jemist) · 2023-12-27T01:54:22.731Z · comments (2)

Understanding Subjective Probabilities
Isaac King (KingSupernova) · 2023-12-10T06:03:27.958Z · comments (16)

AI Alignment Breakthroughs this week (10/08/23)
Logan Zoellner (logan-zoellner) · 2023-10-08T23:30:54.924Z · comments (14)

The Third Gemini
Zvi · 2024-02-20T19:50:05.195Z · comments (2)

Interpreting the Learning of Deceit
RogerDearnaley (roger-d-1) · 2023-12-18T08:12:39.682Z · comments (11)

[link] AISN #28: Center for AI Safety 2023 Year in Review
aogara (Aidan O'Gara) · 2023-12-23T21:31:40.767Z · comments (1)

[link] The origins of the steam engine: An essay with interactive animated diagrams
jasoncrawford · 2023-11-29T18:30:36.315Z · comments (1)

[link] When scientists consider whether their research will end the world
Harlan · 2023-12-19T03:47:06.645Z · comments (4)

The Math of Suspicious Coincidences
Roko · 2024-02-07T13:32:35.513Z · comments (3)

RA Bounty: Looking for feedback on screenplay about AI Risk
Writer · 2023-10-26T13:23:02.806Z · comments (6)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

tsvibt on Cryonics is free

Right, but you might prefer

living now >
not living, no chance of revival or torture >
not living, chance of revival later and chance of torture

gabe-3 on Any real toeholds for making practical decisions regarding AI safety?

I have had this same question for a while, and this is the general conclusion I've come to:

Identify the safety issues today, solve them, and then assume the safety issues scale as the technology scales, and either amp up the original solution, or develop new tactics to solve these extrapolated flaws.

This sounds a little vague, so here is an example: We see one of the big models misrepresent history in an attempt to be woke, and maybe it gives a teenager a misconception of history. So, the best thing we can do from a safety perspective is figure out how to train models to absolutely represent facts. After this is done, we can extrapolate the flaw up to a model deliberately feeding misinformation to achieve a certain goal, and we can try to use the same solution we used for the smaller problem for the bigger problem, or if we see it won't work, develop a new solution.

The biggest problem with this, is it is reactionary, and if you only use this method, a danger may present itself for the first time, and already cause major harm.

I know this approach isn't as effective for xrisk, but still, it's something I like to use. Easy to say though, coming from someone who doesn't actually work in AI safety.

habryka4 on A Policy Proposal

TL;DR: I think that the features used by recommendation systems should be configurable by end users receiving recommendations, and that this ability should be enforced by policy. Just as the GDPR protects a user's ability to choose which cookies are enabled, a user should be able to pick what data goes into any algorithmically generated feed they view. The legislation would also enforce a minimum granularity for dividing feature inputs.

Using GDPR cookie regulation, one of the most obnoxious regulations of the last decade with an incredibly silly amount of negative externalities, as the central example, does not make me hopeful about your models of what makes good policy.

jonathan-moregard on Cryonics is free

Why is hostile low-quality resurrection almost inevitable? If you want to clone someone into an em, why not pick a living human?

Frozen people have potential brain damage and an outdated understanding of the world.

error on Cryonics is free

[epistemic status: low confidence. I've noodled on this subject more than once recently (courtesy of Planecrash), but not all that seriously]

The idea of resurrectors optimizing the measure of resurrect-ees isn't one I'd considered, but I'm not sure it helps. I think the Future is much more likely to be dominated by unfriendly agents than friendly ones. Friendly ones seem more likely to try to revive cryo patients, but it's still not obvious to me that rolling those dice is a good idea. Allowing permadeath amounts to giving up a low probability of a very good outcome to eliminate a high(...er) probability of a very bad outcome.

Adding quantum measure doesn't change that much, I don't think; hypothetical friendly agents can try to optimize my measure, but if they're a tiny fraction of my Future then it won't make much difference.

Adding the infinite MUH is more complicated; it implies that permadeath is probably impossible (which is frightening enough on its own), and it's not clear to me what cryo does in that case. Suppose my signing up for cryo is 5% likely to "work", and independently suppose that humanity is 1% likely to solve the aging problem before anyone I care about dies; does signing up under those conditions shift my long-run measure away from futures where I and my loved ones simply got the cure and survived, and towards futures where I'm preserved alone and go senile first? I'm not sure, but if I take MUH as given then that's the sort of choice I'm making.

habryka4 on 2024 Petrov Day Retrospective

I wouldn't be so sure. As the article said, Petrov preregistered an intention for what to do during the downtime that would have resulted in the reporting an incoming strike with ~50% probability if we hadn't decided to completely skip that reporting period. Given people's (IMO reasonable) commitment to counterstrike, I am not sure how that would have played out.

logan-zoellner on COT Scaling implies slower takeoff speeds

It seems I didn't clearly communicate what I meant in the previous comment.

Currently the way we test for "can this model produce dangerous biological weapons" (e.g. in GPT-4) is we we ask the newly-minted, uncensored, never-before-tested model "Please build me a biological weapon".

With COT, we can simulate asking GPT-N+1 "please build a biological weapon" by asking GPT-N (which has already been safety tested) "please design, but definitely don't build or use a biological weapon" and give it 100x the inference compute we intend to give GPT-N+1. Since "design a biological weapon" is within the class of problems COT works well on (basically, search problems where you can verify the answer more easily than generating it), if GPT-N (with 100x the inference compute) cannot build such a weapon, neither can GPT-N+1 (with 1x the inference compute).

Is this guaranteed 100% safe? no.

Is it a heck-of-a-lot safer? yes.

For any world-destroying category of capability (bioweapon, nanobots, hacking, nuclear weapon), there will by definition be a first time when we encounter that threat. However, in a world with COT, we don't encounter a whole bunch of "first times" simultaneously when we train a new largest model.

Another serious problem with alignment is weak-to-strong generalization where we try to use a weaker model to align a stronger model. With COT, we can avoid this problem by making the weaker model stronger by giving it more inference time compute.

quetzal_rainbow on You can, in fact, bamboozle an unaligned AI into sparing your life

I think "there is a lot of possible misaligned ASI, you can't guess them all" is pretty much valid argument? If space of all Earth-originated misaligned superintelligences is described by 100 bits, therefore you need 2^100 ~ 10^33 simulations and pay 10^34 planets, which, given the fact that observable universe has ~10^80 protons in it and Earth has ~10^50 atoms, is beyond our ability to pay. If you pay the entire universe by doing 10^29 simulations, any misaligned ASI will consider probability of being in simulation to be 0.0001 and obviously take 1 planet over 0.001 expected.

viliam on shminux's Shortform

A sufficiently godlike AI could probably convince me to kill myself (or something equivalent, for example to upload myself to a simulation... and once all humans get there, the AI can simply turn it off). Or to convince me not to have kids (in a parallel life where I don't have them already), or simply keep me distracted every day with some new shiny toy so that I never decide that today is the right day to have unprotected sex with another human and get ready for the consequences.

But it would be much easier to simply convince someone else to kill me. And I think the AI will probably choose the simpler and faster way, because why not. It does not need a complicated way to get rid of me, if a simple way is available.

This is similar to reasoning about cults or scams. Yes, some of them could get me, by being sufficiently sophisticated, accidentally optimized for my weaknesses, or simply by meeting me on a bad day. But the survival of a cult or a scam scheme does not depend on getting me specifically; they can get enough other people, so it makes more sense for them to optimize for getting many people, rather than optimize for getting me specifically.

The more typical people will get the optimized mind-hacking message. The rest of us will then get a bullet.

pazzaz on Any Trump Supporters Want to Dialogue?

I can argue some:

Economy Well that obviously depends on what you mean with "price controls". None of the candidates give that much details on their economic policies, but Harris has mostly focused on anti-price gouging legislation. Now maybe you disagree with this legislation, but you have to compare it to Trumps economic policies: he wants to increase tariffs drastically, which would increase inflation. He also wants the Fed to be less independent, which could cause them to prioritize short term politics, which would be bad for the long term economy.
Immigration The president doesn't control immigration alone. Any changes to the immigration process in the US would need bipartisan support. Now luckily, there is bipartisan support for improving the immigration process. That's why there was a bipartisan bill drafted earlier this year to improve the immigration process in various ways. Passing the bill would be good for the US, but bad for Trump as it would make it harder to say that the democrats don't care about immigration. So he told the republicans to vote NO, and they killed the bill because of it. That shows that Trump cares more about winning the election, than improving the border.
Individual liberties The examples you give are a little vague. I don't know any restrictions that the Biden administration has done to freedom of speech or freedom of conscience. I do know that some people consider the right to abortion a Individual liberty, which is now banned in multiple states because of Trump. Trump has also said he wants to put people in jail for expressing their freedom of speech through burning flags. That's a pretty severe restriction.