LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

AI #74: GPT-4o Mini Me and Llama 3
Zvi · 2024-07-25T13:50:06.528Z · comments (6)

[link] There is no IQ for AI
Gabriel Alfour (gabriel-alfour-1) · 2023-11-27T18:21:26.196Z · comments (10)

Verifiable private execution of machine learning models with Risc0?
mako yass (MakoYass) · 2023-10-25T00:44:48.643Z · comments (2)

[link] One: a story
Richard_Ngo (ricraz) · 2023-10-10T00:18:31.604Z · comments (0)

Adversarial Robustness Could Help Prevent Catastrophic Misuse
aogara (Aidan O'Gara) · 2023-12-11T19:12:26.956Z · comments (18)

[link] AISN #28: Center for AI Safety 2023 Year in Review
aogara (Aidan O'Gara) · 2023-12-23T21:31:40.767Z · comments (1)

AI Constitutions are a tool to reduce societal scale risk
Sammy Martin (SDM) · 2024-07-25T11:18:17.826Z · comments (2)

RA Bounty: Looking for feedback on screenplay about AI Risk
Writer · 2023-10-26T13:23:02.806Z · comments (6)

Running the Numbers on a Heat Pump
jefftk (jkaufman) · 2024-02-09T03:00:04.920Z · comments (12)

Sparse MLP Distillation
slavachalnev · 2024-01-15T19:39:02.926Z · comments (3)

AIS terminology proposal: standardize terms for probability ranges
eggsyntax · 2024-08-30T15:43:39.857Z · comments (12)

AI #85: AI Wins the Nobel Prize
Zvi · 2024-10-10T13:40:07.286Z · comments (6)

The Math of Suspicious Coincidences
Roko · 2024-02-07T13:32:35.513Z · comments (3)

[link] 2024 State of the AI Regulatory Landscape
Deric Cheng (deric-cheng) · 2024-05-28T11:59:06.582Z · comments (0)

Information-Theoretic Boxing of Superintelligences
JustinShovelain · 2023-11-30T14:31:11.798Z · comments (0)

Glomarization FAQ
Zane · 2023-11-15T20:20:49.488Z · comments (5)

[link] Baking vs Patissing vs Cooking, the HPS explanation
adamShimi · 2024-07-17T20:29:09.645Z · comments (16)

AI #62: Too Soon to Tell
Zvi · 2024-05-02T15:40:04.364Z · comments (8)

A Case for Superhuman Governance, using AI
ozziegooen · 2024-06-07T00:10:10.902Z · comments (0)

Some additional SAE thoughts
Hoagy · 2024-01-13T19:31:40.089Z · comments (4)

Announcing SPAR Summer 2024!
laurenmarie12 · 2024-04-16T08:30:31.339Z · comments (2)

Interpreting Quantum Mechanics in Infra-Bayesian Physicalism
Yegreg · 2024-02-12T18:56:03.967Z · comments (6)

I played the AI box game as the Gatekeeper — and lost
datawitch · 2024-02-12T18:39:35.777Z · comments (52)

Fun With CellxGene
sarahconstantin · 2024-09-06T22:00:03.461Z · comments (2)

[link] When scientists consider whether their research will end the world
Harlan · 2023-12-19T03:47:06.645Z · comments (4)

[link] The origins of the steam engine: An essay with interactive animated diagrams
jasoncrawford · 2023-11-29T18:30:36.315Z · comments (1)

[link] [Paper] Hidden in Plain Text: Emergence and Mitigation of Steganographic Collusion in LLMs
Yohan Mathew (ymath) · 2024-09-25T14:52:48.263Z · comments (2)

[link] How "Pause AI" advocacy could be net harmful
Tamsin Leake (carado-1) · 2023-12-26T16:19:20.724Z · comments (10)

Differential Optimization Reframes and Generalizes Utility-Maximization
J Bostock (Jemist) · 2023-12-27T01:54:22.731Z · comments (2)

The Intentional Stance, LLMs Edition
Eleni Angelou (ea-1) · 2024-04-30T17:12:29.005Z · comments (3)

[link] Safety tax functions
owencb · 2024-10-20T14:08:38.099Z · comments (0)

Some comments on intelligence
Viliam · 2024-08-01T15:17:07.215Z · comments (5)

Two Tales of AI Takeover: My Doubts
Violet Hour · 2024-03-05T15:51:05.558Z · comments (8)

[link] Why Recursion Pharmaceuticals abandoned cell painting for brightfield imaging
Abhishaike Mahajan (abhishaike-mahajan) · 2024-11-05T14:51:41.310Z · comments (1)

Searching for phenomenal consciousness in LLMs: Perceptual reality monitoring and introspective confidence
EuanMcLean (euanmclean) · 2024-10-29T12:16:18.448Z · comments (7)

“Clean” vs. “messy” goal-directedness (Section 2.2.3 of “Scheming AIs”)
Joe Carlsmith (joekc) · 2023-11-29T16:32:30.068Z · comments (1)

[link] My Methodological Turn
adamShimi · 2024-09-29T15:01:45.986Z · comments (0)

Deception Chess: Game #2
Zane · 2023-11-29T02:43:22.375Z · comments (17)

Aggregative Principles of Social Justice
Cleo Nardo (strawberry calm) · 2024-06-05T13:44:47.499Z · comments (10)

Results from the Turing Seminar hackathon
Charbel-Raphaël (charbel-raphael-segerie) · 2023-12-07T14:50:38.377Z · comments (1)

[LDSL#4] Root cause analysis versus effect size estimation
tailcalled · 2024-08-11T16:12:14.604Z · comments (0)

Please Understand
samhealy · 2024-04-01T12:33:20.459Z · comments (11)

Quick Thoughts on Our First Sampling Run
jefftk (jkaufman) · 2024-05-23T00:20:02.050Z · comments (3)

[question] Weighing reputational and moral consequences of leaving Russia or staying
spza · 2024-02-18T19:36:40.676Z · answers+comments (24)

[link] Liquid vs Illiquid Careers
vaishnav92 · 2024-10-20T23:03:49.725Z · comments (6)

[link] AI forecasting bots incoming
Dan H (dan-hendrycks) · 2024-09-09T19:14:31.050Z · comments (44)

[link] What fuels your ambition?
Cissy · 2024-01-31T18:30:53.274Z · comments (1)

[question] [link] Is Bjorn Lomborg roughly right about climate change policy?
yhoiseth · 2023-09-27T20:06:30.722Z · answers+comments (14)

Wholesome Culture
owencb · 2024-03-01T12:08:17.877Z · comments (3)

[question] What Other Lines of Work are Safe from AI Automation?
RogerDearnaley (roger-d-1) · 2024-07-11T10:01:12.616Z · answers+comments (35)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

elityre on Using Dangerous AI, But Safely?

Ok. So I haven't thought through these proposals in much detail, and I don't claim any confident take, but my first response is "holy fuck, that's a lot of complexity. It really seems like there will be some flaw in our control scheme that we don't notice, if we're stacking a bunch of clever ideas like this one on top of each other."

This is not at all to be taken as a disparagement of the authors. I salute them for their contribution. We should definitely explore ideas like these, and test them, and use the best ideas we have at AGI time.

But my intuitive first order response is "fuck."

chipmonk on Ayn Rand’s model of “living money”; and an upside of burnout

Oh! The metaphor I've been using with my clients for the thing I think you're pointing at is reputation.

If the mind is a group (in this case a group of pattern predictors, but please also imagine it as a group of people), then ask yourself: How does a group of people (with no dictator) make a decision?

Well, they talk. They make bids.

Can one person use "willpower" and force the group to make a decision a particular way? Yes, if they make a strong enough bid and the rest of the group lets them. Why would the rest of the group let them? Reputation. But if they do that too many times with poor results, they lose their reputation and won't be able to dictate the group anymore. "Willpower" lost.

I suspect this happens in the mind among pattern predictors, too. (I believe @Kaj_Sotala [LW · GW] has written about this somewhere wrt Global Workspace Theory? I found this tweet in the meantime.) If a certain part of your mind lose reputation with the others parts, that part will lose reputation and won't be able to make competitive bids anymore. That part's "willpower" has decreased.

lukehmiles on lukehmiles's Shortform

Yes.

habryka4 on lukehmiles's Shortform

What do you mean by "applied research org"? Like, applied alignment research?

lukehmiles on lukehmiles's Shortform

I do think I could put a good team together and make decent contributions quickly

lukehmiles on lukehmiles's Shortform

I can only find capabilities jobs right now. I would be interested in starting a tiny applied research org or something. How hard is it to get funding for that? I don't have a strong relevant public record, but I did quite a lot of work at METR and elsewhere.

ryan_greenblatt on 5 ways to improve CoT faithfulness

On the other hand, your version of Daniel's protocol loses this property.

I'm talking about training only the Face, not training the policy (shoggoth) at all with the proposal I'm imagining.

And, these should clearly be separate models such that the training of one doesn't generalize to the other.

So, making the Face more deceptive doesn't kill the canary? The question is whether we end up with a highly misaligned policy (and this is clear from the chain of thought such that we can discover this). I think Daniel was expecting in the baseline proposal that you end up with a Face that does at least some censoring/mild deception due to training incentives (I think we already see a bit of this with o1).

Maybe your concern is that it is plausible that the policy won't be misaligned, but the Face will be and this will screw us over. This seems plausible and is worth worrying about (with both proposals). However, I thought that a core part of the hope is that the Face isn't smart enough to cleverly scheme (unless this schemeing happens in the CoT itself). And indeed, if the Face can cleverly scheme without CoT, what is stopping the policy from doing this too?

(It might also be possible to reduce the extent to which the initial Face has control over the world. E.g., you can initialize one Face to use for the first part of training (including the proposal I was mentioning) and then train another later on to imitate the earlier Face on a subpart of the distribution you inspect with the hope that it doesn't inherit learned misalignment. You could also deploy a different Face than the one you use for training if you aren't doing online training.)

yix on College technical AI safety hackathon retrospective - Georgia Tech

Thanks again Esben for collaborating with us! Can confidently say that the above is super valuable advice for any AI safety hackathon organizers, they're consistent with our experiences.

In the context of a college campus hackathon, I'd especially stress focus on preparing starter materials and making submission requirements clear early on!

nosignalnonoise on Heresies in the Shadow of the Sequences

I like what you're doing, but I feel like the heresies you propose are too tame.

Here are some more radical heresies to consider:

Most people are far more bottlenecked on some combination of akrasia and prospective memory, not on the accuracy of their models of the world. Rationalists in particular would be better off devoting effort to actually doing the obvious things than to understanding the world better.
Self deception is very instrumentally useful a large fraction of real world situations we find ourselves in, and we should use more of it.
1. Mormons seem to be especially good at coordinating on good lifestyle choices, so we should all consider becoming Mormon.
Among groups of 10+ people, it's usually more useful to get everyone all working on implementing the same plan than it is to come up with the best plan.
Intelligence (of the sort measured by exams and IQ tests) is only moderately important to success.

elityre on Lao Mein's Shortform

But he helped found OpenAI, and recently founded another AI company.

I think Elon's strategy of "telling the world not to build AGI, and then going to start another AGI company himself" is much less dumb / ethical fraught [LW(p) · GW(p)], than people often credit.