LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

A Case for Superhuman Governance, using AI
ozziegooen · 2024-06-07T00:10:10.902Z · comments (0)

Compositionality and Ambiguity: Latent Co-occurrence and Interpretable Subspaces
Matthew A. Clarke (Antigone) · 2024-12-20T15:16:51.857Z · comments (0)

AI #74: GPT-4o Mini Me and Llama 3
Zvi · 2024-07-25T13:50:06.528Z · comments (6)

Improving Model-Written Evals for AI Safety Benchmarking
Sunishchal Dev (sunishchal-dev) · 2024-10-15T18:25:08.179Z · comments (0)

Call for evaluators: Participate in the European AI Office workshop on general-purpose AI models and systemic risks
Tom DAVID (tom-david) · 2024-11-27T02:54:16.263Z · comments (0)

AI Constitutions are a tool to reduce societal scale risk
Sammy Martin (SDM) · 2024-07-25T11:18:17.826Z · comments (2)

[link] Safety tax functions
owencb · 2024-10-20T14:08:38.099Z · comments (0)

Why Aligning an LLM is Hard, and How to Make it Easier
RogerDearnaley (roger-d-1) · 2025-01-23T06:44:04.048Z · comments (3)

[link] Our new video about goal misgeneralization, plus an apology
Writer · 2025-01-14T14:07:21.648Z · comments (0)

Some comments on intelligence
Viliam · 2024-08-01T15:17:07.215Z · comments (5)

Elon Musk and Solar Futurism
transhumanist_atom_understander · 2024-12-21T02:55:28.554Z · comments (27)

Abstract Mathematical Concepts vs. Abstractions Over Real-World Systems
Thane Ruthenis · 2025-02-18T18:04:46.717Z · comments (10)

AI Safety Seed Funding Network - Join as a Donor or Investor
Alexandra Bos (AlexandraB) · 2024-12-16T19:30:43.812Z · comments (0)

[link] Baking vs Patissing vs Cooking, the HPS explanation
adamShimi · 2024-07-17T20:29:09.645Z · comments (16)

[link] 2024 State of the AI Regulatory Landscape
Deric Cheng (deric-cheng) · 2024-05-28T11:59:06.582Z · comments (0)

"Full Automation" is a Slippery Metric
ozziegooen · 2024-06-11T19:56:49.855Z · comments (1)

[link] A High Level Closed-Door Session Discussing DeepSeek: Vision Trumps Technology
Cosmia_Nebula · 2025-01-30T09:53:16.152Z · comments (1)

[link] Anthropic CEO calls for RSI
Andrea_Miotti (AndreaM) · 2025-01-29T16:54:24.943Z · comments (10)

Extending control evaluations to non-scheming threats
joshc (joshua-clymer) · 2025-01-12T01:42:54.614Z · comments (1)

AI #59: Model Updates
Zvi · 2024-04-11T14:20:06.339Z · comments (2)

[question] What are things you're allowed to do as a startup?
Elizabeth (pktechgirl) · 2024-06-20T00:01:59.257Z · answers+comments (9)

[question] What Other Lines of Work are Safe from AI Automation?
RogerDearnaley (roger-d-1) · 2024-07-11T10:01:12.616Z · answers+comments (35)

AI #85: AI Wins the Nobel Prize
Zvi · 2024-10-10T13:40:07.286Z · comments (6)

The Intentional Stance, LLMs Edition
Eleni Angelou (ea-1) · 2024-04-30T17:12:29.005Z · comments (3)

Two Tales of AI Takeover: My Doubts
Violet Hour · 2024-03-05T15:51:05.558Z · comments (8)

AIS terminology proposal: standardize terms for probability ranges
eggsyntax · 2024-08-30T15:43:39.857Z · comments (12)

Fun With CellxGene
sarahconstantin · 2024-09-06T22:00:03.461Z · comments (2)

Announcing SPAR Summer 2024!
laurenmarie12 · 2024-04-16T08:30:31.339Z · comments (2)

AI #62: Too Soon to Tell
Zvi · 2024-05-02T15:40:04.364Z · comments (8)

[link] Anthropic: Reflections on our Responsible Scaling Policy
Zac Hatfield-Dodds (zac-hatfield-dodds) · 2024-05-20T04:14:44.435Z · comments (21)

[link] Hunting for AI Hackers: LLM Agent Honeypot
Reworr R (reworr-reworr) · 2025-02-12T20:29:32.269Z · comments (0)

Per Tribalismum ad Astra
Martin Sustrik (sustrik) · 2025-01-19T06:50:07.763Z · comments (5)

Quantum without complication
Optimization Process · 2025-01-16T08:53:11.347Z · comments (2)

AI #103: Show Me the Money
Zvi · 2025-02-13T15:20:07.057Z · comments (9)

Disagreement on AGI Suggests It’s Near
tangerine · 2025-01-07T20:42:43.456Z · comments (15)

Nonpartisan AI safety
Yair Halberstadt (yair-halberstadt) · 2025-02-10T14:55:50.913Z · comments (4)

Notes on Occam via Solomonoff vs. hierarchical Bayes
JesseClifton · 2025-02-10T17:55:14.689Z · comments (7)

Agents don't have to be aligned to help us achieve an indefinite pause.
Hastings (hastings-greer) · 2025-01-25T18:51:03.523Z · comments (0)

Why you maybe should lift weights, and How to.
samusasuke · 2025-02-12T05:15:32.011Z · comments (29)

Corrigibility's Desirability is Timing-Sensitive
RobertM (T3t) · 2024-12-26T22:24:17.435Z · comments (4)

5. Open Corrigibility Questions
Max Harms (max-harms) · 2024-06-10T14:09:20.777Z · comments (0)

Essaying Other Plans
Screwtape · 2024-03-06T22:59:06.240Z · comments (4)

Reading More Each Day: A Simple $35 Tool
aysajan · 2024-07-24T13:54:04.290Z · comments (2)

Paper Summary: Princes and Merchants: European City Growth Before the Industrial Revolution
Jeffrey Heninger (jeffrey-heninger) · 2024-07-15T21:30:04.043Z · comments (1)

[question] Where to find reliable reviews of AI products?
Elizabeth (pktechgirl) · 2024-09-17T23:48:25.899Z · answers+comments (6)

[link] Agreeing With Stalin in Ways That Exhibit Generally Rationalist Principles
Zack_M_Davis · 2024-03-02T22:05:49.553Z · comments (25)

[link] My Methodological Turn
adamShimi · 2024-09-29T15:01:45.986Z · comments (0)

[link] New blog: Expedition to the Far Lands
Connor Leahy (NPCollapse) · 2024-08-17T11:07:48.537Z · comments (3)

[link] A new process for mapping discussions
Nathan Young · 2024-09-30T08:57:20.029Z · comments (8)

Wholesome Culture
owencb · 2024-03-01T12:08:17.877Z · comments (3)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

algon on You should use Consumer Reports

Yep, that sounds sensible. I sometimes use consumer reports in my usual method for buying something in product class X. My usual is:
1) Check what's recommended on forums/subreddits who care about the quality of X.
2) Compare the rating distribution of an instance of X to other members of X.
3) Check high quality reviews. This either requires finding someone you trust to do this, or looking at things like consumer reports.

a1987dm on Fuzzing LLMs sometimes makes them reveal their secrets

The LLM analog of in vino veritas

lorec on Lorec's Shortform

I expect any tests to show unambiguously that it's "not being replaced at all and citations[/mentions] chaotically swirling". If I understand Evans correctly, these were all random eminent figures he picked, not selected to be falling out of fashion - and they do seem to be a pretty broad sample of the "old prestigious standard names" space.

The stand mixer is a clever analogy; I didn't previously have experience with the separation thing.

I presume you've seen Is Clickbait Destroying Our General Intelligence?, and probably Hanson's cultural evolution / cultural drift frame. I wonder if you're familiar with Callard's Distant Signals paradigm [ transcript available on episode page ], which I think is the most illuminating of the three.

Besides just the cost of ~instantaneous ~omnicast communication dropping to ~zero, I see a role for the fall of the gold standard in all this. See e.g. U.S. per capita energy usage since ~1970, international fertility since ~1970. My theory [ which I really need to make a more legible graphic for ] is that when people don't "own their money" and have to track the effects of distant inflation-adjusts from the Fed, inflation volatility [ the destructive macroeconomic thing this OP on /r/badeconomics is saying couldn't possibly be happening due to the fall of the gold standard ] goes way up. Incentives in the market for ideas are ultimately material [yes, virtual status goods influence material wealth, but it also goes the other way around], so the market for materials influences the market for ideas, and vice versa, in a vicious spiral of decline. Is the theory.

remmelt-ellen on If AI is in a bubble and the bubble bursts, what would you do?

Update: back up to 50% chance.

Noting Microsoft’s cancelling of data center deals. And the fact the ‘AGI’ labs are still losing cash, and with DeepSeek are competing increasingly on a commodity product.

luke-stebbing on Why Can't We Hypothesize After the Fact?

Despite the simplicity prior, “this nontrivial program implements the behavior I want” almost always fails to be true because the program contains bugs. Even after you fix the bugs, it still almost always fails to be true because it contains even more bugs.

False scientific theories don’t seem to fail in quite the same ways, and I’m not sure how much of that is due to differences in the search processes versus the search spaces (e.g. physics seems quite different from Python).

nadroj on Fuzzing LLMs sometimes makes them reveal their secrets

Another work on fuzzing is Poser: Unmasking Alignment Faking LLMs by Manipulating Their Internals by @joshc [LW · GW], @kh4dien [LW · GW], and Severin Field. Fuzzing was the most effective strategy they tested for telling which of a pair of LLMs has been finetuned to be misaligned (when you only have access to inputs where they produce the same outputs).

account on Fuzzing LLMs sometimes makes them reveal their secrets

A thing that might be worth trying: quantize the deceptive models down, and see what that does to their truthfulness.

Hypothesis: acting deceptively is a more complex behavior for an LLM than being truthful. Thus, anything that cripples an LLM's ability to act in complex ways is going to make them more truthful. Quantization would have that effect too.

That method might, then, lose power on more capable LLMs, or in case of deeper deceptive behaviors. Also if you want to check for deception in extremely complex tasks - LLM's ability to perform the task might fall off a cliff long before deception does.

christian-z-r on You can just wear a suit

I did the same during my last year of high school and first year in university! I even got a second hand trouser press so I could make sure the creases were sharp.

However, I stopped it in the end. I think most observers thought I was either some sort of neoreactionary or just a plain weirdo. Almost nobody asked me why I was wearing it (this was Scandinavia, so pretty normal to not ask other people why they are doing weird things. We are a bunch of quiet, private people). At some point I realized I was using mental energy by the entire thing, being less able to relax in social situations.

Then, I changed to wearing a shirt + sweater + tweed jacket, with slightly longer hair and more stubble. This still gave me a certain style satisfaction, but in more muted way. And finally, by now, I always just dress to stand out as little as possible, since I have found that not thinking about my own appearance gives me more energy for interacting socially with others. Also, after having been in full dress to a couple of renaisance fairs, I started realizing that modern people are actually doing a lot of work to make sure they all wear period appropriate clothing, and it started to seem that me not doing so was like a reenactor putting a 14th century houbbelande on when everyone else was wearing 15th century clothing.

cubefox on Kei's Shortform

The 200K token context window is a significant bottleneck.

Gemini Pro has a 2 million token context window, so I assume it would do significantly better. (I wonder why no other model has come close to the Gemini context window size. I have to assume not all algorithmic breakthroughs are replicated a few months later by other models.)

mattmacdermott on Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path?

I agree that it can be possible to turn such a system into an agent. I think the original comment is defending a stronger claim that there's a sort of no free lunch theorem: either you don't act on the outputs of the oracle at all, or it's just as much of an agent as any other system.

I think the stronger claim is clearly not true. The worrying thing about powerful agents is that their outputs are selected to cause certain outcomes, even if you try to prevent those outcomes. So depending on the actions you're going to take in response to its outputs, its outputs have to be different. But the point of an oracle is to not have that property -- its outputs are decided by a criterion (something like truth) -- that is independent of the actions you're going to take in response^[1]. So if you respond differently to the outputs, they cause different outcomes. Assuming you've succeeded at building the oracle to specification, it's clearly not the case that the oracle has the worrying property of agents just because you act on its outputs.

I don't disagree that by either hooking the oracle up in a scaffolded feedback loop with the environment, or getting it to output plans, you could extract more agency from it. Of the two I think the scaffolding can in principle easily produce dangerous agency in the same way long-horizon RL can, but that the version where you get it to output a plan is much less worrrying (I can argue for that in a separate comment if you like).

I'm ignoring the self-fulfilling prophecy case here. ↩︎