LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[question] What are things you're allowed to do as a startup?
Elizabeth (pktechgirl) · 2024-06-20T00:01:59.257Z · answers+comments (9)

The Math of Suspicious Coincidences
Roko · 2024-02-07T13:32:35.513Z · comments (3)

Interpreting Quantum Mechanics in Infra-Bayesian Physicalism
Yegreg · 2024-02-12T18:56:03.967Z · comments (6)

[link] Baking vs Patissing vs Cooking, the HPS explanation
adamShimi · 2024-07-17T20:29:09.645Z · comments (16)

AI #62: Too Soon to Tell
Zvi · 2024-05-02T15:40:04.364Z · comments (8)

[link] 2024 State of the AI Regulatory Landscape
Deric Cheng (deric-cheng) · 2024-05-28T11:59:06.582Z · comments (0)

Information-Theoretic Boxing of Superintelligences
JustinShovelain · 2023-11-30T14:31:11.798Z · comments (0)

[link] The origins of the steam engine: An essay with interactive animated diagrams
jasoncrawford · 2023-11-29T18:30:36.315Z · comments (1)

[link] Agreeing With Stalin in Ways That Exhibit Generally Rationalist Principles
Zack_M_Davis · 2024-03-02T22:05:49.553Z · comments (19)

[link] When scientists consider whether their research will end the world
Harlan · 2023-12-19T03:47:06.645Z · comments (4)

Putting multimodal LLMs to the Tetris test
Lovre · 2024-02-01T16:02:12.367Z · comments (5)

Protestants Trading Acausally
Martin Sustrik (sustrik) · 2024-04-01T14:46:26.374Z · comments (4)

A Case for Superhuman Governance, using AI
ozziegooen · 2024-06-07T00:10:10.902Z · comments (0)

Some comments on intelligence
Viliam · 2024-08-01T15:17:07.215Z · comments (5)

[link] AISN #28: Center for AI Safety 2023 Year in Review
aogara (Aidan O'Gara) · 2023-12-23T21:31:40.767Z · comments (1)

Inference-Only Debate Experiments Using Math Problems
Arjun Panickssery (arjun-panickssery) · 2024-08-06T17:44:27.293Z · comments (0)

A more systematic case for inner misalignment
Richard_Ngo (ricraz) · 2024-07-20T05:03:03.500Z · comments (4)

"Full Automation" is a Slippery Metric
ozziegooen · 2024-06-11T19:56:49.855Z · comments (1)

AI Constitutions are a tool to reduce societal scale risk
Sammy Martin (SDM) · 2024-07-25T11:18:17.826Z · comments (2)

AI #74: GPT-4o Mini Me and Llama 3
Zvi · 2024-07-25T13:50:06.528Z · comments (6)

Fun With CellxGene
sarahconstantin · 2024-09-06T22:00:03.461Z · comments (2)

Understanding Subjective Probabilities
Isaac King (KingSupernova) · 2023-12-10T06:03:27.958Z · comments (16)

Adversarial Robustness Could Help Prevent Catastrophic Misuse
aogara (Aidan O'Gara) · 2023-12-11T19:12:26.956Z · comments (18)

RA Bounty: Looking for feedback on screenplay about AI Risk
Writer · 2023-10-26T13:23:02.806Z · comments (6)

[question] What's your standard for good work performance?
Chi Nguyen · 2023-09-27T16:58:16.114Z · answers+comments (3)

Interpreting the Learning of Deceit
RogerDearnaley (roger-d-1) · 2023-12-18T08:12:39.682Z · comments (10)

Sparse MLP Distillation
slavachalnev · 2024-01-15T19:39:02.926Z · comments (3)

AI Safety 101 : Reward Misspecification
markov (markovial) · 2023-10-18T20:39:34.538Z · comments (4)

Some additional SAE thoughts
Hoagy · 2024-01-13T19:31:40.089Z · comments (4)

[link] There is no IQ for AI
Gabriel Alfour (gabriel-alfour-1) · 2023-11-27T18:21:26.196Z · comments (10)

[link] Evaluating Stability of Unreflective Alignment
james.lucassen · 2024-02-01T22:15:40.902Z · comments (3)

Taking features out of superposition with sparse autoencoders more quickly with informed initialization
Pierre Peigné (pierre-peigne) · 2023-09-23T16:21:42.799Z · comments (8)

Announcing SPAR Summer 2024!
laurenmarie12 · 2024-04-16T08:30:31.339Z · comments (2)

Running the Numbers on a Heat Pump
jefftk (jkaufman) · 2024-02-09T03:00:04.920Z · comments (12)

Verifiable private execution of machine learning models with Risc0?
mako yass (MakoYass) · 2023-10-25T00:44:48.643Z · comments (1)

AI Alignment Breakthroughs this week (10/08/23)
Logan Zoellner (logan-zoellner) · 2023-10-08T23:30:54.924Z · comments (14)

[link] Managing AI Risks in an Era of Rapid Progress
Algon · 2023-10-28T15:48:25.029Z · comments (3)

The Intentional Stance, LLMs Edition
Eleni Angelou (ea-1) · 2024-04-30T17:12:29.005Z · comments (3)

Differential Optimization Reframes and Generalizes Utility-Maximization
J Bostock (Jemist) · 2023-12-27T01:54:22.731Z · comments (2)

AI #59: Model Updates
Zvi · 2024-04-11T14:20:06.339Z · comments (2)

[question] Current AI safety techniques?
Zach Stein-Perlman · 2023-10-03T19:30:54.481Z · answers+comments (2)

The Third Gemini
Zvi · 2024-02-20T19:50:05.195Z · comments (2)

Dishonorable Gossip and Going Crazy
Ben Pace (Benito) · 2023-10-14T04:00:35.591Z · comments (31)

Let's talk about Impostor syndrome in AI safety
Igor Ivanov (igor-ivanov) · 2023-09-22T13:51:18.482Z · comments (4)

Wholesome Culture
owencb · 2024-03-01T12:08:17.877Z · comments (3)

Big-endian is better than little-endian
Menotim · 2024-04-29T02:30:48.053Z · comments (17)

[link] The Poker Theory of Poker Night
omark · 2024-04-07T09:47:01.658Z · comments (13)

Investigating Bias Representations in LLMs via Activation Steering
DawnLu · 2024-01-15T19:39:14.077Z · comments (4)

[link] One: a story
Richard_Ngo (ricraz) · 2023-10-10T00:18:31.604Z · comments (0)

[question] [link] Is Bjorn Lomborg roughly right about climate change policy?
yhoiseth · 2023-09-27T20:06:30.722Z · answers+comments (14)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

ektimo on How Often Does Taking Away Options Help?

Code here,

The link to code isn't working for me

benjy_forstadt on Another argument against utility-centric alignment paradigms

There is a difference between the claim that powerful agents are approximately well-described as being expected utility maximizers (which many or may not be true) and the claim that AGI systems will have an explicit utility function the moment they’re turned on, and maximize that function from that moment on.

I think this is the assumption OP is pointing out: “most of the book's discussion of AI risk frames the AI as having a certain set of goals from the moment it's turned on, and ruthlessly pursuing those to the best of its ability”. “From the moment it’s turned on” is pretty important, because it rules out value learning as a solution

habryka4 on Bogdan Ionut Cirstea's Shortform

In contrast, Sakana's AI scientist cost on average 15$/paper and .50$/review.

Seriously, the Sakana AI stuff is basically total bogus, as I've pointed out on like 4 other threads (and also as Scott Alexander recently pointed out). Please stop citing it as a thing that produces anything close to fully formed scientific papers. It's output is really not better than just prompting o1 yourself. Of course, o1 and even Sonnet and GPT-4 are very impressive, but there is no update to be made after you've played around with that.

I agree that ML capabilities are under-elicited, but the Sakana AI stuff really is very little evidence on that, besides someone being good at marketing and setting up some scaffolding that produces fake prestige signals.

ektimo on ektimo's Shortform

How about a voting system where everyone is given 1000 Influence Tokens to spend across all the items on the ballot? This lets voters exert more influence on the things they care more about. Has anyone tried something like this?

(There could be tweaks like if people are avoiding spending on winners it could redistribute margin of victory, or if avoiding spending on losers it could redistribute tokens when losing, etc. but I'm not sure how much that would happen. The more interesting thing may be how does it influence everyone's sense of what they are doing?)

faul_sname on Another argument against utility-centric alignment paradigms

And this is where the fundamental AGI-doom arguments – all these coherence theorems, utility-maximization frameworks, et cetera – come in. At their core, they're claims that any "artificial generally intelligent system capable of autonomously optimizing the world the way humans can" would necessarily be well-approximated as a game-theoretic agent. Which, in turn, means that any system that has the set of capabilities the AI researchers ultimately want their AI models to have, would inevitably have a set of potentially omnicidal failure modes.

If you drop the "artificially" from the claim, you are left with a claim that any "generally intelligent system capable of autonomously optimizing the world the way humans can" would necessarily be well-approximated as a game-theoretic agent. Do you endorse that claim, or do think that there is some particular reason a biological or hybrid generally intelligent system capable of autonomously optimizing the world the way a human or an organization based on humans might not be well-approximated as a game-theoretic agent?

Because humans sure don't seem like paperclipper-style utility maximizers to me.

redman on Making Eggs Without Ovaries

Tech available in 2-5 years for 150k (or 50k in india?) sounds good to me. I know someone who would 100% do that today if the offer were available. I'm going to follow your blog for news, keep up the work, plenty of people would really like to see you succeed.

yanni-kyriacos on yanni's Shortform

AI Safety (in the broadest possible sense, i.e. including ethics & bias) is going be taken very seriously soon by Government decision makers in many countries. But without high quality talent staying in their home countries (i.e. not moving to UK or US), there is a reasonable chance that x/c-risk won’t be considered problems worth trying to solve. X/c-risk sympathisers need seats at the table. IMO local AIS movement builders should be thinking hard about how to keep talent local (this is an extremely hard problem to solve).

habryka4 on Wei Dai's Shortform

Isn't the basic idea of Constitutional AI just having the AI provide its own training feedback using written instruction? My guess is there was a substantial amount of self-evaluation in the o1 training with complicated written instructions, probably kind of similar to a constituion (though this is just a guess).

gb on Economics Roundup #3

But the OP explicitly said (as quoted in the parent) that the proposal allows for refunds if the basis is not (fully) realized, which would cover the situation you’re describing.

antanaclasis on Economics Roundup #3

Scenario: you have equity worth (say) $100 million in expectation, but of no realized value at the moment.

You are forced to pay unrealized gains tax on that amount, and so are now $25 million in the hole. Even if you avoid this crashing you immediately (such as by getting a loan), if your equity goes to $0 you’re still out for the $25 million you paid, with no assets to back it.

The fact that this could be counted as a prepayment for a hypothetical later unrealized gain doesn’t help you, you can’t actually get your money back.