LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

A civilization ran by amateurs
Olli Järviniemi (jarviniemi) · 2024-05-30T17:57:32.601Z · comments (7)

AI Safety Chatbot
markov (markovial) · 2023-12-21T14:06:48.981Z · comments (11)

What is SB 1047 *for*?
Raemon · 2024-09-05T17:39:39.871Z · comments (8)

Natural Latents Are Not Robust To Tiny Mixtures
johnswentworth · 2024-06-07T18:53:36.643Z · comments (8)

[Intuitive self-models] 4. Trance
Steven Byrnes (steve2152) · 2024-10-08T13:30:41.446Z · comments (6)

Balsa Update and General Thank You
Zvi · 2023-12-12T20:30:03.980Z · comments (8)

[question] What do we know about the AI knowledge and views, especially about existential risk, of the new OpenAI board members?
Zvi · 2024-03-11T14:55:05.128Z · answers+comments (2)

On OpenAI Dev Day
Zvi · 2023-11-09T16:10:06.646Z · comments (0)

[link] How do open AI models affect incentive to race?
jessicata (jessica.liu.taylor) · 2024-05-07T00:33:20.658Z · comments (13)

What is "True Love"?
johnswentworth · 2024-08-18T16:05:47.358Z · comments (9)

Originality vs. Correctness
alkjash · 2023-12-06T18:51:49.531Z · comments (17)

Pollsters Should Publish Question Translations
jefftk (jkaufman) · 2024-09-08T22:10:04.932Z · comments (3)

AI Alignment via Slow Substrates: Early Empirical Results With StarCraft II
Lester Leong (lester-leong) · 2024-10-14T04:05:05.096Z · comments (9)

Toward Safety Cases For AI Scheming
Mikita Balesni (mykyta-baliesnyi) · 2024-10-31T17:20:06.019Z · comments (1)

Interdictor Ship
lsusr · 2024-08-19T04:59:18.487Z · comments (9)

There Should Be More Alignment-Driven Startups
Vaniver · 2024-05-31T02:05:06.799Z · comments (14)

Self-explaining SAE features
Dmitrii Kharlapenko (dmitrii-kharlapenko) · 2024-08-05T22:20:36.041Z · comments (13)

MATS Alumni Impact Analysis
utilistrutil · 2024-09-30T02:35:57.273Z · comments (6)

Approaching Human-Level Forecasting with Language Models
Fred Zhang (fred-zhang) · 2024-02-29T22:36:34.012Z · comments (6)

Base LLMs refuse too
Connor Kissane (ckkissane) · 2024-09-29T16:04:21.343Z · comments (20)

0th Person and 1st Person Logic
Adele Lopez (adele-lopez-1) · 2024-03-10T00:56:14.446Z · comments (28)

"Epistemic range of motion" and LessWrong moderation
habryka (habryka4) · 2023-11-27T21:58:40.834Z · comments (3)

[link] Linkpost: Memorandum on Advancing the United States’ Leadership in Artificial Intelligence
Nisan · 2024-10-25T04:37:00.828Z · comments (2)

[link] Is Claude a mystic?
jessicata (jessica.liu.taylor) · 2024-06-07T04:27:09.118Z · comments (23)

5 Physics Problems
DaemonicSigil · 2024-03-18T08:05:45.971Z · comments (0)

An Actually Intuitive Explanation of the Oberth Effect
Isaac King (KingSupernova) · 2024-01-10T20:23:17.216Z · comments (33)

[link] Results from an Adversarial Collaboration on AI Risk (FRI)
Josh Rosenberg (josh-rosenberg) · 2024-03-11T20:00:24.642Z · comments (3)

Raemon's Deliberate (“Purposeful?”) Practice Club
Raemon · 2023-11-14T18:24:19.335Z · comments (11)

Showing SAE Latents Are Not Atomic Using Meta-SAEs
Bart Bussmann (Stuckwork) · 2024-08-24T00:56:46.048Z · comments (9)

Some negative steganography results
Fabien Roger (Fabien) · 2023-12-09T20:22:52.323Z · comments (5)

LessOnline Festival Updates Thread
Ben Pace (Benito) · 2024-04-18T21:55:08.003Z · comments (26)

Feature Targeted LLC Estimation Distinguishes SAE Features from Random Directions
Lidor Banuel Dabbah · 2024-07-19T20:32:15.095Z · comments (6)

Does AI risk “other” the AIs?
Joe Carlsmith (joekc) · 2024-01-09T17:51:47.020Z · comments (3)

Understanding SAE Features with the Logit Lens
Joseph Bloom (Jbloom) · 2024-03-11T00:16:57.429Z · comments (0)

Rationalists are missing a core piece for agent-like structure (energy vs information overload)
tailcalled · 2024-08-17T09:57:19.370Z · comments (9)

[link] More people getting into AI safety should do a PhD
AdamGleave · 2024-03-14T22:14:48.855Z · comments (24)

New paper shows truthfulness & instruction-following don't generalize by default
joshc (joshua-clymer) · 2023-11-19T19:27:30.735Z · comments (0)

[link] Towards shutdownable agents via stochastic choice
EJT (ElliottThornley) · 2024-07-08T10:14:24.452Z · comments (7)

The Sense Of Physical Necessity: A Naturalism Demo (Introduction)
LoganStrohl (BrienneYudkowsky) · 2024-02-24T02:56:31.458Z · comments (1)

What's next for the field of Agent Foundations?
Nora_Ammann · 2023-11-30T17:55:13.982Z · comments (23)

[link] shoes with springs
bhauth · 2023-12-30T21:46:55.319Z · comments (6)

[link] Pacing Outside the Box: RNNs Learn to Plan in Sokoban
Adrià Garriga-alonso (rhaps0dy) · 2024-07-25T22:00:55.398Z · comments (8)

Measuring Coherence of Policies in Toy Environments
dx26 (dylan-xu) · 2024-03-18T17:59:08.118Z · comments (9)

D&D.Sci: The Mad Tyrant's Pet Turtles
abstractapplic · 2024-03-29T16:22:13.732Z · comments (18)

AI #81: Alpha Proteo
Zvi · 2024-09-12T13:00:07.958Z · comments (3)

[link] Electrostatic Airships?
DaemonicSigil · 2024-10-27T04:32:34.852Z · comments (13)

AI #48: Exponentials in Geometry
Zvi · 2024-01-18T14:20:07.869Z · comments (9)

[link] Are There Examples of Overhang for Other Technologies?
Jeffrey Heninger (jeffrey-heninger) · 2023-12-13T21:48:08.954Z · comments (50)

[link] An Opinionated Evals Reading List
Marius Hobbhahn (marius-hobbhahn) · 2024-10-15T14:38:58.778Z · comments (0)

Thoughts on SB-1047
ryan_greenblatt · 2024-05-29T23:26:14.392Z · comments (1)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

tailcalled on Scissors Statements for President?

Kind of. First, the big exception: If you manage to enforce global authoritarianism, you can stockpile surplus indefinitely, basically tiling the world with charged-up batteries. But what's the point of that?

Secondly, "waste/signaling cascade" is kind of in the eye of the beholder. If a forest is standing in some region, is it wasting sunlight that could've been used on farming? Even in a very literal sense, you could say the answer is yes since the trees are competing in a zero-sum game for height. But without that competition, you wouldn't have "trees" at all, so calling it a waste is a value judgement that trees are worthless. (Which of course you are entitled to make, but this is clearly a disagreement with the people who like solarpunk.)

But yeah, ultimately I'm kind of thinking of life as entropy maximization [LW · GW]. The surplus has to be used for something, the question is what. If you've got nothing to use it for, then it makes sense for you to withdraw, but then it's not clear why to worry that other people are fighting over it.

annasalamon on Scissors Statements for President?

Or: by seeing themselves, and a voter for the other side, as co-victims of an optical illusion, designed to trick each of them into being unable to find another's areas of true seeing. And by working together to figure out how the illusion works, while seeing it as a common enemy.

But my specific hypothesis here is that the illusion works by misconstruing the other voter's "Robert can see a problem with candidate Y" as "Robert can't see the problem with candidate X", and that if you focus on trying to decode first the illusion won't kick in as much.

alexander-gietelink-oldenziel on Alexander Gietelink Oldenziel's Shortform

Okay fair enough "rich idiot" was meant more tongue-in-cheek - that's not what I intended.

james-stephen-brown on GPTs are Predictors, not Imitators

This was a fascinating, original idea as usual. I loved the notion of a brilliant, condescending sort of robot capable of doing a task perfectly who chooses (in order to demonstrate its own artistry) to predict and act out how we would get it wrong.

It did make me wonder though, whether when we reframe something like this for GPTs it's also important to apply the reframing to our own human intelligence to determine if the claim is distinct; in this case asking the question "are we imitators, simulators or predictors?". It might be possible to make the case that we are also predictors in as much as our consciousness projects an expectation of the results of our behaviour on to the world, an idea well explained by cognitive scientist Andy Clark.

I agree though, it would be remarkable if GPTs did end up thinking the way we do. And ironically, if they don't think the way we do, and instead begin to do away with the inefficiencies of predicting and playing out human errors, that would put us in the position of doing the hard work of predicting what how they will act.

alexander-gietelink-oldenziel on Alexander Gietelink Oldenziel's Shortform

Yes, this is possible. It smells a bit of 4d-chess. As far as I can tell he already had finalized his position by the time the WSJ interview came out.

I've dug a little deeper and it seems he did do a bunch of research on polling data. I was a bit too rash to say he had no inside information whatsoever. Plausibly he had some. The degree of the inside information he would need is very high. It seems he did a similar Kelly bet calculation since he report his all-things-considered probability to be 80-90%:

"With so much money on the line, Théo said he is feeling nervous, though he believes Trump has an 80%-90% chance to win the election.

"A surprise can always occur," Théo told The Journal."

I have difficulty believing one can get this kind of certainty for all-things-considered-probability for something as noisy and tight as US presidential election. [but he won both the electoral college and popular vote bet]

christopher-king on How to Give in to Threats (without incentivizing them)

Another problem is, do you know how to formulate/formalize a version of LDT so that we can mathematically derive the game outcomes that you suggest here?

There is a no free lunch theorem for this. LDT (and everything else) can be irrational [LW · GW]

annasalamon on Scissors Statements for President?

By parsing the other voter as "against X" rather than "for Y", and then inquiring into how they see X as worth being against, and why, while trying really hard to play taboo and avoid ontological buckets.

alexander-gietelink-oldenziel on Alexander Gietelink Oldenziel's Shortform

The true probability would be more like >90% considering other factors like opportunity costs, transactions cost, counterparty risk, unforeseen black swans of various kinds etc.

Bear in mind this is all things considered probability not just in-model probability, i.e. this would have to integrate that most other observers (especially those with strong calibrated prediction ) very strongly disagree*. Certainly, in some cases this is possible but one would need quite overwhelming evidence that you had a huge edge.

I agree one can reject Kelly betting - that's pretty crazy risky but plausibly the case for people like Elon or Theo. The question is whether the rest of us (with presumably more reasonably cautious attitudes) should take his win as much epistemic evidence. I think not. From our perspective his manic riskloving wouldn't be an much evidence for rational expectations.

*didn't the Kelly formula already integrate the fact that other people think differently. No, this is an additional piece of information one has to integrate. The Kelly betting gives you an implicit risk-averseness even conditioning on your beliefs being true (on average).

EDIT: Indeed it seems Theo the French Whale might have done a Kelly bet estimate too, he reports his true probability at 80-90%. Perhaps he did have private information.

"For example, a hypothetical sale of Théo's 47 million shares for Trump to win the election would execute at an estimated average price of just $0.02, according to Polymarket, which would represent a 96% loss for the trader. Théo paid an average price of about $0.56 cents for the 47 million shares.

Meanwhile, a hypothetical sale of Théo's nearly 20 million shares for Trump to win the popular vote would execute at an average price of less than a 10th of a penny, according to Polymarket, representing a near-total loss.

With so much money on the line, Théo said he is feeling nervous, though he believes Trump has an 80%-90% chance to win the election.

"A surprise can always occur," Théo told The Journal."

annasalamon on Scissors Statements for President?

Huh. Is your model is that surpluses are all inevitably dissipated in some sort of waste/signaling cascade? This seems wrong to me but also like it's onto something.

annasalamon on Scissors Statements for President?

I like your conjecture about Susan's concern about giving Robert steam.

I am hoping that if we decode the meme structure better, Susan could give herself and Robert steam re: "maybe I, Susan, am blind to some thing, B, that matters" without giving steam to "maybe A doesn't matter, maybe Robert doesn't have a blind spot there." Like, maybe we can make a more specific "try having empathy right at this part" request that doesn't confuse things the same way. Or maybe we can make a world where people who don't bother to try that look like schmucks who aren't memetically savvy, or something. I think there might be room for something like this?