LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Deceptive agents can collude to hide dangerous features in SAEs
Simon Lermen (dalasnoin) · 2024-07-15T17:07:33.283Z · comments (0)

[question] What prevents SB-1047 from triggering on deep fake porn/voice cloning fraud?
ChristianKl · 2024-09-26T09:17:39.088Z · answers+comments (17)

Investigating Sensitive Directions in GPT-2: An Improved Baseline and Comparative Analysis of SAEs
Daniel Lee (daniel-lee) · 2024-09-06T02:28:41.954Z · comments (0)

Cheap Whiteboards!
Johannes C. Mayer (johannes-c-mayer) · 2024-08-08T13:52:59.627Z · comments (2)

[link] If-Then Commitments for AI Risk Reduction [by Holden Karnofsky]
habryka (habryka4) · 2024-09-13T19:38:53.194Z · comments (0)

[Linkpost] Play with SAEs on Llama 3
Tom McGrath · 2024-09-25T22:35:44.824Z · comments (1)

[link] Video Intro to Guaranteed Safe AI
Mike Vaiana (mike-vaiana) · 2024-07-11T17:53:47.630Z · comments (0)

[link] Predicting Influenza Abundance in Wastewater Metagenomic Sequencing Data
jefftk (jkaufman) · 2024-09-23T17:25:58.380Z · comments (0)

The case for more Alignment Target Analysis (ATA)
Chi Nguyen · 2024-09-20T01:14:41.411Z · comments (13)

Optimizing Repeated Correlations
SatvikBeri · 2024-08-01T17:33:23.823Z · comments (1)

[link] Can a Bayesian Oracle Prevent Harm from an Agent? (Bengio et al. 2024)
mattmacdermott · 2024-09-01T07:46:26.647Z · comments (0)

Links and brief musings for June
Kaj_Sotala · 2024-07-06T10:10:03.344Z · comments (0)

Just because an LLM said it doesn't mean it's true: an illustrative example
dirk (abandon) · 2024-08-21T21:05:59.691Z · comments (12)

[link] Positive visions for AI
L Rudolf L (LRudL) · 2024-07-23T20:15:26.064Z · comments (4)

A New Class of Glitch Tokens - BPE Subtoken Artifacts (BSA)
Lao Mein (derpherpize) · 2024-09-20T13:13:26.181Z · comments (7)

The causal backbone conjecture
tailcalled · 2024-08-17T18:50:14.577Z · comments (0)

LessWrong email subscriptions?
Raemon · 2024-08-27T21:59:56.855Z · comments (6)

Housing Roundup #9: Restricting Supply
Zvi · 2024-07-17T12:50:05.321Z · comments (8)

[link] Announcing Open Philanthropy's AI governance and policy RFP
Julian Hazell (julian-hazell) · 2024-07-17T02:02:39.933Z · comments (0)

[link] Beware the science fiction bias in predictions of the future
Nikita Sokolsky (nikita-sokolsky) · 2024-08-19T05:32:47.372Z · comments (20)

Evaluating Sparse Autoencoders with Board Game Models
Adam Karvonen (karvonenadam) · 2024-08-02T19:50:21.525Z · comments (1)

[link] MIRI's July 2024 newsletter
Harlan · 2024-07-15T21:28:17.343Z · comments (2)

[link] A primer on the next generation of antibodies
Abhishaike Mahajan (abhishaike-mahajan) · 2024-09-01T22:37:59.207Z · comments (0)

Can Generalized Adversarial Testing Enable More Rigorous LLM Safety Evals?
scasper · 2024-07-30T14:57:06.807Z · comments (0)

[link] what becoming more secure did for me
Chipmonk · 2024-08-22T17:44:48.525Z · comments (5)

Proving the Geometric Utilitarian Theorem
StrivingForLegibility · 2024-08-07T01:39:10.920Z · comments (0)

[question] What's the Deal with Logical Uncertainty?
Ape in the coat · 2024-09-16T08:11:43.588Z · answers+comments (21)

A Visual Task that's Hard for GPT-4o, but Doable for Primary Schoolers
Lennart Finke (l-f) · 2024-07-26T17:51:28.202Z · comments (4)

The Wisdom of Living for 200 Years
Martin Sustrik (sustrik) · 2024-06-28T04:44:10.609Z · comments (3)

[link] Fictional parasites very different from our own
Abhishaike Mahajan (abhishaike-mahajan) · 2024-09-08T14:59:39.080Z · comments (0)

An experiment on hidden cognition
Olli Järviniemi (jarviniemi) · 2024-07-22T03:26:05.564Z · comments (2)

Using an LLM perplexity filter to detect weight exfiltration
Adam Karvonen (karvonenadam) · 2024-07-21T18:18:05.612Z · comments (11)

[link] Introduction to Super Powers (for kids!)
Shoshannah Tekofsky (DarkSym) · 2024-09-20T17:17:27.070Z · comments (0)

[question] What percent of the sun would a Dyson Sphere cover?
Raemon · 2024-07-03T17:27:50.826Z · answers+comments (26)

I didn't think I'd take the time to build this calibration training game, but with websim it took roughly 30 seconds, so here it is!
mako yass (MakoYass) · 2024-08-02T22:35:21.136Z · comments (2)

[link] Robert Caro And Mechanistic Models In Biography
adamShimi · 2024-07-14T10:56:42.763Z · comments (5)

[link] Altruism and Vitalism Aren't Fellow Travelers
Arjun Panickssery (arjun-panickssery) · 2024-08-09T02:01:11.361Z · comments (2)

Seeking Mechanism Designer for Research into Internalizing Catastrophic Externalities
c.trout (ctrout) · 2024-09-11T15:09:48.019Z · comments (2)

How Congressional Offices Process Constituent Communication
Tristan Williams (tristan-williams) · 2024-07-02T12:38:41.472Z · comments (0)

Fun With The Tabula Muris (Senis)
sarahconstantin · 2024-09-20T18:20:01.901Z · comments (0)

[link] Truth is Universal: Robust Detection of Lies in LLMs
Lennart Buerger · 2024-07-19T14:07:25.162Z · comments (3)

Whirlwind Tour of Chain of Thought Literature Relevant to Automating Alignment Research.
sevdeawesome · 2024-07-01T05:50:49.498Z · comments (0)

[link] Foundations - Why Britain has stagnated [crosspost]
Nathan Young · 2024-09-23T10:43:20.411Z · comments (1)

Trying to be rational for the wrong reasons
Viliam · 2024-08-20T16:18:06.385Z · comments (8)

[question] Why do Minimal Bayes Nets often correspond to Causal Models of Reality?
Dalcy (Darcy) · 2024-08-03T12:39:44.085Z · answers+comments (1)

Would you benefit from, or object to, a page with LW users' reacts?
Raemon · 2024-08-20T16:35:47.568Z · comments (6)

[question] When can I be numerate?
FinalFormal2 · 2024-09-12T04:05:27.710Z · answers+comments (2)

[LDSL#2] Latent variable models, network models, and linear diffusion of sparse lognormals
tailcalled · 2024-08-09T19:57:56.122Z · comments (0)

[question] Money Pump Arguments assume Memoryless Agents. Isn't this Unrealistic?
Dalcy (Darcy) · 2024-08-16T04:16:23.159Z · answers+comments (6)

AI #77: A Few Upgrades
Zvi · 2024-08-20T00:20:09.717Z · comments (3)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

sodium on Mira Murati leaves OpenAI/ OpenAI to remove non-profit control

Also from WSJ

abstractapplic on D&D.Sci: Whom Shall You Call? [Evaluation and Ruleset]

Thanks for a good one

I'm glad you feel that way about this scenario. I wish I did . . .

(For future reference, on the off-chance you haven't seen it: there's a compilation of all the past scenarios here [LW · GW], handily rated by quality and steamrollability.)

One thing that perhaps would make it easier was if the web interactive could tell whether or not your selection was the optimal one directly, and possibly how higher your expected price was than the optimal price (I first plugged mine in, then had to double check with your table out here)

. . . huh. I feel conflicted about this on aesthetic grounds - like, Reality doesn't come with big flashing signs saying "EV-maxxing solution reached!" when you reach an EV-maxxing solution - but it does sound both convenient to have and easy to set up. Might try adding this functionality to the interactive for the next one; would be curious to hear what anyone else who happens to be reading this comment thinks.

Anyway, greetings, and looking forward to seeing the next one.

Good to have you on board!

abramdemski on Wei Dai's Shortform

yyyep

interstice on A Path out of Insufficient Views

Some beliefs can be worse or better at predicting what we observe, this is not the same thing as popularity.

anon-user on Superintelligence Can't Solve the Problem of Deciding What You'll Do

Ability to predict how outcome depends on inputs + ability to compute the inverse of the prediction formula + ability to select certain inputs => ability to determine the output (within limits of what the influencing the inputs can accomplish). The rest is just an ontological difference on what language to use to describe this mechanism. I know that if I place a kettle on a gas stove and turn on the flame, I will get the boiling water, and we colloquially describe this as bowling the water. I do not know all the intricacies of the processes inside the water, and I am not directly controlling individual heat exchange subprocesses inside the kettle, but if would be silly to argue that I am not controlling the outcome of the water getting boiled.

anon-user on A Nonconstructive Existence Proof of Aligned Superintelligence

Perhaps you are missing the point of what I am saying here somewhat? The issue is is not the scale of the side-effect of a computation, it's the fact that the side-effect exists, so any accurate mathematical abstraction of an actual real-world ASI must be prepared to deal with solving a self-referential equation.

anon-user on Four Levels of Voting Methods

I think it's important to further refine the accuracy criterion - I think another very important criterion (particularly given today's state of US politics) is how conducive the voting system towards consensus-building vs polarization. In other words, not only pure accuracy matters, but the direction of the error as well. That is, an error towards a more extreme candidate is IMHO a lot more harmful than an equally sized error towards a more consensus candidate.

anthonyc on Is cybercrime really costing trillions per year?

In one sense, you're right, it is obviously correct. *Iff* you can actually do the calculation well, honestly, and convincingly, that is.

In practice, it's really hard to do that in a way that is consistent and principled. Most who try end up succumbing to various forms of motivated reasoning. And even when you do manage it, you have to make a lot of assumptions and extrapolations that get you really wide error bars, and a result that no one is going to believe unless they already want to believe your conclusion.

The other problem is you can't assume the analysis still holds if any of all those assumptions change. Two people, each with credible proposals to reduce the risk and cost of cybercrime in that sense, they can both make similar cost and benefit claims, but clearly effects are not additive; your estimate defines a max not a sum. This is always strictly the case, but if you use a narrower analysis than you can often treat them as approximately independent. If you want to make real-world decisions, you should include a sensitivity analysis as well.

sil-ver on [Intuitive self-models] 2. Conscious Awareness

Were you using this demo?

I’m skeptical of the hypothesis that the color phi phenomenon is just BS. It doesn’t seem like that kind of psych result. I think it’s more likely that this applet is terribly designed.

Yes -- and yeah, fair enough. Although-

I think I got some motion illusion?

-remember that the question isn't "did I get the vibe that something moves". We already know that a series of frames gives the vibe that something moves. The question is whether you remember having seen the red circle halfway across before seeing the blue circle.

zack_m_davis on The Sun is big, but superintelligences will not spare Earth a little sunlight

The claim is pretty clearly intended to be about relative material, not absolute number of pawns: in the end position of the second game, you have three pawns left and Stockfish has two; we usually don't describe this as Stockfish having giving up six pawns. (But I agree that it's easier to obtain resources from an adversary that values them differently, like if Stockfish is trying to win and you're trying to capture pawns.)