LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

[link] Douglas Hofstadter changes his mind on Deep Learning & AI risk (June 2023)?
gwern · 2023-07-03T00:48:47.131Z · comments (54)

Alignment Grantmaking is Funding-Limited Right Now
johnswentworth · 2023-07-19T16:49:08.811Z · comments (67)

Accidentally Load Bearing
jefftk (jkaufman) · 2023-07-13T16:10:00.806Z · comments (14)

Yes, It's Subjective, But Why All The Crabs?
johnswentworth · 2023-07-28T19:35:36.741Z · comments (15)

Self-driving car bets
paulfchristiano · 2023-07-29T18:10:01.112Z · comments (41)

Ways I Expect AI Regulation To Increase Extinction Risk
1a3orn · 2023-07-04T17:32:48.047Z · comments (32)

[link] Cultivating a state of mind where new ideas are born
Henrik Karlsson (henrik-karlsson) · 2023-07-27T09:16:42.566Z · comments (18)

Consciousness as a conflationary alliance term for intrinsically valued internal experiences
Andrew_Critch · 2023-07-10T08:09:48.881Z · comments (46)

Grant applications and grand narratives
Elizabeth (pktechgirl) · 2023-07-02T00:16:25.129Z · comments (20)

[link] [Linkpost] Introducing Superalignment
beren · 2023-07-05T18:23:18.419Z · comments (68)

Towards Developmental Interpretability
Jesse Hoogland (jhoogland) · 2023-07-12T19:33:44.788Z · comments (8)

Cryonics and Regret
MvB (martin-von-berg) · 2023-07-24T09:16:01.456Z · comments (34)

My "2.9 trauma limit"
Raemon · 2023-07-01T19:32:14.805Z · comments (31)

Jailbreaking GPT-4's code interpreter
nikola (nikolaisalreadytaken) · 2023-07-13T18:43:54.484Z · comments (22)

OpenAI Launches Superalignment Taskforce
Zvi · 2023-07-11T13:00:06.232Z · comments (40)

Rationality !== Winning
Raemon · 2023-07-24T02:53:59.764Z · comments (49)

Brain Efficiency Cannell Prize Contest Award Ceremony
Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2023-07-24T11:30:10.602Z · comments (12)

When can we trust model evaluations?
evhub · 2023-07-28T19:42:21.799Z · comments (9)

[link] The Goddess of Everything Else - The Animation
Writer · 2023-07-13T16:26:25.552Z · comments (4)

The Seeker’s Game – Vignettes from the Bay
Yulia · 2023-07-09T19:32:58.717Z · comments (18)

[link] Neuronpedia
Johnny Lin (hijohnnylin) · 2023-07-26T16:29:28.884Z · comments (51)

[link] Introducing Fatebook: the fastest way to make and track predictions
Adam B (adam-b) · 2023-07-11T15:28:13.798Z · comments (34)

How LLMs are and are not myopic
janus · 2023-07-25T02:19:44.949Z · comments (14)

[link] Even Superhuman Go AIs Have Surprising Failure Modes
AdamGleave · 2023-07-20T17:31:35.814Z · comments (21)

Why was the AI Alignment community so unprepared for this moment?
Ras1513 · 2023-07-15T00:26:29.769Z · comments (65)

Going Crazy and Getting Better Again
Evenstar · 2023-07-02T18:55:25.790Z · comments (10)

Reducing sycophancy and improving honesty via activation steering
Nina Rimsky (NinaR) · 2023-07-28T02:46:23.122Z · comments (16)

“Reframing Superintelligence” + LLMs + 4 years
Eric Drexler · 2023-07-10T13:42:09.739Z · comments (8)

QAPR 5: grokking is maybe not *that* big a deal?
Quintin Pope (quintin-pope) · 2023-07-23T20:14:33.405Z · comments (15)

[link] Winners of AI Alignment Awards Research Contest
Akash (akash-wasil) · 2023-07-13T16:14:38.243Z · comments (3)

[link] Introducing bayescalc.io
Adele Lopez (adele-lopez-1) · 2023-07-07T16:11:12.854Z · comments (29)

Ten Levels of AI Alignment Difficulty
Sammy Martin (SDM) · 2023-07-03T20:20:21.403Z · comments (12)

Measuring and Improving the Faithfulness of Model-Generated Reasoning
Ansh Radhakrishnan (anshuman-radhakrishnan-1) · 2023-07-18T16:36:34.473Z · comments (13)

[link] Priorities for the UK Foundation Models Taskforce
Andrea_Miotti (AndreaM) · 2023-07-21T15:23:34.029Z · comments (4)

Consider Joining the UK Foundation Model Taskforce
Zvi · 2023-07-10T13:50:05.097Z · comments (12)

Anthropic Observations
Zvi · 2023-07-25T12:50:03.178Z · comments (1)

A transcript of the TED talk by Eliezer Yudkowsky
Mikhail Samin (mikhail-samin) · 2023-07-12T12:12:34.399Z · comments (13)

Views on when AGI comes and on strategy to reduce existential risk
TsviBT · 2023-07-08T09:00:19.735Z · comments (31)

When Someone Tells You They're Lying, Believe Them
ymeskhout · 2023-07-14T00:31:48.168Z · comments (3)

Fixed Point: a love story
Richard_Ngo (ricraz) · 2023-07-08T13:56:54.807Z · comments (2)

Why it's so hard to talk about Consciousness
Rafael Harth (sil-ver) · 2023-07-02T15:56:05.188Z · comments (152)

Apollo Neuro Results
Elizabeth (pktechgirl) · 2023-07-30T18:40:05.213Z · comments (16)

[question] What Does LessWrong/EA Think of Human Intelligence Augmentation as of mid-2023?
lukemarks (marc/er) · 2023-07-08T11:42:38.625Z · answers+comments (28)

BCIs and the ecosystem of modular minds
beren · 2023-07-21T15:58:27.081Z · comments (14)

Meta-level adversarial evaluation of oversight techniques might allow robust measurement of their adequacy
Buck · 2023-07-26T17:02:56.456Z · comments (18)

Compute Thresholds: proposed rules to mitigate risk of a “lab leak” accident during AI training runs
davidad · 2023-07-22T18:09:03.816Z · comments (2)

Underwater Torture Chambers: The Horror Of Fish Farming
omnizoid · 2023-07-26T00:27:15.490Z · comments (49)

[UPDATE: deadline extended to July 24!] New wind in rationality’s sails: Applications for Epistea Residency 2023 are now open
Jana Meixnerová (Epistea) · 2023-07-11T11:02:28.705Z · comments (7)

Sapient Algorithms
Valentine · 2023-07-17T16:30:01.350Z · comments (15)

[link] A $10k retroactive grant for VaccinateCA
Austin Chen (austin-chen) · 2023-07-27T18:14:44.305Z · comments (0)

next page (older posts) →

Archive

Recent comments

duschkopf on Semantic Disagreement of Sleeping Beauty Problem

If this were true that the concept of „indexical sample space“ does not capture the thirder position, how do you explain that it produces exactly the same probabilities that thirders entertain? Operating with indexicals is a necessary condition (and motivation) for Thirdism, which means assuming indexical sample spaces when it comes to the mathematical formalization of arguments in terms of probability theory. To my knowledge no relevant thirder literature denies that. And within the thirder model, these probabilities indeed hold true. If we assume Monday and Tuesday to be mutually exclusive, than this is mathematically the case. Math is not a judge of our assumptions here, it is merely the executive organ which produces thirder probabilities. The point at issue is whether the theoretical assumptions of the thirder model fit reality and probabilities could be transfer into the real world. Thirders say yes, speaking of regular probabilities, halfers say no speaking of irregular, „weighted“ probabilities.

lauro-langosco on RobertM's Shortform

Yeah fair point. I do think labs have some some nonzero amount of responsibility to be proactive about what others believe about their commitments. I agree it doesn't extend to 'rebut every random rumor'.

oliver-daniels-koch on Oliver Daniels-Koch's Shortform

I think I'm mostly right, but using a somewhat confused frame.

It makes more sense to think of MAD approaches as detecting all abnormal reasons (including deceptive alignment) by default, and then if we get that working we'll try to decrease false anomalies by doing something like comparing the least common ancestor of the measurements in a novel mechanism to the least common ancestor of the measurements on trusted mechanisms.

linda-linsefors on LessWrong Community Weekend 2024 [Applications Open]

Thanks :)

ramblindash on Dating Roundup #3: Third Time’s the Charm

[M]aybe being yourself and open works for people who happen to already be relationship-compatible. People who are not would be worse off by trying to be themselves. I think I have been burned in the past a lot by that kind of advice, although my experience is too much of an anecdote to infer an average.

I think you are maybe using a different definition of "worse off." I would submit that a relationship that is maintainable only by being inauthentic and unopen is, in the long run, significantly worse than no relationship, both because of the experience of being in it, but also because of opportunity cost.

That's different than holding some things back at the beginning, or keeping some impolite thoughts to yourself sometimes. But if your goal is a long-term partnership, you move further away from that goal by spending time and energy on someone you know you aren't compatible with.

molly on Forecasting: the way I think about it

Good q, yes, that's the vertical axis in all the figures.

viliam on Dating Roundup #3: Third Time’s the Charm

How can expectations exist without roles? When everyone is free to do whatever they want to, no one can expect anything specific...

Well, we can still have general, i.e. not gender-specific expectations, such as: people should be nice and emotionally mature. Nothing wrong with that. But it seems like the traditional gender roles also provided some gender-specific "hacks", and now we don't have them.

Or you could ask which traits are valued at the dating marketplace, or more specifically at the part you are interested in. But there is no general answer anymore; it depends on what you are looking for. For example, if you want to have a traditional relationship, it would make sense to behave according to the traditional roles, and expect the same from your potential partners. Other subcultures have different rules. And I suppose most people are confused, do random things, get random results, then hopefully learn and try something different.

oliver-daniels-koch on Oliver Daniels-Koch's Shortform

One confusion I have with MAD as an approach to ELK is that it seems to assume some kind of initial inner alignment. If we're flagging when the model takes actions / makes predictions for "unusual reasons", where unusual is define with respect to some trusted set, but aligned and misaligned models are behaviorally indistinguishable on the trusted set, then a model could learn to do things for misaligned reasons on the trusted set, and then use those same reasons on the untrusted set. For example, a deceptively aligned model would appear aligned in training but attempt take-over in deployment for the "same reason" (e.g. to maximize paperclips), but a MAD approach that "properly" handles out of distribution cases would not flag take over attempts because we want models to be able to respond to novel situations.

I guess this is part of what motivates measurement tampering as a subclass of ELK - instead of trying to track motivations of the agent as reasons, we try to track the reasons for the measurement predictions, and we have some trusted set with no tampering, where we know the reasons for the measurements is ~exactly that the thing we want to be measuring.

Now time to check my answer by rereading https://www.alignmentforum.org/posts/vwt3wKXWaCvqZyF74/mechanistic-anomaly-detection-and-elk [AF · GW]

keltan on some thoughts on LessOnline

That’s a great idea, Thank you!

And here it is: https://manifold.markets/keltan/will-there-be-a-lessonline-2025

christiankl on Designing for a single purpose

These days companies frequently buy back shares because they can do that without having to pay taxes for that.