LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Jailbreaking language models with user roleplay
loops (smitop) · 2024-09-28T23:43:10.870Z · comments (0)

[link] New fast transformer inference ASIC — Sohu by Etched
lukehmiles (lcmgcd) · 2024-06-26T09:56:08.649Z · comments (9)

[question] Set Theory Multiverse vs Mathematical Truth - Philosophical Discussion
Wenitte Apiou (wenitte-apiou) · 2024-11-01T18:56:06.900Z · answers+comments (25)

[link] Cooperation and Alignment in Delegation Games: You Need Both!
Oliver Sourbut · 2024-08-03T10:16:51.716Z · comments (0)

Bad lessons learned from the debate
bayesyatina · 2024-06-26T11:54:44.953Z · comments (5)

[link] What is autonomy? Why boundaries are necessary.
Chipmonk · 2024-10-21T17:56:33.722Z · comments (1)

My covid-related beliefs and questions
Severin T. Seehrich (sts) · 2024-07-23T03:27:09.348Z · comments (0)

Utilitarianism and the replaceability of desires and attachments
MichaelStJules · 2024-07-27T01:57:42.419Z · comments (2)

Of Birds and Bees
RussellThor · 2024-09-30T10:52:15.069Z · comments (9)

Denver USA - ACX Meetups Everywhere Fall 2024
Eneasz · 2024-08-29T18:40:53.332Z · comments (0)

[link] Taking nonlogical concepts seriously
Kris Brown (kris-brown) · 2024-10-15T18:16:01.226Z · comments (5)

The Great Bootstrap
KristianRonn · 2024-10-11T19:46:51.752Z · comments (0)

Broadly human level, cognitively complete AGI
p.b. · 2024-08-06T09:26:13.220Z · comments (0)

Sequence overview: Welfare and moral weights
MichaelStJules · 2024-08-15T04:22:32.567Z · comments (0)

[link] Kinds of Motivation
Sable · 2024-07-13T15:52:44.432Z · comments (2)

[link] Spherical cow
dkl9 · 2024-11-11T03:10:27.788Z · comments (0)

[question] somebody explain the word "epistemic" to me
KvmanThinking (avery-liu) · 2024-10-28T16:40:24.275Z · answers+comments (8)

Quantitative Trading Bootcamp [Nov 6-10]
Ricki Heicklen (bayesshammai) · 2024-10-28T18:39:58.480Z · comments (0)

Enhancing Mathematical Modeling with LLMs: Goals, Challenges, and Evaluations
ozziegooen · 2024-10-28T21:44:42.352Z · comments (0)

Relativity Theory for What the Future 'You' Is and Isn't
FlorianH (florian-habermacher) · 2024-07-29T02:01:17.736Z · comments (48)

[link] October 2024 Progress in Guaranteed Safe AI
Quinn (quinn-dougherty) · 2024-10-28T23:34:51.689Z · comments (0)

[link] Validating / finding alignment-relevant concepts using neural data
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-09-20T21:12:49.267Z · comments (0)

What does a Gambler's Verity world look like?
ErioirE (erioire) · 2024-07-25T22:03:56.447Z · comments (6)

[question] If I ask an LLM to think step by step, how big are the steps?
ryan_b · 2024-09-13T20:30:50.558Z · answers+comments (1)

Making Beliefs Pay Rent
Screwtape · 2024-07-28T17:59:52.101Z · comments (2)

One person's worth of mental energy for AI doom aversion jobs. What should I do?
Lorec · 2024-08-26T01:29:01.700Z · comments (16)

[link] The AI regulator’s toolbox: A list of concrete AI governance practices
Adam Jones (domdomegg) · 2024-08-10T21:15:09.265Z · comments (1)

[link] Free Will, Determinism, And Choice
Zero Contradictions · 2024-07-06T06:34:41.495Z · comments (3)

Thoughts to niplav on lie-detection, truthfwl mechanisms, and wealth-inequality
Emrik (Emrik North) · 2024-07-11T18:55:46.687Z · comments (8)

AirBnB Baking
jefftk (jkaufman) · 2024-07-10T12:50:03.381Z · comments (1)

Funding for programs and events on global catastrophic risk, effective altruism, and other topics
abergal · 2024-08-14T23:59:48.146Z · comments (0)

A Brief Explanation of AI Control
Aaron_Scher · 2024-10-22T07:00:56.954Z · comments (1)

[link] Consciousness As Recursive Reflections
Gunnar_Zarncke · 2024-10-05T20:00:53.053Z · comments (3)

[question] What are some good ways to form opinions on controversial subjects in the current and upcoming era?
notfnofn · 2024-10-27T14:33:53.960Z · answers+comments (21)

[question] On the subject of in-house large language models versus implementing frontier models
Annapurna (jorge-velez) · 2024-09-23T15:00:32.811Z · answers+comments (1)

[link] Boons and banes
dkl9 · 2024-09-23T06:18:38.335Z · comments (0)

Moral Trade, Impact Distributions and Large Worlds
Larks · 2024-09-20T03:45:56.273Z · comments (0)

Piling bounded arguments
momom2 (amaury-lorin) · 2024-09-19T22:27:41.534Z · comments (0)

[link] [Linkpost] Hawkish nationalism vs international AI power and benefit sharing
jakub_krys (kryjak) · 2024-10-18T18:13:19.425Z · comments (5)

[question] What actual bad outcome has "ethics-based" RLHF AI Alignment already prevented?
Roko · 2024-10-19T06:11:12.602Z · answers+comments (16)

Deception and Jailbreak Sequence: 2. Iterative Refinement Stages of Jailbreaks in LLM
Winnie Yang (winnie-yang) · 2024-08-28T08:41:38.967Z · comments (2)

Not all biases are equal - a study of sycophancy and bias in fine-tuned LLMs
jakub_krys (kryjak) · 2024-11-11T23:11:15.233Z · comments (0)

[link] Checking public figures on whether they "answered the question" quick analysis from Harris/Trump debate, and a proposal
david reinstein (david-reinstein) · 2024-09-11T20:25:27.845Z · comments (4)

The Personal Implications of AGI Realism
xizneb · 2024-10-20T16:43:37.870Z · comments (7)

A brief theory of why we think things are good or bad
David Johnston (david-johnston) · 2024-10-20T20:31:26.309Z · comments (10)

[question] What makes one a "rationalist"?
mathyouf · 2024-10-08T20:25:21.812Z · answers+comments (5)

[link] Is Redistributive Taxation Justifiable? Part 1: Do the Rich Deserve their Wealth?
Alexander de Vries (alexander-de-vries) · 2024-09-05T10:23:08.958Z · comments (20)

Fake Blog Posts as a Problem Solving Device
silentbob · 2024-08-31T09:22:54.513Z · comments (0)

Join my new subscriber chat
sarahconstantin · 2024-11-06T02:30:11.059Z · comments (0)

[question] Does a time-reversible physical law/Cellular Automaton always imply the First Law of Thermodynamics?
Noosphere89 (sharmake-farah) · 2024-08-30T15:12:28.823Z · answers+comments (11)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

benito on Sabotage Evaluations for Frontier Models

Not cynical enough! They make billions of dollars and for most of the time they've done this there have been little-to-no people with serious political power or prestige in the world who hold the position that it's obviously doomed, so I think it's pretty easy to come up with a rationalization that lets you go ahead and build some of the most incredible and powerful things humanity has ever built.

satron on Sabotage Evaluations for Frontier Models

If the default path is AI's taking over control from humans, then what is the current plan in leading AI labs? Surely all the work they put in AI safety is done to prevent exactly such scenarios. I would find it quite hard to believe that a large group of people would vigorously do something if they believed that their efforts will go to vain.

mondsemmel on Lao Mein's Shortform

Just as one example, OpenAI was against SB 1047, whereas Musk was for it. I'm not optimistic about regulation being enough to save us, but presumably they would be helpful, and some AI companies like OpenAI were against even the limited regulations of SB 1047. Plus SB 1047 also included stuff like whistleblower protections, and that's the kind of thing that could help policymakers make better decisions in the future.

mondsemmel on Lao Mein's Shortform

I'm sympathetic to Musk being genuinely worried about AI safety. My problem is that one of his first actions after learning about AI safety was to found OpenAI, and that hasn't worked out very well. Not just due to Altman; even the "Open" part was a highly questionable goal. Hopefully Musk's future actions in this area would have positive EV, but still.

leon-lang on johnswentworth's Shortform

What’s your opinion on the possible progress of systems like AlphaProof, o1, or Claude with computer use?

johnswentworth on johnswentworth's Shortform

I don't expect that to be particularly relevant. The data wall is still there; scaling just compute has considerably worse returns than the curves we've been on for the past few years, and we're not expecting synthetic data to be anywhere near sufficient to bring us close to the old curves.

unexpectedvalues on Seven lessons I didn't learn from election day

I don't really know, sorry. My memory is that 2023 already pretty bad for incumbent parties (e.g. the right-wing ruling party in Poland lost power), but I'm not sure.

alexander-gietelink-oldenziel on Lao Mein's Shortform

How would removing Sam Altman significantly reduce extinction risk? Conditional on AI alignment being hard and Doom likely the exact identity of the Shoggoth Summoner seems immaterial.

benito on Sabotage Evaluations for Frontier Models

Yes, it does imply that the default path is permanent-disempowerment or extinction.

johnswentworth on The Median Researcher Problem

unless you additionally posit an additional mechanism like fields with terrible replication rates have a higher standard deviation than fields without them

Why would that be relevant?