LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Good job opportunities for helping with the most important century
HoldenKarnofsky · 2024-01-18T17:30:03.332Z · comments (0)

[link] Big tech transitions are slow (with implications for AI)
jasoncrawford · 2024-10-24T14:25:06.873Z · comments (16)

Eye contact is effortless when you’re no longer emotionally blocked on it
Chipmonk · 2024-09-27T21:47:01.970Z · comments (24)

AI #89: Trump Card
Zvi · 2024-11-07T16:30:05.684Z · comments (12)

[link] Claude 3 Opus can operate as a Turing machine
Gunnar_Zarncke · 2024-04-17T08:41:57.209Z · comments (2)

Introduce a Speed Maximum
jefftk (jkaufman) · 2024-01-11T02:50:04.284Z · comments (28)

[link] Scaling laws for dominant assurance contracts
jessicata (jessica.liu.taylor) · 2023-11-28T23:11:07.631Z · comments (5)

Deeply Cover Car Crashes?
jefftk (jkaufman) · 2023-12-10T22:20:01.133Z · comments (31)

[link] "Model UN Solutions"
Arjun Panickssery (arjun-panickssery) · 2023-12-08T23:06:33.490Z · comments (5)

AI #47: Meet the New Year
Zvi · 2024-01-13T16:20:10.519Z · comments (7)

Please Bet On My Quantified Self Decision Markets
niplav · 2023-12-01T20:07:38.284Z · comments (6)

A Socratic dialogue with my student
lsusr · 2023-12-05T09:31:05.266Z · comments (14)

Live Machinery: An Interface Design Philosophy for Wholesome AI Futures (Workshop @ EA Hotel!)
Sahil · 2024-11-01T17:24:09.957Z · comments (2)

[link] My Model of Epistemology
adamShimi · 2024-08-31T17:01:45.472Z · comments (0)

Open Problems in AIXI Agent Foundations
Cole Wyeth (Amyr) · 2024-09-12T15:38:59.007Z · comments (2)

Win/continue/lose scenarios and execute/replace/audit protocols
Buck · 2024-11-15T15:47:24.868Z · comments (1)

What Helped Me - Kale, Blood, CPAP, X-tiamine, Methylphenidate
Johannes C. Mayer (johannes-c-mayer) · 2024-01-03T13:22:11.700Z · comments (12)

Video and transcript of presentation on Otherness and control in the age of AGI
Joe Carlsmith (joekc) · 2024-10-08T22:30:38.054Z · comments (1)

[link] On Fables and Nuanced Charts
Niko_McCarty (niko-2) · 2024-09-08T17:09:07.503Z · comments (2)

List of strategies for mitigating deceptive alignment
joshc (joshua-clymer) · 2023-12-02T05:56:50.867Z · comments (2)

Secondary Risk Markets
Vaniver · 2023-12-11T21:52:46.836Z · comments (4)

Proposal for improving the global online discourse through personalised comment ordering on all websites
Roman Leventov · 2023-12-06T18:51:37.645Z · comments (21)

Open Thread – Winter 2023/2024
habryka (habryka4) · 2023-12-04T22:59:49.957Z · comments (160)

Book Review: On the Edge: The Gamblers
Zvi · 2024-09-24T11:50:06.065Z · comments (1)

[Valence series] 4. Valence & Social Status (deprecated)
Steven Byrnes (steve2152) · 2023-12-15T14:24:41.040Z · comments (19)

Predictive model agents are sort of corrigible
Raymond D · 2024-01-05T14:05:03.037Z · comments (6)

Open consultancy: Letting untrusted AIs choose what answer to argue for
Fabien Roger (Fabien) · 2024-03-12T20:38:03.785Z · comments (5)

Sparse Autoencoders: Future Work
Logan Riggs (elriggs) · 2023-09-21T15:30:47.198Z · comments (5)

'Theories of Values' and 'Theories of Agents': confusions, musings and desiderata
Mateusz Bagiński (mateusz-baginski) · 2023-11-15T16:00:48.926Z · comments (8)

ARENA 2.0 - Impact Report
CallumMcDougall (TheMcDouglas) · 2023-09-26T17:13:19.952Z · comments (5)

[link] AISN #25: White House Executive Order on AI, UK AI Safety Summit, and Progress on Voluntary Evaluations of AI Risks
aogara (Aidan O'Gara) · 2023-10-31T19:34:54.837Z · comments (1)

[link] Hyperreals in a Nutshell
Yudhister Kumar (randomwalks) · 2023-10-15T14:23:58.027Z · comments (27)

Dangers of Closed-Loop AI
Gordon Seidoh Worley (gworley) · 2024-03-22T23:52:22.010Z · comments (9)

[question] What is an "anti-Occamian prior"?
Zane · 2023-10-23T02:26:10.851Z · answers+comments (22)

An explanation for every token: using an LLM to sample another LLM
Max H (Maxc) · 2023-10-11T00:53:55.249Z · comments (5)

How I select alignment research projects
Ethan Perez (ethan-perez) · 2024-04-10T04:33:08.092Z · comments (4)

My Detailed Notes & Commentary from Secular Solstice
Jeffrey Heninger (jeffrey-heninger) · 2024-03-23T18:48:51.894Z · comments (16)

Forecasting AI (Overview)
jsteinhardt · 2023-11-16T19:00:04.218Z · comments (0)

An Introduction to Representation Engineering - an activation-based paradigm for controlling LLMs
Jan Wehner · 2024-07-14T10:37:21.544Z · comments (5)

[link] OpenAI appoints Retired U.S. Army General Paul M. Nakasone to Board of Directors
Joel Burget (joel-burget) · 2024-06-13T21:28:18.110Z · comments (10)

Categories of leadership on technical teams
benkuhn · 2024-07-22T04:50:04.071Z · comments (0)

Empirical vs. Mathematical Joints of Nature
Elizabeth (pktechgirl) · 2024-06-26T01:55:22.858Z · comments (1)

Representation Tuning
Christopher Ackerman (christopher-ackerman) · 2024-06-27T17:44:33.338Z · comments (9)

[link] My article in The Nation — California’s AI Safety Bill Is a Mask-Off Moment for the Industry
garrison · 2024-08-15T19:25:59.592Z · comments (0)

How predictive processing solved my wrist pain
max_shen (makoshen) · 2024-07-04T01:56:20.162Z · comments (8)

[link] List of Collective Intelligence Projects
Chipmonk · 2024-07-02T14:10:41.789Z · comments (9)

Agency in Politics
Martin Sustrik (sustrik) · 2024-07-17T05:30:01.873Z · comments (2)

Monthly Roundup #22: September 2024
Zvi · 2024-09-17T12:20:08.297Z · comments (10)

Doomsday Argument and the False Dilemma of Anthropic Reasoning
Ape in the coat · 2024-07-05T05:38:39.428Z · comments (55)

A sketch of acausal trade in practice
Richard_Ngo (ricraz) · 2024-02-04T00:32:54.622Z · comments (4)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

abramdemski on 5 ways to improve CoT faithfulness

Ah, yep, this makes sense to me.

abramdemski on 5 ways to improve CoT faithfulness

Yeah, this is a good point, which doesn't seem addressed by any idea so far.

abramdemski on o1 is a bad idea

Chess is like a bounded, mathematically described universe where all the instrumental convergence stays contained, and only accomplishes a very limited instrumentality in our universe (IE chess programs gain a limited sort of power here by being good playmates).

LLMs touch on the real world far more than that, such that MCTS-like skill at navigating "the LLM world" in contrast to chess sounds to me like it may create a concerning level of real-world-relevant instrumental convergence.

michaeldickens on OpenAI Email Archives (from Musk v. Altman)

OP did the work to collect these emails and put them into a post. When people do work for you, you shouldn't punish them by giving them even more work.

datawitch on Ayn Rand’s model of “living money”; and an upside of burnout

Oh wow, this is almost exactly how I model my internal mind. I didn't realize it was a real thing other people has arrived at. Is there a name for this?

abramdemski on 5 ways to improve CoT faithfulness

Still, IMO, exploiting the frozen planner through adversarial inputs in a single step seems pretty unlikely to be a fruitful strategy for the optimized planner. If the optimized planner is simply trying to accurately convey information to the frozen planner, probably >99% of that information is best to convey through human-understandable text.

Well, I'm not sure. As you mention, it depends on the step size. It also depends on how vulnerable to adversarial inputs LLMs are and how easy they are to find. I haven't looked into the research on this, but it sounds empirically checkable. If there are lots of adversarial inputs which have a wide array of impacts on LLM behavior, then it would seem very plausible that the optimized planner could find useful ones without being specifically steered in that direction.

This is especially likely to hold under the following conditions:

We can also combine this with other proposals, such as paraphrasing.

curiousmeta on Positive Bias: Look Into the Dark

This cognitive phenomenon is usually lumped in with “confirmation bias.” However, it seems to me that the phenomenon of trying to test positive rather than negative examples, ought to be distinguished from the phenomenon of trying to preserve the belief you started with. “Positive bias” is sometimes used as a synonym for “confirmation bias,” and fits this particular flaw much better.

Subtle distinction I almost missed here. Worth expanding.

abramdemski on 5 ways to improve CoT faithfulness

Yeah, you're right, I no longer think it's an interesting proposal.

koratkar on Gwerns

For next time: https://www.lesswrong.com/posts/LfrNFfJFcqnG9WuFf/activated-charcoal-for-hangover-prevention-way-more-than-you [LW · GW] ;)

bjartur-tomas on Gwerns

I was really sleep deprived and slightly intoxicated yesterday and wrote this. It was amusing to me, at least, in the state I was in.