LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

The "context window" analogy for human minds
Ruby · 2024-02-13T19:29:10.387Z · comments (0)

Predictive model agents are sort of corrigible
Raymond D · 2024-01-05T14:05:03.037Z · comments (6)

Empirical vs. Mathematical Joints of Nature
Elizabeth (pktechgirl) · 2024-06-26T01:55:22.858Z · comments (1)

ARENA 2.0 - Impact Report
CallumMcDougall (TheMcDouglas) · 2023-09-26T17:13:19.952Z · comments (5)

Book review: The Quincunx
cousin_it · 2024-06-05T21:13:55.055Z · comments (12)

[link] My article in The Nation — California’s AI Safety Bill Is a Mask-Off Moment for the Industry
garrison · 2024-08-15T19:25:59.592Z · comments (0)

Debate: Is it ethical to work at AI capabilities companies?
Ben Pace (Benito) · 2024-08-14T00:18:38.846Z · comments (21)

[link] My Model of Epistemology
adamShimi · 2024-08-31T17:01:45.472Z · comments (0)

[link] Twitter thread on politics of AI safety
Richard_Ngo (ricraz) · 2024-07-31T00:00:34.298Z · comments (2)

Index of rationalist groups in the Bay Area July 2024
Lucie Philippon (lucie-philippon) · 2024-07-26T16:32:25.337Z · comments (10)

[link] Twitter thread on AI takeover scenarios
Richard_Ngo (ricraz) · 2024-07-31T00:24:33.866Z · comments (0)

Open Problems in AIXI Agent Foundations
Cole Wyeth (Amyr) · 2024-09-12T15:38:59.007Z · comments (2)

[link] On Fables and Nuanced Charts
Niko_McCarty (niko-2) · 2024-09-08T17:09:07.503Z · comments (2)

What Helped Me - Kale, Blood, CPAP, X-tiamine, Methylphenidate
Johannes C. Mayer (johannes-c-mayer) · 2024-01-03T13:22:11.700Z · comments (12)

Dangers of Closed-Loop AI
Gordon Seidoh Worley (gworley) · 2024-03-22T23:52:22.010Z · comments (7)

My Detailed Notes & Commentary from Secular Solstice
Jeffrey Heninger (jeffrey-heninger) · 2024-03-23T18:48:51.894Z · comments (16)

[link] OpenAI appoints Retired U.S. Army General Paul M. Nakasone to Board of Directors
Joel Burget (joel-burget) · 2024-06-13T21:28:18.110Z · comments (10)

How predictive processing solved my wrist pain
max_shen (makoshen) · 2024-07-04T01:56:20.162Z · comments (8)

Open consultancy: Letting untrusted AIs choose what answer to argue for
Fabien Roger (Fabien) · 2024-03-12T20:38:03.785Z · comments (5)

Introduce a Speed Maximum
jefftk (jkaufman) · 2024-01-11T02:50:04.284Z · comments (28)

'Theories of Values' and 'Theories of Agents': confusions, musings and desiderata
Mateusz Bagiński (mateusz-baginski) · 2023-11-15T16:00:48.926Z · comments (8)

[Valence series] 4. Valence & Social Status (deprecated)
Steven Byrnes (steve2152) · 2023-12-15T14:24:41.040Z · comments (19)

Sparse Autoencoders: Future Work
Logan Riggs (elriggs) · 2023-09-21T15:30:47.198Z · comments (5)

[link] AISN #25: White House Executive Order on AI, UK AI Safety Summit, and Progress on Voluntary Evaluations of AI Risks
aogara (Aidan O'Gara) · 2023-10-31T19:34:54.837Z · comments (1)

Secondary Risk Markets
Vaniver · 2023-12-11T21:52:46.836Z · comments (4)

List of strategies for mitigating deceptive alignment
joshc (joshua-clymer) · 2023-12-02T05:56:50.867Z · comments (2)

Motivating Alignment of LLM-Powered Agents: Easy for AGI, Hard for ASI?
RogerDearnaley (roger-d-1) · 2024-01-11T12:56:29.672Z · comments (4)

Copyright Confrontation #1
Zvi · 2024-01-03T15:50:04.850Z · comments (7)

AI Impacts Survey: December 2023 Edition
Zvi · 2024-01-05T14:40:06.156Z · comments (6)

Direction of Fit
NicholasKees (nick_kees) · 2023-10-02T12:34:24.385Z · comments (0)

What I Learned (Conclusion To "The Sense Of Physical Necessity")
LoganStrohl (BrienneYudkowsky) · 2024-03-20T21:24:37.464Z · comments (0)

Wireheading and misalignment by composition on NetHack
pierlucadoro · 2023-10-27T17:43:41.727Z · comments (4)

[link] math terminology as convolution
bhauth · 2023-10-30T01:05:11.823Z · comments (1)

AI #56: Blackwell That Ends Well
Zvi · 2024-03-21T12:10:05.412Z · comments (16)

[link] AI governance needs a theory of victory
Corin Katzke (corin-katzke) · 2024-06-21T16:15:46.560Z · comments (6)

Finding the Wisdom to Build Safe AI
Gordon Seidoh Worley (gworley) · 2024-07-04T19:04:16.089Z · comments (10)

Intransitive Trust
Screwtape · 2024-05-27T16:55:29.294Z · comments (15)

[link] Why Yudkowsky is wrong about "covalently bonded equivalents of biology"
titotal (lombertini) · 2023-12-06T14:09:15.402Z · comments (40)

[link] legged robot scaling laws
bhauth · 2024-01-20T05:45:56.632Z · comments (8)

Doomsday Argument and the False Dilemma of Anthropic Reasoning
Ape in the coat · 2024-07-05T05:38:39.428Z · comments (55)

Reflective consistency, randomized decisions, and the dangers of unrealistic thought experiments
Radford Neal · 2023-12-07T03:33:16.149Z · comments (25)

Trying to deconfuse some core AI x-risk problems
habryka (habryka4) · 2023-10-17T18:36:56.189Z · comments (13)

Unpicking Extinction
ukc10014 · 2023-12-09T09:15:41.291Z · comments (10)

LessWrong: After Dark, a new side of LessWrong
So8res · 2024-04-01T22:44:04.449Z · comments (5)

[link] Robin Hanson & Liron Shapira Debate AI X-Risk
Liron · 2024-07-08T21:45:40.609Z · comments (4)

How I select alignment research projects
Ethan Perez (ethan-perez) · 2024-04-10T04:33:08.092Z · comments (4)

An explanation for every token: using an LLM to sample another LLM
Max H (Maxc) · 2023-10-11T00:53:55.249Z · comments (5)

If You Can Climb Up, You Can Climb Down
jefftk (jkaufman) · 2024-07-30T00:00:06.295Z · comments (9)

We’re not as 3-Dimensional as We Think
silentbob · 2024-08-04T14:39:16.799Z · comments (16)

[link] My Apartment Art Commission Process
jenn (pixx) · 2024-08-26T18:36:44.363Z · comments (4)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

joachim-bartosik on How to choose what to work on

This is pretty much the same thing, except breaking out the “economic engine” into two elements of “world needs it” and “you can get paid for it.”

There are things that are economic engines of things that world doesn't quite need (getting people addicted, rent seeking, threats of violence).

jacques-thibodeau on If I wanted to spend WAY more on AI, what would I spend it on?

Here's what I'm currently using and how much I am paying:

Superwhisper (or other new Speech-to-Text that leverage LLMs for rewriting) apps. Under $8.49 per month.
Cursor Pro ($20-30/month). Switch to API credits when the slow responses take too long.
Claude.ai Pro ($20/month). You could consider getting two accounts or a Team account to worry less about hitting the token limit.
Chatgpt.com Pro account ($20/month). Again, can get a second account to have more o1-preview responses from the chat.
Aider (~$10/month max in API credits if used with Cursor Pro).
Google Colab Pro subscription ($9.99/month). You could get the Pro+ plan for $49.99/month.
Google One 2TB AI Premium plan ($20/month). This comes with Gemini chat and other AI features. I also sign up to get the latest features earlier, like Notebook LM and Illuminate.
v0 chat ($20/month). Used for creating Next.js websites quickly.
jointakeoff.com ($22.99/month) for courses on using AI for development.
I still have GitHub Copilot (along with Cursor's Copilot++) because I bought a long-term subscription.
Grammarly ($12/month).
Reader by ElevenLabs (Free, for now). Best quality TTS app out there right now.

Other things I'm considering paying for:

Perplexity AI ($20/month).
Other AI-focused courses that help me best use AI for productivity (web dev or coding in general).
Suno AI ($8/month). I might want to make music with it.

Apps others may be willing to pay for:

Warp, an LLM-enabled terminal ($20/month). I don't use the free version enough to upgrade to the paid version.

There are ways to optimize how much I'm paying to save a bit of cash for sure. But I'm currently paying roughly $168/month.

Note: I am a technical alignment researcher who also works on trying to augment alignment researchers and eventually automate more and more of alignment research so I'm biasing myself to overspend on products in order to make sure I'm aware of the bleeding-edge setup.

nathan-helm-burger on Yoav Ravid's Shortform

To me, the heart of the matter is being trapped in local minima which involve negative sum competitions. Pyrrhic victory is sometimes involved, but sometimes not. An example of not a mutual loss would be if Actor A does come out ahead of where they started, but Actor B lost more than Actor A gained.

A rat race which devours slack, as Dagon mentions is another example. Although that doesn't seem as central to Moloch to me as Negative Sum competition.

Also, there's a related psychological aspect to this. When people get scared and insecure and hostile in their trades and competitions, they have a tendency to stop even searching for win-win solutions. Liv Boeree talks about Win-Win being the opposite of Moloch. https://www.youtube.com/channel/UC09fp6hZ2RHiUYwY8hNCirA

nathan-helm-burger on Wei Dai's Shortform

Maybe others are using it in secret but don't want to admit it for some reason? I can't find any mention of Anthropic having filed a patent on the idea, but maybe other companies would feel too much like it looked like they were second-rate imitators if they said they were copying Anthropic's idea?

Just speculating, I don't know. Sure seems like a useful idea to copy.

nathan-helm-burger on My hopes for YouCongress.com

I think that's a great idea. I don't think the duplication of polls would be a problem, since it would be easy to filter aggregate results for user-created vs official polls.

rohinmshah on Estimating Tail Risk in Neural Networks

A few questions:

The literature review is very strange to me. Where is the section on certified robustness against epsilon-ball adversarial examples? The techniques used in that literature (e.g. interval propagation) are nearly identical to what you discuss here.
Relatedly, what's the source of hope for these kinds of methods outperforming adversarial training? My sense from the certified defenses literature is that the estimates they produce are very weak, because of the problems with failing to model all the information in activations. (Note I'm not sure how weak the estimates actually are, since they usually report fraction of inputs which could be certified robust, rather than an estimate of the probability that a sampled input will cause a misclassification, which would be more analogous to your setting.)
If your catastrophe detector involves a weak model running many many inferences, then it seems like the total number of layers is vastly larger than the number of layers in M, which seems like it will exacerbate the problems above by a lot. Any ideas for dealing with this?
What's your proposal for the distribution for Method 2 (independent linear features)?

This suggests that we must model the entire distribution of activations simultaneously, instead of modeling each individual layer.

Why think this is a cost you can pay? Even if we ignore the existence of C and just focus on M, and we just require modeling the correlations between any pair of layers (which of course can be broken by higher-order correlations), that is still quadratic in the number of parameters of M and so has a cost similar to training M in the first place. In practice I would assume it is a much higher cost (not least because C is so much larger than M).

davidmanheim on Proveably Safe Self Driving Cars [Modulo Assumptions]

I partly disagree; steganography is only useful when it's possible for the outside / receiving system to detect and interpret the hidden messages, so if the messages are of a type that outside systems would identify, they can and should be detectable by the gating system as well.

That said, I'd be very interested in looking at formal guarantees that the outputs are minimally complex in some computationally tractable sense, or something similar - it definitely seems like something that @davidad [LW · GW] would want to consider.

davidmanheim on Proveably Safe Self Driving Cars [Modulo Assumptions]

I really like that idea, and the clarity it provides, and have renamed the post to reflect it! (Sorryr this was so slow- I'm travelling.)

davidmanheim on Proveably Safe Self Driving Cars [Modulo Assumptions]

That seems fair!

marvinscheffold on Who Feels More Alone?

@Ericf [LW · GW] you are right, I didn't calculate it like this. I´m going to adjust the formula to see if it changes the overall outcome of the essay or not.