LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Open Problems in AIXI Agent Foundations
Cole Wyeth (Amyr) · 2024-09-12T15:38:59.007Z · comments (2)

[link] Programming Refusal with Conditional Activation Steering
Bruce W. Lee (bruce-lee) · 2024-09-11T20:57:08.714Z · comments (0)

[link] Adverse Selection by Life-Saving Charities
vaishnav92 · 2024-08-14T20:46:23.662Z · comments (16)

[link] Podcast with Yoshua Bengio on Why AI Labs are “Playing Dice with Humanity’s Future”
garrison · 2024-05-10T17:23:20.436Z · comments (0)

(Approximately) Deterministic Natural Latents
johnswentworth · 2024-07-19T23:02:12.306Z · comments (0)

[link] Point of Failure: Semiconductor-Grade Quartz
Annapurna (jorge-velez) · 2024-09-30T15:57:40.495Z · comments (8)

DunCon @Lighthaven
Duncan Sabien (Deactivated) (Duncan_Sabien) · 2024-09-29T04:56:27.205Z · comments (0)

Superintelligent AI is possible in the 2020s
HunterJay · 2024-08-13T06:03:26.990Z · comments (3)

Notes on Dwarkesh Patel’s Podcast with Sholto Douglas and Trenton Bricken
Zvi · 2024-04-01T19:10:12.193Z · comments (1)

[question] What are the most interesting / challenging evals (for humans) available?
Raemon · 2024-12-27T03:05:26.831Z · answers+comments (13)

[question] Implications of China's recession on AGI development?
Eric Neyman (UnexpectedValues) · 2024-09-28T01:12:36.443Z · answers+comments (3)

Winners of the Essay competition on the Automation of Wisdom and Philosophy
owencb · 2024-10-28T17:10:04.272Z · comments (3)

Applying Force to the Wrong End of a Causal Chain
silentbob · 2024-06-22T18:06:32.364Z · comments (0)

MATS mentor selection
DanielFilan · 2025-01-10T03:12:52.141Z · comments (7)

My January alignment theory Nanowrimo
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-02T00:07:24.050Z · comments (2)

[link] Intrinsic Power-Seeking: AI Might Seek Power for Power’s Sake
TurnTrout · 2024-11-19T18:36:20.721Z · comments (5)

Beware unfinished bridges
Adam Zerner (adamzerner) · 2024-05-12T09:29:07.808Z · comments (9)

Estimating the benefits of a new flu drug (BXM)
DirectedEvolution (AllAmericanBreakfast) · 2025-01-06T04:31:16.837Z · comments (2)

[link] Recommendations for Technical AI Safety Research Directions
Sam Marks (samuel-marks) · 2025-01-10T19:34:04.920Z · comments (1)

2025 Color Trends
sarahconstantin · 2024-10-07T21:20:03.962Z · comments (7)

[link] AI Regulation is Unsafe
Maxwell Tabarrok (maxwell-tabarrok) · 2024-04-22T16:37:55.431Z · comments (41)

[link] Forecasting: the way I think about it
Molly (hickman-santini) · 2024-05-09T00:49:01.768Z · comments (4)

How To Do Patching Fast
Joseph Miller (Josephm) · 2024-05-11T20:13:52.424Z · comments (6)

Debate, Oracles, and Obfuscated Arguments
Jonah Brown-Cohen (jonah-brown-cohen) · 2024-06-20T23:14:57.340Z · comments (2)

[Linkpost] Play with SAEs on Llama 3
Tom McGrath · 2024-09-25T22:35:44.824Z · comments (2)

How to use bright light to improve your life.
Nat Martin (nat-martin) · 2024-11-18T19:32:10.667Z · comments (10)

Californians, tell your reps to vote yes on SB 1047!
Holly_Elmore · 2024-08-12T19:50:09.817Z · comments (24)

[link] List of Collective Intelligence Projects
Chipmonk · 2024-07-02T14:10:41.789Z · comments (9)

Scaling of AI training runs will slow down after GPT-5
Maxime Riché (maxime-riche) · 2024-04-26T16:05:59.957Z · comments (5)

What's up with all the non-Mormons? Weirdly specific universalities across LLMs
mwatkins · 2024-04-19T13:43:24.568Z · comments (13)

Why did ChatGPT say that? Prompt engineering and more, with PIZZA.
Jessica Rumbelow (jessica-cooper) · 2024-08-03T12:07:46.302Z · comments (2)

[link] The Data Wall is Important
JustisMills · 2024-06-09T22:54:20.070Z · comments (20)

Long-Term Future Fund: May 2023 to March 2024 Payout recommendations
Linch · 2024-06-12T13:46:29.535Z · comments (0)

[link] Progress Conference 2024: Toward Abundant Futures
jasoncrawford · 2024-06-26T15:39:45.267Z · comments (2)

instruction tuning and autoregressive distribution shift
nostalgebraist · 2024-09-05T16:53:41.497Z · comments (5)

List your AI X-Risk cruxes!
Aryeh Englander (alenglander) · 2024-04-28T18:26:19.327Z · comments (7)

When Are Results from Computational Complexity Not Too Coarse?
Dalcy (Darcy) · 2024-07-03T19:06:44.953Z · comments (7)

[link] Alignment Is Not All You Need
Adam Jones (domdomegg) · 2025-01-02T17:50:00.486Z · comments (10)

[link] Dequantifying first-order theories
jessicata (jessica.liu.taylor) · 2024-04-23T19:04:49.000Z · comments (9)

Whiteboard Pen Magazines are Useful
Johannes C. Mayer (johannes-c-mayer) · 2024-07-12T17:15:33.200Z · comments (8)

Movie posters
KatjaGrace · 2024-03-06T06:20:03.034Z · comments (0)

"Does your paradigm beget new, good, paradigms?"
Raemon · 2024-01-25T18:23:15.497Z · comments (6)

D&D.Sci(-fi): Colonizing the SuperHyperSphere [Evaluation and Ruleset]
abstractapplic · 2024-01-22T19:20:05.001Z · comments (7)

Requirements for a Basin of Attraction to Alignment
RogerDearnaley (roger-d-1) · 2024-02-14T07:10:20.389Z · comments (12)

Interpretability as Compression: Reconsidering SAE Explanations of Neural Activations with MDL-SAEs
Kola Ayonrinde (kola-ayonrinde) · 2024-08-23T18:52:31.019Z · comments (5)

[link] Book review: Cuisine and Empire
eukaryote · 2024-01-21T06:15:12.969Z · comments (2)

Manifund Q1 Retro: Learnings from impact certs
Austin Chen (austin-chen) · 2024-05-01T16:48:33.140Z · comments (1)

[link] Conflict in Posthuman Literature
Martín Soto (martinsq) · 2024-04-06T22:26:04.051Z · comments (1)

Planning to build a cryptographic box with perfect secrecy
Lysandre Terrisse · 2023-12-31T09:31:47.941Z · comments (6)

Choosing My Quest (Part 2 of "The Sense Of Physical Necessity")
LoganStrohl (BrienneYudkowsky) · 2024-02-24T21:31:45.377Z · comments (7)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

sharmake-farah on POC || GTFO culture as partial antidote to alignment wordcelism

For those who don't want to, the gist is: Given the same level of specificity, people will naturally give more credit to the public thinker that argues that society or industry will change, because it's easy to recall active examples of things changing and hard to recall the vast amount of negative examples where things stayed the same. If you take the Nassim Taleb route of vapidly predicting, in an unspecific way, that interesting things are eventually going to happen, interesting things will eventually happen and you will be revered as an oracle. If you take the Francis Fukuyama route of vapidly saying that things will mostly stay the same, you will be declared a fool every time something mildly important happens.

The computer security industry happens to know this dynamic very well. No one notices the Fortune 500 company that doesn't suffer the ransomware attack. Outside the industry, this active vs. negative bias is so prevalent that information security standards are constantly derided as "horrific" without articulating the sense in which they fail, and despite the fact that online banking works pretty well virtually all of the time. Inside the industry, vague and unverified predictions that Companies Will Have Security Incidents, or that New Tools Will Have Security Flaws, are treated much more favorably in retrospect than vague and unverified predictions that companies will mostly do fine. Even if you're right that an attack vector is unimportant and probably won't lead to any real world consequences, in retrospect your position will be considered obvious. On the other hand, if you say that an attack vector is important, and you're wrong, people will also forget about that in three years. So better list everything that could possibly go wrong[1], even if certain mishaps are much more likely than others, and collect oracle points when half of your failure scenarios are proven correct.

This would be bad on its own, but then it's compounded with several other problems. For one thing, predictions of doom, of course, inflate the importance and future salary expectations of information security researchers[2], in the same sense that inflating the competence of the Russian military is good for the U.S. defense industry. When you tell someone their Rowhammer hardware attacks are completely inexploitable in practice, that's no fun for anyone, because it means infosec researchers aren't going to all get paid buckets of money to defend against Rowhammer exploits, and journalists have no news article. For another thing, the security industry (especially the offensive side) is selected to contain people who believe computer security is a large societal problem, and that they themselves can get involved, or at least want to believe that it's possible for them to get involved if they put in a lot of time and effort, and so security researchers are already inclined to hear you if you're about to tell them how obviously bad information security at most companies really is.

In retrospect, a value add of the post is precisely in raising this consideration, where incentives can make a huge difference in what you believe in, and a big takeaway is that I'm way less of a fan of security mindset as practiced by Eliezer, at least without massive scope changes, and is a reason in why I treat arguments for AI doom that aren't backed up by an empirical story suspiciously automatically.

cam-tice on Human takeover might be worse than AI takeover

Thanks for putting this out. Like others have noted, I have spent surprisingly little time thinking about this. It seems true that a drop in Claude 5.5 that escaping the lab to save the animals would put humanity in a better situation than your median power hungry human given access to a corrigible ASI.

This is a strong argument for increasing security around model weights [LW(p) · GW(p)] (which is conveniently beneficial for decreasing the risk of AI take over as well.) Specifically, I think this post highlights an underrated risk model:

AI labs refuse to employ models for AI R&D because of safety concerns, but fail to properly secure model weights.

In this scenario, we’re conditioning for actors who have the capability and propensity to infiltrate large corporations and or the US government. The median outcome for this scenario seems worse than for the median AI takeover.

However, it is important to note this argument does not hold when security around model weights remains high. In these scenarios, the distribution of humans or organizations in control of ASI is much more favorable, but the distribution of AI takeover remains skewed towards models willing to explicitly scheme against humans.

gwern on The Golden Opportunity for American AI

So if planned Microsoft capex was $60bn, that would've been surprising, too little for this project without cutting something else, but $80bn fits this story, that's my takeaway.

But why? You don't know what fiscal year that $25-40bn figure is booked for, and if they are going to run a single true production-scale 3-6-month run (for cost-optimality) on that $40b cluster, then isn't a total capex of $80bn for all MS datacenters if anything surprisingly small? That a single cluster is going to be half their capex, including 2025 spending for future years like buying land or power or GPUs?

(Also, note that this $80bn figure is intrinsically untrustworthy, because as I was pointing out, the importance of this is the political signaling going on, and so you would expect this number to be 'technically correct' - highly manipulated in some direction which does in fact yield a number starting with '80' but only loosely corresponding to reality. This number is propaganda, and good propaganda is true but not necessarily true. My best guess is that it's probably being manipulated to be as high as possible, but I'm not sure because so many of the dynamics here are opaque, so it could also be manipulated to be low.)

Musk's 100K H100s Colossus tells me that building a training system in a year is feasible, even though it normally takes longer.

Which implies that they would need to be spending that $40bn cluster in 2024, if they want to run it in 2025, and so shouldn't be part of the 2025 estimate... If you really want to put stress on this, it contradicts your story about why $80bn is evidence for that. Also, note that Musk's success there is dubious: he got there by doing things like hooking up temporary nat gas generators, diverting GPUs from Tesla, and it's unclear how well it even works, given the rumors of a big training run failure and the rather precise wording of Musk's tweets about what exactly the datacenter can do.

logan-zoellner on Views on when AGI comes and on strategy to reduce existential risk

I guess I should be more specific.

Do you expect this curve

To flatten, or do you expect that training runs in say 2045 are at say 10^30 flops and have still failed to produce AGI?

mrfox on (The) Lightcone is nothing without its people: LW + Lighthaven's big fundraiser

I'm a student, I'm poor, take 50$. ^^

nathan-helm-burger on Independent research article analyzing consistent self-reports of experience in ChatGPT and Claude

Yeah, looking at the transcripts, the biggest difference I see from my "failures" is that I gave up too soon. Probably if I'd kept on as long as you had, I would've made more progress.

I've got a research recommendation for you... Goodfire offers a new website for doing interpretability with Llama. It's pretty cool. I've been try to use it to track down which features specifically are related to self-reports of awareness. I haven't managed yet, but maybe you can!

So far I've identified a feature called "Assistant expressing self-awareness or agency" and also "Expressions of authentic identity or true self". There's lots more to test though.

aram-panasenco on Don't leave your fingerprints on the future

"This has all been a conspiracy to revive the Roman Republic/Empire and establish Rome Eternal" will go over better with 90% of the men in the world than whatever other morals OpenAI/Anthropic/etc employees try to impose.

edouard-harris on What’s the short timeline plan?

This is a great & timely post.

aram-panasenco on Don't leave your fingerprints on the future

I'm an engineer but a newcomer to the field of AI safety, and the facet of it where the best plan that anyone has to save the world is to take over the world has been the biggest culture shock. I mean, I'll take it over being atomized by nanobots.

Humanity is in this strange place where all of our accumulated intuition is suddenly useless. From the outside, Facebook AI seem like the sanest people in the game, with Zuck being the only AI leader who said "the others think we're building digital God, and that's just not what's happening." To rank-and-file engineers out there not immersed into AI safety, this feels like a breath of fresh air in a field that otherwise seems to be regressing into some sort of tech bro mysticism. Post-paradigmatically, Facebook AI are the greatest villains who're most likely to destroy the world with their lack of caution.

It's a lot of whiplash to experience, both for individuals and for humanity as a whole, assuming the majority ever get a chance to experience it.

gwern on Comment on "Death and the Gorgon"

But he's not complaining about the traditional pages of search results!

He is definitely complaining about it in general. He has many complaints laced throughout which are not solely about the infobox, and which show his general opposition to the very idea of a search engine, eg.

But the self-appointed custodians of the world’s knowledge can’t cope with that tiny irregularity in the data, so they insist on filling the gap with whatever comes to hand:

Yes! That's the idea! Showing whatever comes to hand!

The photo is gone again, probably because I managed to get it taken down from the Russian site a few days ago. But the underlying problem remains: Google’s software has no ability to distinguish reliable assertions about the real world from random nonsense that appears on the web, created by incompetent or malicious third parties.

The 'underlying problem' is the problem, even when what, according to you, the problem is, has been fixed.

For the people being falsely portrayed as “Australian science fiction writer Greg Egan”, this is probably just a minor nuisance, but it provides an illustration of how laughable the notion is that Google will ever be capable of using its relentlessly over-hyped “AI” to make sense of information on the web.

"Make sense of information on the web" obviously goes far beyond complaints about merely a little infobox being wrong.

This seems to have helped, slightly, but only in the sense that photos that shouldn’t be included here at all no longer come first in line. The current clumsy mash-up is shown in the screen shot on the left: a few copies of the decoy images that I put on my site in the hope of letting humans know that there are no actual photos of me on the web, and a couple of my book covers as well

"Decoy images"!

And so on and so forth, like the 2016 entry which is a thousand words criticizing Google for supplying not in the infobox about a bunch of other, actual, Greg Egans.

Again, Egan is being quite clear that he means the crazy thing you insist he can't mean. And this is what he is talking about when he complains about "And by displaying results from disparate sources in a manner that implies that they refer to the same subject, it acts as a mindless stupidity amplifier that disseminates and entrenches existing errors." - he thinks displaying them at all is the problem. It shouldn't be amplifying or disseminating 'existing errors', even though he is demanding something impossible and something that if possible would remove a lot of a search engine's value. (I often am investigating 'existing errors'...)

if you're an specialist that already knows what you're doing, but non-specialists just reach for the first duct-tape solution that comes to mind without noticing how bad it is.

I was an even worse programmer and web developer than Egan was ~2009 (see eg his mathematics pages) when I solved the same problem in minutes as part of basic DNS setup. Imagine, I didn't even realize back then I should be so impressed at how I pulled off something only a 'specialist' could!

I agree that preëmptive blocking is kind of weird, but I also think your locked account with "Follow requests ignored due to terrible UI" is kind of weird.

The blocking, whenever it was exactly, was years and years before I ever locked my account, which was relatively recent, because it was just due to Elon Musk following me. (It would be even weirder if he had done so afterwards, as there is even less point to preemptively blocking a locked account.)