LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Executive Director for AIS France - Expression of interest
gergogaspar (gergo-gaspar) · 2024-12-19T08:14:54.023Z · comments (0)

CCing Mailing Lists on External Communication
jefftk (jkaufman) · 2024-12-04T22:00:02.038Z · comments (0)

Force Sequential Output with SCP?
jefftk (jkaufman) · 2024-11-09T12:40:06.098Z · comments (4)

[link] Anthropic teams up with Palantir and AWS to sell AI to defense customers
Matrice Jacobine · 2024-11-09T11:50:34.050Z · comments (0)

[link] Testing Genetic Engineering Detection with Spike-Ins
jefftk (jkaufman) · 2024-10-22T17:20:54.947Z · comments (0)

[link] Markets Are Information - Beating the Sportsbooks at Their Own Game
JJXW · 2024-11-07T20:58:43.389Z · comments (1)

The Bayesian Conspiracy Live Recording
Eneasz · 2024-11-06T16:25:13.380Z · comments (0)

how to rapidly assimilate new information
dhruvmethi · 2024-10-24T02:18:00.648Z · comments (3)

[link] Corrigibility should be an AI's Only Goal
PeterMcCluskey · 2024-12-29T20:25:17.922Z · comments (1)

Derivative AT a discontinuity
Alok Singh (OldManNick) · 2024-10-24T02:48:24.573Z · comments (5)

Testing "True" Language Understanding in LLMs: A Simple Proposal
MtryaSam · 2024-11-02T19:12:34.710Z · comments (2)

Notes from Copenhagen Secular Solstice 2024
Søren Elverlin (soren-elverlin-1) · 2024-12-22T15:08:20.848Z · comments (0)

[question] Has Someone Checked The Cold-Water-In-Left-Ear Thing?
Maloew (maloew-valenar) · 2024-12-28T20:15:35.951Z · answers+comments (0)

[link] Ideologies are slow and necessary, for now
Gabriel Alfour (gabriel-alfour-1) · 2024-12-23T01:57:47.153Z · comments (1)

Near- and medium-term AI Control Safety Cases
Martín Soto (martinsq) · 2024-12-23T17:37:48.860Z · comments (0)

[link] What is autonomy? Why boundaries are necessary.
Chipmonk · 2024-10-21T17:56:33.722Z · comments (1)

Post-Quantum Investing: Dump Crypto for Index Funds and Real Estate?
G (g-1) · 2024-12-11T11:59:11.062Z · comments (5)

Not all biases are equal - a study of sycophancy and bias in fine-tuned LLMs
jakub_krys (kryjak) · 2024-11-11T23:11:15.233Z · comments (0)

What conclusions can be drawn from a single observation about wealth in tennis?
Trevor Cappallo (trevor-cappallo) · 2024-12-18T09:55:34.923Z · comments (3)

[question] Cryonics considerations: how big of a problem is ischemia?
kman · 2024-12-04T04:45:06.629Z · answers+comments (1)

Consider tabooing "I think"
Adam Zerner (adamzerner) · 2024-11-12T02:00:08.433Z · comments (2)

[question] Set Theory Multiverse vs Mathematical Truth - Philosophical Discussion
Wenitte Apiou (wenitte-apiou) · 2024-11-01T18:56:06.900Z · answers+comments (25)

On Intentionality, or: Towards a More Inclusive Concept of Lying
Cornelius Dybdahl (Kalciphoz) · 2024-10-18T10:37:32.201Z · comments (0)

[link] An Uncanny Moat
Adam Newgas (BorisTheBrave) · 2024-11-15T11:39:15.165Z · comments (0)

Where do you put your ideas?
CstineSublime · 2024-12-17T07:26:06.685Z · comments (20)

Reanalyzing the 2023 Expert Survey on Progress in AI
AI Impacts (AI Imacts) · 2024-12-16T06:10:04.563Z · comments (0)

[question] Is my distinctiveness evidence for being in a simulation?
AynonymousPrsn123 · 2025-01-06T21:27:13.280Z · answers+comments (42)

[link] The Dissolution of AI Safety
Roko · 2024-12-12T10:34:14.253Z · comments (44)

HDBSCAN is Surprisingly Effective at Finding Interpretable Clusters of the SAE Decoder Matrix
Jaehyuk Lim (jason-l) · 2024-10-11T23:06:14.340Z · comments (2)

Valence Need Not Be Bounded; Utility Need Not Synthesize
Lorec · 2024-11-20T01:37:20.911Z · comments (0)

[link] Contagious Beliefs—Simulating Political Alignment
James Stephen Brown (james-brown) · 2024-10-13T00:27:08.084Z · comments (0)

[question] why won't this alignment plan work?
KvmanThinking (avery-liu) · 2024-10-10T15:44:59.450Z · answers+comments (7)

AI Safety Outreach Seminar & Social (online)
Linda Linsefors · 2025-01-08T13:25:23.192Z · comments (0)

Dario Amodei's "Machines of Loving Grace" sound incredibly dangerous, for Humans
Super AGI (super-agi) · 2024-10-27T05:05:13.763Z · comments (1)

Don't Dismiss on Epistemics
ggex · 2024-11-19T00:44:05.329Z · comments (3)

Meta AI (FAIR) latest paper integrates system-1 and system-2 thinking into reasoning models.
happy friday (happy-friday) · 2024-10-24T16:54:15.721Z · comments (0)

[question] Change My Mind: Thirders in "Sleeping Beauty" are Just Doing Epistemology Wrong
DragonGod · 2024-10-16T10:20:22.133Z · answers+comments (67)

Thoughts On the Nature of Capability Elicitation via Fine-tuning
Theodore Chapman · 2024-10-15T08:39:19.909Z · comments (0)

[link] It's important to know when to stop: Mechanistic Exploration of Gemma 2 List Generation
Gerard Boxo (gerard-boxo) · 2024-10-14T17:04:57.010Z · comments (0)

The grass is always greener in the environment that shaped your values
Karl Faulks (karl-faulks) · 2024-11-17T18:00:15.852Z · comments (0)

[link] Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities
Jonathan N (derpyplops) · 2024-11-05T01:01:08.083Z · comments (0)

[link] Nerdtrition: simple diets via spreadsheet abuse
dkl9 · 2024-10-27T21:45:15.117Z · comments (0)

[question] What are the primary drivers that caused selection pressure for intelligence in humans?
Towards_Keeperhood (Simon Skade) · 2024-11-07T09:40:20.275Z · answers+comments (15)

Quantum Immortality: A Perspective if AI Doomers are Probably Right
avturchin · 2024-11-07T16:06:08.106Z · comments (53)

Proactive 'If-Then' Safety Cases
Nathan Helm-Burger (nathan-helm-burger) · 2024-11-18T21:16:37.237Z · comments (0)

[question] Why don't we currently have AI agents?
ChristianKl · 2024-12-26T15:26:35.682Z · answers+comments (10)

New UChicago Rationality Group
Noah Birnbaum (daniel-birnbaum) · 2024-11-08T21:20:34.485Z · comments (0)

[link] Triangulating My Interpretation of Methods: Black Boxes by Marco J. Nathan
adamShimi · 2024-10-09T19:13:26.631Z · comments (0)

Favorite colors of some LLMs.
weightt an (weightt-an) · 2024-12-31T21:22:58.494Z · comments (3)

[link] Riffing on Machines of Loving Grace
an1lam · 2025-01-01T01:06:45.122Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

gurkenglas on When AI 10x's AI R&D, What Do We Do?

Account settings let you set mentions to notify you by email :)

gwern on bilalchughtai's Shortform

An example implementation of this feature is Gwern.net's "link-bibliographies" (eg). We extract all URLs from the Markdown, filter, turn them into a list with the available metadata like title/author/tags/abstract, and because we assign IDs to all links, we can also provide a reverse/backlink '↑' to the context in the page where the link was used (and a link might be used multiple times). Wikipedia links are included, but stuffed into a sublist at the end because they would drown out the regular links. It uses dynamic/lazy transclusion, so it doesn't cost anything if you never look at it, but if you want to print out the page or something, that should also be doable as they get loaded then by the print-mode.

We also plan to include a second version of the link-bibliography, a 'browsing history' version, which quietly logs each link you interact with in a big list at the end of the page (we'll probably put it before the full link-bibliography). So the standard full link-bibliography provides all the URLs, but the browsing-history would provide just the shortlist of URLs you interact with, in the order you interacted with them. The idea is that you could more freely move in and out of popups if you didn't have the anxiety of 'losing' them, because there's an append-only log, and after reading a page, you might skim the browsing-history and open up some of them for further reading or to jog your memory about what you were reading at one point. Since the links are in temporal order, it should be easy to reconstruct your train of thought at any point as you were reading. (You could also use it to create a sort of 'custom bibliography', where you pop up a small subset of links focused on some particular claim or thesis, and you can save that to PDF or something.) Since it's all transcludes, the browsing-history is also effectively free (you already paid the cost of downloading & rendering each entry when you popped it up the first time).

koki on Koki's Shortform

Regarding the GPU export restrictions, the U.S. government is trying to outcompete China. However, it seems that Big Tech CEOs, including Sam Altman, have something else in mind. They figure that as long as the U.S. can develop ASI within the next five years—before China recovers from the export restrictions—that’s all that matters. And once it’s built, they don’t really care what happens next: even if X/S-risks arise, it’s still better if it’s their own country that causes them rather than China. If the whole world is going to be dragged into it, they’d rather do it themselves.

magic9mushroom on What’s the short timeline plan?

There is no war in the run-up to AGI that would derail the project, e.g. by necessitating that most resources be used for capabilities instead of safety research.

Assuming short timelines, I think it’s likely impossible to reach my desired levels of safety culture.

I feel obliged to note that a nuclear war, by dint of EMPs wiping out the power grid, would likely remove private AI companies as a thing for a while, thus deleting their current culture. It would also lengthen timelines.

Certainly not ideal in its own right, though.

technicalities on Shallow review of technical AI safety, 2024

Done, thanks!

jbash on When AI 10x's AI R&D, What Do We Do?

Sorry; I'm not in the habit of reading the notifications, so I didn't see the "@" tag.

I don't have a good answer (which doesn't change the underlying bad prospects for securing the data). I think I'd tend to prefer to "mitigating risks after potential model theft", because I believe "convince key actors" is fundamentally futile. The kind of security you'd need, if it's possible, would basically shut them down. Which is equivalent to abandoning the "key actor" role to whoever does not implement that kind of security.

Unfortunately, "key actors" would also have to be convinced to "mitigate risks", which they're unlikely to do because that would require them to accept that their preventative measures are probably going to fail. So even the relatively mild "go ahead and do it, but don't expect it to work" is probably not going to happen.

jbash on What are some scenarios where an aligned AGI actually helps humanity, but many/most people don't like it?

Well, OK, but you also said "actually helps humanity", which assumes some kind of outside view. And you used "aligned" without specifying any particular one of the conflicting visions of "alignment" that are out there.

I absolutely agree that "aligned with whom" is a huge issue. It's one of the things that really bugs me about the word.

I do also agree that there are going to be irreconcilliable differences, and that, barring mind surgery to change their opinions, many people will be unhappy with whatever happens. That applies no matter what an AI does, and in fact no matter what anybody who's "in charge" does. It applies even if nobody is in charge. But if somebody is in charge, it's guaranteed that a lot of people will be very angry at that somebody. Sometimes all you can change is who is unhappy.

For example, a whole lot of Christians, Muslims, and possibly others believe that everybody who doesn't wholeheartedly accept their religion is not only wrong, but also going to suffer in hell for eternity. Those religions are mutually contradictory at their cores. And a probably smaller but still large number of athiests believe that all religion is mindrot that intrinsically reduces the human dignity of anybody who accepts it.

You can't solve that, no matter how smart you are. Favor one view and the other view loses. Favor none, and the other views say that a bunch of people are seriously harmed, even if it's voluntary. It doesn't even matter how you favor a view. Gentle persuasion is still a problem. OK, technically you can avoid people being mad about it after the fact by extreme mind surgery, but you can't reconcile their original values. You can prevent violent conflict by sheer force, but you can't remove the underlying issue.

Still, a lot of the approaches you describe are are pretty ham-handed even if you agree with the underlying values. Some of the desired outcomes you list even sound to me like good ideas... but you ought to be able to work toward those goals, even achieve them, without doing it in a way that pisses off the maximum possible number of people. So I guess I'm reacting to the extreme framing and the extreme measures. I don't think the Taliban actively want people to be mad.

[Edited unusually heavily after posting because apparently I can't produce coherent, low-typo text in the morning]

meedstrom on CFAR Takeaways: Andrew Critch

Basically agree, but not an useful comment.

I'd nuance that as that being alive and energetic is fun -- but when my body no longer grants energy, it's like death already. Say I'm trying to take notes about the content of this thread, but I'm so tired I barely produce anything. If the terms of my body are such that I must first do a timeskip to tomorrow to get more energy, then I want the timeskip.

I guess I understand becoming sleep-deprived and staying up anyway if you don't notice your IQ dropping...

mikbp on Is Musk still net-positive for humanity?

Oh, it is probably my mistake XD I'm also not native. I meant increase, not that it is the maximum it could be, sorry.

aram-panasenco on AGI Ruin: A List of Lethalities

I really appreciate this post, as much as it's making me feel that I and everyone I care about have terminal cancer with only 12-60 months to live.

I found the idea that a pivotal act is necessary as especially valuable and expanded on it in my post [Is AI Alignment Enough?](https://www.lesswrong.com/posts/tdrK7r4QA3ifbt2Ty/is-ai-alignment-enough)