LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

On the 2nd CWT with Jonathan Haidt
Zvi · 2024-04-05T17:30:05.223Z · comments (3)

[link] Video Intro to Guaranteed Safe AI
Mike Vaiana (mike-vaiana) · 2024-07-11T17:53:47.630Z · comments (0)

[link] Impact in AI Safety Now Requires Specific Strategic Insight
MiloSal (milosal) · 2024-12-29T00:40:53.780Z · comments (1)

Book Summary: Zero to One
bilalchughtai (beelal) · 2024-12-29T16:13:52.922Z · comments (2)

How do LLMs give truthful answers? A discussion of LLM vs. human reasoning, ensembles & parrots
Owain_Evans · 2024-03-28T02:34:21.799Z · comments (0)

[link] my favourite Scott Sumner blog posts
DMMF · 2024-06-11T14:40:43.093Z · comments (0)

AI #57: All the AI News That’s Fit to Print
Zvi · 2024-03-28T11:40:05.435Z · comments (14)

[link] Can a Bayesian Oracle Prevent Harm from an Agent? (Bengio et al. 2024)
mattmacdermott · 2024-09-01T07:46:26.647Z · comments (0)

Chat Bankman-Fried: an Exploration of LLM Alignment in Finance
claudia.biancotti · 2024-11-18T09:38:35.723Z · comments (4)

[question] Seeking AI Alignment Tutor/Advisor: $100–150/hr
MrThink (ViktorThink) · 2024-10-05T21:28:16.491Z · answers+comments (3)

Consequentialism is a compass, not a judge
Neil (neil-warren) · 2024-04-13T10:47:44.980Z · comments (6)

Sleeping on Stage
jefftk (jkaufman) · 2024-10-22T00:50:07.994Z · comments (3)

Preface
Allison Duettmann (allison-duettmann) · 2025-01-02T18:59:46.290Z · comments (1)

Just because an LLM said it doesn't mean it's true: an illustrative example
dirk (abandon) · 2024-08-21T21:05:59.691Z · comments (12)

Ackshually, many worlds is wrong
tailcalled · 2024-04-11T20:23:59.416Z · comments (42)

[link] what becoming more secure did for me
Chipmonk · 2024-08-22T17:44:48.525Z · comments (5)

What is the best argument that LLMs are shoggoths?
JoshuaFox · 2024-03-17T11:36:23.636Z · comments (22)

AI Safety University Organizing: Early Takeaways from Thirteen Groups
agucova · 2024-10-02T15:14:00.137Z · comments (0)

[link] Secret US natsec project with intel revealed
Nathan Helm-Burger (nathan-helm-burger) · 2024-05-25T04:22:11.624Z · comments (0)

[link] Death notes - 7 thoughts on death
Nathan Young · 2024-10-28T15:01:13.532Z · comments (1)

Optimizing Repeated Correlations
SatvikBeri · 2024-08-01T17:33:23.823Z · comments (1)

Do Sparse Autoencoders (SAEs) transfer across base and finetuned language models?
Taras Kutsyk · 2024-09-29T19:37:30.465Z · comments (8)

Distillation of 'Do language models plan for future tokens'
TheManxLoiner · 2024-06-27T20:57:34.351Z · comments (2)

$250K in Prizes: SafeBench Competition Announcement
ozhang (oliver-zhang) · 2024-04-03T22:07:41.171Z · comments (0)

Smartphone Etiquette: Suggestions for Social Interactions
Declan Molony (declan-molony) · 2024-06-04T06:01:03.336Z · comments (4)

My Dating Heuristic
Declan Molony (declan-molony) · 2024-05-21T05:28:40.197Z · comments (4)

SAE features for refusal and sycophancy steering vectors
neverix · 2024-10-12T14:54:48.022Z · comments (4)

Ideas for Next-Generation Writing Platforms, using LLMs
ozziegooen · 2024-06-04T18:40:24.636Z · comments (4)

Learning Multi-Level Features with Matryoshka SAEs
Bart Bussmann (Stuckwork) · 2024-12-19T15:59:00.036Z · comments (4)

[link] Emotional issues often have an immediate payoff
Chipmonk · 2024-06-10T23:39:40.697Z · comments (2)

[link] A brief history of the automated corporation
owencb · 2024-11-04T14:35:04.906Z · comments (1)

Mask and Respirator Intelligibility Comparison
jefftk (jkaufman) · 2024-12-07T03:20:01.585Z · comments (5)

[link] UK AISI: Early lessons from evaluating frontier AI systems
Zach Stein-Perlman · 2024-10-25T19:00:21.689Z · comments (0)

Intranasal mRNA Vaccines?
J Bostock (Jemist) · 2025-01-01T23:46:40.524Z · comments (2)

[link] overengineered air filter shelving
bhauth · 2024-11-08T22:04:39.987Z · comments (2)

Living with Rats in College
lsusr · 2024-12-25T10:44:13.085Z · comments (0)

Trying Bluesky
jefftk (jkaufman) · 2024-11-17T02:50:04.093Z · comments (17)

Action derivatives: You’re not doing what you think you’re doing
PatrickDFarley · 2024-11-21T16:24:04.044Z · comments (0)

[link] Introducing the Anthropic Fellows Program
Miranda Zhang (miranda-zhang) · 2024-11-30T23:47:29.259Z · comments (0)

AI #93: Happy Tuesday
Zvi · 2024-12-04T00:30:06.891Z · comments (2)

[link] Let's Design A School, Part 2.1 School as Education - Structure
Sable · 2024-05-02T22:04:30.435Z · comments (3)

Three Types of Constraints in the Space of Agents
Nora_Ammann · 2024-01-15T17:27:27.560Z · comments (3)

[link] Positive visions for AI
L Rudolf L (LRudL) · 2024-07-23T20:15:26.064Z · comments (4)

[question] Thoughts on Francois Chollet's belief that LLMs are far away from AGI?
O O (o-o) · 2024-06-14T06:32:48.170Z · answers+comments (17)

LessWrong email subscriptions?
Raemon · 2024-08-27T21:59:56.855Z · comments (6)

Bayesian inference without priors
DanielFilan · 2024-04-24T23:50:08.312Z · comments (8)

Improving SAE's by Sqrt()-ing L1 & Removing Lowest Activating Features
Logan Riggs (elriggs) · 2024-03-15T16:30:00.744Z · comments (5)

Vote in the LessWrong review! (LW 2022 Review voting phase)
habryka (habryka4) · 2024-01-17T07:22:17.921Z · comments (9)

The causal backbone conjecture
tailcalled · 2024-08-17T18:50:14.577Z · comments (0)

[link] Arrogance and People Pleasing
Jonathan Moregård (JonathanMoregard) · 2024-02-06T18:43:09.120Z · comments (7)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

nathan-helm-burger on Governance Course - Week 1 Reflections

Yeah, the thing the 'scaling extrapolation' view doesn't take into account is that as soon as radical speed-ups to algorithmic research are made possible by AI R&D agents, suddenly the trendlines for algorithmic progress should be projected to steepen. How much and for how long before slow-downs are hit? That's unclear. I think there is at least some substantial probability that no slow-downs are hit before full AGI, and some smaller but still considerable probability that the improvement cycle rushes forward at high speed past that point to ASI.

This should be assumed to potentially involve dramatic gains in both peak capabilities, and in efficiency and speed of training and inference. If so, then compute governance becomes completely irrelevant for blocking creation of dangerously powerful AI. It can still help put limits on the amount of inference used. Why? Because no matter how efficient the AI is, if you have more compute you have more parallel copies (and can run them faster up to the limits of the system, which is probably somewhere between 100x to 1000x human thought speed).

If we are going to head this off, we need new governance methods, and soon. Maybe really really soon, like, before the end of 2025. Hopefully we have until more like 2028, but we can't count on that for sure.

I have very little faith in current governments to implement and enforce policies that are more complex than things on the order of governance compute and chip export controls. Much less to do so within the short timeframes we are facing.

I think the conclusion this points towards is that we need new forms of governance. Not to replace existing governments, but to complement them. Voluntary mutual inspection contracts with privacy-respecting technology using AI inspectors. Something of that sort.

Here's some recent evidence of compute thresholds not being reliable: https://novasky-ai.github.io/posts/sky-t1/

Here's some self-links to some of my thoughts on this (I recommend reading the posts these comments are on as well):

https://www.lesswrong.com/posts/DvHokvyr2cZiWJ55y/2-skim-the-manual-intelligent-voluntary-cooperation?commentId=BBjpfYXWywb2RKjz5 [LW(p) · GW(p)]

https://www.lesswrong.com/posts/FEcw6JQ8surwxvRfr/human-takeover-might-be-worse-than-ai-takeover?commentId=uSPR9svtuBaSCoJ5P [LW(p) · GW(p)]

https://www.lesswrong.com/posts/tdrK7r4QA3ifbt2Ty/is-ai-alignment-enough?commentId=An6L68WETg3zCQrHT [LW(p) · GW(p)]

screwtape on One Day Sooner

(Self review) I stand by this essay and think more people should read it, though they don't need to read it deeply.

I think some people knew this kind of work and so this serves as a pointer to "yeah, that thing we did at my last company" and some people did not realize this was an option. Making people aware of potentially exciting options they could choose in life is (in my opinion) a good use of an essay. In my ideal world everyone would read something describing the One Day Sooner mindset as they were choosing their first careers so they could have it in mind as a possible trait jobs could have. Is that trait positive or negative? Depends on the person!

I don't know if it's LessWrong Best Of material, but given the number of people who work in this manner that work in the community I think it's good to have some term for it in the water supply.

Best when paired with Never Drop A Ball.

johnny-lin on Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning

apologies for the issue with the neuronpedia link. it's now been resolved.

p-joao on Could my work, "Beyond HaHa" benefit the LessWrong community?

You make a very good point—there are different ways to create contrasts in information that are quickly recognized, and that’s how I’ve come to understand humor. The faster the brain recognizes the information, the higher the chance it triggers a sense of pleasure, or perhaps falls somewhere along a spectrum of pleasure tied to recognizing patterns or resolving contrasts.

I also agree that many jokes can reduce authority. They signal that someone is not a threat, that they lower themselves to gain acceptance, which is often done by people who aren’t strong or authoritative and who use humor as a way to find their space. On the other hand, humor can also be used by authoritative figures to reinforce their power—when others laugh, it confirms that they don’t perceive the person as a threat. Some studies even suggest that chimpanzee laughter might be linked to this dynamic.

In my case, I would describe myself as someone who, in many ways, felt “weak” and used humor to create space for myself. I wasn’t in a position to demand authority outright. I had to teach skills like first aid in a very short amount of time, and I found that humor as positive reinforcement was much more effective than relying on negative reinforcement.

It’s a complex topic, isn’t it? There are so many variables in humor, but this is the perspective I’ve been able to develop so far: humor as something that operates on a spectrum of pleasure derived from the rapid recognition of information.

Additionally, I plan to share some stories about how humor has helped me stay attentive and better use what a class or learning environment offers in a more constructive way.

t3t on [New Feature] Your Subscribed Feed

Well, that's unfortunate. That feature isn't super polished and isn't currently in the active development path, but will try to see if it's something obvious. (In the meantime, would recommend subscribing to fewer people, or seeing if the issue persists in Chrome. Other people on the team are subscribed to 100-200 people without obvious issues.)

p-joao on Could my work, "Beyond HaHa" benefit the LessWrong community?

Thank you for your interest! My first idea for a post on LessWrong was actually about that—my journey from being a firefighter to discovering rationality. However, I hesitated because it felt very personal, and some of the most interesting parts of my story would be hard to verify. To summarize, I found myself unable to adapt to the "ethics" of the role, which eventually led me to leave and seek rationality as a way to rebuild my life. At the time, it felt like I had nothing left, as I had dedicated my entire life to becoming a firefighter.

Interestingly, there are some parallels between my experiences and the Brazilian movies Tropa de Elite. That kind of intense, complex environment leaves you with stories that are hard to explain but deeply shape who you are.

Thanks to your comment, though, I’m reconsidering publishing my story. Perhaps I could frame it as partly real, partly exaggerated—after all, not everything has to be 100% factual, right? Haha.

lc on [New Feature] Your Subscribed Feed

Just reproduced it; all I have to do is subscribe to a bunch of people and this happens and the site becomes unusable:

raemon on Ureshiku Naritai

I'm curious how this seems to have gone for you 14 years later.

lc on [New Feature] Your Subscribed Feed

The image didn't upload but it's a picture of my browser saying that the web page's javascript is using a ton of resources and I can force-stop it if I wish

jkaufman on Voluntary Salary Reduction

Pretty sure the salary transparency law doesn't apply to us, because you need 25+ MA employees. Even if it did, though, I think it would mostly mean giving moderately wider salary ranges? Which I expect would be fine; our two current open positions [1][2] have ranges of 23% and 30%.

[1] https://securebio.org/careers/2024-lab-tech/

[2] https://securebio.org/careers/2024-director-operations/