LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

[link] Thoughts on seed oil
dynomight · 2024-04-20T12:29:14.212Z · comments (114)

[link] My hour of memoryless lucidity
Eric Neyman (UnexpectedValues) · 2024-05-04T01:40:56.717Z · comments (16)

[link] Introducing AI Lab Watch
Zach Stein-Perlman · 2024-04-30T17:00:12.652Z · comments (15)

Funny Anecdote of Eliezer From His Sister
Daniel Birnbaum (daniel-birnbaum) · 2024-04-22T22:05:31.886Z · comments (4)

Refusal in LLMs is mediated by a single direction
Andy Arditi (andy-arditi) · 2024-04-27T11:13:06.235Z · comments (75)

[question] Examples of Highly Counterfactual Discoveries?
johnswentworth · 2024-04-23T22:19:19.399Z · answers+comments (91)

On Not Pulling The Ladder Up Behind You
Screwtape · 2024-04-26T21:58:29.455Z · comments (14)

Mechanistically Eliciting Latent Behaviors in Language Models
Andrew Mack (andrew-mack) · 2024-04-30T18:51:13.493Z · comments (36)

Ironing Out the Squiggles
Zack_M_Davis · 2024-04-29T16:13:00.371Z · comments (34)

[link] Simple probes can catch sleeper agents
Monte M (montemac) · 2024-04-23T21:10:47.784Z · comments (15)

[question] Which skincare products are evidence-based?
Vanessa Kosoy (vanessa-kosoy) · 2024-05-02T15:22:12.597Z · answers+comments (43)

The first future and the best future
KatjaGrace · 2024-04-25T06:40:04.510Z · comments (11)

Why I'm doing PauseAI
Joseph Miller (Josephm) · 2024-04-30T16:21:54.156Z · comments (14)

Key takeaways from our EA and alignment research surveys
Cameron Berg (cameron-berg) · 2024-05-03T18:10:41.416Z · comments (6)

Explaining a Math Magic Trick
Robert_AIZI · 2024-05-05T19:41:52.048Z · comments (8)

[link] "AI Safety for Fleshy Humans" an AI Safety explainer by Nicky Case
habryka (habryka4) · 2024-05-03T18:10:12.478Z · comments (10)

[link] Uncovering Deceptive Tendencies in Language Models: A Simulated Company AI Assistant
Olli Järviniemi (jarviniemi) · 2024-05-06T07:07:05.019Z · comments (4)

Priors and Prejudice
MathiasKB (MathiasKirkBonde) · 2024-04-22T15:00:41.782Z · comments (16)

A couple productivity tips for overthinkers
Steven Byrnes (steve2152) · 2024-04-20T16:05:50.332Z · comments (9)

ACX Covid Origins Post convinced readers
ErnestScribbler · 2024-05-01T13:06:20.818Z · comments (7)

[link] introduction to cancer vaccines
bhauth · 2024-05-05T01:06:16.972Z · comments (9)

Rejecting Television
Declan Molony (declan-molony) · 2024-04-23T04:59:50.253Z · comments (10)

[link] Motivation gaps: Why so much EA criticism is hostile and lazy
titotal (lombertini) · 2024-04-22T11:49:59.389Z · comments (5)

[link] The Inner Ring by C. S. Lewis
Saul Munn (saul-munn) · 2024-04-24T22:48:09.228Z · comments (6)

Towards Multimodal Interpretability: Learning Sparse Interpretable Features in Vision Transformers
hugofry · 2024-04-29T20:57:35.127Z · comments (7)

Q&A on Proposed SB 1047
Zvi · 2024-05-02T15:10:02.916Z · comments (5)

LessWrong Community Weekend 2024, open for applications
UnplannedCauliflower · 2024-05-01T10:18:21.992Z · comments (1)

Constructability: Plainly-coded AGIs may be feasible in the near future
Épiphanie Gédéon (joy_void_joy) · 2024-04-27T16:04:45.894Z · comments (12)

Duct Tape security
Isaac King (KingSupernova) · 2024-04-26T18:57:05.659Z · comments (9)

Introducing AI-Powered Audiobooks of Rational Fiction Classics
Askwho · 2024-05-04T17:32:49.719Z · comments (12)

Questions for labs
Zach Stein-Perlman · 2024-04-30T22:15:55.362Z · comments (9)

AXRP Episode 31 - Singular Learning Theory with Daniel Murfet
DanielFilan · 2024-05-07T03:50:05.001Z · comments (4)

Take the wheel, Shoggoth! (LW frontpage algorithm experiments)
Ruby · 2024-04-23T03:58:43.443Z · comments (16)

AISC9 has ended and there will be an AISC10
Linda Linsefors · 2024-04-29T10:53:18.812Z · comments (2)

On Llama-3 and Dwarkesh Patel’s Podcast with Zuckerberg
Zvi · 2024-04-22T13:10:02.645Z · comments (4)

[link] Improving Dictionary Learning with Gated Sparse Autoencoders
Senthooran Rajamanoharan (SenR) · 2024-04-25T18:43:47.003Z · comments (35)

Deep Honesty
Aletheophile (aletheo) · 2024-05-07T20:31:48.734Z · comments (6)

[link] "Why I Write" by George Orwell (1946)
Arjun Panickssery (arjun-panickssery) · 2024-04-25T16:02:28.668Z · comments (2)

So What's Up With PUFAs Chemically?
J Bostock (Jemist) · 2024-04-27T13:32:52.159Z · comments (23)

Changes in College Admissions
Zvi · 2024-04-24T13:50:03.487Z · comments (10)

Transcoders enable fine-grained interpretable circuit analysis for language models
Jacob Dunefsky (jacob-dunefsky) · 2024-04-30T17:58:09.982Z · comments (12)

Now THIS is forecasting: understanding Epoch’s Direct Approach
Elliot_Mckernon (elliot) · 2024-05-04T12:06:48.144Z · comments (4)

[question] Shane Legg's necessary properties for every AGI Safety plan
jacquesthibs (jacques-thibodeau) · 2024-05-01T17:15:41.233Z · answers+comments (12)

[link] This is Water by David Foster Wallace
Nathan Young · 2024-04-24T21:21:09.445Z · comments (16)

Transfer Learning in Humans
niplav · 2024-04-21T20:49:42.595Z · comments (1)

[link] Let's Design A School, Part 1
Sable · 2024-04-23T21:50:20.937Z · comments (3)

Superposition is not "just" neuron polysemanticity
LawrenceC (LawChan) · 2024-04-26T23:22:06.066Z · comments (4)

Spatial attention as a “tell” for empathetic simulation?
Steven Byrnes (steve2152) · 2024-04-26T15:10:58.040Z · comments (11)

[link] How do open AI models affect incentive to race?
jessicata (jessica.liu.taylor) · 2024-05-07T00:33:20.658Z · comments (12)

[link] LLMs seem (relatively) safe
JustisMills · 2024-04-25T22:13:06.221Z · comments (24)

next page (older posts) →

Archive

Recent comments

itay-dreyfus on Designing for a single purpose

That's interesting, thanks.

Reminds me of small giants, which is a very similar concept.
https://museapp.com/podcast/24-small-giants/

akash-wasil on RobertM's Shortform

I haven't followed this in great detail, but I do remember hearing from many AI policy people (including people at the UKAISI) that such commitments had been made.

It's plausible to me that this was an example of "miscommunication" rather than "explicit lying." I hope someone who has followed this more closely provides details.

But note that I personally think that AGI labs have a responsibility to dispel widely-believed myths. It would shock me if OpenAI/Anthropic/Google DeepMind were not aware that people (including people in government) believed that they had made this commitment. If you know that a bunch of people think you committed to sending them your models, and your response is "well technically we never said that but let's just leave it ambiguous and then if we defect later we can just say we never committed", I still think it's fair for people to be disappointed in the labs.

(I do think this form of disappointment should not be conflated with "you explicitly said X and went back on it", though.)

review-bot on Reconsider the anti-cavity bacteria if you are Asian

The LessWrong Review [? · GW] runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2025. The top fifty or so posts are featured prominently on the site throughout the year. Will this post make the top fifty?

bogdan-ionut-cirstea on Bogdan Ionut Cirstea's Shortform

Contra both the 'doomers' and the 'optimists' on (not) pausing. Rephrased: RSPs (done right) seem right.

Contra 'doomers'. Oversimplified, 'doomers' (e.g. PauseAI, FLI's letter, Eliezer) ask(ed) for pausing now / even earlier - (e.g. the Pause Letter). I expect this would be / have been very much suboptimal, even purely in terms of solving technical alignment. For example, Some thoughts on automating alignment research [LW · GW] suggests timing the pause so that we can use automated AI safety research could result in '[...] each month of lead that the leader started out with would correspond to 15,000 human researchers working for 15 months.' We clearly don't have such automated AI safety R&D capabilities now, suggesting that pausing later, when AIs are closer to having the required automated AI safety R&D capabilities would be better. At the same time, current models seem very unlikely to be x-risky (e.g. they're still very bad at passing dangerous capabilities evals), which is another reason to think pausing now would be premature.

Contra 'optimists'. I'm more unsure here, but the vibe I'm getting from e.g. AI Pause Will Likely Backfire (Guest Post) [LW · GW] is roughly something like 'no pause ever'; largely based on arguments of current systems seeming easy to align / control. While I agree with the point that current systems do seem easy to align / control and I could even see this holding all the way up to ~human-level automated AI safety R&D, I can easily see scenarios where around that time things get scary quickly without any pause. For example, similar arguments to those about the scalability of automated AI safety R&D suggest automated AI capabilities R&D could also be scaled up significantly. For example, figures like those in Before smart AI, there will be many mediocre or specialized AIs [LW · GW] suggest very large populations of ~human-level automated AI capabilities researchers could be deployed (e.g. 100x larger than the current [human] population of AI researchers). Given that even with the current relatively small population, algorithmic progress seems to double LM capabilities ~every 8 months, it seems like algorithmic progress could be much faster with 100x larger populations, potentially leading to new setups (e.g. new AI paradigms, new architectures, new optimizers, synthetic data, etc.) which could quite easily break the properties that make current systems seem relatively easy / safe to align. In this scenario, pausing to get this right (especially since automated AI safety R&D would also be feasible) seems like it could be crucial.

jiao-bu on Dating Roundup #3: Third Time’s the Charm

There is something I have been exploring, being back into the dating market in the USA after more than a decade of blessed expatriatism, and am currently seeing people and exploring all this.

Culturally, what are women supposed to do for men? No stative verbs (am/is/are/was/were/be/being/been), no nouns, no adjectives, but like what are the top 5 action verbs that women should be doing for a man and if she isn't, there should be a good reason or maybe he's going to leave? Or even 5 or 6 important ones or even mundane-but-expected ones? I can think of a list with regard to men, some of which are simple like hold the door or bring flowers, some of which are complex (like the thing above about flowing well with money)... but what verbs are like totally important and expected for women to do?

I think it's a disservice to women to not have some explicit expectations or even setting bars. But the answers could also just be in my own blindspot. I'm curious and I hope the question is appropriate here.

tigerlily on How would you navigate a severe financial emergency with no help or resources?

Thanks, I have done DataAnnotation already a few months back. It's true that it's difficult to get assignments there after you finish the first one or two. They supposedly have tons of work for people who specialize in certain tech roles, but that obviously won't apply to most people. There is also virtually no way to contact anyone who works at DataAnnotation if you have questions. But I have made a few dollars there.

darklight on Cooperation is optimal, with weaker agents too - tldr

I already tried discussing a very similar concept I call Superrational Signalling in this post [LW · GW]. It got almost no attention, and I have doubts that Less Wrong is receptive to such ideas.

I also tried actually programming a Game Theoretic simulation to try to test the idea, which you can find here, along with code and explanation. Haven't gotten around to making a full post about it though (just a shortform).

mako-yass on Cohabitive Games so Far

(I'm aware of most of these games)

I made it pretty clear in the article that it isn't about purely cooperative games. (Though I wonder if they'd be easier to adapt. Cooperative + complications seems closer to the character of a cohabitive game than competitive + non-zero-sum score goals do...)

Gloomhaven seems, and describes itself as being a cooperative game. What competitive elements are you referring to?

The third tier is worth talking about. I think these sorts of games might, if you played them enough, teach the same skills, but I think you'd have to play them for a long time. My expectation is that basically all of them end with a ranking? as you said, first, second, third. The ranking isn't scored, (ie, we aren't told that being second is half as good as being first) so there's not much clarity about how much players should value them, which is one obstacle to learning. Rankings also keep the game zero sum on net, and zero sum dynamics between first and second or between first and the alliance have the focus of your attention most of the time. The fewer or the more limited mutually beneficial deals are, the less social learning there will be. Zero sum dynamics need to be discussed in cohabitive games, but the games will support more efficient learning if they're reduced.
And there really are a lot of people who think that the game that humans are playing in the real world is zero sum, that all real games are zero sum, so, I also suspect that these sorts of games might never teach the skill, because to teach the skill you have to show them a way out of that mindset, and all they do is reinforce it.

competitive [...] not usually permanent alliances are critical to victory: Diplomacy, Twilight Imperium (all of them), Cosmic Encounter

This category is really interesting, because the alliances expire and have to be remade multiple times per game, and I've been meaning to play some games from this category, but they're also a lot more foggy, the agreements are of poor quality, they invite only limited amounts of foresight and social creativity, in contrast, writing good legislation in the real world seems to require more social creativity than we can currently produce.

slapstick on Dating Roundup #3: Third Time’s the Charm

Very interesting post! I enjoyed it! Just had some thoughts about the poly section.

If you are polyamorous, and you meet someone plausibly 25% better, or even someone 0% better (I mean the person you are with is pretty good, no?) you are honor bound to try and make it happen.

I'm not sure why you'd be honour bound to make that work. Maybe the phrasing is just being hyperbolic but I don't think refraining from pursuing a romantic relationship damages your poly honour.

Most people are not hyper-skilled in anything. Certainly they are not hyper-skilled in communication, emotional regulation and self-awareness.

If you define "hyper-skilled" as "way more skilled than average" then what you're saying is true by definition. If its not defined relative to everyone else in a given culture, I think you can certainly say most people are hyper skilled at communication, emotional regulation and self-awareness in ways which their culture requires of them.

For example, most people in highly religious/authoritarian cultures are adept at those social skills which prevent them from being ostracized and condemned. Not reacting violently to insults would be considered hyper skilled in some cultures whereas it's the minimum in others.

With that In mind I don't think polyamory is as unrealistic or as demanding in its requirements as you make it out to be. People tend to become hyper skilled socially when it's a requirement for what they're doing, and when it's normalized within their culture. If other structures are in place to replace the requirement for those particular skills, they won't develop.

Polyamory probability selects for people who are socially skilled in the ways that help with polyamory, but being polyamorous also helps to develop those skills.

I think it's fair to say that for many or most people it would be too costly to try to switch from monogamy towards polyamory when they've already been highly invested in developing their monogamy toolbox. I think that's very different from saying only a small percentage of people have the capacity/potential to flourish being poly.

Scott then follows up with a highlights from the comments, where the arguments against polyamory seem convincing

I read most of the comments and I think pretty much all of the arguments against polyamory are coming from monogamous people with very limited/no experience with polyamory or polyamorous people. Not to say that discredits their arguments, but I'm typically pretty sceptical of arguments about lifestyles that are widely considered distasteful, coming from people who are far removed from those lifestyles, based on a couple anecdotes, if any.

Monogamous people are also already having way fewer children, and the type of person deciding to be polyamorous probably correlates pretty strongly with the type of person already deciding not to have kids. I don't think there's really good arguments that kids of poly people will be worse off, most of those arguments refer to practices which aren't essential to being Poly. Many of the arguments appeal to reference classes that aren't particularly applicable to a scenario where things are being done with care intentionally as opposed to as a result of scarcity, neglect, and unforseen challenging circumstances.

davidmanheim on How do top AI labs vet architecture/algorithm changes?

Most of this seems to be subsumed in the general question of how do you do research, and there's lot of advice, but it's (ironically) not at all a science. From my limited understanding of what goes on in the research groups inside these companies, it's a combination of research intuition, small scale testing, checking with others and discussing the new approach, validating your ideas, and getting buy-in from people higher up that it's worth your and their time to try the new idea. Which is the same as research generally.

At that point, I'll speculate and assume whatever idea they have is validated in smaller but still relatively large settings. For things like sample efficiency, they might, say, train a GPT-3 size model, which now cost only a fraction of the researcher's salary to do. (Yes, I'm sure they all have very large compute budgets for their research.) If the results are still impressive, I'm sure there is lots more discussion and testing before actually using the method in training the next round of frontier models that cost huge amounts of money - and those decisions are ultimately made by the teams building those models, and management.