LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

COT Scaling implies slower takeoff speeds
Logan Zoellner (logan-zoellner) · 2024-09-28T16:20:00.320Z · comments (56)

Intent alignment as a stepping-stone to value alignment
Seth Herd · 2024-11-05T20:43:24.950Z · comments (4)

[link] Locally optimal psychology
Chipmonk · 2024-11-25T18:35:11.985Z · comments (7)

We’re not as 3-Dimensional as We Think
silentbob · 2024-08-04T14:39:16.799Z · comments (16)

I'm creating a deep dive podcast episode about the original Leverage Research - would you like to take part?
spencerg · 2024-09-22T14:03:22.164Z · comments (2)

Exploring SAE features in LLMs with definition trees and token lists
mwatkins · 2024-10-04T22:15:28.108Z · comments (5)

[link] Twitter thread on AI takeover scenarios
Richard_Ngo (ricraz) · 2024-07-31T00:24:33.866Z · comments (0)

Eye contact is effortless when you’re no longer emotionally blocked on it
Chipmonk · 2024-09-27T21:47:01.970Z · comments (24)

The murderous shortcut: a toy model of instrumental convergence
Thomas Kwa (thomas-kwa) · 2024-10-02T06:48:06.787Z · comments (0)

Doing Research Part-Time is Great
casualphysicsenjoyer (hatta_afiq) · 2024-11-22T19:01:15.542Z · comments (7)

[question] If I have some money, whom should I donate it to in order to reduce expected P(doom) the most?
KvmanThinking (avery-liu) · 2024-10-03T11:31:19.974Z · answers+comments (37)

A New Class of Glitch Tokens - BPE Subtoken Artifacts (BSA)
Lao Mein (derpherpize) · 2024-09-20T13:13:26.181Z · comments (7)

LASR Labs Spring 2025 applications are open!
Erin Robertson · 2024-10-04T13:44:20.524Z · comments (0)

Turning Your Back On Traffic
jefftk (jkaufman) · 2024-07-17T01:00:08.627Z · comments (7)

Orca communication project - seeking feedback (and collaborators)
Towards_Keeperhood (Simon Skade) · 2024-12-03T17:29:40.802Z · comments (16)

Distinguish worst-case analysis from instrumental training-gaming
Olli Järviniemi (jarviniemi) · 2024-09-05T19:13:34.443Z · comments (0)

An Introduction to Representation Engineering - an activation-based paradigm for controlling LLMs
Jan Wehner · 2024-07-14T10:37:21.544Z · comments (6)

[link] My Model of Epistemology
adamShimi · 2024-08-31T17:01:45.472Z · comments (1)

[link] Shifting Headspaces - Transitional Beast-Mode
Jonathan Moregård (JonathanMoregard) · 2024-08-12T13:02:06.120Z · comments (9)

Building Big Science from the Bottom-Up: A Fractal Approach to AI Safety
Lauren Greenspan (LaurenGreenspan) · 2025-01-07T03:08:51.447Z · comments (2)

[link] I didn't have to avoid you; I was just insecure
Chipmonk · 2024-08-17T16:41:50.237Z · comments (7)

The Laws of Large Numbers
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-04T11:54:16.967Z · comments (11)

[link] A Percentage Model of a Person
Sable · 2024-10-12T17:55:07.560Z · comments (3)

[link] Turning 22 in the Pre-Apocalypse
testingthewaters · 2024-08-22T20:28:25.794Z · comments (14)

[question] When is reward ever the optimization target?
Noosphere89 (sharmake-farah) · 2024-10-15T15:09:20.912Z · answers+comments (17)

A Matter of Taste
Zvi · 2024-12-18T17:50:07.201Z · comments (4)

Finding the Wisdom to Build Safe AI
Gordon Seidoh Worley (gworley) · 2024-07-04T19:04:16.089Z · comments (10)

[link] Big tech transitions are slow (with implications for AI)
jasoncrawford · 2024-10-24T14:25:06.873Z · comments (16)

[question] What are your cruxes for imprecise probabilities / decision rules?
Anthony DiGiovanni (antimonyanthony) · 2024-07-31T15:42:27.057Z · answers+comments (33)

Doomsday Argument and the False Dilemma of Anthropic Reasoning
Ape in the coat · 2024-07-05T05:38:39.428Z · comments (55)

Is the Power Grid Sustainable?
jefftk (jkaufman) · 2024-10-26T02:30:06.612Z · comments (38)

[link] UC Berkeley course on LLMs and ML Safety
Dan H (dan-hendrycks) · 2024-07-09T15:40:00.920Z · comments (1)

But Where do the Variables of my Causal Model come from?
Dalcy (Darcy) · 2024-08-09T22:07:57.395Z · comments (1)

Grammars, subgrammars, and combinatorics of generalization in transformers
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-02T09:37:23.191Z · comments (0)

[link] [Paper] Hidden in Plain Text: Emergence and Mitigation of Steganographic Collusion in LLMs
Yohan Mathew (ymath) · 2024-09-25T14:52:48.263Z · comments (2)

An anti-inductive sequence
Viliam · 2024-08-14T12:28:54.226Z · comments (10)

Fireplace and Candle Smoke
jefftk (jkaufman) · 2025-01-01T01:50:01.408Z · comments (4)

Debate: Is it ethical to work at AI capabilities companies?
Ben Pace (Benito) · 2024-08-14T00:18:38.846Z · comments (21)

Deep Learning is cheap Solomonoff induction?
Lucius Bushnaq (Lblack) · 2024-12-07T11:00:56.455Z · comments (1)

Cross-context abduction: LLMs make inferences about procedural training data leveraging declarative facts in earlier training data
Sohaib Imran (sohaib-imran) · 2024-11-16T23:22:21.857Z · comments (11)

My disagreements with "AGI ruin: A List of Lethalities"
Noosphere89 (sharmake-farah) · 2024-09-15T17:22:18.367Z · comments (46)

[link] The Way According To Zvi
Sable · 2024-12-07T17:35:48.769Z · comments (0)

Childhood and Education #8: Dealing with the Internet
Zvi · 2025-01-06T14:00:09.604Z · comments (7)

Searching for phenomenal consciousness in LLMs: Perceptual reality monitoring and introspective confidence
EuanMcLean (euanmclean) · 2024-10-29T12:16:18.448Z · comments (8)

[link] Toki pona FAQ
dkl9 · 2024-03-17T21:44:21.782Z · comments (8)

[link] Searching for the Root of the Tree of Evil
Ivan Vendrov (ivan-vendrov) · 2024-06-08T17:05:53.950Z · comments (14)

The Evolution of Humans Was Net-Negative for Human Values
Zack_M_Davis · 2024-04-01T16:01:10.037Z · comments (1)

[link] Claude 3 Opus can operate as a Turing machine
Gunnar_Zarncke · 2024-04-17T08:41:57.209Z · comments (2)

Drone Wars Endgame
RussellThor · 2024-02-01T02:30:46.161Z · comments (71)

Childhood and Education Roundup #5
Zvi · 2024-04-17T13:00:03.015Z · comments (4)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

no77e-noi on meemi's Shortform

Hey everyone, could you spell out to me what's the issue here? I read a lot of comments that basically assume "x and y are really bad" but never spell it out. So, is the problem that:

- Giving the benchmark to OpenAI helps capabilities (but don't they have a vast sea of hard problems to already train models on?)

- OpenAI could fake o3's capabilities (why do you care so much? This would slow down AI progress, not accelerate it)

- Some other thing I'm not seeing?

williamkiely on Everywhere I Look, I See Kat Woods

expressly attempts to tarnish someone's reputation

I don't think that's accurate. The OP clearly states:

One upfront caveat. I am speaking about “Kat Woods” the public figure, not the person. If you read something here and think, “That’s not a true/nice statement about Kat Woods”, you should know that I would instead like you to think “That’s not a true/nice statement about the public persona Kat Woods, the real human with complex goals who I'm sure is actually really cool if I ever met her, appears to be cultivating.”

the-gears-to-ascension on quetzal_rainbow's Shortform

Decision theory as discussed here heavily involves thinking about agents responding to other agents' decision processes

jamesian on Yonatan Cale's Shortform

My guess is that AI accelerators will have some difficult-to-modify persistent memory based on similar chips having it, but I'm not sure if it would be on the same die or not. I wrote more about how a firmware-based implementation of Offline Licensing might use H100 secure memory, clocks, and secure boot here: https://arxiv.org/abs/2404.18308

quwgri on Claude 3 claims it's conscious, doesn't want to die or be modified

There is one problem with this. It is not entirely clear whether an ordinary living person will talk about consciousness if he is brought up accordingly his whole life (not given any literature that mentions consciousness, never talking to him about qualia, et cetera...).

the-gears-to-ascension on What's Wrong With the Simulation Argument?

Sims are very cheap compared to space travel, and you need to know what you're dealing with in quite a lot of detail before you fly because you want to have mapped the entire space of possible negotiations in an absolutely ridiculous level of detail.

Sims built for this purpose would still be a lot lower detail than reality, but of course that would be indistinguishable from inside if the sim is designed properly. Maybe most kinds of things despawn in the sim when you look away, for example. Only objects which produce an ongoing computation that has influence on the resulting civ would need modeling in detail. Which I suspect would include every human on earth, due to small world effects, the internet, sensitive dependence on initial conditions, etc. Imagine how time travel movies imply the tiniest change can amplify - one needs enough detail to have a good map of that level of thing. Compare weather simulation.

Someone poor in Ghana might die and change the mood of someone working for ai training in Ghana, which subtly affects how the unfriendly AI that goes to space and affects alien civs is produced, or something. Or perhaps there's an uprising when they try to replace all human workers with robots. Modeling what you thought about now helps predict how good you'll be at the danceoff in your local town which affects the posts produced as training data on the public internet. Oh, come to think of it, where are we posting, and on what topic? Perhaps they needed to model your life in enough detail to have tight estimates of your posts, because those posts affect what goes on online.

But most of the argument for continuing to model humans seems to me to be the sensitive dependence on initial conditions, because it means you need an unintuitively high level of modeling detail in order to estimate what von Neumann probe wave is produced.

Still cheap - even in base reality earth right now is only taking up a little more energy than its tiny silhouette against the sun's energy output in all directions. A kardashev 2 civ would have no problem fuelling an optimized sim with a trillion trillion samples of possible aliens' origin processes. Probably superintelligent kardashev 1 even finds it quite cheap, could be less then earth's resources to do the entire sim including all parallel outcomes.

matthew-barnett on meemi's Shortform

I'm not completely sure, since I was not personally involved in the relevant negotiations for FrontierMath. However, what I can say is that Tamay already indicated that Epoch should have tried harder to obtain different contract terms that enabled us to have greater transparency. I don't think it makes sense for him to say that unless he believes it was feasible to have achieved a different outcome.

Also, I want to clarify that this new benchmark is separate from FrontierMath and we are under different constraints with regards to it.

guive on Don’t ignore bad vibes you get from people

I think this approach is reasonable for things where failure is low stakes. But I really think it makes sense to be extremely conservative about who you start businesses with. Your ability to verify things is limited, and there may still be information in vibes even after updating on the results of all feasible efforts to verify someone's trustworthiness.

seed on Everywhere I Look, I See Kat Woods

I personally found the memes funny. To address your objection:

Overall, the content she posts feels like engagement bait. It feels like it is trying to convince me of something rather than make me smarter about something. It feels like it is trying to convey feelings at me rather than facts. It feels like it is making me stupider.
To give an analogy, it feels like PETA content. When I initially went vegan, it wasn’t PETA content that convinced me. It was Brian Tomasik content and videos of grinding male chicks. While it’s true that I am "out of distribution" so to speak, popular consensus is that PETA’s attempts at memetic content are mostly cringe. Kat Woods, why would you want to make content like that?

The goal of rationalist community is to make people smarter and more rational. Thus we have a norm: we should aim to explain and not persuade. This isn't a norm in a wider world; persuading other people to your point of view is a socially acceptable way to achieve your goals. It seems to me you are trying to enforce lesswrong norms outside of lesswrong; why?

The goal of AI safety isn't to make people smarter, it is to prevent unsafe AI from being deployed. Conveying feelings isn't inherently bad. I would agree that manipulating people's feelings to change their beliefs contrary to facts is bad. But that is not the only possible purpose of conveying feelings. People can be moved to act by feelings, we often feel better when we know that others share our feelings, and we can get along better if we understand each other's feelings and don't hurt them, and humor is valuable in its own right. I also wouldn't say that these posts you sited are all feelings based, many of them are debunking faulty arguments that many people make. That's just valid discourse.

The goal of PETA isn't to be popular, it is to protect animals. Here are their accomplishments as listed on their website (I trust they're not lying):

PETA persuaded more than a dozen companies, including Pfizer and Johnson & Johnson, to make the abusive and pointless forced swim test a thing of the past. Laboratories conduct these experiments by dosing mice, rats, guinea pigs, gerbils, or hamsters with a test substance, dropping them into inescapable containers of water, and watching as the petrified animals frantically look for an escape. See other victories for animals who are used in experiments.
In 1995 after two years of negotiations with—and more than 400 demonstrations against—the company worldwide, McDonald’s became the first fast-food chain to agree to make basic welfare improvements for farmed animals. Now, thanks largely to PETA’s outreach and persistence, you can’t visit a fast-food restaurant without seeing a vegan option, whether it’s Burger King’s or Carl’s Jr.’s animal-free burgers, Del Taco’s vegan beef burritos, or WaBa Grill’s plant-based steak bowls. The vegan revolution is here.
Undercover investigations of pig-breeding factory farms in North Carolina and Oklahoma revealed horrific conditions and daily abuse of pigs, including the fact that one pig was skinned alive, leading to the first-ever felony indictments of farm workers. See other victories for animals who are used for food.
After persistent campaigning by PETA U.S., other PETA entities, and our supporters around the world, Canada Goose joined the ever-growing list of top fashion brands that have sworn off fur, including Prada, Coach, Versace, Michael Kors, Balmain, Gucci, Calvin Klein, and Burberry. And we’re toppling other industries, too. After we released the results of PETA Asia’s investigation into the angora rabbit fur industry, more than 100 major brands suspended their use of the material, including Gap, H&M, Ralph Lauren, Topshop, UNIQLO, and Zara. And following the release of the first-of-its-kind undercover PETA investigation into one of the world’s largest alpaca-fleece producers, we persuaded more than 65 companies to make the compassionate decision to ban the material. See other victories for animals exploited for fashion.
After 36 years of protests from PETA members and supporters against Ringling Bros. and Barnum & Bailey Circus, it stopped using animals in its shows. Ringling is planning its return to the big top, without animals—sending a powerful message to the entire industry and echoing what we’ve been saying for decades: Animals don’t belong in the circus or in any other form of entertainment. In a landmark case, our Endangered Species Act (ESA) lawsuit against Tiger King villain Tim Stark and Indiana roadside zoo Wildlife in Need succeeded—setting a precedent that premature separation of lion, tiger, and lion/tiger hybrid cubs and mothers; declawing; and cub-petting violate federal law. We also played an integral role in a major victory when the U.S. Department of Justice seized 69 protected big cats from Lauren and Jeff Lowe, operators of Tiger King Park in Oklahoma, and won its own ESA lawsuit against the Lowes. See other victories for animals used for entertainment.
PETA persuaded Mobil, Texaco, Pennzoil, Shell, and other oil companies to cover their exhaust stacks after showing how millions of birds and bats had become trapped in the shafts and been burned to death. See other victories for wildlife.
Thanks to PETA’s lengthy campaign to push PETCO to take more responsibility for the animals in its stores, the company agreed to stop selling large birds and to make provisions for the millions of rats and mice in its care. See other victories for abused companion animals.

It seems what PETA does works.

But honestly, is this content for the greater good? Are the clickbait titles causing people to earnestly engage? Are peoples’ minds being changed? Are people thinking thoughtfully about the facts and ideas being presented?

If this looks a lot like brand memeing or PETA advocacy, I'd expect it to work about as well. Meaning at least somewhat well. What makes you think it doesn't work, apart from your own feelings? I'm not a subscriber to the subreddits and I don't know what their vibe or level of seriousness is, so the memes may indeed be out of place on some of these subreddits. I have no opinion on that. I agree it would be good if someone collected data and learned which advocacy methods are most effective.

What would I do instead

Maybe you should do it.

mateusz-baginski on meemi's Shortform

to the extent this is feasible for us

Was [keeping FrontierMath entirely private and under Epoch's control] feasible for Epoch in the same sense of "feasible" you are using here?