LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Paper Summary: Princes and Merchants: European City Growth Before the Industrial Revolution
Jeffrey Heninger (jeffrey-heninger) · 2024-07-15T21:30:04.043Z · comments (1)

[LDSL#4] Root cause analysis versus effect size estimation
tailcalled · 2024-08-11T16:12:14.604Z · comments (0)

[link] AI Safety Memes Wiki
plex (ete) · 2024-07-24T18:53:04.977Z · comments (1)

[question] Where to find reliable reviews of AI products?
Elizabeth (pktechgirl) · 2024-09-17T23:48:25.899Z · answers+comments (4)

But Where do the Variables of my Causal Model come from?
Dalcy (Darcy) · 2024-08-09T22:07:57.395Z · comments (1)

Representation Tuning
Christopher Ackerman (christopher-ackerman) · 2024-06-27T17:44:33.338Z · comments (4)

[link] AI forecasting bots incoming
Dan H (dan-hendrycks) · 2024-09-09T19:14:31.050Z · comments (44)

Childhood and Education Roundup #6: College Edition
Zvi · 2024-06-26T11:40:03.990Z · comments (8)

[link] New blog: Expedition to the Far Lands
Connor Leahy (NPCollapse) · 2024-08-17T11:07:48.537Z · comments (3)

Monthly Roundup #19: June 2024
Zvi · 2024-06-25T12:00:03.333Z · comments (9)

Reading More Each Day: A Simple $35 Tool
aysajan · 2024-07-24T13:54:04.290Z · comments (2)

Evaporation of improvements
Viliam · 2024-06-20T18:34:40.969Z · comments (27)

[question] What Other Lines of Work are Safe from AI Automation?
RogerDearnaley (roger-d-1) · 2024-07-11T10:01:12.616Z · answers+comments (35)

AI Constitutions are a tool to reduce societal scale risk
Sammy Martin (SDM) · 2024-07-25T11:18:17.826Z · comments (2)

DIY RLHF: A simple implementation for hands on experience
Mike Vaiana (mike-vaiana) · 2024-07-10T12:07:03.047Z · comments (0)

Response to Dileep George: AGI safety warrants planning ahead
Steven Byrnes (steve2152) · 2024-07-08T15:27:07.402Z · comments (7)

Appraising aggregativism and utilitarianism
Cleo Nardo (strawberry calm) · 2024-06-21T23:10:37.014Z · comments (10)

[link] If-Then Commitments for AI Risk Reduction [by Holden Karnofsky]
habryka (habryka4) · 2024-09-13T19:38:53.194Z · comments (0)

Investigating Sensitive Directions in GPT-2: An Improved Baseline and Comparative Analysis of SAEs
Daniel Lee (daniel-lee) · 2024-09-06T02:28:41.954Z · comments (0)

[question] Me & My Clone
SimonBaars (simonbaars) · 2024-07-18T16:25:40.770Z · answers+comments (22)

Cheap Whiteboards!
Johannes C. Mayer (johannes-c-mayer) · 2024-08-08T13:52:59.627Z · comments (2)

Deceptive agents can collude to hide dangerous features in SAEs
Simon Lermen (dalasnoin) · 2024-07-15T17:07:33.283Z · comments (0)

[link] AI Safety at the Frontier: Paper Highlights, August '24
gasteigerjo · 2024-09-03T19:17:24.850Z · comments (0)

[link] Video Intro to Guaranteed Safe AI
Mike Vaiana (mike-vaiana) · 2024-07-11T17:53:47.630Z · comments (0)

RLHF is the worst possible thing done when facing the alignment problem
tailcalled · 2024-09-19T18:56:27.676Z · comments (6)

[link] Positive visions for AI
L Rudolf L (LRudL) · 2024-07-23T20:15:26.064Z · comments (4)

Links and brief musings for June
Kaj_Sotala · 2024-07-06T10:10:03.344Z · comments (0)

Optimizing Repeated Correlations
SatvikBeri · 2024-08-01T17:33:23.823Z · comments (1)

LessWrong email subscriptions?
Raemon · 2024-08-27T21:59:56.855Z · comments (6)

Just because an LLM said it doesn't mean it's true: an illustrative example
dirk (abandon) · 2024-08-21T21:05:59.691Z · comments (12)

Talk: AI safety fieldbuilding at MATS
Ryan Kidd (ryankidd44) · 2024-06-23T23:06:37.623Z · comments (2)

[link] Can a Bayesian Oracle Prevent Harm from an Agent? (Bengio et al. 2024)
mattmacdermott · 2024-09-01T07:46:26.647Z · comments (0)

[question] How are you preparing for the possibility of an AI bust?
Nate Showell · 2024-06-23T19:13:45.247Z · answers+comments (16)

The causal backbone conjecture
tailcalled · 2024-08-17T18:50:14.577Z · comments (0)

Distinguish worst-case analysis from instrumental training-gaming
Olli Järviniemi (jarviniemi) · 2024-09-05T19:13:34.443Z · comments (0)

[link] Beware the science fiction bias in predictions of the future
Nikita Sokolsky (nikita-sokolsky) · 2024-08-19T05:32:47.372Z · comments (20)

[link] Fictional parasites very different from our own
Abhishaike Mahajan (abhishaike-mahajan) · 2024-09-08T14:59:39.080Z · comments (0)

An experiment on hidden cognition
Olli Järviniemi (jarviniemi) · 2024-07-22T03:26:05.564Z · comments (2)

Using an LLM perplexity filter to detect weight exfiltration
Adam Karvonen (karvonenadam) · 2024-07-21T18:18:05.612Z · comments (11)

[link] An Intuitive Explanation of Sparse Autoencoders for Mechanistic Interpretability of LLMs
Adam Karvonen (karvonenadam) · 2024-06-25T15:57:16.872Z · comments (0)

[link] A primer on the next generation of antibodies
Abhishaike Mahajan (abhishaike-mahajan) · 2024-09-01T22:37:59.207Z · comments (0)

Evaluating Sparse Autoencoders with Board Game Models
Adam Karvonen (karvonenadam) · 2024-08-02T19:50:21.525Z · comments (1)

Housing Roundup #9: Restricting Supply
Zvi · 2024-07-17T12:50:05.321Z · comments (8)

[link] Announcing Open Philanthropy's AI governance and policy RFP
Julian Hazell (julian-hazell) · 2024-07-17T02:02:39.933Z · comments (0)

[link] MIRI's July 2024 newsletter
Harlan · 2024-07-15T21:28:17.343Z · comments (2)

Proving the Geometric Utilitarian Theorem
StrivingForLegibility · 2024-08-07T01:39:10.920Z · comments (0)

Can Generalized Adversarial Testing Enable More Rigorous LLM Safety Evals?
scasper · 2024-07-30T14:57:06.807Z · comments (0)

The Wisdom of Living for 200 Years
Martin Sustrik (sustrik) · 2024-06-28T04:44:10.609Z · comments (3)

A Visual Task that's Hard for GPT-4o, but Doable for Primary Schoolers
Lennart Finke (l-f) · 2024-07-26T17:51:28.202Z · comments (4)

[question] What's the Deal with Logical Uncertainty?
Ape in the coat · 2024-09-16T08:11:43.588Z · answers+comments (19)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

lao-mein on A New Class of Glitch Tokens - BPE Subtoken Artifacts (BSA)

You're really not going to like the fact that the GPT4o tokenizer has every single number below 1000 tokenized. It's not a hand-crafted feature, since the token_ids are all over the place. I think they had to manually remove larger number tokens (there are none above 999).

I feel like I need a disclaimer like the South Park episode. This is what is actually inside the tokenizer.

['37779', '740'],
['47572', '741'],
['48725', '742'],
['49191', '743'],
['46240', '744'],
['44839', '745'],
['47433', '746'],
['42870', '747'],
['39478', '748'],
['44712', '749'],

They also have plenty of full width numbers (generally only used in Chinese and Japanese to not mess with spacing) and numbers in other languages in there.

['14334', '十'],
['96681', '十一'],
['118633', '十三'],
['138884', '十九'],
['95270', '十二'],
['119007', '十五'],
['107205', '十八'],
['180481', '十四']
['42624', '零'],
['14053', '０'],
['49300', '００'],
['10888', '１'],
['64980', '１０'],
['141681', '１００'],
['113512', '１１'],
['101137', '１２'],
['123326', '１３'],
['172589', '１４'],
['126115', '１５'],
['171221', '１６']

Maybe they use a different tokenizer for math problems? Maybe the multi-digit number tokenizers are only used in places where there are a lot of id numbers? I can only wonder.

sharmake-farah on The Information: OpenAI shows 'Strawberry' to feds, races to launch it

The big answer, now that we know what o1 was made using Q*/Strawberry, is essentially that Strawberry/Q* did 2 very important things:

It cracked the code on how to make a General Purpose Search that scales with more compute, and in particular the model can now adaptively think for longer on harder problems.

In essence, OpenAI figured out how to implement General Purpose Search scalably:

https://www.lesswrong.com/posts/6mysMAqvo9giHC4iX/what-s-general-purpose-search-and-why-might-we-expect-to-see [LW · GW]

It unlocked a new inference scaling law, which in particular means that more compute can reliably solve more problems at inference.

This makes AI capabilities harder to contain, since it's easier to have large inference runs than large training runs.

viliam on Why good things often don’t lead to better outcomes

Maybe related: Evaporation of improvements [LW · GW]

thane-ruthenis on GPT-o1

I expect there are still significant differences between your model and the "LLM Whisperer" model, though I notice I'm not quite sure what you'd say they are. Mind highlighting any cruxes you see?

tapatakt on Tapatakt's Shortform

About possible backlashes from unsuccesfull communication [LW(p) · GW(p)].

I hoped for some examples like "anti-war movies have unintentionally boosted military recruitment", which is the only example I remembered myself.

Asked the same question to Claude, it gave me this examples:

Scared Straight programs: These programs, designed to deter juvenile delinquency by exposing at-risk youth to prison life, have been shown to actually increase criminal behavior in participants.
The "Just Say No" anti-drug campaign: While well-intentioned, some research suggests this oversimplified message may have increased drug use among certain groups by triggering a "forbidden fruit" effect.

All others were not much relevant, mostly like "harm of this oversimplified communication was in oversimplification".

The common thing in two relevant examples and my own example about anti-war movies is, I think, "try to ensure you don't make bad thing look cool". Got it.

But is it all? Are there any examples that don't come down to this?

nathan-helm-burger on A New Class of Glitch Tokens - BPE Subtoken Artifacts (BSA)

Wow, just wow. Sure seems like Gwern has it spot on here that OpenAI engineers are being rushed and sloppy about this.

Kinda pisses me off, considering that I applied to work for OpenAI between GPT-2 and GPT-3, and this isn't the sort of mistake I would make. Yet, they rejected my application. (note: given the current state of OpenAI, I wouldn't apply today!)

Building my own small LLMs around that time for practice, the low frequency tokens and the tokenization of digits were among the first things I checked!

I tried a quick search for Anthropic to see if they are doing the same nonsense. Found this site: https://lunary.ai/anthropic-tokenizer And this https://github.com/javirandor/anthropic-tokenizer

Lunary shows Anthropic as not only tokenizing groups of digits together, but also sometimes digits with a space in between!? Is that true? I'm flabbergasted if so. I'm going to look into this more.

sil-ver on [Intuitive self-models] 1. Preliminaries

1.6.2 Are explanations-of-self-reports a first step towards understanding the “true nature” of consciousness, free will, etc.?

Fwiw I've spent a lot of time thinking about the relationship between Step 1 and Step 2, and I strongly believe that step 1 is sufficient or almost sufficient for step 2, i.e., that it's impossible to give an adequate account of human phenomenology without figuring out most of the computational aspects of consciousness. So at least in principle, I think philosophy is superfluous. But I also find all discussions I've read about it (such as the stuff from Dennett, but also everything I've found on LessWrong) to be far too shallow/high-level to get anywhere interesting. People who take the hard problem seriously seem to prefer talking about the philosophical stuff, and people who don't seem content with vague analogies or appeals to future work, and so no one -- that I've seen, anyway -- actually addresses what I'd consider to be the difficult aspects of phenomenology.

Will definitely read any serious attempt to engage with step 1. And I'll try not be biased by the fact that I know your set of conclusions isn't compatible with mine.

jonas-hallgren on Laziness death spirals

Sleep is a banger reset point for me and therefore doing a nap/yoga nidra and then picking up the day from there if I notice myself avoiding things has been really helpful for me.

Thanks for the post, it was good.

viliam on Laziness death spirals

Writing a to-do list.

Step zero: Prepare a pen and pencil, so that you can put things on your to-do list when you remember them.

measure on We Don't Know Our Own Values, but Reward Bridges The Is-Ought Gap

This seems similar to the ant larvae situation where they reflectively argue around the hardcoded reward signal. Hurting people might still be considered a value the sadist has, but it trades off against other values.