LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Text Posts from the Kids Group: 2019
jefftk (jkaufman) · 2024-06-23T13:20:01.495Z · comments (0)

Interpretability as Compression: Reconsidering SAE Explanations of Neural Activations with MDL-SAEs
Kola Ayonrinde (kola-ayonrinde) · 2024-08-23T18:52:31.019Z · comments (2)

GPT-3.5 judges can supervise GPT-4o debaters in capability asymmetric debates
Charlie George (charlie-george) · 2024-08-27T20:44:08.683Z · comments (7)

[question] When can I be numerate?
FinalFormal2 · 2024-09-12T04:05:27.710Z · answers+comments (1)

The Garden of Eden
Alexander Turok · 2024-07-22T16:07:42.509Z · comments (2)

[LDSL#2] Latent variable models, network models, and linear diffusion of sparse lognormals
tailcalled · 2024-08-09T19:57:56.122Z · comments (0)

[link] Libs vs Frameworks, Middle-Level Regularities vs Theories
adamShimi · 2024-07-04T19:01:59.440Z · comments (0)

AI #77: A Few Upgrades
Zvi · 2024-08-20T00:20:09.717Z · comments (3)

[question] Why do Minimal Bayes Nets often correspond to Causal Models of Reality?
Dalcy (Darcy) · 2024-08-03T12:39:44.085Z · answers+comments (1)

Incentive Learning vs Dead Sea Salt Experiment
Steven Byrnes (steve2152) · 2024-06-25T17:49:01.488Z · comments (1)

Whirlwind Tour of Chain of Thought Literature Relevant to Automating Alignment Research.
sevdeawesome · 2024-07-01T05:50:49.498Z · comments (0)

[link] Day Zero Antivirals for Future Pandemics
Niko_McCarty (niko-2) · 2024-08-26T15:18:33.858Z · comments (2)

[link] My 5-step program for losing weight
Nikita Sokolsky (nikita-sokolsky) · 2024-06-30T01:05:40.408Z · comments (20)

Can We Predict Persuasiveness Better Than Anthropic?
Lennart Finke (l-f) · 2024-08-04T14:05:33.668Z · comments (5)

o1-preview is pretty good at doing ML on an unknown dataset
Håvard Tveit Ihle (havard-tveit-ihle) · 2024-09-20T08:39:49.927Z · comments (0)

[link] on Science Beakers and DDT
bhauth · 2024-09-05T03:21:21.382Z · comments (12)

[LDSL#3] Information-orientation is in tension with magnitude-orientation
tailcalled · 2024-08-10T21:58:27.659Z · comments (0)

[link] ML Safety Research Advice - GabeM
Gabe M (gabe-mukobi) · 2024-07-23T01:45:42.288Z · comments (2)

Monthly Roundup #21: August 2024
Zvi · 2024-08-20T00:20:08.178Z · comments (6)

[link] Profit and Value
kwang · 2024-07-17T18:06:57.048Z · comments (3)

August 2024 Time Tracking
jefftk (jkaufman) · 2024-08-24T13:50:04.676Z · comments (0)

[link] The Tech Industry is the Biggest Blocker to Meaningful AI Safety Regulations
garrison · 2024-08-16T19:37:28.416Z · comments (1)

[link] "On the Impossibility of Superintelligent Rubik’s Cube Solvers", Claude 2024 [humor]
gwern · 2024-06-23T21:18:10.013Z · comments (6)

[link] An ML paper on data stealing provides a construction for "gradient hacking"
David Scott Krueger (formerly: capybaralet) (capybaralet) · 2024-07-30T21:44:37.310Z · comments (1)

[link] [Talk transcript] What “structure” is and why it matters
Alex_Altair · 2024-07-25T15:49:00.844Z · comments (0)

"The Singularity Is Nearer" by Ray Kurzweil - Review
Lavender (Kevin92) · 2024-07-08T21:32:27.307Z · comments (0)

AXRP Episode 35 - Peter Hase on LLM Beliefs and Easy-to-Hard Generalization
DanielFilan · 2024-08-24T22:30:02.039Z · comments (0)

[LDSL#5] Comparison and magnitude/diminishment
tailcalled · 2024-08-12T18:47:20.546Z · comments (0)

Deception and Jailbreak Sequence: 1. Iterative Refinement Stages of Deception in LLMs
Winnie Yang (winnie-yang) · 2024-08-22T07:32:07.600Z · comments (0)

[link] Hyperpolation
Gunnar_Zarncke · 2024-09-15T21:37:00.002Z · comments (4)

Consider attending the AI Security Forum '24, a 1-day pre-DEFCON event
Charlie Rogers-Smith (charlie.rs) · 2024-07-12T23:01:46.370Z · comments (0)

[link] Podcast: "How the Smart Money teaches trading with Ricki Heicklen" (Patrick McKenzie interviewing)
rossry · 2024-07-11T22:49:06.633Z · comments (2)

[link] Four Randomized Control Trials In Economics
Maxwell Tabarrok (maxwell-tabarrok) · 2024-08-08T15:59:23.250Z · comments (1)

[link] [Linkpost] 'The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery'
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-08-15T21:32:59.979Z · comments (1)

[link] Podcast: Elizabeth & Austin on "What Manifold was allowed to do"
Austin Chen (austin-chen) · 2024-06-28T22:10:41.607Z · comments (0)

[link] The Great Organism Theory of Evolution
rogersbacon · 2024-08-10T12:26:02.434Z · comments (0)

Ransomware Payments Should Require a Sin Tax
Brian Bien (brian-bien) · 2024-07-22T21:16:29.029Z · comments (10)

[question] Have people given up on iterated distillation and amplification?
Chris_Leong · 2024-07-19T12:23:04.625Z · answers+comments (1)

Instrumental vs Terminal Desiderata
Max Harms (max-harms) · 2024-06-26T20:57:17.584Z · comments (0)

My decomposition of the alignment problem
Daniel C (harper-owen) · 2024-09-02T00:21:08.359Z · comments (22)

Fully booked - LessWrong Community weekend
jt · 2024-07-16T17:15:51.753Z · comments (2)

A necessary Membrane formalism feature
ThomasCederborg · 2024-09-10T21:33:09.508Z · comments (6)

Failure Modes of Teaching AI Safety
Eleni Angelou (ea-1) · 2024-06-25T19:07:46.826Z · comments (0)

Simon DeDeo on Explore vs Exploit in Science
Elizabeth (pktechgirl) · 2024-09-10T03:40:08.311Z · comments (0)

[question] What should we do about COVID in 2024?
ChristianKl · 2024-08-04T10:57:24.140Z · answers+comments (2)

Scaling Laws and Likely Limits to AI
Davidmanheim · 2024-08-18T17:19:46.597Z · comments (0)

[link] Anthropic is being sued for copying books to train Claude
Remmelt (remmelt-ellen) · 2024-08-31T02:57:27.092Z · comments (4)

Announcing the PIBBSS Symposium '24!
DusanDNesic · 2024-09-03T11:19:47.568Z · comments (0)

Tokenized SAEs: Infusing per-token biases.
tdooms · 2024-08-04T09:17:46.755Z · comments (20)

Why Reflective Stability is Important
Johannes C. Mayer (johannes-c-mayer) · 2024-09-05T15:28:19.913Z · comments (2)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

nathan-helm-burger on A New Class of Glitch Tokens - BPE Subtoken Artifacts (BSA)

🤢 But whyyyyyyyy?!

lao-mein on A New Class of Glitch Tokens - BPE Subtoken Artifacts (BSA)

There is likely a reason for this - if you feed in numbers you found on the internet into a LLM digit by digit, it's going to destroy the embeddings of those numbers. A lot of things found in scrapes are just... extremely long sequences of numbers. The tradeoff may be numeracy (can do basic multiplication) vs natural language performance (won't start spitting out Minecraft debug logs in the middle of conversation).

sharmake-farah on GPT-o1

If I did have issues with Janus World, it's probably overestimating how much anthropomorphic reasoning gets us (to be clear I think a lot of people underestimate the power of anthropomorphic reasoning on LLMs), combined with them being far too sensational/mystical for my taste, which leads them to overrate the possibility of deceptive alignment IMO.

My biggest difference in models is probably that I use less anthropomorphic reasoning on LLMs than Janus World does.

gwern on Counting arguments provide no evidence for AI doom

That's interesting. I admit I've never really tried the 'spare tokens' trick seriously on any LLMs, but if it can get the S-poem in 3 samples with the spare token trick, maybe I've underestimated it. (I wonder how it would stack with the o1-preview/mini chain-of-thought? The example transcripts are rather verbose, so maybe those provide all of the 'spare token' effect by default.)

lao-mein on A New Class of Glitch Tokens - BPE Subtoken Artifacts (BSA)

You're really not going to like the fact that the GPT4o tokenizer has every single number below 1000 tokenized. It's not a hand-crafted feature, since the token_ids are all over the place. I think they had to manually remove larger number tokens (there are none above 999).

I feel like I need a disclaimer like the South Park episode. This is what is actually inside the tokenizer.

['37779', '740'],
['47572', '741'],
['48725', '742'],
['49191', '743'],
['46240', '744'],
['44839', '745'],
['47433', '746'],
['42870', '747'],
['39478', '748'],
['44712', '749'],

They also have plenty of full width numbers (generally only used in Chinese and Japanese to not mess with spacing) and numbers in other languages in there.

['14334', '十'],
['96681', '十一'],
['118633', '十三'],
['138884', '十九'],
['95270', '十二'],
['119007', '十五'],
['107205', '十八'],
['180481', '十四']
['42624', '零'],
['14053', '０'],
['49300', '００'],
['10888', '１'],
['64980', '１０'],
['141681', '１００'],
['113512', '１１'],
['101137', '１２'],
['123326', '１３'],
['172589', '１４'],
['126115', '１５'],
['171221', '１６']

Maybe they use a different tokenizer for math problems? Maybe the multi-digit number tokenizers are only used in places where there are a lot of id numbers? Nope. Looks like they were just raw-dogging it. If anyone is wondering why GPTs are so bad at basic multiplication, this is why.

Colin Fraser on X: "Here's a similar experiment I just tried. The fact that this works even a little bit completely blows my mind and confuses me greatly. If you asked me if this would work at all I would say definitely not. https://t.co/E4knpf7JoZ" / X

If you've ever wondered "wow, why is GPT4o specifically better at math when the number of digits is divisible by 3?", wonder no more. It's the tokenizer. Again.

sharmake-farah on The Information: OpenAI shows 'Strawberry' to feds, races to launch it

The big answer, now that we know what o1 was made using Q*/Strawberry, is essentially that Strawberry/Q* did 2 very important things:

It cracked the code on how to make a General Purpose Search that scales with more compute, and in particular the model can now adaptively think for longer on harder problems.

In essence, OpenAI figured out how to implement General Purpose Search scalably:

https://www.lesswrong.com/posts/6mysMAqvo9giHC4iX/what-s-general-purpose-search-and-why-might-we-expect-to-see [LW · GW]

It unlocked a new inference scaling law, which in particular means that more compute can reliably solve more problems at inference.

This makes AI capabilities harder to contain, since it's easier to have large inference runs than large training runs.

viliam on Why good things often don’t lead to better outcomes

Maybe related: Evaporation of improvements [LW · GW]

thane-ruthenis on GPT-o1

I expect there are still significant differences between your model and the "LLM Whisperer" model, though I notice I'm not quite sure what you'd say they are. Mind highlighting any cruxes you see?

tapatakt on Tapatakt's Shortform

About possible backlashes from unsuccesfull communication [LW(p) · GW(p)].

I hoped for some examples like "anti-war movies have unintentionally boosted military recruitment", which is the only example I remembered myself.

Asked the same question to Claude, it gave me this examples:

Scared Straight programs: These programs, designed to deter juvenile delinquency by exposing at-risk youth to prison life, have been shown to actually increase criminal behavior in participants.
The "Just Say No" anti-drug campaign: While well-intentioned, some research suggests this oversimplified message may have increased drug use among certain groups by triggering a "forbidden fruit" effect.

All others were not much relevant, mostly like "harm of this oversimplified communication was in oversimplification".

The common thing in two relevant examples and my own example about anti-war movies is, I think, "try to ensure you don't make bad thing look cool". Got it.

But is it all? Are there any examples that don't come down to this?

nathan-helm-burger on A New Class of Glitch Tokens - BPE Subtoken Artifacts (BSA)

Wow, just wow. Sure seems like Gwern has it spot on here that OpenAI engineers are being rushed and sloppy about this.

Kinda pisses me off, considering that I applied to work for OpenAI between GPT-2 and GPT-3, and this isn't the sort of mistake I would make. Yet, they rejected my application. (note: given the current state of OpenAI, I wouldn't apply today!)

Building my own small LLMs around that time for practice, the low frequency tokens and the tokenization of digits were among the first things I checked!

I tried a quick search for Anthropic to see if they are doing the same nonsense. Found this site: https://lunary.ai/anthropic-tokenizer And this https://github.com/javirandor/anthropic-tokenizer

Lunary shows Anthropic as not only tokenizing groups of digits together, but also sometimes digits with a space in between!? Is that true? I'm flabbergasted if so. I'm going to look into this more.