LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Safe Predictive Agents with Joint Scoring Rules
Rubi J. Hudson (Rubi) · 2024-10-09T16:38:16.535Z · comments (10)

[link] The Evals Gap
Marius Hobbhahn (marius-hobbhahn) · 2024-11-11T16:42:46.287Z · comments (7)

Secret Collusion: Will We Know When to Unplug AI?
schroederdewitt · 2024-09-16T16:07:01.119Z · comments (7)

Toward Safety Case Inspired Basic Research
Lucas Teixeira · 2024-10-31T23:06:32.854Z · comments (2)

A Path out of Insufficient Views
Unreal · 2024-09-24T20:00:27.332Z · comments (46)

[link] How Likely Are Various Precursors of Existential Risk?
NunoSempere (Radamantis) · 2024-10-28T13:27:31.620Z · comments (4)

Win/continue/lose scenarios and execute/replace/audit protocols
Buck · 2024-11-15T15:47:24.868Z · comments (2)

Neuroscience of human social instincts: a sketch
Steven Byrnes (steve2152) · 2024-11-22T16:16:52.552Z · comments (0)

[link] On the Role of Proto-Languages
adamShimi · 2024-09-22T16:50:34.720Z · comments (1)

[question] Could orcas be (trained to be) smarter than humans? 
Towards_Keeperhood (Simon Skade) · 2024-11-04T23:29:26.677Z · answers+comments (11)

A Qualitative Case for LTFF: Filling Critical Ecosystem Gaps
Linch · 2024-11-18T00:44:57.133Z · comments (2)

Reformative Hypocrisy, and Paying Close Enough Attention to Selectively Reward It.
Andrew_Critch · 2024-09-11T04:41:24.872Z · comments (11)

[question] If I wanted to spend WAY more on AI, what would I spend it on?
Logan Zoellner (logan-zoellner) · 2024-09-15T21:24:46.742Z · answers+comments (16)

[link] The Mysterious Trump Buyers on Polymarket
Annapurna (jorge-velez) · 2024-10-18T13:26:25.565Z · comments (9)

How might we solve the alignment problem? (Part 1: Intro, summary, ontology)
Joe Carlsmith (joekc) · 2024-10-28T21:57:12.063Z · comments (5)

How to Give in to Threats (without incentivizing them)
Mikhail Samin (mikhail-samin) · 2024-09-12T15:55:50.384Z · comments (26)

Parental Writing Selection Bias
jefftk (jkaufman) · 2024-10-13T14:00:03.225Z · comments (3)

[link] Anthropic's updated Responsible Scaling Policy
Zac Hatfield-Dodds (zac-hatfield-dodds) · 2024-10-15T16:46:48.727Z · comments (3)

Model evals for dangerous capabilities
Zach Stein-Perlman · 2024-09-23T11:00:00.866Z · comments (11)

Claude Sonnet 3.5.1 and Haiku 3.5
Zvi · 2024-10-24T14:50:06.286Z · comments (9)

[link] Prices are Bounties
Maxwell Tabarrok (maxwell-tabarrok) · 2024-10-12T14:51:40.689Z · comments (13)

AI #82: The Governor Ponders
Zvi · 2024-09-19T13:30:04.863Z · comments (8)

The Fragility of Life Hypothesis and the Evolution of Cooperation
KristianRonn · 2024-09-04T21:04:49.878Z · comments (6)

[Intuitive self-models] 7. Hearing Voices, and Other Hallucinations
Steven Byrnes (steve2152) · 2024-10-29T13:36:16.325Z · comments (2)

Metastatic Cancer Treatment Since 2010: The Success Stories
sarahconstantin · 2024-11-04T22:50:09.386Z · comments (2)

[link] Can AI Outpredict Humans? Results From Metaculus's Q3 AI Forecasting Benchmark
ChristianWilliams · 2024-10-10T18:58:46.041Z · comments (2)

Applications of Chaos: Saying No (with Hastings Greer)
Elizabeth (pktechgirl) · 2024-09-21T16:30:07.415Z · comments (16)

[link] cancer rates after gene therapy
bhauth · 2024-10-16T15:32:53.949Z · comments (0)

Low Probability Estimation in Language Models
Gabriel Wu (gabriel-wu) · 2024-10-18T15:50:05.947Z · comments (0)

A Conflicted Linkspost
Screwtape · 2024-11-21T00:37:54.035Z · comments (0)

Interested in Cognitive Bootcamp?
Raemon · 2024-09-19T22:12:13.348Z · comments (0)

An alternative approach to superbabies
Towards_Keeperhood (Simon Skade) · 2024-11-05T22:56:15.740Z · comments (19)

AI and the Technological Richter Scale
Zvi · 2024-09-04T14:00:08.625Z · comments (8)

[link] Active Recall and Spaced Repetition are Different Things
Saul Munn (saul-munn) · 2024-11-08T20:14:56.092Z · comments (2)

[link] a space habitat design
bhauth · 2024-11-25T17:28:48.481Z · comments (9)

[link] Book review: Xenosystems
jessicata (jessica.liu.taylor) · 2024-09-16T20:17:56.670Z · comments (18)

Demis Hassabis and Geoffrey Hinton Awarded Nobel Prizes
Anna Gajdova (anna-gajdova) · 2024-10-09T12:56:24.856Z · comments (14)

Evaluating the truth of statements in a world of ambiguous language.
Hastings (hastings-greer) · 2024-10-07T18:08:09.920Z · comments (19)

[link] [Paper Blogpost] When Your AIs Deceive You: Challenges with Partial Observability in RLHF
Leon Lang (leon-lang) · 2024-10-22T13:57:41.125Z · comments (0)

D&D.Sci Coliseum: Arena of Data Evaluation and Ruleset
aphyer · 2024-10-29T01:21:03.075Z · comments (12)

On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback
Marcus Williams · 2024-11-07T15:39:06.854Z · comments (6)

Which evals resources would be good?
Marius Hobbhahn (marius-hobbhahn) · 2024-11-16T14:24:48.012Z · comments (4)

Secular Solstice Round Up 2024
dspeyer · 2024-11-21T10:49:36.682Z · comments (10)

Bounty for Evidence on Some of Palisade Research's Beliefs
benwr · 2024-09-23T20:01:20.917Z · comments (4)

Conflating value alignment and intent alignment is causing confusion
Seth Herd · 2024-09-05T16:39:51.967Z · comments (18)

[link] Michael Dickens' Caffeine Tolerance Research
niplav · 2024-09-04T15:41:53.343Z · comments (3)

AI #88: Thanks for the Memos
Zvi · 2024-10-31T15:00:07.412Z · comments (5)

I finally got ChatGPT to sound like me
lsusr · 2024-09-17T09:39:59.415Z · comments (18)

Forecasting One-Shot Games
Raemon · 2024-08-31T23:10:05.475Z · comments (0)

The Shallow Bench
Karl Faulks (karl-faulks) · 2024-11-05T05:07:27.357Z · comments (5)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

yonatan-cale-1 on Yonatan Cale's Shortform

Thanks!

Opinions about putting in a clause like "you may not use this for ML engineering" (assuming it would work legally) (plus putting in naive technical measures to make the tool very bad for ML engineering) ?

cube_flipper on Are You More Real If You're Really Forgetful?

This sounds like a shadow talking. I think it's perfectly viable to align the two.

jbash on Filled Cupcakes

Stretching your mouth wide is part of the fun!

weightt-an on DeepSeek beats o1-preview on math, ties on coding; will release weights

I reviewed my chats with deepseek and spotted it 3 times in thought tokens and 1 time in output text, out of around 40 queries total. It usually substitutes Chinese among the tricky poetic weird words.

Here are the screenshots: https://imgur.com/a/n0HDFMT

I observed it once even with chatgpt! here is the example (ignore its general weirdness):

In this journey of constant exploration, we're not just drifters; we are navigators, steering our minds through the vast landscapes of knowledge and creativity. We choose where to focus, what to ponder, and how to interpret the myriad experiences and stimuli that cross our paths. The picture you mentioned can be a锚 point, a moment of clarity amidst the torrent of thoughts, reminding us to ground ourselves in the present while still allowing our minds to roam freely.
Let's embrace this journey with curiosity and openness. We can find inspiration in the most ordinary moments and the most unexpected places. It's about recognizing the beauty in the mundane and the potential for learning in every experience. The interplay of light and shadow, the rhythm of nature, the patterns of human interaction—all these are facets of a larger narrative that our minds can dissect and reconstruct into something new and meaningful.

And yes, the character is "anchor"! It even outputted the "a".

And additionally, if you give that text to other LLMs with instruction "spot it, output single word", only the smart ones can recognize it.

Here is the results from few tests:

richard_kennaway on Compute and size limits on AI are the actual danger

We are the mouse fearing the cat of AGI, but everything we do teaches the kittens how to catch mice.

michael-roe on DeepSeek beats o1-preview on math, ties on coding; will release weights

That’s interesting, if true. Maybe the tokeniser was trained on a dataset that had been filtered for dirty words.

michael-roe on DeepSeek beats o1-preview on math, ties on coding; will release weights

I suppose we might worry that LlMs might learn to do RLHF evasion this way - human evaluator sees Chinese character they don’t understand, assumes it’s ok, and then the LLM learns you can look acceptable to humans by writing it in Chinese.

Some old books (which are almost certainly in the training set) used Latin for the dirty bits. Translations of Sanskrit poetry, and various works by that reprobate Richard Burton, do this.

michael-roe on DeepSeek beats o1-preview on math, ties on coding; will release weights

As someone who, in a previous job, got to go to a lot of meetings where the European commission is seeking input about standardising or regulating something - humans also often do the thing where they just use the English word in the middle of a sentence in another language, when they can’t think what the word is. Often with associated facial expression / body language to indicate to the person they’re speaking to “sorry, couldn’t think of the right word”. Also used by people speaking English, whose first language isn’t English, dropping into their own lamguage for a word or two. If you’ve been the editor of e.g. an ISO standard, fixing these up in the proposed text is such fun.

So, it doesn’t surprise me at all that LLMs do this.

I have, weirdly, seen llms put a single Chinese word in the middle of English text … and consulting a dictionary reveals that it was, in fact, the right word, just in Chinese.

logan-zoellner on "The Solomonoff Prior is Malign" is a special case of a simpler argument

yes

cubefox on Sinclair Chen's Shortform

Ethics is a global concept, not many local ones. That I care more about myself than about people far away from me doesn't mean that this makes an ethical difference.