LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Looking for Goal Representations in an RL Agent - Update Post
CatGoddess · 2024-08-28T16:42:19.367Z · comments (0)

[question] What should we do about COVID in 2024?
ChristianKl · 2024-08-04T10:57:24.140Z · answers+comments (2)

Tokenized SAEs: Infusing per-token biases.
tdooms · 2024-08-04T09:17:46.755Z · comments (20)

[question] Karma votes: blind to or accounting for score?
cata · 2024-06-22T21:40:34.143Z · answers+comments (4)

Finding Deception in Language Models
Esben Kran (esben-kran) · 2024-08-20T09:42:13.060Z · comments (4)

PSA: Consider alternatives to AUROC when reporting classifier metrics for alignment
rpglover64 (alex-rozenshteyn) · 2024-06-24T17:53:28.705Z · comments (1)

The Bar for Contributing to AI Safety is Lower than You Think
Chris_Leong · 2024-08-16T15:20:19.055Z · comments (1)

[link] Compression Moves for Prediction
adamShimi · 2024-09-14T17:51:12.004Z · comments (0)

[link] AI existential risk probabilities are too unreliable to inform policy
Oleg Trott (oleg-trott) · 2024-07-28T00:59:59.497Z · comments (5)

Bryan Johnson and a search for healthy longevity
NancyLebovitz · 2024-07-27T15:28:13.117Z · comments (17)

Computational Complexity as an Intuition Pump for LLM Generality
aribrill (Particleman) · 2024-06-25T20:25:36.751Z · comments (6)

"Real AGI"
Seth Herd · 2024-09-13T14:13:24.124Z · comments (18)

[link] Green and golden: a meditation
Richard_Ngo (ricraz) · 2024-08-18T01:36:43.613Z · comments (0)

[link] Imbue (Generally Intelligent) continue to make progress
Nathan Helm-Burger (nathan-helm-burger) · 2024-06-26T20:41:18.413Z · comments (0)

"Which Future Mind is Me?" Is a Question of Values
dadadarren · 2024-08-09T18:17:09.884Z · comments (12)

[link] To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-09-19T16:13:55.835Z · comments (1)

Invitation to lead a project at AI Safety Camp (Virtual Edition, 2025)
Linda Linsefors · 2024-08-23T14:18:24.327Z · comments (2)

[question] Self-censoring on AI x-risk discussions?
Decaeneus · 2024-07-01T18:24:15.759Z · answers+comments (2)

Why I'm bearish on mechanistic interpretability: the shards are not in the network
tailcalled · 2024-09-13T17:09:25.407Z · comments (35)

[question] Is this voting system strategy proof?
Donald Hobson (donald-hobson) · 2024-09-06T20:44:46.691Z · answers+comments (9)

What program structures enable efficient induction?
Daniel C (harper-owen) · 2024-09-05T10:12:14.058Z · comments (4)

[link] Why Swiss watches and Taylor Swift are AGI-proof
Kevin Kohler (KevinKohler) · 2024-09-05T13:23:27.033Z · comments (11)

Initial Experiments Using SAEs to Help Detect AI Generated Text
Aaron_Scher · 2024-07-22T05:16:20.516Z · comments (0)

[link] How to choose what to work on
jasoncrawford · 2024-09-18T20:39:12.316Z · comments (2)

OpenAI Boycott Revisit
Jake Dennie · 2024-07-22T01:44:55.094Z · comments (2)

[link] Pronouns are Annoying
ymeskhout · 2024-09-18T13:30:04.620Z · comments (18)

Games of My Childhood: The Troops
Kaj_Sotala · 2024-07-08T11:20:03.033Z · comments (0)

Travel Buffer
jefftk (jkaufman) · 2024-07-06T02:20:02.723Z · comments (3)

[link] Minimalist And Maximalist Type Systems
adamShimi · 2024-07-05T16:25:59.448Z · comments (6)

[link] The Dumbification of our smart screens
Itay Dreyfus (itay-dreyfus) · 2024-07-04T06:32:36.672Z · comments (0)

Training a Sparse Autoencoder in < 30 minutes on 16GB of VRAM using an S3 cache
Louka Ewington-Pitsos (louka-ewington-pitsos) · 2024-08-24T07:39:00.057Z · comments (0)

Reducing global AI competition through the Commerce Control List and Immigration reform: a dual-pronged approach
Ben Smith (ben-smith) · 2024-09-03T05:28:24.549Z · comments (2)

[link] My lukewarm take on GLP-1 agonists
George3d6 · 2024-08-26T12:34:27.929Z · comments (0)

All the Following are Distinct
Gianluca Calcagni (gianluca-calcagni) · 2024-08-02T16:35:51.815Z · comments (3)

The Residual Expansion: A Framework for thinking about Transformer Circuits
Daniel Tan (dtch1997) · 2024-08-02T11:04:56.347Z · comments (13)

An information-theoretic study of lying in LLMs
Annah (annah) · 2024-08-02T10:06:39.312Z · comments (0)

[link] Jonothan Gorard:The territory is isomorphic to an equivalence class of its maps
Daniel C (harper-owen) · 2024-09-07T10:04:47.840Z · comments (18)

Interview with Robert Kralisch on Simulators
WillPetillo · 2024-08-26T05:49:15.543Z · comments (0)

[link] AI Safety Newsletter #39: Implications of a Trump Administration for AI Policy Plus, Safety Engineering
Corin Katzke (corin-katzke) · 2024-07-29T17:50:52.454Z · comments (1)

[link] Why good things often don’t lead to better outcomes
DMMF · 2024-09-19T16:37:07.778Z · comments (1)

[link] Meta Alignment: Communication Wack-a-Mole
Bridgett Kay (bridgett-kay) · 2024-06-22T20:12:16.412Z · comments (2)

[link] Announcing The Techno-Humanist Manifesto: A new philosophy of progress for the 21st century
jasoncrawford · 2024-07-08T16:33:02.194Z · comments (4)

[question] Is there any rigorous work on using anthropic uncertainty to prevent situational awareness / deception?
David Scott Krueger (formerly: capybaralet) (capybaralet) · 2024-09-04T12:40:07.678Z · answers+comments (6)

My experience applying to MATS 6.0
mic (michael-chen) · 2024-07-18T19:02:21.849Z · comments (3)

Podcasts: AGI Show, Consistently Candid, London Futurists
KatjaGrace · 2024-06-23T13:50:03.676Z · comments (0)

[link] CultFrisbee
Gauraventh (aryangauravyadav) · 2024-08-11T21:36:36.550Z · comments (3)

Superintelligence Can't Solve the Problem of Deciding What You'll Do
Vladimir_Nesov · 2024-09-15T21:03:28.077Z · comments (10)

[link] How (and why) to get tested for CMV
Metacelsus · 2024-07-15T20:06:05.649Z · comments (0)

[link] Non-Transactional Compliments
Jonathan Moregård (JonathanMoregard) · 2024-08-09T13:42:16.471Z · comments (0)

[link] AlignedCut: Visual Concepts Discovery on Brain-Guided Universal Feature Space
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-09-14T23:23:26.296Z · comments (1)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

darklight on Darklight's Shortform

So, a while back I came up with an obscure idea I called the Alpha Omega Theorem and posted it on the Less Wrong forums. Given how there's only one post about it, it shouldn't be something that LLMs would know about. So in the past, I'd ask them "What is the Alpha Omega Theorem?", and they'd always make up some nonsense about a mathematical theory that doesn't actually exist. More recently, Google Gemini and Microsoft Bing Chat would use search to find my post and use that as the basis for their explanation. However, I only have the free version of ChatGPT and Claude, so they don't have access to the Internet and would make stuff up.

A couple days ago I tried the question on ChatGPT again, and GPT-4o managed to correctly say that there isn't a widely known concept of that name in math or science, and basically said it didn't know. Claude still makes up a nonsensical math theory. I also today tried telling Google Gemini not to use search, and it also said it did not know rather than making stuff up.

I'm actually pretty surprised by this. Looks like OpenAI and Google figured out how to reduce hallucinations somehow.

three-monkey-mind on Not every accommodation is a Curb Cut Effect: The Handicapped Parking Effect, the Clapper Effect, and more

If even one out of every ten accessibility advocates/experts/etc. did these things, then all these bugs would’ve been fixed years ago.

Maybe you're aware of an OOM more accessibility advocates than I am, but I come across all sorts of well-written blog posts explaining this or that bug, which browser/etc. it happens in, and how to work around it. That's most of the bullet points, although it might not be in the bug tracker of choice for the project.

What people aren't doing, as far as I have seen, is starting pooled-funds bug bounties for these things. People pass the collection plate for childhood cancer, especially since I'm told that September is Childhood Cancer Awareness Month, but not bugfixing.

This is not insensible: all sorts of people tend to be unwilling to set aside the cost of a new cell phone to fix one bug apiece that, generally speaking, is encountered in one's day job.

And there are a lot of accessibility bugs out there, some of which are quite old. I can only assume that accessibility bugs aren't treated massively more seriously than anything else in the WebKit or Firefox Bugzillas.

While the world would be a better place if bug-bounty collection plates were more popular, I can see why they're not as popular as I'd like.

gunnar_zarncke on [Intuitive self-models] 1. Preliminaries

…So that’s all that’s needed. If any system has both a capacity for endogenous action (motor control, attention control, etc.), and a generic predictive learning algorithm, that algorithm will be automatically incentivized to develop generative models about itself (both its physical self and its algorithmic self), in addition to (and connected to) models about the outside world.

Yes, and there are many different classes of such models. Most of them boring because the prediction of the effect of the agent on the environment is limited (small effect or low data rate) or simple (linear-ish or more-is-better-like).

But the self-models of social animals will quickly grow complex because the prediction of the action on the environment includes elements in the environment - other members of the species - that themselves predict the actions of other members.

You don't mention it, but I think Theory of Mind [? · GW] or Emphatic Inference [? · GW] play a large role in the specific flavor of human self-models.

silasbarta on Austin, TX Petrov Day and Potluck 2024

FYI there’s some backup on westbound William Cannon just before the turn into thr neighborhood at James Ranch Rd.

gb on Inquisitive vs. adversarial rationality

Not for this kind of fact, I’m afraid – my experience is that in answering questions like these, LLMs typically do no better than an educated guess. There are just way too many people stating their educated legal guesses as fact in the corpus, so it gets hard to distinguish.

sil-ver on [Intuitive self-models] 1. Preliminaries

Mhh, I think "it's not possible to solve (1) without also solving (2)" is equivalent to "every solution to (1) also solves (2)", which is equivalent to "(1) is sufficient for (2)". I did take some liberty in rephrasing step (2) from "figure out what consciousness is" to "figure out its computational implementation".

michael-pearce on Interpretability as Compression: Reconsidering SAE Explanations of Neural Activations with MDL-SAEs

On the question of quantizing different feature activations differently: Computing the description length using the entropy of a feature activation's probability distribution is flexible enough to distinguish different types of distributions. For example, binary distributions would have a entropy of one bit and more continuous distributions would have larger entropies.

In our methodology, the effective float precision matters because it sets the bin width for the histogram of a feature's activations that is then used to compute the entropy. We used the same effective float precision for all features, which was found by rounding activations to different precisions until the reconstruction or cross-entropy loss is changed by some amount.

christiankl on Inquisitive vs. adversarial rationality

When it comes to trying to understand basic facts like how legal systems work LLMs make it easy to get an overview.

review-bot on [April Fools'] Definitive confirmation of shard theory

The LessWrong Review [? · GW] runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2024. The top fifty or so posts are featured prominently on the site throughout the year.

Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?

ymeskhout on Pronouns are Annoying

The whole point of the sentence was to demonstrate how bad ambiguity can get with pronouns, and this exchange is demonstrating my point exactly. The issue might be that you're making some (very reasonable) assumptions without noticing it narrows the range of possible interpretations. The only unambiguous part of the sentence is "John told Mark", but every other he can be either John or Mark.

Edit: my apologies for any rude tone, it was not intentional. All of us necessarily make reasonable assumptions to narrow ambiguity in our day to day conversations and it can be hard to completely jettison the habit.