LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Video Intro to Guaranteed Safe AI
Mike Vaiana (mike-vaiana) · 2024-07-11T17:53:47.630Z · comments (0)

There aren't enough smart people in biology doing something boring
Abhishaike Mahajan (abhishaike-mahajan) · 2024-10-21T15:52:04.482Z · comments (13)

the Daydication technique
chaosmage · 2024-10-18T21:47:46.448Z · comments (0)

Domain-specific SAEs
jacob_drori (jacobcd52) · 2024-10-07T20:15:38.584Z · comments (0)

Distinguishing ways AI can be "concentrated"
Matthew Barnett (matthew-barnett) · 2024-10-21T22:21:13.666Z · comments (2)

[question] Any real toeholds for making practical decisions regarding AI safety?
lemonhope (lcmgcd) · 2024-09-29T12:03:08.084Z · answers+comments (6)

Concrete Methods for Heuristic Estimation on Neural Networks
Oliver Daniels (oliver-daniels-koch) · 2024-11-14T05:07:55.240Z · comments (0)

Investigating Sensitive Directions in GPT-2: An Improved Baseline and Comparative Analysis of SAEs
Daniel Lee (daniel-lee) · 2024-09-06T02:28:41.954Z · comments (0)

[link] Evaluating Synthetic Activations composed of SAE Latents in GPT-2
Giorgi Giglemiani (Rakh) · 2024-09-25T20:37:48.227Z · comments (0)

Bay Winter Solstice 2024: song leading auditions
tcheasdfjkl · 2024-11-10T23:59:08.199Z · comments (0)

[link] Generic advice caveats
Saul Munn (saul-munn) · 2024-10-30T21:03:07.185Z · comments (1)

An AI crash is our best bet for restricting AI
Remmelt (remmelt-ellen) · 2024-10-11T02:12:03.491Z · comments (3)

Superintelligence Can't Solve the Problem of Deciding What You'll Do
Vladimir_Nesov · 2024-09-15T21:03:28.077Z · comments (11)

Deceptive agents can collude to hide dangerous features in SAEs
Simon Lermen (dalasnoin) · 2024-07-15T17:07:33.283Z · comments (0)

Scientific Notation Options
jefftk (jkaufman) · 2024-05-18T15:10:02.181Z · comments (13)

[link] Solving alignment isn't enough for a flourishing future
mic (michael-chen) · 2024-02-02T18:23:00.643Z · comments (0)

The economy is mostly newbs (strat predictions)
lemonhope (lcmgcd) · 2024-02-01T19:15:49.420Z · comments (6)

[link] Predicting Influenza Abundance in Wastewater Metagenomic Sequencing Data
jefftk (jkaufman) · 2024-09-23T17:25:58.380Z · comments (0)

A Strange ACH Corner Case
jefftk (jkaufman) · 2024-02-10T03:00:05.930Z · comments (2)

[link] David Burns Thinks Psychotherapy Is a Learnable Skill. Git Gud.
Morpheus · 2024-01-27T13:21:05.068Z · comments (20)

Incentive Learning vs Dead Sea Salt Experiment
Steven Byrnes (steve2152) · 2024-06-25T17:49:01.488Z · comments (1)

NYU Code Debates Update/Postmortem
David Rein (david-rein) · 2024-05-24T16:08:06.151Z · comments (4)

Cheap Whiteboards!
Johannes C. Mayer (johannes-c-mayer) · 2024-08-08T13:52:59.627Z · comments (2)

[question] Supposing the 1bit LLM paper pans out
O O (o-o) · 2024-02-29T05:31:24.158Z · answers+comments (11)

[link] AISN #30: Investments in Compute and Military AI Plus, Japan and Singapore’s National AI Safety Institutes
aogara (Aidan O'Gara) · 2024-01-24T19:38:33.461Z · comments (1)

Interpretability of SAE Features Representing Check in ChessGPT
Jonathan Kutasov (jonathan-kutasov) · 2024-10-05T20:43:36.679Z · comments (2)

[question] What prevents SB-1047 from triggering on deep fake porn/voice cloning fraud?
ChristianKl · 2024-09-26T09:17:39.088Z · answers+comments (21)

[question] Why do Minimal Bayes Nets often correspond to Causal Models of Reality?
Dalcy (Darcy) · 2024-08-03T12:39:44.085Z · answers+comments (1)

[question] Me & My Clone
SimonBaars (simonbaars) · 2024-07-18T16:25:40.770Z · answers+comments (22)

European Progress Conference
Martin Sustrik (sustrik) · 2024-10-06T11:10:03.819Z · comments (11)

[question] Seeking AI Alignment Tutor/Advisor: $100–150/hr
MrThink (ViktorThink) · 2024-10-05T21:28:16.491Z · answers+comments (3)

Optimizing Repeated Correlations
SatvikBeri · 2024-08-01T17:33:23.823Z · comments (1)

Bayesian inference without priors
DanielFilan · 2024-04-24T23:50:08.312Z · comments (8)

[link] Positive visions for AI
L Rudolf L (LRudL) · 2024-07-23T20:15:26.064Z · comments (4)

Geometric Utilitarianism (And Why It Matters)
StrivingForLegibility · 2024-05-12T03:41:21.342Z · comments (2)

AI #57: All the AI News That’s Fit to Print
Zvi · 2024-03-28T11:40:05.435Z · comments (14)

How do LLMs give truthful answers? A discussion of LLM vs. human reasoning, ensembles & parrots
Owain_Evans · 2024-03-28T02:34:21.799Z · comments (0)

Links and brief musings for June
Kaj_Sotala · 2024-07-06T10:10:03.344Z · comments (0)

The causal backbone conjecture
tailcalled · 2024-08-17T18:50:14.577Z · comments (0)

Just because an LLM said it doesn't mean it's true: an illustrative example
dirk (abandon) · 2024-08-21T21:05:59.691Z · comments (12)

Distillation of 'Do language models plan for future tokens'
TheManxLoiner · 2024-06-27T20:57:34.351Z · comments (2)

[link] my favourite Scott Sumner blog posts
DMMF · 2024-06-11T14:40:43.093Z · comments (0)

[link] Forecasting future gains due to post-training enhancements
elifland · 2024-03-08T02:11:57.228Z · comments (2)

Evidential Correlations are Subjective, and it might be a problem
Martín Soto (martinsq) · 2024-03-07T18:37:54.105Z · comments (6)

LessWrong email subscriptions?
Raemon · 2024-08-27T21:59:56.855Z · comments (6)

Why I think it's net harmful to do technical safety research at AGI labs
Remmelt (remmelt-ellen) · 2024-02-07T04:17:15.246Z · comments (24)

[link] Arrogance and People Pleasing
Jonathan Moregård (JonathanMoregard) · 2024-02-06T18:43:09.120Z · comments (7)

[link] Manifold Markets
PeterMcCluskey · 2024-02-02T17:48:36.630Z · comments (9)

Do Sparse Autoencoders (SAEs) transfer across base and finetuned language models?
Taras Kutsyk · 2024-09-29T19:37:30.465Z · comments (8)

Three Types of Constraints in the Space of Agents
Nora_Ammann · 2024-01-15T17:27:27.560Z · comments (3)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

michaeldickens on Announcing turntrout.com, my new digital home

Do you think a 3-state dark mode selector is better than a 1-state (where "auto" is the only state)? My website is 1-state, on the assumption that auto will work for almost everyone and it lets me skip the UI clutter of having a lighting toggle that most people won't use.

Also, I don't know if the site has been updated but it looks to me like turntrout.com's two modes aren't dark and light, they're auto and light. When I set Firefox's appearance to dark or auto, turntrout.com's dark mode appears dark, but when I set Firefox to light, turntrout.com appears light. turntrout.com's light mode appears to be light regardless of my Firefox setting.

justinpombrio on "It's a 10% chance which I did 10 times, so it should be 100%"

However it is true that doing something with a 10% success rate 10 times will net you an average of 1 success.

For the easier to work out case of doing something with a 50% success rate 2 times:

25% chance of 0 successes
50% chance of 1 success
25% chance of 2 successes

Gives an average of 1 success.

Of course this only matters for the sort of thing where 2 successes is better than 1 success:

10% chance of finding a monogamous partner 10 times yields 0.63 monogamous partners in expectation.
10% chance of finding a polyamorous partner 10 times yields 1.00 polyamorous partners in expectation.

boris-kashirin on "It's a 10% chance which I did 10 times, so it should be 100%"

e^3 is ~20, so for large n you get 95% of success by doing 3n attempts.

linch on A Qualitative Case for LTFF: Filling Critical Ecosystem Gaps

Agreed, I was trying to convey something that I think is underrated succinctly, obviously going to miss some nuances.

zach-stein-perlman on A Qualitative Case for LTFF: Filling Critical Ecosystem Gaps

Open Philanthropy's AI safety work tends toward large grants in the high hundreds of thousands or low millions, meaning individuals and organizations with lower funding needs won't be funded by them

This is directionally true but projects seeking less money should still apply to OP if relevant; a substantial minority of recent OP AI safety and GCR capacity building grants are <$100K.

philip_b on "It's a 10% chance which I did 10 times, so it should be 100%"

Nice. I have a suggestion how to improve the article. Put a clearly stated theorem somewhere in the middle, in its own block, like in academic math articles.

myles-h on What are Emotions?

Wow, thank you so much. This is a lens I totally hadn't considered.

You can see in the post how I was confused how evolution played a part in "imbuing" material terminal goals into humans. I was like, "but kinetic sculptures were not in the ancestral environment?"

It sounds like rather than imbuing humans with material goals, it has imbued a process by which humans create their own.

I would still define material goals as simply terminal goals which are not defined by some qualia, but it is fascinating that this is what material goals look like in humans.

This also, as you say, makes it harder to distinguish between emotional and material goals in humans, since our material goals are ultimately emotionally derived. In particular, it makes it difficult to distinguish between an instrumental goal to an emotional terminal goal, and a learned material goal created from reinforced prediction of its expected emotional reward.

E.g. the difference between someone wanting a cookie because it will make them feel good, and someone wanting money as a terminal goal because their brain frequently predicted that money would lead to feeling good.

I still make this distinction between material and emotional goals because this isn't the only way that material goals play out among all agents. For example, my thermostat has simply been directly imbued with the goal of maintaining a temperature. I can also imagine this is how material goals play out in most insects.

Other emotions, like fear, anger, etc. are different. They can be thought of as "tilts"' to our cognitive landscape. Even learning that we're experiencing them is tricky. That's why emotional awareness is a subject to learn about, not just something we're born knowing. We need to learn to "feel the tilt". Elevated heart rate might signal fear, anger, or excitement; noticing it or finding other cues are necessary to understand how we're tilted, and how to correct for it if we want to act rationally. Those sorts of emotions "tilt the landscape" of our cognition by making different thoughts and actions more likely, like thoughts of how someone's actions were unfair or physical attacks when we're angry.

This makes a lot of sense. Yeah I was definitely simplifying all emotions to just their qualia effect, without considering their other physiological effects which define them. So I guess in this post when I say "emotion", I really mean "qualia".

But I'm pretty sure that predicted reward is pretty synonymous with what we call "values".

Just to clarify, are you using "reward" here to also mean "positive (or a lack of negative) qualia". Or is this reinforcement mechanism recursive by which we might learn to value something because of its predicted reward, but that reward is also a learned value.... and so on where the base case is an emotional reward. If so, how deep can it go?

benito on Reformative Hypocrisy, and Paying Close Enough Attention to Selectively Reward It.

Hm, but I note others at the time felt it clear that this would exacerbate the competition (1, 2).

eggsyntax on LLMs Look Increasingly Like General Reasoners

(responding separately to your predictions)

1.

The world models of LLMs are impoverished in weird ways compared to humans, due to blind spots in the training data. An example would be tactile sensations

Sure. I expect LLMs to have substantial imbalances in capability (relative to human). That's already the case; they're better at some things than humans, and much worse than humans at others.

Solving some of the blind spots is critical for further capability gains.

This I'm much more skeptical about. I could imagine such a blind spot getting in the way of certain capabilities gains, but it seems very unlikely to me that any of them -- with the possible exception of general reasoning, which is why I'm so focused on investigating it -- would get in the way of capabilities gains in general.

2.

To elicit further capability gains, it will become necessary to turn to data which is less well-suited for transformer architecture.

Do you have any sense of what data would be less well-suited to transformers? So far it's worked well with language (tokens are subwords), images (tokens are patches), video (tokens are 'spacetime patches'), and even RL environments (tokens are...I'm not sure, actually). You mention tactile sensations, but are those less serializable than videos with resolutions far greater than the size of a patch token?

This will lead to escalating compute requirements, the effects of which will already become apparent in 2025.

Is there a way to distinguish this outcome from the escalating compute requirements we're seeing anyway?

3.

As a result, there will be even stronger incentives for...Combining different ML architectures, including transformers, and classical software into compound systems.

This seems likely either way, and I agree that it seems likely to result in substantially greater capabilities than LLMs alone.

“LLMs plus some scaffolding” will not be an accurate description of the systems that solve the next batch of hard problems.

What qualifies as the next batch of hard problems? There are still benchmarks that LLMs have made progress on but which are far from saturated; do you mean those? Or something much more ambitious?

Developing completely new architecture, with a certain chance of another "Attention Is All You Need"...The likelihood and necessity of this is obviously a crux

This seems likely to happen at some point either way; it'd be awfully surprising if transformers were the Final Form of AI.

4.

Automated original ML research will turn out to be one of the hard problems that require 3.a or b.

This would certainly cause me to update in your direction, although we may not get clear evidence, since I expect scaffolding to come into play whether or not it's strictly necessary.

Transformer architecture will not create its own scaffolding or successor.

I'm confused about the first part of that. I could ask Claude-3.5-Sonnet now to create a scaffolding system based on previous work in that area, and I'd expect it to be able to do it with some poking. Do you maybe mean something like, 'will not invent on its own important new scaffolding techniques'?

I suspect it'll be hard to say whether it creates its own successor. AI researchers are already using LLMs to help with their work, so it gets a tiny bit of credit for advances; I expect them to help more and more as they advance. I also expect that successor designs will come work by lots of different researchers, each of them using LLMs to varying degrees.

notfnofn on "It's a 10% chance which I did 10 times, so it should be 100%"

has come up from time to time for me