LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Our Digital and Biological Children
Eneasz · 2024-10-24T18:36:38.719Z · comments (0)

Towards Quantitative AI Risk Management
Henry Papadatos (henry) · 2024-10-16T19:26:48.817Z · comments (1)

Investigating Sensitive Directions in GPT-2: An Improved Baseline and Comparative Analysis of SAEs
Daniel Lee (daniel-lee) · 2024-09-06T02:28:41.954Z · comments (0)

Why is there Nothing rather than Something?
Logan Zoellner (logan-zoellner) · 2024-10-26T12:37:50.204Z · comments (3)

[question] What prevents SB-1047 from triggering on deep fake porn/voice cloning fraud?
ChristianKl · 2024-09-26T09:17:39.088Z · answers+comments (21)

Domain-specific SAEs
jacob_drori (jacobcd52) · 2024-10-07T20:15:38.584Z · comments (0)

[link] Evaluating Synthetic Activations composed of SAE Latents in GPT-2
Giorgi Giglemiani (Rakh) · 2024-09-25T20:37:48.227Z · comments (0)

The Foraging (Ex-)Bandit [Ruleset & Reflections]
abstractapplic · 2024-11-14T20:16:21.535Z · comments (3)

[link] Predicting Influenza Abundance in Wastewater Metagenomic Sequencing Data
jefftk (jkaufman) · 2024-09-23T17:25:58.380Z · comments (0)

European Progress Conference
Martin Sustrik (sustrik) · 2024-10-06T11:10:03.819Z · comments (11)

Inferential Game: The Foraging (Ex-)Bandit
abstractapplic · 2024-11-11T16:59:42.058Z · comments (4)

Interpretability of SAE Features Representing Check in ChessGPT
Jonathan Kutasov (jonathan-kutasov) · 2024-10-05T20:43:36.679Z · comments (2)

Concrete Methods for Heuristic Estimation on Neural Networks
Oliver Daniels (oliver-daniels-koch) · 2024-11-14T05:07:55.240Z · comments (0)

There aren't enough smart people in biology doing something boring
Abhishaike Mahajan (abhishaike-mahajan) · 2024-10-21T15:52:04.482Z · comments (13)

[link] Generic advice caveats
Saul Munn (saul-munn) · 2024-10-30T21:03:07.185Z · comments (1)

Standard SAEs Might Be Incoherent: A Choosing Problem & A “Concise” Solution
Kola Ayonrinde (kola-ayonrinde) · 2024-10-30T22:50:45.642Z · comments (0)

the Daydication technique
chaosmage · 2024-10-18T21:47:46.448Z · comments (0)

Bay Winter Solstice 2024: song leading auditions
tcheasdfjkl · 2024-11-10T23:59:08.199Z · comments (0)

Superintelligence Can't Solve the Problem of Deciding What You'll Do
Vladimir_Nesov · 2024-09-15T21:03:28.077Z · comments (11)

Thinking in 2D
sarahconstantin · 2024-10-20T19:30:05.842Z · comments (0)

An AI crash is our best bet for restricting AI
Remmelt (remmelt-ellen) · 2024-10-11T02:12:03.491Z · comments (3)

[question] Any real toeholds for making practical decisions regarding AI safety?
lemonhope (lcmgcd) · 2024-09-29T12:03:08.084Z · answers+comments (6)

Sleeping on Stage
jefftk (jkaufman) · 2024-10-22T00:50:07.994Z · comments (3)

Option control
Joe Carlsmith (joekc) · 2024-11-04T17:54:03.073Z · comments (0)

Chat Bankman-Fried: an Exploration of LLM Alignment in Finance
claudia.biancotti · 2024-11-18T09:38:35.723Z · comments (4)

[link] A brief history of the automated corporation
owencb · 2024-11-04T14:35:04.906Z · comments (1)

[question] Seeking AI Alignment Tutor/Advisor: $100–150/hr
MrThink (ViktorThink) · 2024-10-05T21:28:16.491Z · answers+comments (3)

[link] Death notes - 7 thoughts on death
Nathan Young · 2024-10-28T15:01:13.532Z · comments (1)

[link] overengineered air filter shelving
bhauth · 2024-11-08T22:04:39.987Z · comments (2)

[question] Which things were you surprised to learn are metaphors?
Gordon Seidoh Worley (gworley) · 2024-11-22T03:46:02.845Z · answers+comments (17)

SAE features for refusal and sycophancy steering vectors
neverix · 2024-10-12T14:54:48.022Z · comments (4)

Trying Bluesky
jefftk (jkaufman) · 2024-11-17T02:50:04.093Z · comments (16)

[link] Can a Bayesian Oracle Prevent Harm from an Agent? (Bengio et al. 2024)
mattmacdermott · 2024-09-01T07:46:26.647Z · comments (0)

Action derivatives: You’re not doing what you think you’re doing
PatrickDFarley · 2024-11-21T16:24:04.044Z · comments (0)

Do Sparse Autoencoders (SAEs) transfer across base and finetuned language models?
Taras Kutsyk · 2024-09-29T19:37:30.465Z · comments (8)

The Logistics of Distribution of Meaning: Against Epistemic Bureaucratization
Sahil · 2024-11-07T05:27:20.276Z · comments (1)

[link] Care Doesn't Scale
stavros · 2024-10-28T11:57:38.742Z · comments (1)

SAEs you can See: Applying Sparse Autoencoders to Clustering
Robert_AIZI · 2024-10-28T14:48:16.744Z · comments (0)

[link] Fictional parasites very different from our own
Abhishaike Mahajan (abhishaike-mahajan) · 2024-09-08T14:59:39.080Z · comments (0)

[link] A Theory of Equilibrium in the Offense-Defense Balance
Maxwell Tabarrok (maxwell-tabarrok) · 2024-11-15T13:51:33.376Z · comments (6)

[question] When engaging with a large amount of resources during a literature review, how do you prevent yourself from becoming overwhelmed?
corruptedCatapillar · 2024-11-01T07:29:49.262Z · answers+comments (2)

Thoughts after the Wolfram and Yudkowsky discussion
Tahp · 2024-11-14T01:43:12.920Z · comments (13)

[link] Linkpost: "Imagining and building wise machines: The centrality of AI metacognition" by Johnson, Karimi, Bengio, et al.
Chris_Leong · 2024-11-11T16:13:26.504Z · comments (6)

Improving Model-Written Evals for AI Safety Benchmarking
Sunishchal Dev (sunishchal-dev) · 2024-10-15T18:25:08.179Z · comments (0)

[link] UK AISI: Early lessons from evaluating frontier AI systems
Zach Stein-Perlman · 2024-10-25T19:00:21.689Z · comments (0)

How to put California and Texas on the campaign trail!
Yair Halberstadt (yair-halberstadt) · 2024-11-06T06:08:25.673Z · comments (4)

The new ruling philosophy regarding AI
Mitchell_Porter · 2024-11-11T13:28:24.476Z · comments (0)

No Electricity in Manchuria
winstonBosan · 2024-11-19T01:11:58.661Z · comments (0)

[link] A primer on the next generation of antibodies
Abhishaike Mahajan (abhishaike-mahajan) · 2024-09-01T22:37:59.207Z · comments (0)

Fun With The Tabula Muris (Senis)
sarahconstantin · 2024-09-20T18:20:01.901Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

raemon on Raemon's Shortform

Motif coming up for me: a lot of skill ceilings are much higher than you might think, and worth investing in.

Some skills that you can be way better at:

Listening to people, and hearing what they're actually trying to say, and gaining value from it
Noticing subtle things that are important. You can learn to notice like 5 different things happening inside you or around you, that occured in <1 second.
Being concrete, in ways that help you resolve confusion and gain momentum on solving problems.

daniel-murfet on The Queen’s Dilemma: A Paradox of Control

Is restricting human agency fine if humans have little control over where it is restricted and to what degree?

daniel-murfet on The Queen’s Dilemma: A Paradox of Control

Re: your first point. I think I'm still a bit confused here and that's partly why I wanted to write this down and have people poke at it. Following Sen (but maybe I'm misunderstanding him) I'm not completely convinced I know how to factor human agency into "winning". One part of me wants to say that whatever notion of agency I have, in some sense it's a property of world states and in principle I could extract it with enough monitoring of my brain or whatever, and then any prescribed tradeoff between "measured sense of agency" and "score" is something I could give to the machine as a goal.

So then I end up with the machine giving me the precise amount of leeway that lets me screw up the game just right for my preferences.

I don't see a fundamental problem with that, but it's also not the part of the metaphor that seems most interesting to me. What I'm more interested in is human inferiority as a pattern, and the way that pattern pervades the overall system and translates into computational structure, perhaps in surprising and indirect ways.

rvnnt on Hierarchical Agency: A Missing Piece in AI Alignment

A related pattern-in-reality that I've had on my todo-list to investigate is something like "cooperation-enforcing structures". Things like

legal systems, police
immune systems (esp. in suppressing cancer)
social norms, reputation systems, etc.

I'd been approaching this from a perspective of "how defeating Moloch can happen in general" and "how might we steer Earth to be less Moloch-fucked"; not so much AI safety directly.

Do you think a good theory of hierarchical agency would subsume those kinds of patterns-in-reality? If yes: I wonder if their inclusion could be used as a criterion/heuristic for narrowing down the search for a good theory?

john-fisher on John Fisher's Shortform

Genuinely curious - what do you think is most likely to go wrong? I imagine there would be quite a lot of corporate pushback via lobbying... is that what you mean?

john-fisher on John Fisher's Shortform

Not sure I understand... Can you elaborate?

daniel-murfet on The Queen’s Dilemma: A Paradox of Control

I'll reply in a few branches. Re: stochastic chess. I think there's a difference between a metaphor and a toy model; this is a metaphor, and the ingredients are chosen to illustrate in a microcosm some features I think are relevant about the full picture. The speed differential, and some degree of stochasticity, seem like aspects of human intervention in AI systems that seem meaningful to me.

I do agree that if one wanted to isolate the core phenomena here mathematically and study it, chess might not be the right toy model.

will-taylor on Counting AGIs

while this paradigm of 'training a model that's an agi, and then running it at inference' is one way we get to transformative agi, i find myself thinking that probably WON'T be the first transformative AI, because my guess is that there are lots of tricks using lots of compute at inference to get not quite transformative ai to transformative ai.

Agreed that this is far from the only possibility, and we have some discussion of increasing inference time to make the final push up to generality in the bit beginning "If general intelligence is achievable by properly inferencing a model with a baseline of capability that is lower than human-level..." We did a bit more thinking around this topic which we didn't think was quite core to the post, so Connor has written it up on his blog here: https://arcaderhetoric.substack.com/p/moravecs-sea

and i doubt that these tricks can funge against train time compute, as you seem to be assuming in your analysis.

Our method 5 is intended for this case - we'd use an appropriate 'capabilities per token' multiplier to account for needing extra inference time to reach human level.

zy on Daniel Kokotajlo's Shortform

It is interesting; I am only a half musician but I wonder what a true musician think about the music generation quality generally; also this reminds me of the Silicon Valley show's music similarity tool to check for copyright issues; that might be really useful nowadays lmao

will-taylor on Counting AGIs

Our pleasure!

I'm not convinced a first generation AGI would be "super expert level in most subjects". I think it's more likely they'd be extremely capable in some areas but below human level in others. (This does mean the 'drop-in worker' comparison isn't perfect, as presumably people would use them for the stuff they're really good at rather than any task.) See the section which begins "As of 2024, AI systems have demonstrated extremely uneven capabilities" for more discussion of this and some relevant links. I agree on the knowledge access and communication speed, but think they're still likely to suffer from hallucination (if they're LLM-like) which could prove limiting for really difficult problems with lots of steps.