LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] The Evals Gap
Marius Hobbhahn (marius-hobbhahn) · 2024-11-11T16:42:46.287Z · comments (7)

AI Safety as a YC Startup
Lukas Petersson (lukas-petersson-1) · 2025-01-08T10:46:29.042Z · comments (9)

AI #99: Farewell to Biden
Zvi · 2025-01-16T14:20:05.768Z · comments (5)

Predict 2025 AI capabilities (by Sunday)
Jonas V (Jonas Vollmer) · 2025-01-15T00:16:05.034Z · comments (3)

[question] What Have Been Your Most Valuable Casual Conversations At Conferences?
johnswentworth · 2024-12-25T05:49:36.711Z · answers+comments (21)

How might we solve the alignment problem? (Part 1: Intro, summary, ontology)
Joe Carlsmith (joekc) · 2024-10-28T21:57:12.063Z · comments (5)

Detect Goodhart and shut down
Jeremy Gillen (jeremy-gillen) · 2025-01-22T18:45:30.910Z · comments (17)

The OODA Loop -- Observe, Orient, Decide, Act
Davis_Kingsley · 2025-01-01T08:00:27.979Z · comments (2)

Luck Based Medicine: No Good Very Bad Winter Cured My Hypothyroidism
Elizabeth (pktechgirl) · 2024-12-08T20:10:02.651Z · comments (3)

A Novel Emergence of Meta-Awareness in LLM Fine-Tuning
rife (edgar-muniz) · 2025-01-15T22:59:46.321Z · comments (31)

Estimates of GPU or equivalent resources of large AI players for 2024/5
CharlesD · 2024-11-28T23:01:58.522Z · comments (7)

[link] Discursive Warfare and Faction Formation
Benquo · 2025-01-09T16:47:31.824Z · comments (3)

A Conflicted Linkspost
Screwtape · 2024-11-21T00:37:54.035Z · comments (0)

Correct my H5N1 research
Elizabeth (pktechgirl) · 2024-12-09T19:07:03.277Z · comments (25)

Tax Price Gouging?
jefftk (jkaufman) · 2025-01-17T14:10:03.395Z · comments (20)

[link] The Mysterious Trump Buyers on Polymarket
Annapurna (jorge-velez) · 2024-10-18T13:26:25.565Z · comments (10)

[link] Just one more exposure bro
Chipmonk · 2024-12-12T21:37:07.069Z · comments (6)

Claude Sonnet 3.5.1 and Haiku 3.5
Zvi · 2024-10-24T14:50:06.286Z · comments (9)

I Finally Worked Through Bayes' Theorem (Personal Achievement)
keltan · 2024-12-05T02:04:16.547Z · comments (6)

Which evals resources would be good?
Marius Hobbhahn (marius-hobbhahn) · 2024-11-16T14:24:48.012Z · comments (4)

[link] Preference Inversion
Benquo · 2025-01-02T18:15:52.938Z · comments (46)

Metastatic Cancer Treatment Since 2010: The Success Stories
sarahconstantin · 2024-11-04T22:50:09.386Z · comments (2)

Announcement: Learning Theory Online Course
Yegreg · 2025-01-20T19:55:57.598Z · comments (11)

A Solution for AGI/ASI Safety
Weibing Wang (weibing-wang) · 2024-12-18T19:44:29.739Z · comments (29)

Introducing the WeirdML Benchmark
Håvard Tveit Ihle (havard-tveit-ihle) · 2025-01-16T11:38:17.056Z · comments (13)

[link] A toy evaluation of inference code tampering
Fabien Roger (Fabien) · 2024-12-09T17:43:40.910Z · comments (0)

Role embeddings: making authorship more salient to LLMs
Nina Panickssery (NinaR) · 2025-01-07T20:13:16.677Z · comments (0)

[link] Epistemic status: poetry (and other poems)
Richard_Ngo (ricraz) · 2024-11-21T18:13:17.194Z · comments (5)

[Intuitive self-models] 7. Hearing Voices, and Other Hallucinations
Steven Byrnes (steve2152) · 2024-10-29T13:36:16.325Z · comments (2)

Low Probability Estimation in Language Models
Gabriel Wu (gabriel-wu) · 2024-10-18T15:50:05.947Z · comments (0)

[link] [Paper Blogpost] When Your AIs Deceive You: Challenges with Partial Observability in RLHF
Leon Lang (leon-lang) · 2024-10-22T13:57:41.125Z · comments (2)

On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback
Marcus Williams · 2024-11-07T15:39:06.854Z · comments (6)

Logits, log-odds, and loss for parallel circuits
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-20T09:56:26.031Z · comments (0)

DeekSeek v3: The Six Million Dollar Model
Zvi · 2024-12-31T15:10:06.924Z · comments (6)

Dave Kasten's AGI-by-2027 vignette
davekasten · 2024-11-26T23:20:47.212Z · comments (8)

AI #94: Not Now, Google
Zvi · 2024-12-12T15:40:06.336Z · comments (3)

[link] Careless thinking: A theory of bad thinking
Nathan Young · 2024-12-17T18:23:16.140Z · comments (17)

D&D.Sci Dungeonbuilding: the Dungeon Tournament
aphyer · 2024-12-14T04:30:55.656Z · comments (16)

Live Machinery: An Interface Design Philosophy for Wholesome AI Futures
Sahil · 2024-11-01T17:24:09.957Z · comments (3)

Cognitive Biases Contributing to AI X-risk — a deleted excerpt from my 2018 ARCHES draft
Andrew_Critch · 2024-12-03T09:29:49.745Z · comments (2)

[link] Active Recall and Spaced Repetition are Different Things
Saul Munn (saul-munn) · 2024-11-08T20:14:56.092Z · comments (2)

Greedy-Advantage-Aware RLHF
sej2020 · 2024-12-27T19:47:25.562Z · comments (15)

Looking back on the Future of Humanity Institute - Asterisk
jakeeaton · 2024-11-19T00:44:40.928Z · comments (0)

Considerations on orca intelligence
Towards_Keeperhood (Simon Skade) · 2024-12-29T14:35:16.445Z · comments (5)

An alternative approach to superbabies
Towards_Keeperhood (Simon Skade) · 2024-11-05T22:56:15.740Z · comments (19)

Finding Features Causally Upstream of Refusal
Daniel Lee (daniel-lee) · 2025-01-14T02:30:04.321Z · comments (5)

Meta Pivots on Content Moderation
Zvi · 2025-01-17T14:20:06.727Z · comments (3)

[link] Review: Breaking Free with Dr. Stone
TurnTrout · 2024-12-18T01:26:37.730Z · comments (5)

D&D.Sci Coliseum: Arena of Data Evaluation and Ruleset
aphyer · 2024-10-29T01:21:03.075Z · comments (13)

[link] Analyzing how SAE features evolve across a forward pass
bensenberner · 2024-11-07T22:07:02.827Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

shankar-sivarajan on Is there such a thing as an impossible protein?

Another possible interpretation of the titular question: an amino acid sequence with a fixed stable functional configuration but one that it cannot naturally reach because some intermediate stage of the folding is forbidden. I suspect such a thing is possible, and one might even be able to synthesize the final structure (in pieces, perhaps?).

My first thought was knotted proteins, but somehow those actually exist in nature (how?!): link.

benito on Announcing Dialogues

I am sad they're not getting as much use. I have wondered if they would work well as part of the comment section UI, where if you're having a back-and-forth with someone, the site instead offers you "Would you like to have a dialogue instead?" with a single button.

yegreg on Announcement: Learning Theory Online Course

If it's just questions about the notation, please also feel free to ask here whatever is unclear, I'm happy to clarify (I understand that I kept the wording of the questions quite brief).

deluks917 on Tail SP 500 Call Options

We are clearly looking at things differently. That's fine. But if two people see things differently I don't think it's wise to map what they are saying into your ontology.

jacques-thibodeau on jacquesthibs's Shortform

I'm currently in the Catalyze Impact AI safety incubator program. I'm working on creating infrastructure for automating AI safety research. This startup is attempting to fill a gap in the alignment ecosystem and looking to build with the expectation of under 3 years left to automated AI R&D. This is my short timelines plan [LW · GW].

I'm looking to talk (for feedback) to anyone interested in the following:

AI control
Automating math to tackle problems as described in Davidad's Safeguarded AI programme.
High-assurance safety cases [LW · GW]
How to robustify society in a post-AGI world
Leverage large amounts of inference-time compute to make progress on alignment research
Short timelines
Profitability while still reducing overall x-risk

If you're interested in chatting or giving feedback, please DM me!

artifex0 on Mechanisms too simple for humans to design

One interesting example of humans managing to do this kind of compression in software: .kkrieger is a fully-functional first person shooter game with varied levels, detailed textures and lighting, multiple weapons and enemies and a full soundtrack. Replicating it in a modern game engine would probably produce a program at least a gigabyte large, but because of some incredibly clever procedural generation, .kkrieger managed to do it in under 100kb.

daniel-tan on Daniel Tan's Shortform

Some tech stacks / tools / resources for research. I have used most of these and found them good for my work.

TODO: check out https://www.lesswrong.com/posts/6P8GYb4AjtPXx6LLB/tips-and-code-for-empirical-research-workflows#Part_2__Useful_Tools [LW · GW]

Finetuning open-source language models.

Docker images: Nvidia CUDA latest image as default, or framework-specific image (e.g Axolotl)
Orchestrating cloud instances: Runpod
- Connecting to cloud instances: Paramiko
- Transferring data: SCP
Launching finetuning jobs: Axolotl
- Efficient tensor ops: FlashAttention, xFormers
- Multi-GPU training: DeepSpeed
- [Supports writing custom cuda kernels in Triton]
Monitoring ongoing jobs: Weights and Biases
Storing saved model checkpoints: Huggingface
Serving the trained checkpoints: vLLM.
- [TODO: look into llama-cpp-python and similar things for running on worse hardware]

Finetuning OpenAI language models.

End-to-end experiment management: openai-finetuner

Evaluating language models.

Running standard benchmarks: Inspect
Running custom evals: [janky framework which I might try to clean up and publish at some point]

AI productivity tools.

Programming: Cursor IDE
Thinking / writing: Claude
- Plausibly DeepSeek is now better
More extensive SWE: Devin
[TODO: look into agent workflows, OpenAI operator, etc]

Basic SWE

Managing virtual environments: PDM
Dependency management: UV
Versioning: Semantic release
Linting: Ruff
Testing: Pytest
CI: Github Actions
Repository structure: PDM
Repository templating: PDM
Building wheels for distribution: PDM
[TODO: set up a cloud development workflow]

Research communication.

Quick updates: Google Slides
Extensive writing: Google Docs, Overleaf
- Some friends have recommended Typst
Making figures: Google Draw, Excalidraw

ryan_greenblatt on ryan_greenblatt's Shortform

I think you are correct with respect to my estimate of and the associated model I was using. Sorry about my error here. I think I was fundamentally confusing a few things in my head when writing out the comment.

I think your refactoring of my strategy is correct and I try to check it myself, though I don't feel as confident in this as I feel in my approach being incorrect.

Your estimate doesn't account for the conversion between algorithmic improvement and labor efficiency, but it is easy to add this in by just changing the historical algorithmic efficiency improvement of 3.5x/year to instead be the adjusted effective labor efficiency rate and then solving identically. I was previously thinking the relationship was that labor efficiency was around the same as algorithmic efficiency, but I now think this is more likely to be around $a l g o_e f f i c i e n c y^{2}$ based on Tom's comment [LW(p) · GW(p)].

Plugging this is, we'd get:

$\frac{λ}{β} (1 - p) = \frac{r}{q} (1 - p) = \frac{ln ({3.5}^{2})}{0.4 ln (4) + 0.6 ln (1.6)} (1 - 0.4) = 2 \frac{ln (3.5)}{ln (2.3)} (1 - 0.4) = 2 \cdot 1.5 \cdot 0.6 = 1.8$

(In your comment you said $\frac{ln (3.5)}{ln (2.3)} = 1.6$ , but I think the arithmetic is a bit off here and the answer is closer to 1.5.)

poignardazur on When Is Insurance Worth It?

Not necessarily.

I think any method that calculates the value/utility of your wealth as a timeless function of utility per amount will be pretty disconnected to how people behave in practice. It doesn't account for people making plans and having to scrap them because an accident cost them their savings, for instance.

(But then again, I'm not an economist, maybe there are timeless frameworks that account for that.)

johnswentworth on The Case Against AI Control Research

To be clear, I am not claiming that this failure mode is very likely very hard to resolve. Just harder than "run it twice on the original question and a rephrasing/transformation of the question".