LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Distinguishing ways AI can be "concentrated"
Matthew Barnett (matthew-barnett) · 2024-10-21T22:21:13.666Z · comments (2)

Interpretability of SAE Features Representing Check in ChessGPT
Jonathan Kutasov (jonathan-kutasov) · 2024-10-05T20:43:36.679Z · comments (2)

Do Sparse Autoencoders (SAEs) transfer across base and finetuned language models?
Taras Kutsyk · 2024-09-29T19:37:30.465Z · comments (8)

SAE features for refusal and sycophancy steering vectors
neverix · 2024-10-12T14:54:48.022Z · comments (4)

Just because an LLM said it doesn't mean it's true: an illustrative example
dirk (abandon) · 2024-08-21T21:05:59.691Z · comments (12)

[link] Care Doesn't Scale
stavros · 2024-10-28T11:57:38.742Z · comments (1)

Sleeping on Stage
jefftk (jkaufman) · 2024-10-22T00:50:07.994Z · comments (3)

Option control
Joe Carlsmith (joekc) · 2024-11-04T17:54:03.073Z · comments (0)

LessWrong email subscriptions?
Raemon · 2024-08-27T21:59:56.855Z · comments (6)

[question] Seeking AI Alignment Tutor/Advisor: $100–150/hr
MrThink (ViktorThink) · 2024-10-05T21:28:16.491Z · answers+comments (3)

Thinking in 2D
sarahconstantin · 2024-10-20T19:30:05.842Z · comments (0)

Standard SAEs Might Be Incoherent: A Choosing Problem & A “Concise” Solution
Kola Ayonrinde (kola-ayonrinde) · 2024-10-30T22:50:45.642Z · comments (0)

[link] Can a Bayesian Oracle Prevent Harm from an Agent? (Bengio et al. 2024)
mattmacdermott · 2024-09-01T07:46:26.647Z · comments (0)

SAEs you can See: Applying Sparse Autoencoders to Clustering
Robert_AIZI · 2024-10-28T14:48:16.744Z · comments (0)

[link] A brief history of the automated corporation
owencb · 2024-11-04T14:35:04.906Z · comments (1)

[link] SB 1047 gets vetoed
ryan_b · 2024-09-30T15:49:38.609Z · comments (1)

AXRP Episode 36 - Adam Shai and Paul Riechers on Computational Mechanics
DanielFilan · 2024-09-29T05:50:02.531Z · comments (0)

Linkpost: "Imagining and building wise machines: The centrality of AI metacognition" by Johnson, Karimi, Bengio, et al.
Chris_Leong · 2024-11-11T16:13:26.504Z · comments (6)

You're Playing a Rough Game
jefftk (jkaufman) · 2024-10-17T19:20:06.251Z · comments (2)

A suite of Vision Sparse Autoencoders
Louka Ewington-Pitsos (louka-ewington-pitsos) · 2024-10-27T04:05:20.377Z · comments (0)

[question] When can I be numerate?
FinalFormal2 · 2024-09-12T04:05:27.710Z · answers+comments (3)

How to put California and Texas on the campaign trail!
Yair Halberstadt (yair-halberstadt) · 2024-11-06T06:08:25.673Z · comments (4)

[link] Fictional parasites very different from our own
Abhishaike Mahajan (abhishaike-mahajan) · 2024-09-08T14:59:39.080Z · comments (0)

Trying to be rational for the wrong reasons
Viliam · 2024-08-20T16:18:06.385Z · comments (8)

Fun With The Tabula Muris (Senis)
sarahconstantin · 2024-09-20T18:20:01.901Z · comments (0)

The case for more Alignment Target Analysis (ATA)
Chi Nguyen · 2024-09-20T01:14:41.411Z · comments (13)

[link] UK AISI: Early lessons from evaluating frontier AI systems
Zach Stein-Perlman · 2024-10-25T19:00:21.689Z · comments (0)

[question] When engaging with a large amount of resources during a literature review, how do you prevent yourself from becoming overwhelmed?
corruptedCatapillar · 2024-11-01T07:29:49.262Z · answers+comments (2)

Abstractions are not Natural
Alfred Harwood · 2024-11-04T11:10:09.023Z · comments (21)

[link] A primer on the next generation of antibodies
Abhishaike Mahajan (abhishaike-mahajan) · 2024-09-01T22:37:59.207Z · comments (0)

[link] Conventional footnotes considered harmful
dkl9 · 2024-10-01T14:54:01.732Z · comments (16)

[link] Death notes - 7 thoughts on death
Nathan Young · 2024-10-28T15:01:13.532Z · comments (1)

[link] what becoming more secure did for me
Chipmonk · 2024-08-22T17:44:48.525Z · comments (5)

A Triple Decker for Elfland
jefftk (jkaufman) · 2024-10-11T01:50:02.332Z · comments (0)

[link] Introduction to Super Powers (for kids!)
Shoshannah Tekofsky (DarkSym) · 2024-09-20T17:17:27.070Z · comments (0)

The new ruling philosophy regarding AI
Mitchell_Porter · 2024-11-11T13:28:24.476Z · comments (0)

[link] overengineered air filter shelving
bhauth · 2024-11-08T22:04:39.987Z · comments (2)

Improving Model-Written Evals for AI Safety Benchmarking
Sunishchal Dev (sunishchal-dev) · 2024-10-15T18:25:08.179Z · comments (0)

[link] "25 Lessons from 25 Years of Marriage" by honorary rationalist Ferrett Steinmetz
CronoDAS · 2024-10-02T22:42:30.509Z · comments (2)

AI Safety University Organizing: Early Takeaways from Thirteen Groups
agucova · 2024-10-02T15:14:00.137Z · comments (0)

[link] Tokyo AI Safety 2025: Call For Papers
Blaine (blaine-rogers) · 2024-10-21T08:43:38.467Z · comments (0)

Seeking Mechanism Designer for Research into Internalizing Catastrophic Externalities
c.trout (ctrout) · 2024-09-11T15:09:48.019Z · comments (2)

[question] How should vegans think about Methionine needs?
ChristianKl · 2024-11-10T09:28:47.655Z · answers+comments (1)

Winning isn't enough
JesseClifton · 2024-11-05T11:37:39.486Z · comments (14)

Apply to the Cooperative AI PhD Fellowship by October 14th!
Lewis Hammond (lewis-hammond-1) · 2024-10-05T12:41:24.093Z · comments (0)

Rashomon - A newsbetting site
ideasthete · 2024-10-15T18:15:02.476Z · comments (8)

[link] A Defense of Peer Review
Niko_McCarty (niko-2) · 2024-10-22T16:16:49.982Z · comments (1)

Would you benefit from, or object to, a page with LW users' reacts?
Raemon · 2024-08-20T16:35:47.568Z · comments (6)

Deception and Jailbreak Sequence: 1. Iterative Refinement Stages of Deception in LLMs
Winnie Yang (winnie-yang) · 2024-08-22T07:32:07.600Z · comments (1)

[link] The Offense-Defense Balance of Gene Drives
Maxwell Tabarrok (maxwell-tabarrok) · 2024-09-27T16:47:25.976Z · comments (1)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

milan-w on Thoughts after the Wolfram and Yudkowsky discussion

I asked GPT4o to perform a web search for podcast appearances by Yudkowsky. It dug up these two lists (apparently, autogenerated from scrapped data). When I asked it to base use these lists as a starting point to look for high quality debates and after some further elicitation and wrangling, the best we could find was this moderated panel discussion featuring Yudkowsky, Liv Boeree, and Joscha Bach. There's also the Yudkowsky v/s George Hotz debate on Lex Fridman, and the time Yudkowsky explained AI risk to the streamer and political commentaror known as Destiny. I have watched none of the three debates I just mentioned; but I know that Hotz is a heavily vibes-based (rather than object-level-based) thinker, and that Destiny has no background in AI risk, but has good epistemics. I think he probably offered reasonable-at-first-approximation-yet-mostly-uninformed pushback.

cole-wyeth on Heresies in the Shadow of the Sequences

Perhaps Legg-Hutter intelligence.
I'm not sure how much the goal matters - probably the details depend on the utility function you want to optimize. I think you can do about as well as possible by carving out a utility function module and designing the rest uniformly to pursue the objectives of that module. But perhaps this comes at a fairly significant cost (i.e. you'd need a somewhat larger computer to get the same performance if you insist on doing it this way).
...And yes, there does exist a computer program which is remarkably good at just chess and nothing else, but that's not the kind of thing I'm talking about here.
Yes, the I/O channels should be fixed along with the hardware.

olli-jaerviniemi on The Parable of the Dagger

dagon on nikola's Shortform

Hmm. I think there are two dimensions to the advice (what is a reasonable distribution of timelines to have, vs what should I actually do). It's perfectly fine to have some humility about one while still giving opinions on the other. "If you believe Y, then it's reasonable to do X" can be a useful piece of advice. I'd normally mention that I don't believe Y, but for a lot of conversations, we've already had that conversation, and it's not helpful to repeat it.

tomcatfish on Drawing Less Wrong: Should You Learn to Draw?

People seeing this in the future: Check out Draw a Box for some low-level mechanical stuff.

milan-w on Heresies in the Shadow of the Sequences

Though there are elegant and still practical specifications for intelligent behavior, the most intelligent agent that runs on some fixed hardware has completely unintelligible cognitive structures and in fact its source code is indistinguishable from white noise.

What does "most intelligent agent" mean?
Don't you think we'd also need to specify "for a fixed (basket of) tasks"?
Are the I/O channels fixed along with the hardware?

yams on yams's Shortform

I (and maybe you) have historically underrated the density of people with religious backgrounds in secular hubs. Most of these people don't 'think differently', in a structural sense, from their forebears; they just don't believe in that God anymore.

The hallmark here is a kind of naive enlightenment approach that ignores ~200 years of intellectual history (and a great many thinkers from before that period, including canonical philosophers they might claim to love/respect/understand). This type of thing.

They're no less tribal or dogmatic, or more critical, than the place they came from. They just vote the other way and can maybe talk about one or two levels of abstraction beyond the stereotype they identify against (although they can't really think about those levels).

You should still be nice to them, and honest with them, but you should understand what you're getting into.

The mere biographical detail of having a religious background or being religious isn't a strong mark against someone's thinking on other topics, but it is a sign you may be talking to a member of a certain meta-intellectual culture, and need to modulate your style. I have definitely had valuable conversations with people that firmly belong in this category, and would not categorically discourage engagement. Just don't be so surprised when the usual jutsu falls flat!

mako-yass on nikola's Shortform

Timelines are a result of a person's intuitions about a technical milestone being reached in the future, it is super obviously impossible for us to have a consensus about that kind of thing.

Talking only synchronises beliefs if you have enough time to share all of the relevant information, with technical matters, you usually don't.

alok-singh on Derivative AT a discontinuity

added some open circles

alok-singh on Derivative AT a discontinuity

I adjusted H to use heaviside's 1/2 convention, good catch.