LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Information dark matter
Logan Kieller (logan-kieller) · 2024-10-01T15:05:41.159Z · comments (4)

[link] Twitter thread on open-source AI
Richard_Ngo (ricraz) · 2024-07-31T00:26:11.655Z · comments (6)

DIY LessWrong Jewelry
Fluffnutt (Pear) · 2024-08-25T21:33:56.173Z · comments (0)

My disagreements with "AGI ruin: A List of Lethalities"
Noosphere89 (sharmake-farah) · 2024-09-15T17:22:18.367Z · comments (44)

Proveably Safe Self Driving Cars [Modulo Assumptions]
Davidmanheim · 2024-09-15T13:58:19.472Z · comments (26)

[question] How unusual is the fact that there is no AI monopoly?
Viliam · 2024-08-16T20:21:51.012Z · answers+comments (15)

[link] A computational complexity argument for many worlds
jessicata (jessica.liu.taylor) · 2024-08-13T19:35:10.116Z · comments (15)

DunCon @Lighthaven
Duncan Sabien (Deactivated) (Duncan_Sabien) · 2024-09-29T04:56:27.205Z · comments (0)

[link] An X-Ray is Worth 15 Features: Sparse Autoencoders for Interpretable Radiology Report Generation
hugofry · 2024-10-07T08:53:14.658Z · comments (0)

[link] NAO Updates, Fall 2024
jefftk (jkaufman) · 2024-10-18T00:00:04.142Z · comments (2)

An argument that consequentialism is incomplete
cousin_it · 2024-10-07T09:45:12.754Z · comments (27)

Investigating the Ability of LLMs to Recognize Their Own Writing
Christopher Ackerman (christopher-ackerman) · 2024-07-30T15:41:44.017Z · comments (0)

Apply to MATS 7.0!
Ryan Kidd (ryankidd44) · 2024-09-21T00:23:49.778Z · comments (0)

Book Review: What Even Is Gender?
Joey Marcellino · 2024-09-01T16:09:27.773Z · comments (14)

Music in the AI World
Martin Sustrik (sustrik) · 2024-08-16T04:20:01.706Z · comments (8)

RLHF is the worst possible thing done when facing the alignment problem
tailcalled · 2024-09-19T18:56:27.676Z · comments (10)

[LDSL#6] When is quantification needed, and when is it hard?
tailcalled · 2024-08-13T20:39:45.481Z · comments (0)

[link] Epistemic states as a potential benign prior
Tamsin Leake (carado-1) · 2024-08-31T18:26:14.093Z · comments (2)

Extracting SAE task features for in-context learning
Dmitrii Kharlapenko (dmitrii-kharlapenko) · 2024-08-12T20:34:13.747Z · comments (1)

[question] What's the Deal with Logical Uncertainty?
Ape in the coat · 2024-09-16T08:11:43.588Z · answers+comments (23)

Balancing Label Quantity and Quality for Scalable Elicitation
Alex Mallen (alex-mallen) · 2024-10-24T16:49:00.939Z · comments (1)

[link] What is it like to be psychologically healthy? Podcast ft. DaystarEld
Chipmonk · 2024-10-05T19:14:04.743Z · comments (8)

[question] When is reward ever the optimization target?
Noosphere89 (sharmake-farah) · 2024-10-15T15:09:20.912Z · answers+comments (12)

[LDSL#1] Performance optimization as a metaphor for life
tailcalled · 2024-08-08T16:16:27.349Z · comments (4)

[link] Concrete benefits of making predictions
Jonny Spicer (jonnyspicer) · 2024-10-17T14:23:17.613Z · comments (5)

Inference-Only Debate Experiments Using Math Problems
Arjun Panickssery (arjun-panickssery) · 2024-08-06T17:44:27.293Z · comments (0)

Fun With CellxGene
sarahconstantin · 2024-09-06T22:00:03.461Z · comments (2)

AIS terminology proposal: standardize terms for probability ranges
eggsyntax · 2024-08-30T15:43:39.857Z · comments (12)

AI #85: AI Wins the Nobel Prize
Zvi · 2024-10-10T13:40:07.286Z · comments (6)

[link] Safety tax functions
owencb · 2024-10-20T14:08:38.099Z · comments (0)

[link] [Paper Blogpost] When Your AIs Deceive You: Challenges with Partial Observability in RLHF
Leon Lang (leon-lang) · 2024-10-22T13:57:41.125Z · comments (0)

Some comments on intelligence
Viliam · 2024-08-01T15:17:07.215Z · comments (5)

AI labs can boost external safety research
Zach Stein-Perlman · 2024-07-31T19:30:16.207Z · comments (1)

[question] Where to find reliable reviews of AI products?
Elizabeth (pktechgirl) · 2024-09-17T23:48:25.899Z · answers+comments (6)

[link] My Methodological Turn
adamShimi · 2024-09-29T15:01:45.986Z · comments (0)

[LDSL#4] Root cause analysis versus effect size estimation
tailcalled · 2024-08-11T16:12:14.604Z · comments (0)

[link] [Paper] Hidden in Plain Text: Emergence and Mitigation of Steganographic Collusion in LLMs
Yohan Mathew (ymath) · 2024-09-25T14:52:48.263Z · comments (1)

Examples of How I Use LLMs
jefftk (jkaufman) · 2024-10-14T17:10:04.597Z · comments (2)

[link] AI forecasting bots incoming
Dan H (dan-hendrycks) · 2024-09-09T19:14:31.050Z · comments (44)

[link] New blog: Expedition to the Far Lands
Connor Leahy (NPCollapse) · 2024-08-17T11:07:48.537Z · comments (3)

Towards Quantitative AI Risk Management
Henry Papadatos (henry) · 2024-10-16T19:26:48.817Z · comments (1)

[link] A new process for mapping discussions
Nathan Young · 2024-09-30T08:57:20.029Z · comments (7)

[link] AI Safety at the Frontier: Paper Highlights, August '24
gasteigerjo · 2024-09-03T19:17:24.850Z · comments (0)

[question] What prevents SB-1047 from triggering on deep fake porn/voice cloning fraud?
ChristianKl · 2024-09-26T09:17:39.088Z · answers+comments (21)

European Progress Conference
Martin Sustrik (sustrik) · 2024-10-06T11:10:03.819Z · comments (11)

An AI crash is our best bet for restricting AI
Remmelt (remmelt-ellen) · 2024-10-11T02:12:03.491Z · comments (1)

[link] Evaluating Synthetic Activations composed of SAE Latents in GPT-2
Giorgi Giglemiani (Rakh) · 2024-09-25T20:37:48.227Z · comments (0)

Investigating Sensitive Directions in GPT-2: An Improved Baseline and Comparative Analysis of SAEs
Daniel Lee (daniel-lee) · 2024-09-06T02:28:41.954Z · comments (0)

[link] Liquid vs Illiquid Careers
vaishnav92 · 2024-10-20T23:03:49.725Z · comments (5)

There aren't enough smart people in biology doing something boring
Abhishaike Mahajan (abhishaike-mahajan) · 2024-10-21T15:52:04.482Z · comments (13)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

abe-frei-pearson on Big tech transitions are slow (with implications for AI)

This argument seems to be a one by analogy. steam engine:industrial revolution::???:machine learning. But as you can see there's a term in the analogy I don't understand. Is ??? chatgpt? LLMs? Transformers? AlexNet? The internet? Digital computers? Something that hasn't yet been invented?

niplav on Zach Stein-Perlman's Shortform

Two others that come to mind:

Metaculus (used to be better though)
lobste.rs (quite specialized)
Quanta Magazine has some good comments, e.g. this article has the original researcher showing up & clarifying some questions in the comments

notfnofn on A Logical Proof for the Emergence and Substrate Independence of Sentience

Proof is a really strong word and (in my opinion) inappropriate in this context. This is about to become an extremely important question and we should be careful to avoid overconfidence. I've personally found this comment chain [LW(p) · GW(p)] to be an enlightening discussion on the complexity of this issue (but of course this is something that has been discussed endlessly elsewhere).

As a separate issue, let's say I write down the rule set for an automaton that will slowly grow and eventually emulate every finite string of finite grids of black and white pixels. This is not hard to do. Does it require a substrate to become conscious or is the rule set itself conscious? What if I actually run it in a corner of the universe that slowly uses surrounding matter to allow its output to grow larger and larger?

anthonyc on Big tech transitions are slow (with implications for AI)

There are also many ways the max likelihood model could be consistent with very rapid near-term change, too.

One is that, like in past transitions, the faster growth isn't an exponential, it gets faster and then eventually peters out, like any s-curve. If look at the world from 1700 to now, the industrial revolution is a sum of many individual such curves, but even so, the fastest years/decades of growth globally were ~50x faster than the slowest. If you shorten 1000x growth down to a couple of decades and assume a similar distribution of growth rates, then it matters a whole lot whether 2024 is year 1, year 5, or what. We could be 7 years into a two decade transition that began with transformer architecture, or two decades into a fifty year transition that started with some other machine learning advance, and those would be consistent with both the OP and "Things are about to move ridiculously fast."

In other words: Sustained faster-than-population economic growth didn't show up in Britain until a century or so into the industrial revolution began, peak global growth was a century or so after that, and in recent years the largest remaining countries have been catching up even faster than that even while growth in the UK and US and EU are slower than past peaks. If this were transitional year 7 of 20, and peak growth in the industrial revolution was 5-10%/yr, and this transition is 10x faster, than it's plausible to expect 1 year economic doubling times in each of several years between now and the early 2030s.

The OP seems to assume we're in year 1 or so out of 20-50, and that the most significant or fastest changes will happen near the end of that window. I'm not quite sure why I should agree with those assumptions.

sharmake-farah on Jimrandomh's Shortform

The issue is that all cryptography depends on one-way functions, so any ability to break a cryptographic algorithm that depends on one-way functions in a scalable way means you have defeated almost all of cryptography in practice.

So in one sense, a mathematical advance on a one-way function underlying a symmetric key algorithm would be disastrous for overall cryptographic prospects.

olli-savolainen on johnswentworth's Shortform

So this is why if you read the CNV studies and you look at the hits they identify, and how many subjects are covered by the identified hits, you find that like, maybe 2% of the cohort will have one of those specific identified hits and lose 2 IQ points or gain 2 kg of fat etc. So you can see how that would work out in embryo selection: you'd be able to avoid that loss, which is meaningful! ...in a tiny fraction of all embryos. On average, you'd just sequence them all, find no known pathogenic variant, and shrug, and use the SNP PGS like usual, having gained nothing.
Also, of course, WGS is substantially more expensive than SNP genotyping and more difficult to do on embryos.

That is relevant in pre-implantation diagnosis for parents and gene therapy at the population level. But for Qwisatz Haderach breeding purposes those costs are immaterial. There the main bottleneck is the iteration of selection, or making synthetic genomes. Going for the most typical genome with the least amount of originality is not a technical challenge in itself, right? We would not be interested in the effect of the ugliness, only in getting it out.

measure on Claude Sonnet 3.5.1 and Haiku 3.5

If you're reading this direct, this text is the last one that is wise like what's written between.

This sounds like it tried to encode something steganographically in the message? Maybe that accounts for some of the bizarre language.

cole-wyeth on Cole Wyeth's Shortform

I'm starting a google group for anyone who wants to see occasional updates on my Sherlockian Abduction Master List [LW · GW]. It occurred to me that anyone interested in the project would currently have to check the list to see any new observational cues (infrequently) added - also some people outside of lesswrong are interested.

akash-wasil on Anthropic rewrote its RSP

Henry from SaferAI claims that the new RSP is weaker and vaguer than the old RSP. Do others have thoughts on this claim? (I haven't had time to evaluate yet.)

Main Issue: Shift from precise definitions to vague descriptions.
The primary issue lies in Anthropic's shift away from precisely defined capability thresholds and mitigation measures. The new policy adopts more qualitative descriptions, specifying the capability levels they aim to detect and the objectives of mitigations, but it lacks concrete details on the mitigations and evaluations themselves. This shift significantly reduces transparency and accountability, essentially asking us to accept a "trust us to handle it appropriately" approach rather than providing verifiable commitments and metrics.

More from him:

Example: Changes in capability thresholds.
To illustrate this change, let's look at a capability threshold:

1️⃣ Version 1 (V1): AI Security Level 3 (ASL-3) was defined as "The model shows early signs of autonomous self-replication ability, as defined by a 50% aggregate success rate on the tasks listed in [Appendix on Autonomy Evaluations]."

2️⃣ Version 2 (V2): ASL-3 is now defined as "The ability to either fully automate the work of an entry-level remote-only researcher at Anthropic, or cause dramatic acceleration in the rate of effective scaling" (quantified as an increase of approximately 1000x in a year).

In V2, the thresholds are no longer defined by quantitative benchmarks. Anthropic now states that they will demonstrate that the model's capabilities are below these thresholds when necessary. However, this approach is susceptible to shifting goalposts as capabilities advance.

🔄 Commitment Changes: Dilution of mitigation strategies.
A similar trend is evident in their mitigation strategies. Instead of detailing specific measures, they focus on mitigation objectives, stating they will prove these objectives are met when required. This change alters the nature of their commitments.

💡 Key Point: Committing to robust measures and then diluting them significantly is not how genuine commitments are upheld.
The general direction of these changes is concerning. By allowing more leeway to decide if a model meets thresholds, Anthropic risks prioritizing scaling over safety, especially as competitive pressures intensify.

I was expecting the RSP to become more specific as technology advances and their risk management process matures, not the other way around.

gunnar_zarncke on [Intuitive self-models] 2. Conscious Awareness

I mean this (my summary of the Libet experiments and their replications):

Brain activity detectable with EEG (Readiness Potential) begins between 350 and multiple seconds (depending on experiment and measurement resolution) before the person consciously feels the intention to act (voluntary motor movement).
Subjects report becoming aware of their intention to act (via clock tracking) about 200 ms before the action itself (e.g., pressing a button). 200ms seems relatively fixed, but cognitive load can delay.

To give a specific quote:

Matsuhashi and Hallet: Our result suggests that the perception of intention rises through multiple levels of awareness, starting just after the brain initiates movement.
[...]
1. The first detected event in most subjects was the onset of BP. They were not aware of the movement genesis at this time, even if they were alerted by tones.
2. As the movement genesis progressed, the awareness state rose higher and after the T time, if the subjects were alerted, they could consciously access awareness of their movement genesis as intention. The late BP began within this period.
3. The awareness state rose even higher as the process went on, and at the W time it reached the level of meta-awareness without being probed. In Libet et al’s clock task, subjects could memorize the clock position at this time.
4. Shortly after that, the movement genesis reached its final point, after which the subjects could not veto the movement any more (P time).
[...]
We studied the immediate intention directly preceding the action. We think it best to understand movement genesis and intention as separate phenomena, both measurable. Movement genesis begins at a level beyond awareness and over time gradually becomes accessible to consciousness as the perception of intention.

Now, I think you'd say that what they measured wasn't S(A) but something else that is causally related, but then you are moving farther away from patterns we can observe in the brain. And your theory still has to explain the subclass of those S(A) that they did measure. The participants apparently thought these to be their decisions S(A) about their actions A.