LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Fun With The Tabula Muris (Senis)
sarahconstantin · 2024-09-20T18:20:01.901Z · comments (0)

[link] Conventional footnotes considered harmful
dkl9 · 2024-10-01T14:54:01.732Z · comments (16)

$250K in Prizes: SafeBench Competition Announcement
ozhang (oliver-zhang) · 2024-04-03T22:07:41.171Z · comments (0)

Decent plan prize announcement (1 paragraph, $1k)
lukehmiles (lcmgcd) · 2024-01-12T06:27:44.495Z · comments (19)

[link] **In defence of Helen Toner, Adam D'Angelo, and Tasha McCauley**
mrtreasure · 2023-12-06T02:02:32.004Z · comments (3)

[link] An Intuitive Explanation of Sparse Autoencoders for Mechanistic Interpretability of LLMs
Adam Karvonen (karvonenadam) · 2024-06-25T15:57:16.872Z · comments (0)

Changing Contra Dialects
jefftk (jkaufman) · 2023-10-26T17:30:10.387Z · comments (2)

Economics Roundup #1
Zvi · 2024-03-26T14:00:06.332Z · comments (4)

[link] OpenAI Superalignment: Weak-to-strong generalization
Dalmert · 2023-12-14T19:47:24.347Z · comments (3)

[question] Impressions from base-GPT-4?
mishka · 2023-11-08T05:43:23.001Z · answers+comments (25)

If a little is good, is more better?
DanielFilan · 2023-11-04T07:10:05.943Z · comments (16)

Clipboard Filtering
jefftk (jkaufman) · 2024-04-14T20:50:02.256Z · comments (1)

Testing for consequence-blindness in LLMs using the HI-ADS unit test.
David Scott Krueger (formerly: capybaralet) (capybaralet) · 2023-11-24T23:35:29.560Z · comments (2)

[link] Arrogance and People Pleasing
Jonathan Moregård (JonathanMoregard) · 2024-02-06T18:43:09.120Z · comments (7)

Improving SAE's by Sqrt()-ing L1 & Removing Lowest Activating Features
Logan Riggs (elriggs) · 2024-03-15T16:30:00.744Z · comments (5)

Control Symmetry: why we might want to start investigating asymmetric alignment interventions
domenicrosati · 2023-11-11T17:27:10.636Z · comments (1)

[question] What ML gears do you like?
Ulisse Mini (ulisse-mini) · 2023-11-11T19:10:11.964Z · answers+comments (4)

Weeping Agents
pleiotroth · 2024-06-06T12:18:54.978Z · comments (2)

Distillation of 'Do language models plan for future tokens'
TheManxLoiner · 2024-06-27T20:57:34.351Z · comments (2)

Defense Against The Dark Arts: An Introduction
Lyrongolem (david-xiao) · 2023-12-25T06:36:06.278Z · comments (36)

[link] Was Partisanship Good for the Environmental Movement?
Jeffrey Heninger (jeffrey-heninger) · 2024-05-15T17:30:54.796Z · comments (0)

A Basic Economics-Style Model of AI Existential Risk
Rubi J. Hudson (Rubi) · 2024-06-24T20:26:09.744Z · comments (3)

[question] What percent of the sun would a Dyson Sphere cover?
Raemon · 2024-07-03T17:27:50.826Z · answers+comments (26)

A conceptual precursor to today's language machines [Shannon]
Bill Benzon (bill-benzon) · 2023-11-15T13:50:51.226Z · comments (6)

[link] Cellular respiration as a steam engine
dkl9 · 2024-02-25T20:17:38.788Z · comments (1)

5 psychological reasons for dismissing x-risks from AGI
Igor Ivanov (igor-ivanov) · 2023-10-26T17:21:48.580Z · comments (6)

Anomalous Concept Detection for Detecting Hidden Cognition
Paul Colognese (paul-colognese) · 2024-03-04T16:52:52.568Z · comments (3)

[link] Eric Schmidt on recursive self-improvement
nikola (nikolaisalreadytaken) · 2023-11-05T19:05:15.416Z · comments (3)

2. Premise two: Some cases of value change are (il)legitimate
Nora_Ammann · 2023-10-26T14:36:53.511Z · comments (7)

Foresight Institute: 2023 Progress & 2024 Plans for funding beneficial technology development
Allison Duettmann (allison-duettmann) · 2023-11-22T22:09:16.956Z · comments (1)

Scientific Method
Andrij “Androniq” Ghorbunov (andrij-androniq-ghorbunov) · 2024-02-18T21:06:45.228Z · comments (4)

My Alignment "Plan": Avoid Strong Optimisation and Align Economy
VojtaKovarik · 2024-01-31T17:03:34.778Z · comments (9)

Distinctions when Discussing Utility Functions
ozziegooen · 2024-03-09T20:14:03.592Z · comments (7)

An evaluation of Helen Toner’s interview on the TED AI Show
PeterH · 2024-06-06T17:39:40.800Z · comments (2)

[link] AI Alignment [Progress] this Week (11/05/2023)
Logan Zoellner (logan-zoellner) · 2023-11-07T13:26:21.995Z · comments (0)

[link] Compensating for Life Biases
Jonathan Moregård (JonathanMoregard) · 2024-01-09T14:39:14.229Z · comments (6)

How Congressional Offices Process Constituent Communication
Tristan Williams (tristan-williams) · 2024-07-02T12:38:41.472Z · comments (0)

[question] Would you have a baby in 2024?
martinkunev · 2023-12-25T01:52:04.358Z · answers+comments (76)

[link] Let's Design A School, Part 2.3 School as Education - The Curriculum (Phase 2, Specific)
Sable · 2024-05-15T20:58:50.981Z · comments (0)

Evolution did a surprising good job at aligning humans...to social status
Eli Tyre (elityre) · 2024-03-10T19:34:52.544Z · comments (37)

Utility is not the selection target
tailcalled · 2023-11-04T22:48:20.713Z · comments (1)

[link] Scenario planning for AI x-risk
Corin Katzke (corin-katzke) · 2024-02-10T00:14:11.934Z · comments (12)

Paper Summary: The Koha Code - A Biological Theory of Memory
jakej (jake-jenks) · 2023-12-30T22:37:13.865Z · comments (2)

[question] Could there be "natural impact regularization" or "impact regularization by default"?
tailcalled · 2023-12-01T22:01:46.062Z · answers+comments (6)

A bet on critical periods in neural networks
kave · 2023-11-06T23:21:17.279Z · comments (1)

[link] Extinction Risks from AI: Invisible to Science?
VojtaKovarik · 2024-02-21T18:07:33.986Z · comments (7)

[link] "25 Lessons from 25 Years of Marriage" by honorary rationalist Ferrett Steinmetz
CronoDAS · 2024-10-02T22:42:30.509Z · comments (2)

Seeking Mechanism Designer for Research into Internalizing Catastrophic Externalities
c.trout (ctrout) · 2024-09-11T15:09:48.019Z · comments (2)

the Daydication technique
chaosmage · 2024-10-18T21:47:46.448Z · comments (0)

[link] Clickbait Soapboxing
DaystarEld · 2024-03-13T14:09:29.890Z · comments (15)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

towards_keeperhood on johnswentworth's Shortform

(Thanks. I don't think this is necessarily significant evidence against my hypothesis (see my comment on GeneSmith's comment.)

Another confusing relevant piece of evidence I thought I throw in:

Human intelligence seems to me to be very heavytailed. (I assume this is uncontrovertial here, just look at the greatest scientists vs great scientists.)

If variance in intelligence was basically purely explained by mildly-delterious SNPs, this would seem a bit odd to me: If the average person had 1000SNPs, and then (using butt-numbers which might be very off) Einstein (+6.3std) had only 800 and the average theoretical physics professor (+4std) had 850, I wouldn't expect the difference there to be that big.

It's a bit less surprising on the model where most people have a few strongly delterious mutations, and supergeniuses are the lucky ones that have only 1 or 0 of those.

It's IMO even a bit less surprising on my hypothesis where in some cases the different hyperparameters happen to work much better with each other -- where supergeniuses are in some dimensions "more lucky than the base genome" (in a way that's not necessarily easy to pass on to offspring though because the genes are interdependent, which is why the genes didn't yet rise to fixation). But even there I'd still be pretty surprised by the heavytail.

The heavytail of intelligence really confuses me. (Given that it doesn't even come from sub-critical intelligence explosion dynamics.)

daemonicsigil on DaemonicSigil's Shortform

Yep, Claude sure is a pretty good coder: Wang Tile Pattern Generator

This took 1 initial write and 5 change requests to produce. The most manual effort I had to do was look at unicode ranges and see which ones had distinctive-looking glyphs in them. (Sorry if any of these aren't in your computer's glyph library.)

stephen-fowler on Jimrandomh's Shortform

I don't think people who disagree with your political beliefs must be inherently irrational.

Can you think of real world scenarios in which "shop elsewhere" isn't an option?

sodium on Lighthaven Sequences Reading Group #7 (Tuesday 10/22)

Is there no event on Oct 29th?

quetzal_rainbow on Big tech transitions are slow (with implications for AI)

Okay, I don't understand what do you mean by "degree of intergration". If we lived in a world where immigrant could have "high degree of intergration" within months, what would we have observed?

dweomite on Why I’m not a Bayesian

I'm confused about how continuity poses a problem for "This sentence has truth value in [0,1)" without also posing an equal problem for "this sentence is false", which was used as the original motivating example.

I'd intuitively expect "this sentence is false" == "this sentence has truth value 0" == "this sentence does not have a truth value in (0,1]"

gunnar_zarncke on AI Safety Camp 10

Hi, is there a way to get people in touch with a project or project lead? For example, I'd like to get in touch with Masaharu Mizumoto because iVAIS sounds related to the aintelope project.

gunnar_zarncke on The Case For Bullying

The post was likely downvoted because it conflicts with principles of empathy, cooperation, and intellectual rigor. Defending bullying, even provocatively, clashes with commonly held beliefs. The zero-sum framing of status is overly simplistic, ignoring positive-sum approaches. The provocative style comes off as antagonistic. Reframing the argument around prosocial accountability might get more positive responses.

lc on Shortform

I'm interested too. I think several of the above are solvable issues. AFAICT:

Solved by simple modifications to markets:

Races to correct naive bidders
Defending the true price from incorrect bidders for $ w/o letting price shift

Seem doable with thought:

Billing for information value
Policy conditionals

Seem hard/idk if it's possible to solve:

Collating information known by different bidders
Preventing tricking other bidders for profit

green_leaf on A Logical Proof for the Emergence and Substrate Independence of Sentience

I think we're spinning on an undefined term. I'd bet there are LOTS of details that effect my perception in subtle and aggregate ways which I don't consciously identify.

You're equivocating between perceiving a collection of details and consciously identifying every separate detail.

If I show you a grid of 100 pixels, then (barring imperfect eyesight) you will consciously perceive all 100 them. But you will not consciously identify every individual pixel unless your attention is aimed at each pixel in a for loop (that would take longer than consciously perceiving the entire grid at once).

There are lots of details that affect your perception that you don't consciously identify. But there is no detail that affects your perception that wouldn't be contained in your consciousness (otherwise it, by definition, couldn't affect in your perception).