LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Turning 22 in the Pre-Apocalypse
testingthewaters · 2024-08-22T20:28:25.794Z · comments (14)

My disagreements with "AGI ruin: A List of Lethalities"
Noosphere89 (sharmake-farah) · 2024-09-15T17:22:18.367Z · comments (19)

[link] Generative ML in chemistry is bottlenecked by synthesis
Abhishaike Mahajan (abhishaike-mahajan) · 2024-09-16T16:31:34.801Z · comments (2)

Monthly Roundup #22: September 2024
Zvi · 2024-09-17T12:20:08.297Z · comments (8)

Open Problems in AIXI Agent Foundations
Cole Wyeth (Amyr) · 2024-09-12T15:38:59.007Z · comments (2)

[link] On Fables and Nuanced Charts
Niko_McCarty (niko-2) · 2024-09-08T17:09:07.503Z · comments (2)

AI #82: The Governor Ponders
Zvi · 2024-09-19T13:30:04.863Z · comments (4)

[link] My Model of Epistemology
adamShimi · 2024-08-31T17:01:45.472Z · comments (0)

[link] Book review: On the Edge
PeterMcCluskey · 2024-08-30T22:18:39.581Z · comments (0)

[link] My Apartment Art Commission Process
jenn (pixx) · 2024-08-26T18:36:44.363Z · comments (4)

DIY LessWrong Jewelry
Fluffnutt (Pear) · 2024-08-25T21:33:56.173Z · comments (0)

Laziness death spirals
PatrickDFarley · 2024-09-19T15:58:30.252Z · comments (2)

We Don't Know Our Own Values, but Reward Bridges The Is-Ought Gap
johnswentworth · 2024-09-19T22:22:05.307Z · comments (4)

Book Review: What Even Is Gender?
Joey Marcellino · 2024-09-01T16:09:27.773Z · comments (14)

Fun With CellxGene
sarahconstantin · 2024-09-06T22:00:03.461Z · comments (2)

[link] Epistemic states as a potential benign prior
Tamsin Leake (carado-1) · 2024-08-31T18:26:14.093Z · comments (2)

AIS terminology proposal: standardize terms for probability ranges
eggsyntax · 2024-08-30T15:43:39.857Z · comments (12)

[question] Where to find reliable reviews of AI products?
Elizabeth (pktechgirl) · 2024-09-17T23:48:25.899Z · answers+comments (4)

Proveably Safe Self Driving Cars
Davidmanheim · 2024-09-15T13:58:19.472Z · comments (20)

[link] AI forecasting bots incoming
Dan H (dan-hendrycks) · 2024-09-09T19:14:31.050Z · comments (44)

[link] AI Safety at the Frontier: Paper Highlights, August '24
gasteigerjo · 2024-09-03T19:17:24.850Z · comments (0)

Investigating Sensitive Directions in GPT-2: An Improved Baseline and Comparative Analysis of SAEs
Daniel Lee (daniel-lee) · 2024-09-06T02:28:41.954Z · comments (0)

[link] If-Then Commitments for AI Risk Reduction [by Holden Karnofsky]
habryka (habryka4) · 2024-09-13T19:38:53.194Z · comments (0)

Just because an LLM said it doesn't mean it's true: an illustrative example
dirk (abandon) · 2024-08-21T21:05:59.691Z · comments (12)

LessWrong email subscriptions?
Raemon · 2024-08-27T21:59:56.855Z · comments (6)

[link] Can a Bayesian Oracle Prevent Harm from an Agent? (Bengio et al. 2024)
mattmacdermott · 2024-09-01T07:46:26.647Z · comments (0)

[link] A primer on the next generation of antibodies
Abhishaike Mahajan (abhishaike-mahajan) · 2024-09-01T22:37:59.207Z · comments (0)

Distinguish worst-case analysis from instrumental training-gaming
Olli Järviniemi (jarviniemi) · 2024-09-05T19:13:34.443Z · comments (0)

[question] What's the Deal with Logical Uncertainty?
Ape in the coat · 2024-09-16T08:11:43.588Z · answers+comments (16)

[link] Fictional parasites very different from our own
Abhishaike Mahajan (abhishaike-mahajan) · 2024-09-08T14:59:39.080Z · comments (0)

Seeking Mechanism Designer for Research into Internalizing Catastrophic Externalities
c.trout (ctrout) · 2024-09-11T15:09:48.019Z · comments (2)

Trying to be rational for the wrong reasons
Viliam · 2024-08-20T16:18:06.385Z · comments (8)

Would you benefit from, or object to, a page with LW users' reacts?
Raemon · 2024-08-20T16:35:47.568Z · comments (6)

GPT-3.5 judges can supervise GPT-4o debaters in capability asymmetric debates
Charlie George (charlie-george) · 2024-08-27T20:44:08.683Z · comments (7)

AI #77: A Few Upgrades
Zvi · 2024-08-20T00:20:09.717Z · comments (3)

[question] When can I be numerate?
FinalFormal2 · 2024-09-12T04:05:27.710Z · answers+comments (1)

August 2024 Time Tracking
jefftk (jkaufman) · 2024-08-24T13:50:04.676Z · comments (0)

Monthly Roundup #21: August 2024
Zvi · 2024-08-20T00:20:08.178Z · comments (6)

[link] Day Zero Antivirals for Future Pandemics
Niko_McCarty (niko-2) · 2024-08-26T15:18:33.858Z · comments (2)

[link] on Science Beakers and DDT
bhauth · 2024-09-05T03:21:21.382Z · comments (12)

Deception and Jailbreak Sequence: 1. Iterative Refinement Stages of Deception in LLMs
Winnie Yang (winnie-yang) · 2024-08-22T07:32:07.600Z · comments (0)

Interpretability as Compression: Reconsidering SAE Explanations of Neural Activations with MDL-SAEs
Kola Ayonrinde (kola-ayonrinde) · 2024-08-23T18:52:31.019Z · comments (2)

[link] Hyperpolation
Gunnar_Zarncke · 2024-09-15T21:37:00.002Z · comments (4)

AXRP Episode 35 - Peter Hase on LLM Beliefs and Easy-to-Hard Generalization
DanielFilan · 2024-08-24T22:30:02.039Z · comments (0)

A necessary Membrane formalism feature
ThomasCederborg · 2024-09-10T21:33:09.508Z · comments (6)

My decomposition of the alignment problem
Daniel C (harper-owen) · 2024-09-02T00:21:08.359Z · comments (22)

Simon DeDeo on Explore vs Exploit in Science
Elizabeth (pktechgirl) · 2024-09-10T03:40:08.311Z · comments (0)

[link] Anthropic is being sued for copying books to train Claude
Remmelt (remmelt-ellen) · 2024-08-31T02:57:27.092Z · comments (4)

Looking for Goal Representations in an RL Agent - Update Post
CatGoddess · 2024-08-28T16:42:19.367Z · comments (0)

Why Reflective Stability is Important
Johannes C. Mayer (johannes-c-mayer) · 2024-09-05T15:28:19.913Z · comments (2)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

johnswentworth on We Don't Know Our Own Values, but Reward Bridges The Is-Ought Gap

How accurate is the summary I have presented above?

Basically accurate.

Where do values, as opposed to beliefs-about-values, come from?

That is the right next question to ask. Humans have a map of their values, and can update that map in response to rewards in order to "learn about values", but still leaves the question of when/whether there's any "real values" which the map represents, and what kind-of-things those "real values" are.

A few parts of an answer:

"human values" are not one monolithic thing; we value lots of different stuff, and different parts of our value-estimates can separately represent "a real thing" or fail to represent "a real thing".
we don't yet understand what it means for part of our value-estimates to represent "a real thing", but it probably works pretty similarly to epistemic representation more generally - e.g. my belief about the position of the dog in my apartment represents a real thing (even if the position itself is wrong) exactly when there is in fact a dog in my apartment at all.

adam_scholl on Why I funded PIBBSS

Given both my personal experience with LLMs and my reading of the role that empirical engagement has historically played in non-paradigmatic research, I tend to advocate for a methodology which incorporates immediate feedback loops with present day deep learning systems over the classical "philosophy -> math -> engineering" deconfusion/agent foundations paradigm.

I'm curious what your read of the history is, here? My impression is that most important paradigm-forming work so far has involved empirical feedback somehow, but often in ways exceedingly dissimilar from/illegible to prevailing scientific and engineering practice.

I have a hard time imagining scientists like e.g. Darwin, Carnot, or Shannon describing their work as depending much on "immediate feedback loops with present day" systems. So I'm curious whether you think PIBBSS would admit researchers like these into your program, were they around and pursuing similar strategies today?

t3t on AI #82: The Governor Ponders

If you include Facebook & Google (i.e. the entire orgs) as "frontier AI companies", then 6-figures. If you only include Deepmind and FAIR (and OpenAI and Anthropic), maybe order of 10-15k, though who knows what turnover's been like. Rough current headcount estimates:

Deepmind: 2600 (as of May 2024, includes post-Brain-merge employees)

Meta AI (formerly FAIR): ~1200 (unreliable sources; seems plausible, but is probably an implicit undercount since they almost certainly rely a lot of various internal infrastructure used by all of Facebook's engineering departments that they'd otherwise need to build/manage themselves.)

OpenAI: >1700

Anthropic: >500 (as of May 2024)

So that's a floor of ~6k current employees.

jessica-liu-taylor on We Don't Know Our Own Values, but Reward Bridges The Is-Ought Gap

I discussed something similar in the "Human brains don't seem to neatly factorize" section of the Obliqueness [LW · GW] post. I think this implies that, even assuming the Orthogonality Thesis, humans don't have values that are orthogonal to human intelligence (they'd need to not respond to learning/reflection to be orthogonal in this fashion), so there's not a straightforward way to align ASI with human values by plugging in human values to more intelligence.

raemon on AI #82: The Governor Ponders

Over 125 current & former employees of frontier AI companies have called on @CAGovernor to #SignSB1047.

I know this is a political statement that isn't optimizing for such things, but, I am pretty interested in knowing "what actually is the denominator of people who meaningfully count as 'employees of frontier AI companies?". If the answer is 10s of thousands then, well, that is indeed a tiny number. But I think the number might be something more like 1000-3000?

bruce-schechter on The Lens That Sees Its Flaws

Well, yes. But scientists need to have optimism that their experiments will lead somewhere, entrepeneurs have to be optimistic about there projects (and I'm optimistic that this remark will not get me kicked off this site). Without optimism great projects would not be undertaken.

ozyrus on Ozyrus's Shortform

I don’t know if it’s a place for this, but at some point it became impossible to open an article in new tab from Chrome on IPhone - clicking on article title from “all posts” just opens the article. Really ruins my LW reading experience. Couldn’t quickly find a way to send this feedback to a right place either, so I guess this is a quick take now.

review-bot on [New LW Feature] "Debates"

The LessWrong Review [? · GW] runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2024. The top fifty or so posts are featured prominently on the site throughout the year.

Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?

bruce-schechter on The Lens That Sees Its Flaws

Isn't the 20th century's apparent low death toll from homicide and war just a matter of percentages? The absolute number of deaths from these things is much greater in the 20th century. I think the absolute number matters too.

aprilsr on We Don't Know Our Own Values, but Reward Bridges The Is-Ought Gap

This post definitely resolved some confusions for me. There are still a whole lot of philosophical issues, but it's very nice to have a clearer model of what's going on with the initial naïve conception of value.