LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

A list of all the deadlines in Biden's Executive Order on AI
Valentin Baltadzhiev (valentin-baltadzhiev) · 2023-11-01T17:14:31.074Z · comments (2)

[link] Attention on AI X-Risk Likely Hasn't Distracted from Current Harms from AI
Erich_Grunewald · 2023-12-21T17:24:16.713Z · comments (2)

Three Types of Constraints in the Space of Agents
Nora_Ammann · 2024-01-15T17:27:27.560Z · comments (3)

Geometric Utilitarianism (And Why It Matters)
StrivingForLegibility · 2024-05-12T03:41:21.342Z · comments (2)

Taking Into Account Sentient Non-Humans in AI Ambitious Value Learning: Sentientist Coherent Extrapolated Volition
Adrià Moret (Adrià R. Moret) · 2023-12-02T14:07:29.992Z · comments (31)

Exploring OpenAI's Latent Directions: Tests, Observations, and Poking Around
Johnny Lin (hijohnnylin) · 2024-01-31T06:01:27.969Z · comments (4)

Consequentialism is a compass, not a judge
Neil (neil-warren) · 2024-04-13T10:47:44.980Z · comments (6)

Agent membranes/boundaries and formalizing “safety”
Chipmonk · 2024-01-03T17:55:21.018Z · comments (46)

AI #57: All the AI News That’s Fit to Print
Zvi · 2024-03-28T11:40:05.435Z · comments (14)

Bayesian inference without priors
DanielFilan · 2024-04-24T23:50:08.312Z · comments (8)

How do LLMs give truthful answers? A discussion of LLM vs. human reasoning, ensembles & parrots
Owain_Evans · 2024-03-28T02:34:21.799Z · comments (0)

[link] Agreeing With Stalin in Ways That Exhibit Generally Rationalist Principles
Zack_M_Davis · 2024-03-02T22:05:49.553Z · comments (22)

[link] Emotional issues often have an immediate payoff
Chipmonk · 2024-06-10T23:39:40.697Z · comments (2)

Quick takes on "AI is easy to control"
So8res · 2023-12-02T22:31:45.683Z · comments (49)

[link] Can a Bayesian Oracle Prevent Harm from an Agent? (Bengio et al. 2024)
mattmacdermott · 2024-09-01T07:46:26.647Z · comments (0)

SAE features for refusal and sycophancy steering vectors
neverix · 2024-10-12T14:54:48.022Z · comments (4)

[question] Seeking AI Alignment Tutor/Advisor: $100–150/hr
MrThink (ViktorThink) · 2024-10-05T21:28:16.491Z · answers+comments (3)

Do Sparse Autoencoders (SAEs) transfer across base and finetuned language models?
Taras Kutsyk · 2024-09-29T19:37:30.465Z · comments (7)

Open Thread Fall 2024
habryka (habryka4) · 2024-10-05T22:28:50.398Z · comments (69)

LessWrong email subscriptions?
Raemon · 2024-08-27T21:59:56.855Z · comments (6)

5 ways to improve CoT faithfulness
CBiddulph (caleb-biddulph) · 2024-10-05T20:17:12.637Z · comments (8)

Fun With The Tabula Muris (Senis)
sarahconstantin · 2024-09-20T18:20:01.901Z · comments (0)

AXRP Episode 36 - Adam Shai and Paul Riechers on Computational Mechanics
DanielFilan · 2024-09-29T05:50:02.531Z · comments (0)

Sleeping on Stage
jefftk (jkaufman) · 2024-10-22T00:50:07.994Z · comments (3)

[link] A primer on the next generation of antibodies
Abhishaike Mahajan (abhishaike-mahajan) · 2024-09-01T22:37:59.207Z · comments (0)

[link] Introduction to Super Powers (for kids!)
Shoshannah Tekofsky (DarkSym) · 2024-09-20T17:17:27.070Z · comments (0)

[link] SB 1047 gets vetoed
ryan_b · 2024-09-30T15:49:38.609Z · comments (1)

A Triple Decker for Elfland
jefftk (jkaufman) · 2024-10-11T01:50:02.332Z · comments (0)

[link] Conventional footnotes considered harmful
dkl9 · 2024-10-01T14:54:01.732Z · comments (16)

[link] Fictional parasites very different from our own
Abhishaike Mahajan (abhishaike-mahajan) · 2024-09-08T14:59:39.080Z · comments (0)

You're Playing a Rough Game
jefftk (jkaufman) · 2024-10-17T19:20:06.251Z · comments (2)

[question] When can I be numerate?
FinalFormal2 · 2024-09-12T04:05:27.710Z · answers+comments (3)

The case for more Alignment Target Analysis (ATA)
Chi Nguyen · 2024-09-20T01:14:41.411Z · comments (13)

Clipboard Filtering
jefftk (jkaufman) · 2024-04-14T20:50:02.256Z · comments (1)

Improving SAE's by Sqrt()-ing L1 & Removing Lowest Activating Features
Logan Riggs (elriggs) · 2024-03-15T16:30:00.744Z · comments (5)

Testing for consequence-blindness in LLMs using the HI-ADS unit test.
David Scott Krueger (formerly: capybaralet) (capybaralet) · 2023-11-24T23:35:29.560Z · comments (2)

[link] OpenAI Superalignment: Weak-to-strong generalization
Dalmert · 2023-12-14T19:47:24.347Z · comments (3)

Beta Tester Request: Rallypoint Bounties
lukemarks (marc/er) · 2024-05-25T09:11:11.446Z · comments (4)

Changing Contra Dialects
jefftk (jkaufman) · 2023-10-26T17:30:10.387Z · comments (2)

[link] An Intuitive Explanation of Sparse Autoencoders for Mechanistic Interpretability of LLMs
Adam Karvonen (karvonenadam) · 2024-06-25T15:57:16.872Z · comments (0)

[link] Executive Dysfunction 101
DaystarEld · 2024-05-23T12:43:13.785Z · comments (1)

[question] What ML gears do you like?
Ulisse Mini (ulisse-mini) · 2023-11-11T19:10:11.964Z · answers+comments (4)

[question] Impressions from base-GPT-4?
mishka · 2023-11-08T05:43:23.001Z · answers+comments (25)

[link] Arrogance and People Pleasing
Jonathan Moregård (JonathanMoregard) · 2024-02-06T18:43:09.120Z · comments (7)

Decent plan prize winner & highlights
lukehmiles (lcmgcd) · 2024-01-19T23:30:34.242Z · comments (2)

Useful starting code for interpretability
eggsyntax · 2024-02-13T23:13:47.940Z · comments (2)

The Drowning Child
Tomás B. (Bjartur Tómas) · 2023-10-22T16:39:53.016Z · comments (8)

$250K in Prizes: SafeBench Competition Announcement
ozhang (oliver-zhang) · 2024-04-03T22:07:41.171Z · comments (0)

Control Symmetry: why we might want to start investigating asymmetric alignment interventions
domenicrosati · 2023-11-11T17:27:10.636Z · comments (1)

Economics Roundup #1
Zvi · 2024-03-26T14:00:06.332Z · comments (4)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

john-huang on Could randomly choosing people to serve as representatives lead to better government?

>Do you think there is a situation where selected random people do not want to be in office/leadership and want to pursue their own passion/career and thus due to this reason may do a bad job? Is this mandatory?

I think a robust way to design the assembly (or multiple assemblies like with Bouricius's model) is to have many different people serving different term lengths. Some people may serve a term of only a couple days or weeks. Others might serve for years.

For short-term service, I would make that mandatory. Everyone is required to come.

For long term service, maybe those should be voluntary.

As far as incentives go, there's a range of enforcement options for "mandatory" service. Perhaps you can just pay a big fine, as a percentage of your income, as an alternative to service. There probably ought to be mechanisms to defer service so you can time things a bit better with your life circumstances.

The typical Citizens' Assembly will also offer benefits such as child care, parental care.

A high paying salary will encourage the lower and middle class to participate.

I have trouble coming up with ways to help small business owners to participate though. Could a small business owner drop their work for an entire year, even if it was well paid -- especially if the small business is so small there are no managers to cover their role? Perhaps there could be alternatives for them, such as part time work coupled with work-from-home.

>What are some nuances about population and diversity? (I am not sure yet)

I have yet to hear about a case where Deliberative decision making techniques were tried and failed due to excessive diversity or cultural factors. I'm not an expert on the latest and greatest research here so I may be wrong. I do know that deliberation experiments have been performed all around the world, including East Asia, Africa, and India.

An example deliberative poll was performed in Uganda, paper linked here:

https://direct.mit.edu/daed/article/146/3/140/27163/Applying-Deliberative-Democracy-in-Africa-Uganda-s

I haven't fully read this yet. Note that James Fishkin is the guy that performs and advocates for these "deliberative polls".

simon on D&D Sci Coliseum: Arena of Data

Very interesting, this would certainly cast doubt on

my simplified model

But so far I haven't been noticing

any affects not accounted for by it.

After reading your comments I've been getting Claude to write up an XGBoost implementation for me, I should have made this reply comment when I started, but will post my results under my own comment chain.

I have not (but should) try to duplicate (or fail to do so) your findings - I haven't been quite testing the same thing.

george-ingebretsen on Overcoming Bias Anthology

How difficult would it be to turn this into an epub or pdf? Is there word of that coming soon?

jozdien on BIG-Bench Canary Contamination in GPT-4

Thanks for testing this! This is very interesting.

I'm still operating under the assumption that it does know the string, and is hallucinating, or that it can't retrieve it under these conditions. But slightly alarming that it apologises for lying, then lies again.

Agreed. Seems plausible that it does recognize the string as the BIG-Bench canary (though on questioning it's very evasive, so I wouldn't say it's extremely likely). I'm leaning toward it having "forgotten" the actual string though ("forgotten" because it's plausible it can be extracted with the right prompt, but there isn't a good word for describing a larger elicitation gap).

jozdien on BIG-Bench Canary Contamination in GPT-4

Maybe like 10%?

benito on Why I quit effective altruism, and why Timothy Telleen-Lawton is staying (for now)

Mm, I think if the question is "what accounts for the differences between the EA and rationalist movements today, wrt number of adherents, reputation, amount of influence, achievements" I would assign credit in the ratio of ~1:3 to differences in (values held by individuals):systems. Where systems are roughly: how the organizations are set up, how funding and information flows through the ecosystem.

I can think about that question if it seems relevant, but the initial claim of Elizabeth's was "I believe there are ways to recruit college students responsibly. I don't believe the way EA is doing it really has a chance to be responsible". So I was trying to give an account of the root cause there.

Also — and I recognize that I'm saying something relatively trivial here — the root cause of a problem in a system can of course be any seemingly minor part of it. Just because I'm saying one part of the system is causing problems (the culture's values) doesn't mean I'm saying that's what's primarily responsible for the output. The current cause of a software company's current problems might be the slow speed with which PR reviews are happening, but this shouldn't be mistaken for the claim that the credit allocation for the company's success is primarily that it can do PR reviews fast.

So to repeat, I'm saying that IMO the root cause of irresponsible movement growth and ponzi-scheme-like recruitment strategies was a lack of IMO very important values like dialogue and candor and respecting other people's sense-making and courage and so on, rather than an explanation more like 'those doing recruitment had poor feedback loops so had a hard time knowing what tradeoffs to make' (my paraphrase of your suggestion).

I would have to think harder about which specific values I believe caused this particular issue, but that's my broad point.

shardphoenix on Could randomly choosing people to serve as representatives lead to better government?

This sounds like democracy-washing rule by unaccountable "experts".

steve2152 on Self-prediction acts as an emergent regularizer

One common question is whether the self-modeling task, which involves predicting a layer's own activations, would cause the network to merely learn the identity function. Intuitively, this might seem like an optimal outcome for minimizing the self-modeling loss.

I found this section confusing. If the identity function is the global optimum for self-modeling loss, isn’t it kinda surprising that training doesn’t converge to the identity function? Or does the identity function make it worse at the primary task? If so, why?

[I’m sure this is going to be wrong in some embarrassing way, but what the heck… What I’m imagining right now is as follows. There’s an N×1 activation vector in the second-to-last layer of the DNN, and then a M×N weight matrix constituting the linear transformation, and you multiply them to get a M×1 output layer of the DNN. The first M-N entries of that output layer are the “primary task” outputs, and the bottom N entries are the “self-modeling” outputs, which are compared to the earlier N×1 activation vector mentioned above. And when you’re talking about “identity matrix”, you actually mean that the bottom N×N block of the weight matrix is close to an identity matrix but evidently not quite. (Oops I’m leaving out the bias vector, oh well.) If I’m right so far, then it wouldn’t be the case that the identity matrix makes the thing worse at the primary task, because the top (M-N)×N block of the weight matrix can still be anything. Where am I going wrong?]

tsvibt on johnswentworth's Shortform

I didn't read this carefully--but it's largely irrelevant. Adult editing probably can't have very large effects because developmental windows have passed; but either way the core difficulty is in editor delivery. Germline engineering does not require better gene targets--the ones we already have are enough to go as far as we want. The core difficulty there is taking a stem cell and making it epigenomically competent to make a baby (i.e. make it like a natural gamete or zygote).

austin-chen on Why I quit effective altruism, and why Timothy Telleen-Lawton is staying (for now)

Insofar as you're thinking I said bad people, please don't let yourself make that mistake, I said bad values.

I appreciate you drawing the distinction! The bit about "bad people" was more directed at Tsvi, or possibly the voters who agreevoted with Tsvi.

There's a lot of massively impactful difference in culture and values

Mm, I think if the question is "what accounts for the differences between the EA and rationalist movements today, wrt number of adherents, reputation, amount of influence, achievements" I would assign credit in the ratio of ~1:3 to differences in (values held by individuals):systems. Where systems are roughly: how the organizations are set up, how funding and information flows through the ecosystem.

(As I write this, I realize that maybe even caring about adherents/reputation/influence/achievement in the first place is an impact-based, EA-frame, and the thing that Ben cares about is more like "what accounts for the differences in their philosophies or gestalt of what it feels like to be in the movement"; I feel like I'm lowkey failing an ITT here...)