LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] EPUBs of MIRI Blog Archives and selected LW Sequences
mesaoptimizer · 2023-10-26T14:17:11.538Z · comments (6)

[link] LLM Evaluators Recognize and Favor Their Own Generations
Arjun Panickssery (arjun-panickssery) · 2024-04-17T21:09:12.007Z · comments (1)

AI #38: Let’s Make a Deal
Zvi · 2023-11-16T19:50:05.442Z · comments (2)

Economics Roundup #3
Zvi · 2024-09-10T13:50:06.955Z · comments (9)

Which LessWrong/Alignment topics would you like to be tutored in? [Poll]
Ruby · 2024-09-19T01:35:02.999Z · comments (12)

How difficult is AI Alignment?
Sammy Martin (SDM) · 2024-09-13T15:47:10.799Z · comments (6)

Minimal Motivation of Natural Latents
johnswentworth · 2024-10-14T22:51:58.125Z · comments (14)

[link] rapid growth
Chipmonk · 2024-06-05T00:43:51.501Z · comments (0)

Incidental polysemanticity
Victor Lecomte (victor-lecomte) · 2023-11-15T04:00:00.000Z · comments (7)

New Executive Team & Board — PIBBSS
Nora_Ammann · 2024-07-01T19:30:45.261Z · comments (1)

2023 LessWrong Community Census, Request for Comments
Screwtape · 2023-11-01T16:32:19.102Z · comments (37)

Laying the Foundations for Vision and Multimodal Mechanistic Interpretability & Open Problems
Sonia Joseph (redhat) · 2024-03-13T17:09:17.027Z · comments (13)

Sci-Fi books micro-reviews
Yair Halberstadt (yair-halberstadt) · 2024-06-24T09:49:28.523Z · comments (27)

Childhood and Education Roundup #4
Zvi · 2024-01-30T13:50:06.033Z · comments (10)

Job Listing: Managing Editor / Writer
Gretta Duleba (gretta-duleba) · 2024-02-21T23:41:26.818Z · comments (2)

[question] Where is the Town Square?
Gretta Duleba (gretta-duleba) · 2024-02-13T03:53:18.205Z · answers+comments (8)

[link] Non-alignment project ideas for making transformative AI go well
Lukas Finnveden (Lanrian) · 2024-01-04T07:23:13.658Z · comments (1)

[link] cancer rates after gene therapy
bhauth · 2024-10-16T15:32:53.949Z · comments (0)

My intellectual journey to (dis)solve the hard problem of consciousness
Charbel-Raphaël (charbel-raphael-segerie) · 2024-04-06T09:32:41.612Z · comments (41)

Understanding Positional Features in Layer 0 SAEs
bilalchughtai (beelal) · 2024-07-29T09:36:40.701Z · comments (0)

Locating My Eyes (Part 3 of "The Sense of Physical Necessity")
LoganStrohl (BrienneYudkowsky) · 2024-02-29T03:09:25.810Z · comments (4)

The Case for Predictive Models
Rubi J. Hudson (Rubi) · 2024-04-03T18:22:20.243Z · comments (7)

Ambiguity in Prediction Market Resolution is Still Harmful
aphyer · 2024-07-31T20:32:40.217Z · comments (17)

[question] Does reducing the amount of RL for a given capability level make AI safer?
Chris_Leong · 2024-05-05T17:04:01.799Z · answers+comments (22)

Why does generalization work?
Martín Soto (martinsq) · 2024-02-20T17:51:10.424Z · comments (16)

Concrete empirical research projects in mechanistic anomaly detection
Erik Jenner (ejenner) · 2024-04-03T23:07:21.502Z · comments (3)

[link] How bad is chlorinated water?
bhauth · 2023-12-13T18:00:12.640Z · comments (18)

The need for multi-agent experiments
Martín Soto (martinsq) · 2024-08-01T17:14:16.590Z · comments (3)

The Next ChatGPT Moment: AI Avatars
kolmplex (luke-man) · 2024-01-05T20:14:10.074Z · comments (10)

[link] Why Georgism Lost Its Popularity
Zero Contradictions · 2024-07-20T15:08:41.469Z · comments (50)

Humanity isn't remotely longtermist, so arguments for AGI x-risk should focus on the near term
Seth Herd · 2024-08-12T18:10:56.543Z · comments (10)

[question] What rationality failure modes are there?
Ulisse Mini (ulisse-mini) · 2024-01-19T09:12:57.924Z · answers+comments (11)

[link] Surgery Works Well Without The FDA
Maxwell Tabarrok (maxwell-tabarrok) · 2024-01-26T13:31:29.968Z · comments (28)

Koan: divining alien datastructures from RAM activations
TsviBT · 2024-04-05T18:04:57.280Z · comments (10)

[link] you should probably eat oatmeal sometimes
bhauth · 2024-08-25T14:50:37.570Z · comments (31)

Unit economics of LLM APIs
dschwarz · 2024-08-27T16:51:22.692Z · comments (0)

Take SCIFs, it’s dangerous to go alone
latterframe · 2024-05-01T08:02:38.067Z · comments (1)

[link] Rowing vs steering
Saul Munn (saul-munn) · 2024-08-10T07:00:17.594Z · comments (2)

How I internalized my achievements to better deal with negative feelings
Raymond Koopmanschap · 2024-02-27T15:10:24.149Z · comments (7)

A Robust Natural Latent Over A Mixed Distribution Is Natural Over The Distributions Which Were Mixed
johnswentworth · 2024-08-22T19:19:28.940Z · comments (4)

MonoPoly Restricted Trust
ymeskhout · 2024-01-02T23:02:55.066Z · comments (37)

Debate: Get a college degree?
Ben Pace (Benito) · 2024-08-12T22:23:34.744Z · comments (14)

[link] Soviet comedy film recommendations
Nina Panickssery (NinaR) · 2024-06-09T23:40:58.536Z · comments (11)

Sherlockian Abduction Master List
Cole Wyeth (Amyr) · 2024-07-11T20:27:00.000Z · comments (60)

Trust as a bottleneck to growing teams quickly
benkuhn · 2024-07-13T18:00:04.579Z · comments (3)

Estimating efficiency improvements in LLM pre-training
Daan · 2024-01-19T19:32:45.124Z · comments (3)

[link] AI Girlfriends Won't Matter Much
Maxwell Tabarrok (maxwell-tabarrok) · 2023-12-23T15:58:30.308Z · comments (22)

[link] We Need Major, But Not Radical, FDA Reform
Maxwell Tabarrok (maxwell-tabarrok) · 2024-02-24T16:54:33.061Z · comments (12)

US Presidential Election: Tractability, Importance, and Urgency
kuhanj · 2024-05-29T23:52:22.420Z · comments (2)

Case studies on social-welfare-based standards in various industries
HoldenKarnofsky · 2024-06-20T13:33:44.780Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

randomwalks on What's a good book for a technically-minded 11-year old?

but only the dialogues?

actually, it probably needs a re-ordering. place the really terse stuff in an appendix, put the dialogues in the beginning, etc.

rogerdearnaley on The Mask Comes Off: At What Price?

Actual humans aren't "aligned" with each other, and they may not be consistent enough that you can say they're always "aligned" with themselves.

Completely agreed, see for example my post 3. Uploading [LW · GW] which makes this point at length.

Anyway, even if the approach did work, that would just mean that "its own ideas" were that it had to learn about and implement your (or somebody's?) values, and also that its ideas about how to do that are sound. You still have to get that right before the first time it becomes uncontrollable. One chance, no matter how you slice it.

simon on D&D Sci Coliseum: Arena of Data

Inspired by abstractapplic's machine learning and wanting to get some experience in julia, I got Claude (3.5 sonnet) to write me an XGBoost implementation in julia. Took a long time especially with some bugfixing (took a long time to find that a feature matrix was the wrong shape - a problem with insufficient type explicitness, I think). Still way way faster than doing it myself! Not sure I'm learning all that much julia, but am learning how to get Claude to write it for me, I hope.

Anyway, I used a simple model that

only takes into account 8 * sign(speed difference) + power difference, as in the comment this is a reply to

and a full model that

takes into account all the available features including the base data, the number the simple model uses, and intermediate steps in the calculation of that number (that would be, iirc: power (for each), speed (for each), speed difference, power difference, sign(speed difference))

Results:

Rank 1
Full model scores: Red: 94.0%, Black: 94.9%
Combined full model score: 94.4%
Simple model scores: Red: 94.3%, Black: 94.6%
Combined simple model score: 94.5%

Matchups:
Varina Dourstone (+0 boots, +3 gauntlets) vs House Cadagal Champion
Willow Brown (+3 boots, +0 gauntlets) vs House Adelon Champion
Xerxes III of Calantha (+2 boots, +2 gauntlets) vs House Deepwrack Champion
Zelaya Sunwalker (+1 boots, +1 gauntlets) vs House Bauchard Champion

This is the top scoring scoring result with either the simplified model or the full model. It was found by a full search of every valid item and hero combination available against the house champions.

It is also my previously posted, found w/o machine learning, proposal for the solution. Which is reassuring. (Though, I suppose there is some chance that my feeding the models this predictor, if it's good enough, might make them glom on to it while they don't find some hard-to learn additional pattern.)

My theory though is that giving the models the useful metric mostly just helps them - they don't need to learn the metric from the data, and I mostly think that if there was a significant additional pattern the full model would do better.

(for Cadagal, I haven't changed the champion's boots to +4, though I don't expect that to make a significant difference)

As far as I can tell the full model doesn't do significantly better and does worse in some ways (though, I don't know much about how to evaluate this, and Claude's metrics, including a test set log loss of 0.2527 for the full model and 0.2511 for the simple model, are for a separately generated version which I am not all that confident are actually the same models, though they "should be" up to the restricted training set if Claude was doing it right).

But the red/black variations seen below for the full model seem likely to me (given my prior that red and black are likely to be symmetrical) to be an indication that what the full model is finding that isn't in the full model is at least partially overfitting. Though actually, if it's overfitting a lot, maybe it's surprising that the test set log loss wouldn't be a lot worse than found (though it is at least worse than the simple model)? Hmm - what if there are actual red/black difference? (something to look into perhaps, as well as try to duplicate abstractapplic's report regarding sign(speed difference) not exhausting the benefits of speed info ... but for now I'm more likely to leave the machine learning aside and switch to looking at distributions of gladiator characteristics, I think.)

Predictions for individual matchups for my and abstractapplic's solutions:

My matchups:

Varina Dourstone (+0 boots, +3 gauntlets) vs House Cadagal Champion (+2 boots, +3 gauntlets)
Full Model: Red: 91.1%, Black: 96.7%
Simple Model: Red: 94.3%, Black: 94.6%

Willow Brown (+3 boots, +0 gauntlets) vs House Adelon Champion (+3 boots, +1 gauntlets)
Full Model: Red: 94.3%, Black: 95.1%
Simple Model: Red: 94.3%, Black: 94.6%

Xerxes III of Calantha (+2 boots, +2 gauntlets) vs House Deepwrack Champion (+3 boots, +2 gauntlets)
Full Model: Red: 95.2%, Black: 93.7%
Simple Model: Red: 94.3%, Black: 94.6%

Zelaya Sunwalker (+1 boots, +1 gauntlets) vs House Bauchard Champion (+3 boots, +2 gauntlets)
Full Model: Red: 95.3%, Black: 93.9%
Simple Model: Red: 94.3%, Black: 94.6%

(all my matchups have 4 effective power difference in my favour as noted in an above comment)

abstractapplic's matchups:

Matchup 1:
Uzben Grimblade (+3 boots, +0 gauntlets) vs House Adelon Champion (+3 boots, +1 gauntlets)

Win Probabilities:
Full Model: Red: 72.1%, Black: 62.8%
Simple Model: Red: 65.4%, Black: 65.7%

Stats:
Speed: 18 vs 14 (diff: 4)
Power: 11 vs 18 (diff: -7)
Effective Power Difference: 1
--------------------------------------------------------------------------------

Matchup 2:
Xerxes III of Calantha (+2 boots, +1 gauntlets) vs House Bauchard Champion (+3 boots, +2 gauntlets)

Win Probabilities:
Full Model: Red: 46.6%, Black: 43.9%
Simple Model: Red: 49.4%, Black: 50.6%

Stats:
Speed: 16 vs 12 (diff: 4)
Power: 13 vs 21 (diff: -8)
Effective Power Difference: 0
--------------------------------------------------------------------------------

Matchup 3:
Varina Dourstone (+0 boots, +3 gauntlets) vs House Cadagal Champion (+2 boots, +3 gauntlets)

Win Probabilities:
Full Model: Red: 91.1%, Black: 96.7%
Simple Model: Red: 94.3%, Black: 94.6%

Stats:
Speed: 7 vs 25 (diff: -18)
Power: 22 vs 10 (diff: 12)
Effective Power Difference: 4
--------------------------------------------------------------------------------

Matchup 4:
Yalathinel Leafstrider (+1 boots, +2 gauntlets) vs House Deepwrack Champion (+3 boots, +2 gauntlets)

Win Probabilities:
Full Model: Red: 35.7%, Black: 39.4%
Simple Model: Red: 34.3%, Black: 34.6%

Stats:
Speed: 20 vs 15 (diff: 5)
Power: 9 vs 18 (diff: -9)
Effective Power Difference: -1
--------------------------------------------------------------------------------

Overall Statistics:
Full Model Average: Red: 61.4%, Black: 60.7%
Simple Model Average: Red: 60.9%, Black: 61.4%

joneedssleep on JoNeedsSleep's Shortform

The distinction between inner and outer alignment is quite unnatural. For example, even the concept of reward hacking implies the double-fold failure of a reward that is not robust enough to exploitation, and a model that develops instrumental capabilities as to find a way to trick the reward; indeed, in the case of reward hacking, it's worth noting that depending on the autonomy of the system in question, we could attribute the misalignment as inner or outer. At its core, this distinction comes out of the policy <-> reward scheme of RL, though prediction <-> loss function in SL can be similarly characterized; I doubt how well this framing generalizes to other engineering choices.

john-huang on Could randomly choosing people to serve as representatives lead to better government?

>Do you think there is a situation where selected random people do not want to be in office/leadership and want to pursue their own passion/career and thus due to this reason may do a bad job? Is this mandatory?

I think a robust way to design the assembly (or multiple assemblies like with Bouricius's model) is to have many different people serving different term lengths. Some people may serve a term of only a couple days or weeks. Others might serve for years.

For short-term service, I would make that mandatory. Everyone is required to come.

For long term service, maybe those should be voluntary.

As far as incentives go, there's a range of enforcement options for "mandatory" service. Perhaps you can just pay a big fine, as a percentage of your income, as an alternative to service. There probably ought to be mechanisms to defer service so you can time things a bit better with your life circumstances.

The typical Citizens' Assembly will also offer benefits such as child care, parental care.

A high paying salary will encourage the lower and middle class to participate.

I have trouble coming up with ways to help small business owners to participate though. Could a small business owner drop their work for an entire year, even if it was well paid -- especially if the small business is so small there are no managers to cover their role? Perhaps there could be alternatives for them, such as part time work coupled with work-from-home.

>What are some nuances about population and diversity? (I am not sure yet)

I have yet to hear about a case where Deliberative decision making techniques were tried and failed due to excessive diversity or cultural factors. I'm not an expert on the latest and greatest research here so I may be wrong. I do know that deliberation experiments have been performed all around the world, including East Asia, Africa, and India.

An example deliberative poll was performed in Uganda, paper linked here:

https://direct.mit.edu/daed/article/146/3/140/27163/Applying-Deliberative-Democracy-in-Africa-Uganda-s

I haven't fully read this yet. Note that James Fishkin is the guy that performs and advocates for these "deliberative polls".

simon on D&D Sci Coliseum: Arena of Data

Very interesting, this would certainly cast doubt on

my simplified model

But so far I haven't been noticing

any affects not accounted for by it.

After reading your comments I've been getting Claude to write up an XGBoost implementation for me, I should have made this reply comment when I started, but will post my results under my own comment chain.

I have not (but should) try to duplicate (or fail to do so) your findings - I haven't been quite testing the same thing.

george-ingebretsen on Overcoming Bias Anthology

How difficult would it be to turn this into an epub or pdf? Is there word of that coming soon?

jozdien on BIG-Bench Canary Contamination in GPT-4

Thanks for testing this! This is very interesting.

I'm still operating under the assumption that it does know the string, and is hallucinating, or that it can't retrieve it under these conditions. But slightly alarming that it apologises for lying, then lies again.

Agreed. Seems plausible that it does recognize the string as the BIG-Bench canary (though on questioning it's very evasive, so I wouldn't say it's extremely likely). I'm leaning toward it having "forgotten" the actual string though ("forgotten" because it's plausible it can be extracted with the right prompt, but there isn't a good word for describing a larger elicitation gap).

jozdien on BIG-Bench Canary Contamination in GPT-4

Maybe like 10%?

benito on Why I quit effective altruism, and why Timothy Telleen-Lawton is staying (for now)

Mm, I think if the question is "what accounts for the differences between the EA and rationalist movements today, wrt number of adherents, reputation, amount of influence, achievements" I would assign credit in the ratio of ~1:3 to differences in (values held by individuals):systems. Where systems are roughly: how the organizations are set up, how funding and information flows through the ecosystem.

I can think about that question if it seems relevant, but the initial claim of Elizabeth's was "I believe there are ways to recruit college students responsibly. I don't believe the way EA is doing it really has a chance to be responsible". So I was trying to give an account of the root cause there.

Also — and I recognize that I'm saying something relatively trivial here — the root cause of a problem in a system can of course be any seemingly minor part of it. Just because I'm saying one part of the system is causing problems (the culture's values) doesn't mean I'm saying that's what's primarily responsible for the output. The current cause of a software company's current problems might be the slow speed with which PR reviews are happening, but this shouldn't be mistaken for the claim that the credit allocation for the company's success is primarily that it can do PR reviews fast.

So to repeat, I'm saying that IMO the root cause of irresponsible movement growth and ponzi-scheme-like recruitment strategies was a lack of IMO very important values like dialogue and candor and respecting other people's sense-making and courage and so on, rather than an explanation more like 'those doing recruitment had poor feedback loops so had a hard time knowing what tradeoffs to make' (my paraphrase of your suggestion).

I would have to think harder about which specific values I believe caused this particular issue, but that's my broad point.