LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Was Partisanship Good for the Environmental Movement?
Jeffrey Heninger (jeffrey-heninger) · 2024-05-15T17:30:54.796Z · comments (0)

[link] Let's Design A School, Part 2.3 School as Education - The Curriculum (Phase 2, Specific)
Sable · 2024-05-15T20:58:50.981Z · comments (0)

Defense Against The Dark Arts: An Introduction
Lyrongolem (david-xiao) · 2023-12-25T06:36:06.278Z · comments (36)

[question] Would you have a baby in 2024?
martinkunev · 2023-12-25T01:52:04.358Z · answers+comments (76)

Distinctions when Discussing Utility Functions
ozziegooen · 2024-03-09T20:14:03.592Z · comments (7)

[link] Alignment work in anomalous worlds
Tamsin Leake (carado-1) · 2023-12-16T19:34:26.202Z · comments (4)

Anomalous Concept Detection for Detecting Hidden Cognition
Paul Colognese (paul-colognese) · 2024-03-04T16:52:52.568Z · comments (3)

Evolution did a surprising good job at aligning humans...to social status
Eli Tyre (elityre) · 2024-03-10T19:34:52.544Z · comments (37)

An evaluation of Helen Toner’s interview on the TED AI Show
PeterH · 2024-06-06T17:39:40.800Z · comments (2)

Weeping Agents
pleiotroth · 2024-06-06T12:18:54.978Z · comments (2)

[link] Truth is Universal: Robust Detection of Lies in LLMs
Lennart Buerger · 2024-07-19T14:07:25.162Z · comments (3)

[link] The absence of self-rejection is self-acceptance
Chipmonk · 2023-12-21T21:54:52.116Z · comments (1)

Building Trust in Strategic Settings
StrivingForLegibility · 2023-12-28T22:12:24.024Z · comments (0)

Best-of-n with misaligned reward models for Math reasoning
Fabien Roger (Fabien) · 2024-06-21T22:53:21.243Z · comments (0)

My Alignment "Plan": Avoid Strong Optimisation and Align Economy
VojtaKovarik · 2024-01-31T17:03:34.778Z · comments (9)

[link] Secret US natsec project with intel revealed
Nathan Helm-Burger (nathan-helm-burger) · 2024-05-25T04:22:11.624Z · comments (0)

Utility is not the selection target
tailcalled · 2023-11-04T22:48:20.713Z · comments (1)

[link] Clickbait Soapboxing
DaystarEld · 2024-03-13T14:09:29.890Z · comments (15)

aintelope project update
Gunnar_Zarncke · 2024-02-08T18:32:00.000Z · comments (2)

Population ethics and the value of variety
cousin_it · 2024-06-23T10:42:21.402Z · comments (11)

5 psychological reasons for dismissing x-risks from AGI
Igor Ivanov (igor-ivanov) · 2023-10-26T17:21:48.580Z · comments (6)

A brief review of China's AI industry and regulations
Elliot Mckernon (elliot) · 2024-03-14T12:19:00.775Z · comments (0)

Foresight Institute: 2023 Progress & 2024 Plans for funding beneficial technology development
Allison Duettmann (allison-duettmann) · 2023-11-22T22:09:16.956Z · comments (1)

[link] AI Alignment [Progress] this Week (11/05/2023)
Logan Zoellner (logan-zoellner) · 2023-11-07T13:26:21.995Z · comments (0)

[link] The Living Planet Index: A Case Study in Statistical Pitfalls
Jan_Kulveit · 2024-06-24T10:05:55.101Z · comments (0)

A conceptual precursor to today's language machines [Shannon]
Bill Benzon (bill-benzon) · 2023-11-15T13:50:51.226Z · comments (6)

[link] Compensating for Life Biases
Jonathan Moregård (JonathanMoregard) · 2024-01-09T14:39:14.229Z · comments (6)

[link] Eric Schmidt on recursive self-improvement
nikola (nikolaisalreadytaken) · 2023-11-05T19:05:15.416Z · comments (3)

A bet on critical periods in neural networks
kave · 2023-11-06T23:21:17.279Z · comments (1)

[question] Could there be "natural impact regularization" or "impact regularization by default"?
tailcalled · 2023-12-01T22:01:46.062Z · answers+comments (6)

Scientific Method
Andrij “Androniq” Ghorbunov (andrij-androniq-ghorbunov) · 2024-02-18T21:06:45.228Z · comments (4)

2. Premise two: Some cases of value change are (il)legitimate
Nora_Ammann · 2023-10-26T14:36:53.511Z · comments (7)

Seeking Mechanism Designer for Research into Internalizing Catastrophic Externalities
c.trout (ctrout) · 2024-09-11T15:09:48.019Z · comments (2)

[link] "25 Lessons from 25 Years of Marriage" by honorary rationalist Ferrett Steinmetz
CronoDAS · 2024-10-02T22:42:30.509Z · comments (2)

GPT-3.5 judges can supervise GPT-4o debaters in capability asymmetric debates
Charlie George (charlie-george) · 2024-08-27T20:44:08.683Z · comments (7)

AI Safety University Organizing: Early Takeaways from Thirteen Groups
agucova · 2024-10-02T15:14:00.137Z · comments (0)

[link] The unreasonable effectiveness of plasmid sequencing as a service
Abhishaike Mahajan (abhishaike-mahajan) · 2024-10-08T02:02:55.352Z · comments (0)

Whirlwind Tour of Chain of Thought Literature Relevant to Automating Alignment Research.
sevdeawesome · 2024-07-01T05:50:49.498Z · comments (0)

Rashomon - A newsbetting site
ideasthete · 2024-10-15T18:15:02.476Z · comments (8)

[link] A Defense of Peer Review
Niko_McCarty (niko-2) · 2024-10-22T16:16:49.982Z · comments (1)

Apply to the Cooperative AI PhD Fellowship by October 14th!
Lewis Hammond (lewis-hammond-1) · 2024-10-05T12:41:24.093Z · comments (0)

[link] The Offense-Defense Balance of Gene Drives
Maxwell Tabarrok (maxwell-tabarrok) · 2024-09-27T16:47:25.976Z · comments (1)

[link] Tokyo AI Safety 2025: Call For Papers
Blaine (blaine-rogers) · 2024-10-21T08:43:38.467Z · comments (0)

[link] Foundations - Why Britain has stagnated [crosspost]
Nathan Young · 2024-09-23T10:43:20.411Z · comments (1)

Improving Model-Written Evals for AI Safety Benchmarking
Sunishchal Dev (sunishchal-dev) · 2024-10-15T18:25:08.179Z · comments (0)

D&D.Sci Hypersphere Analysis Part 4: Fine-tuning and Wrapup
aphyer · 2024-01-18T03:06:39.344Z · comments (5)

[question] To what extent is the UK Government's recent AI Safety push entirely due to Rishi Sunak?
Stephen Fowler (LosPolloFowler) · 2023-10-27T03:29:28.465Z · answers+comments (4)

Bent or Blunt Hoods?
jefftk (jkaufman) · 2024-01-09T20:10:11.545Z · comments (0)

From the outside, American schooling is weird
Jacob G-W (g-w1) · 2024-03-28T22:45:30.485Z · comments (4)

[question] How much fraud is there in academia?
ChristianKl · 2023-11-16T11:50:41.544Z · answers+comments (10)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

jozdien on BIG-Bench Canary Contamination in GPT-4

It seems like such an obviously stupid thing to do that my priors aren't very high (though you're right in that they're slightly higher because it's OpenAI). I think it's telling however that neither Claude nor Gemini shy away from revealing the canary string.

hmys on BIG-Bench Canary Contamination in GPT-4

What is the probability they intentionally fine tuned to hide canary contamination?

Seems like an obviously very silly thing to do. But with things like the NDA, my priors on oai being deceptive to their own detriment is not that low.

I'm pretty sure it wouldn't forget the string.

jozdien on BIG-Bench Canary Contamination in GPT-4

This doesn't guarantee it, you're right. But the obvious way to filter canaried data out is to simply remove any document that has the string inside it. To further filter for articles that only talk about the canary instead of intending to use it seems like a needlessly expensive task, given how few such articles there would be.

Further, given that one use of the canary is to tell whether contamination has happened by checking if the model knows the string, not filtering such articles out would still not be great.

All that said however, I think the fact that GPT-4 had memorized BIG-Bench tasks - which certainly used the canary - means that the contamination could have happened from anything that had the canary in it.

jimrandomh on kryptoklob.io/misc/The+(False)+Accusation

Nope, that's more than enough. Caleb Ditchfield, you are seriously mentally ill, and your delusions are causing you to exhibit a pattern of unethical behavior. This is not a place where you will be able to find help or support with your mental illness. Based on skimming your Twitter history, I believe your mental illness is caused by (or exacerbated by) abusing Adderall.

You have already been banned from Lighthaven. I'm extending the ban to LW too.

gunnar_zarncke on [Intuitive self-models] 6. Awakening / Enlightenment / PNSE

After reading all the 2.6 and 3.3 sections again, I think the answer to why the homunculus is attention-grabbing is because it involves "continuous self-surprise" in the same way an animate object (mouse...) is. A surprise that is a present as a proprioceptive signal or felt sense. With PNSE, your brain has learned to predict the internal S(X) mental objects and this signal well enough that the remaining surprisingness of the mental processes would be more like the gears contraption from 3.3.2, where "the surprising feeling that I feel would be explained away by a different ingredient in my intuitive model of the situation—namely, my own unfamiliarity with all the gears inside the contraption and how they fit together." And as such, it is easier to tune out: The mind is doing its usual thing. Process as usual.

witheringweights on Information vs Assurance

Me: I dunno, probably around 9 pm. [At this point, I’ve merely offered some information; I think most people would not interpret this as an assurance, and would not blame me much if I show up to the party at 8:30 or 10:00 or even skip it altogether.]

Assuming the conversation doesn't delve further into this, if I were your friend I'd actually be very surprised if you didn't show up. The question 'At what time are you going?' assumes that you're going, however uncertain the details. If you wish to convey the idea of 'you might not see me at all' your answer should explicitly include 'but I might not go' because without that clause you're agreeing to attend, at least at some point.

To be clear, I agree with the gist of the piece. I just find it funny how even such a short convo could lead to a quite dramatic misunderstanding.

michael-roe on What is the alpha in one bit of evidence?

In any case, as a researcher currently working in this area, I am putting a big bet on moderate badness happening (in that I could be working on something else, and my time has value).

michael-roe on What is the alpha in one bit of evidence?

Also, there is counterparty risk if you bet on everyone dying.

(Yeah, yeah, you can bet on something like other peoples belief in the impednding apocalypse going up before it actually happens).

“Rapid takeoff” hypotheses are particularly hard to bet on.

sustrik on What's a good book for a technically-minded 11-year old?

Yes, I am seeing that as well. Technical/philosophical stuff is fine, but the psychology in adult fiction is too complex for an 11-years old to enjoy.

lalartu on The Personal Implications of AGI Realism

Cell line being immortal doesn't prove that immortal brain is possible any more than microbe strain being immortal.