LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Scattered thoughts on what it means for an LLM to believe
TheManxLoiner · 2024-11-06T22:10:29.429Z · comments (4)

Apply to be a mentor in SPAR!
agucova · 2024-11-05T21:32:45.797Z · comments (0)

Using Narrative Prompting to Extract Policy Forecasts from LLMs
Max Ghenis (MaxGhenis) · 2024-11-05T04:37:52.004Z · comments (0)

If I care about measure, choices have additional burden (+AI generated LW-comments)
avturchin · 2024-11-15T10:27:15.212Z · comments (11)

Project Adequate: Seeking Cofounders/Funders
Lorec · 2024-11-17T03:12:12.995Z · comments (7)

Educational CAI: Aligning a Language Model with Pedagogical Theories
Bharath Puranam (bharath-puranam) · 2024-11-01T18:55:26.993Z · comments (1)

[link] Is P(Doom) Meaningful? Bayesian vs. Popperian Epistemology Debate
Liron · 2024-11-09T23:39:30.039Z · comments (0)

Bellevue Library Meetup - Nov 23
Cedar (xida-ren) · 2024-11-09T23:05:02.452Z · comments (3)

Effects of Non-Uniform Sparsity on Superposition in Toy Models
Shreyans Jain (shreyans-jain) · 2024-11-14T16:59:43.234Z · comments (3)

Towards a Clever Hans Test: Unmasking Sentience Biases in Chatbot Interactions
glykokalyx · 2024-11-10T22:34:58.956Z · comments (0)

[question] Is OpenAI net negative for AI Safety?
Lysandre Terrisse · 2024-11-02T16:18:02.859Z · answers+comments (0)

Some Comments on Recent AI Safety Developments
testingthewaters · 2024-11-09T16:44:58.936Z · comments (0)

Ways to think about alignment
Abhimanyu Pallavi Sudhir (abhimanyu-pallavi-sudhir) · 2024-10-27T01:40:50.762Z · comments (0)

Germany-wide ACX Meetup
Fernand0 · 2024-11-17T10:08:54.584Z · comments (0)

[link] Entropic strategy in Two Truths and a Lie
dkl9 · 2024-11-21T22:03:28.986Z · comments (2)

[question] What (if anything) made your p(doom) go down in 2024?
Satron · 2024-11-16T16:46:43.865Z · answers+comments (6)

Visualizing small Attention-only Transformers
WCargo (Wcargo) · 2024-11-19T09:37:42.213Z · comments (0)

[question] Noticing the World
EvolutionByDesign (bioluminescent-darkness) · 2024-11-04T16:41:44.696Z · answers+comments (1)

[question] What are the primary drivers that caused selection pressure for intelligence in humans?
Towards_Keeperhood (Simon Skade) · 2024-11-07T09:40:20.275Z · answers+comments (15)

What are Emotions?
Myles H (zarsou9) · 2024-11-15T04:20:27.388Z · comments (13)

[question] How might language influence how an AI "thinks"?
bodry (plosique) · 2024-10-30T17:41:04.460Z · answers+comments (0)

LDT (and everything else) can be irrational
Christopher King (christopher-king) · 2024-11-06T04:05:36.932Z · comments (6)

(draft) Cyborg software should be open (?)
AtillaYasar (atillayasar) · 2024-11-01T07:24:51.966Z · comments (5)

San Francisco ACX Meetup “First Saturday”
Nate Sternberg (nate-sternberg) · 2024-10-28T05:05:36.757Z · comments (0)

Antonym Heads Predict Semantic Opposites in Language Models
Jake Ward (jake-ward) · 2024-11-15T15:32:14.102Z · comments (0)

Distributed espionage
margetmagenta · 2024-11-04T19:43:33.316Z · comments (0)

Reducing x-risk might be actively harmful
MountainPath · 2024-11-18T14:25:07.127Z · comments (5)

[link] Higher Order Signs, Hallucination and Schizophrenia
Nicolas Villarreal (nicolas-villarreal) · 2024-11-02T16:33:10.574Z · comments (0)

Beyond Gaussian: Language Model Representations and Distributions
Matt Levinson · 2024-11-24T01:53:38.156Z · comments (0)

Enabling New Applications with Today's Mechanistic Interpretability Toolkit
ananya_joshi · 2024-10-25T17:53:23.960Z · comments (0)

[link] Paradigm Shifts—change everything... except almost everything
James Stephen Brown (james-brown) · 2024-11-23T18:34:13.088Z · comments (0)

[link] Both-Sidesism—When Fair & Balanced Goes Wrong
James Stephen Brown (james-brown) · 2024-11-02T03:04:03.820Z · comments (15)

Your memory eventually drives confidence in each hypothesis to 1 or 0
Crazy philosopher (commissar Yarrick) · 2024-10-28T09:00:27.084Z · comments (6)

[link] AI Safety at the Frontier: Paper Highlights, October '24
gasteigerjo · 2024-10-31T00:09:33.522Z · comments (0)

Interview with Bill O’Rourke - Russian Corruption, Putin, Applied Ethics, and More
JohnGreer · 2024-10-27T17:11:28.891Z · comments (0)

[link] Some Preliminary Notes on the Promise of a Wisdom Explosion
Chris_Leong · 2024-10-31T09:21:11.623Z · comments (0)

Which AI Safety Benchmark Do We Need Most in 2025?
Loïc Cabannes (loic-cabannes) · 2024-11-17T23:50:56.337Z · comments (2)

Root node of my posts
AtillaYasar (atillayasar) · 2024-11-19T20:09:02.973Z · comments (0)

Gothenburg LW/ACX meetup
Stefan (stefan-1) · 2024-10-29T20:40:22.754Z · comments (0)

aspirational leadership
dhruvmethi · 2024-11-20T16:07:43.507Z · comments (0)

Breaking beliefs about saving the world
Oxidize · 2024-11-15T00:46:03.693Z · comments (3)

MIT FutureTech are hiring ‍a Product and Data Visualization Designer
peterslattery · 2024-11-13T14:48:06.167Z · comments (0)

[link] Sparks of Consciousness
Charlie Sanders (charlie-sanders) · 2024-11-13T04:58:27.222Z · comments (0)

Don't want Goodhart? — Specify the variables more
YanLyutnev (YanLutnev) · 2024-11-21T22:43:48.362Z · comments (2)

The boat
RomanS · 2024-11-22T12:56:45.050Z · comments (0)

Agenda Manipulation
Pazzaz · 2024-11-09T14:13:33.729Z · comments (0)

[question] Poll: what’s your impression of altruism?
David Gross (David_Gross) · 2024-11-09T20:28:15.418Z · answers+comments (4)

Truth Terminal: A reconstruction of events
crvr.fr (crdevio) · 2024-11-17T23:51:21.279Z · comments (1)

Modeling AI-driven occupational change over the next 10 years and beyond
2120eth · 2024-11-12T04:58:26.741Z · comments (0)

'Meta', 'mesa', and mountains
Lorec · 2024-10-31T17:25:53.635Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

directedevolution on A very strange probability paradox

I had to write several new Python versions of the code to explore the problem before it clicked for me.

I understand the proof, but the closest I can get to a true intuition that B is bigger is:

Imagine you just rolled your first 6, haven't rolled any odds yet, and then you roll a 2 or a 4.
In the consecutive-6 condition, it's quite unlikely you'll end up keeping this sequence, because you now still have to get two 6s before rolling any odds.
In the two-6 condition, you are much more likely to end up keeping this sequence, which is guaranteed to include at least one 2 or 4, and likely to include more than one before you roll that 6.

I think the main think I want to remember is that "given" or "conditional on X" means that you use the unconditional probability distribution and throw out results not conforming to X, not that you substitute a different generating function that always generates events conforming to X.

raemon on Announcing turntrout.com, my new digital home

It doesn't actually say one-way-or-another in the creation screen (in the chrome-extension tool at least). So, uh, let's see!

⚖ Other people will be able to see my prediction on this question before making a prediction (Raymond Arnold: 45%)

sean-aubin on Keeping Your Identity Small

Mars is locked due to the Santa Claus parade, so please gather at the Elizabeth St. entrance. We'll accumulate there until 14:15.

charlie-steiner on Yonatan Cale's Shortform

I do like the idea of having "model organisms of alignment" (notably different than model organisms of misalignment)

Minecraft is a great starting point, but it would also be nice to try to capture two things: wide generalization, and inter-preference conflict resolution. Generalization because we expect future AI to be able to take actions and reach outcomes that humans can't, and preference conflict resolution because I want to see an AI that uses human feedback on how best to do it (rather than just a fixed regularization algorithm).

ann-brown on Why I Think All The Species Of Significantly Debated Consciousness Are Conscious And Suffer Intensely

Why would they not also potentially feel just as relatively intense positive valence, and have positive utility by default? Just getting an estimate that one side of the equation for their experience exists doesn't tell you about the other.

dakara on Thoughts on “AI is easy to control” by Pope & Belrose

Wow, that seems really promising (thank you for the link!). I can envision one potential problem with the plan, though. It relies on the assumption that giving away 10% of the resources is the safest strategy for whoever controls AGI. But could it be that the group who controls AGI still lives in the "us vs them" mindset and decides that giving away 10% of the resources is actually a riskier strategy, because it would give the opposing side more resources to potentially take away the control over AGI?

lao-mein on Why I Think All The Species Of Significantly Debated Consciousness Are Conscious And Suffer Intensely

This is a good argument for the systematic extermination of all insects via gene drives. If you value shrimp at a significant fraction of the value of a human and think they have negative utility by default, we should be trying really hard to make them go extinct. Can quicker euthanasia really compete against gene-drive-induced non-existence?

sharmake-farah on Towards more cooperative AI safety strategies

I mostly agree with this post, but while I do think that the AI safety movement probably should try to at least be more cooperative with other movements, I disagree with the claim in the comments section that AI safety shouldn't try to pick a political fight in the future around open-source.

(I agree it probably picked that fight too early.)

The reason is that there's a non-trivial chance that alignment is plausibly solvable for human-level AI systems ala AI control, even if they are scheming, so long as the lab has control over the AIs, which as a corollary also means you can't open-source/open-weights the model.

More prosaically, AI misuse can be a problem, and the most important point here is that open-source/open-weighting the model widens the set of people who can change the AI, which unfortunately also means that there is a larger and larger chance for misuse with more people that know how to change the AI.

So I do think there's a non-trivial chance that AI safety eventually will have to suffer political costs to ban/severely restrict open-sourcing AI.

bogdan-ionut-cirstea on Thoughts on “AI is easy to control” by Pope & Belrose

I'm very uncertain and feel somewhat out of depth on this. I do have quite some hope though from arguments like those in https://aiprospects.substack.com/p/paretotopian-goal-alignment.

czynski on Rationality Quotes - Fall 2024

A man who is always asking ‘Is what I do worth while?’ and ‘Am I the right person to do it?’ will always be ineffective himself and a discouragement to others.

-- G.H. Hardy, A Mathematician's Apology