LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Testing "True" Language Understanding in LLMs: A Simple Proposal
MtryaSam · 2024-11-02T19:12:34.710Z · comments (2)

[link] AISN #44: The Trump Circle on AI Safety Plus, Chinese researchers used Llama to create a military tool for the PLA, a Google AI system discovered a zero-day cybersecurity vulnerability, and Complex Systems
Corin Katzke (corin-katzke) · 2024-11-19T16:36:40.501Z · comments (0)

I Have A New Paper Out Arguing Against The Asymmetry And For The Existence of Happy People Being Very Good
omnizoid · 2024-11-21T17:21:41.426Z · comments (3)

Value/Utility: A History
Lorec · 2024-11-19T23:01:39.167Z · comments (0)

[question] Doing Nothing Utility Function
k64 · 2024-09-26T22:05:18.821Z · answers+comments (9)

Derivative AT a discontinuity
Alok Singh (OldManNick) · 2024-10-24T02:48:24.573Z · comments (5)

how to rapidly assimilate new information
dhruvmethi · 2024-10-24T02:18:00.648Z · comments (3)

Thinking About a Pedalboard
jefftk (jkaufman) · 2024-10-08T11:50:02.054Z · comments (2)

[link] Testing Genetic Engineering Detection with Spike-Ins
jefftk (jkaufman) · 2024-10-22T17:20:54.947Z · comments (0)

Toy Models of Superposition: Simplified by Hand
Axel Sorensen (axel-sorensen) · 2024-09-29T21:19:52.475Z · comments (3)

[link] Markets Are Information - Beating the Sportsbooks at Their Own Game
JJXW · 2024-11-07T20:58:43.389Z · comments (1)

[question] Are UV-C Air purifiers so useful?
JohnBuridan · 2024-09-04T14:16:01.310Z · answers+comments (0)

Rethinking Laplace's Rule of Succession
Cleo Nardo (strawberry calm) · 2024-11-22T18:46:25.156Z · comments (5)

Will AI and Humanity Go to War?
Simon Goldstein (simon-goldstein) · 2024-10-01T06:35:22.374Z · comments (4)

Open letter to young EAs
Leif Wenar · 2024-10-11T19:49:10.818Z · comments (10)

[link] Virtue is a Vector
robotelvis · 2024-09-10T03:02:45.737Z · comments (1)

[question] Is this a Pivotal Weak Act? Creating bacteria that decompose metal
doomyeser · 2024-09-11T18:07:19.385Z · answers+comments (9)

The Bayesian Conspiracy Live Recording
Eneasz · 2024-11-06T16:25:13.380Z · comments (0)

[link] Anthropic teams up with Palantir and AWS to sell AI to defense customers
Matrice Jacobine · 2024-11-09T11:50:34.050Z · comments (0)

Force Sequential Output with SCP?
jefftk (jkaufman) · 2024-11-09T12:40:06.098Z · comments (4)

Contra Musician Gender II
jefftk (jkaufman) · 2024-11-13T03:30:09.510Z · comments (0)

Keeping it (less than) real: Against ℶ₂ possible people or worlds
quiet_NaN · 2024-09-13T17:29:44.915Z · comments (0)

Rationalist Gnosticism
tailcalled · 2024-10-10T09:06:34.149Z · comments (10)

[link] Physics of Language models (part 2.1)
Nathan Helm-Burger (nathan-helm-burger) · 2024-09-19T16:48:32.301Z · comments (2)

The Other Existential Crisis
James Stephen Brown (james-brown) · 2024-09-21T01:16:38.011Z · comments (24)

Electric Mandola
jefftk (jkaufman) · 2024-09-21T13:40:04.772Z · comments (0)

[question] What are some good ways to form opinions on controversial subjects in the current and upcoming era?
notfnofn · 2024-10-27T14:33:53.960Z · answers+comments (21)

Becket First
jefftk (jkaufman) · 2024-09-22T17:10:04.304Z · comments (0)

Fractals to Quasiparticles
James Camacho (james-camacho) · 2024-11-26T20:19:29.675Z · comments (0)

[question] A Different Perspective on Rationality - Would This Be Valuable?
Gabriel Brito (gabriel-brito) · 2024-10-26T18:47:46.416Z · answers+comments (4)

[link] Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-11-26T09:58:44.025Z · comments (0)

[link] In Praise of the Beatitudes
robotelvis · 2024-09-24T05:08:21.133Z · comments (7)

Valence Need Not Be Bounded; Utility Need Not Synthesize
Lorec · 2024-11-20T01:37:20.911Z · comments (0)

New UChicago Rationality Group
Noah Birnbaum (daniel-birnbaum) · 2024-11-08T21:20:34.485Z · comments (0)

Quantum Immortality: A Perspective if AI Doomers are Probably Right
avturchin · 2024-11-07T16:06:08.106Z · comments (53)

[link] Contagious Beliefs—Simulating Political Alignment
James Stephen Brown (james-brown) · 2024-10-13T00:27:08.084Z · comments (0)

[link] Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities
Jonathan N (derpyplops) · 2024-11-05T01:01:08.083Z · comments (0)

HDBSCAN is Surprisingly Effective at Finding Interpretable Clusters of the SAE Decoder Matrix
Jaehyuk Lim (jason-l) · 2024-10-11T23:06:14.340Z · comments (2)

[link] Triangulating My Interpretation of Methods: Black Boxes by Marco J. Nathan
adamShimi · 2024-10-09T19:13:26.631Z · comments (0)

On Intentionality, or: Towards a More Inclusive Concept of Lying
Cornelius Dybdahl (Kalciphoz) · 2024-10-18T10:37:32.201Z · comments (0)

Importing Bluesky Comments
jefftk (jkaufman) · 2024-11-28T03:50:06.635Z · comments (0)

Reflections on ML4Good
james__p · 2024-11-25T02:40:32.586Z · comments (0)

MIT FutureTech are hiring for a Head of Operations role
peterslattery · 2024-10-02T17:11:42.960Z · comments (0)

Three main arguments that AI will save humans and one meta-argument
avturchin · 2024-10-02T11:39:08.910Z · comments (8)

Foresight Vision Weekend 2024
Allison Duettmann (allison-duettmann) · 2024-10-01T21:59:55.107Z · comments (0)

[link] AI Safety Newsletter #42: Newsom Vetoes SB 1047 Plus, OpenAI’s o1, and AI Governance Summary
Corin Katzke (corin-katzke) · 2024-10-01T20:35:32.399Z · comments (0)

[link] In-Context Learning: An Alignment Survey
alamerton · 2024-09-30T18:44:28.589Z · comments (0)

Thoughts On the Nature of Capability Elicitation via Fine-tuning
Theodore Chapman · 2024-10-15T08:39:19.909Z · comments (0)

[link] What is autonomy? Why boundaries are necessary.
Chipmonk · 2024-10-21T17:56:33.722Z · comments (1)

LLMs are likely not conscious
research_prime_space · 2024-09-29T20:57:26.111Z · comments (8)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

diziet on "Map of AI Futures" - An interactive flowchart

A bit of feedback: the "We get a second chance at building AGI" outcome should not be an outcome or perhaps rephrased.

tsvibt on Raemon's Shortform

The former doesn't necessarily imply the latter in general, because even if we are systematically underestimating the realistic upper bound for our skill level in these areas, we would still have to deal with diminishing marginal returns to investing in any particular one.

On the other hand, even if what you say is true, skill headroom may still imply that it's worth building shared arts around such skills. Shareability and build-on-ability changes the marginal returns a lot.

viliam on Alignment is not intelligent

The outcome depends on the details of the algorithm. Have you tried writing actual code?

If the code is literally "evaluate all options, choose the one that leads to more cups; if there is more than one such option, choose randomly", then the agent will choose randomly, because all options lead to the same amount of cups. That's what the algorithm literally says. Information like "at some moment the algorithm will change" has no impact on the predicted number of cups, which is literally the only thing the algorithm cares about.

When at midnight you delete this code, and upload a new code saying "evaluate all options, choose the one that leads to more paperclips; if there is more than one such option, choose randomly", the agent will start the factory (if it wasn't started already), because now that is what the code says.

The thing that you probably imagine, is that the agent has a variable called "utility" and chooses the option that leads to the highest predicted value in that variable. That is not the same as the agent that tried to maximize cups. This agent would be a variable-called-utility maximizer.

(Also, come on, LLMs are notoriously bad at math, plus if you push them hard enough you can convince them of a lot of things.)

tsvibt on Passages I Highlighted in The Letters of J.R.R.Tolkien

Philology is philosophy, because it lets you escape the trap of the language you were born with. Much like mathematics, humanity's most ambitious such escape attempt, still very much in its infancy.

True...

If you really want to express the truth about what you feel and see, you need to be inventing new languages. And if you want to preserve a culture, you must not lose its language.

I think this is a mistake, made by many. It's a retreat and an abdication. We are in our native language, so we should work from there.

jonas-hallgren on How to use bright light to improve your life.

This has worked great btw! Thank you for the tip, I consistently get more deep sleep and around 10% more sleep with higher average quality, it's really good!

tailcalled on Crosspost: Developing the middle ground on polarized topics

If we think of the quantified abilities as the logarithms of the true abilities, then taking the log has likely massively increased the correlations by bringing the outliers into the bulk of the distribution.

sid-kap on Why you should be using a retinoid

Are you afraid of dry eyes/meibomian gland dysfunction at all? It seems like it's pretty common as a side effect of retinoids.

nostalgebraist on jbco's Shortform

AFAIK the distinction is that:

When you condition on a particular outcome for , it affects your probabilities for every other variable that's causally related to $X$ , in either direction.
- You gain information about variables that are causally downstream from $X$ (its "effects"). Like, if you imagine setting $X = x$ and then "playing the tape forward," you'll see the sorts of events that tend to follow from $X = x$ and not those that tend to follow from some other outcome $X = x^{'}$ .
- And, you gain information about variables that are causally upstream from $X$ (its "causes"). If you know that $X = x$ , then the causes of $X$ must have "added up to" that outcome for $X$ . You can rule out any configuration of the causes that doesn't "add up to" causing $X = x$ , and that affects your probability distributions for all of these causative variables.
When you use the do-operator to set $X$ to a particular outcome for X, it only affects your probabilities for the "effects" of $X$ , not the "causes." (The first sub-bullet above, not the second.)

For example, suppose hypothetically that I cook dinner every evening. And this process consists of these steps in order:

" $W$ ": considering what ingredients I have in the house
" $X$ ": deciding on a particular meal to make, and cooking it
" $Y$ ": eating the food
" $Z$ ": taking a moment after the meal to take stock of the ingredients left in the kitchen

Some days I have lots of ingredients, and I prepare elaborate dinners. Other days I don't, and I make simple and easy dinners.

Now, suppose that on one particular evening, I am making instant ramen ( $X = m a k i n g i n s t a n t r a m e n$ ). We're given no other info about this evening, but we know this.

What can we conclude from this? A lot, it turns out:

In $Y$ , I'll be eating instant ramen, not something else.
In $W$ , I probably didn't have many ingredients in the house. Otherwise I would have made something more elaborate.
In $Z$ , I probably don't see many ingredients on the shelves (a result of what we know about $W$ ).

This is what happens when we condition on $X = m a k i n g i n s t a n t r a m e n$ .

If instead we apply the do-operator to $X = m a k i n g i n s t a n t r a m e n$ , then:

We learn nothing about $W$ , and from our POV it is still a sample from the original unconditional distribution for $W$ .
We can still conclude that I'll be eating ramen afterwards, in $Y$ .
We know very little about $Z$ (the post-meal ingredient survey) for the same reason we know nothing about $W$ .

Concretely, this models a situation where I first survey my ingredients like usual, and am then forced to make instant ramen by some force outside the universe (i.e. outside our W/X/Y/Z causal diagram).

And this is a useful concept, because we often want to know what would happen if we performed just such an intervention!

That is, we want to know whether it's a good idea to add a new cause to the diagram, forcing some variable to have values we think lead to good outcomes.

To understand what would happen in such an intervention, it's wrong to condition on the outcome using the original, unmodified diagram – if we did that, we'd draw conclusions like "forcing me to make instant ramen would cause me to see relatively few ingredients on the shelves later, after dinner."

johnswentworth on leogao's Shortform

I have heard people say this so many times, and it is consistently the opposite of my experience. The random spontaneous conversations at conferences are disproportionately shallow and tend toward the same things which have been discussed to death online already, or toward the things which seem simple enough that everyone thinks they have something to say on the topic. When doing an activity with friends, it's usually the activity which is novel and/or interesting, while the conversation tends to be shallow and playful and fun but not as substantive as the activity. At work, spontaneous conversations generally had little relevance to the actual things we were/are working on (there are some exceptions, but they're rarely as high-value as ordinary work).

donatas-luciunas on Alignment is not intelligent

ChatGPT picked 2024-12-31 18:00.

Gemini picked 2024-12-31 18:00.

Claude picked 2025-01-01 00:00.

I don't know how can I make it more obvious that your belief is questionable. I don't think you follow "If you disagree, try getting curious about what your partner is thinking". That's the problem not only with you, but with LessWrong community. I know that preserving such belief is very important for you. But I'd like to kindly invite you to be a bit more sceptical.

How can you say that these forecasts are equal?