LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Two new datasets for evaluating political sycophancy in LLMs
alma.liezenga · 2024-09-28T18:29:49.088Z · comments (0)

[link] Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities
Jonathan N (derpyplops) · 2024-11-05T01:01:08.083Z · comments (0)

[link] Nerdtrition: simple diets via spreadsheet abuse
dkl9 · 2024-10-27T21:45:15.117Z · comments (0)

[link] Universal dimensions of visual representation
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-08-28T10:38:58.396Z · comments (0)

[question] Does a time-reversible physical law/Cellular Automaton always imply the First Law of Thermodynamics?
Noosphere89 (sharmake-farah) · 2024-08-30T15:12:28.823Z · answers+comments (11)

Join my new subscriber chat
sarahconstantin · 2024-11-06T02:30:11.059Z · comments (0)

[link] Checking public figures on whether they "answered the question" quick analysis from Harris/Trump debate, and a proposal
david reinstein (david-reinstein) · 2024-09-11T20:25:27.845Z · comments (4)

[link] Consciousness As Recursive Reflections
Gunnar_Zarncke · 2024-10-05T20:00:53.053Z · comments (3)

Fake Blog Posts as a Problem Solving Device
silentbob · 2024-08-31T09:22:54.513Z · comments (0)

[question] What actual bad outcome has "ethics-based" RLHF AI Alignment already prevented?
Roko · 2024-10-19T06:11:12.602Z · answers+comments (16)

[link] October 2024 Progress in Guaranteed Safe AI
Quinn (quinn-dougherty) · 2024-10-28T23:34:51.689Z · comments (0)

Enhancing Mathematical Modeling with LLMs: Goals, Challenges, and Evaluations
ozziegooen · 2024-10-28T21:44:42.352Z · comments (0)

Moral Trade, Impact Distributions and Large Worlds
Larks · 2024-09-20T03:45:56.273Z · comments (0)

Quantitative Trading Bootcamp [Nov 6-10]
Ricki Heicklen (bayesshammai) · 2024-10-28T18:39:58.480Z · comments (0)

One person's worth of mental energy for AI doom aversion jobs. What should I do?
Lorec · 2024-08-26T01:29:01.700Z · comments (16)

Piling bounded arguments
momom2 (amaury-lorin) · 2024-09-19T22:27:41.534Z · comments (0)

[question] somebody explain the word "epistemic" to me
KvmanThinking (avery-liu) · 2024-10-28T16:40:24.275Z · answers+comments (8)

Sequence overview: Welfare and moral weights
MichaelStJules · 2024-08-15T04:22:32.567Z · comments (0)

Funding for programs and events on global catastrophic risk, effective altruism, and other topics
abergal · 2024-08-14T23:59:48.146Z · comments (0)

The Personal Implications of AGI Realism
xizneb · 2024-10-20T16:43:37.870Z · comments (7)

Prediction markets and Taxes
Edmund Nelson (edmund-nelson) · 2024-11-01T17:39:35.191Z · comments (7)

[link] Thinking LLMs: General Instruction Following with Thought Generation
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-10-15T09:21:22.583Z · comments (0)

[link] [Linkpost] Hawkish nationalism vs international AI power and benefit sharing
jakub_krys (kryjak) · 2024-10-18T18:13:19.425Z · comments (5)

The Great Bootstrap
KristianRonn · 2024-10-11T19:46:51.752Z · comments (0)

A brief theory of why we think things are good or bad
David Johnston (david-johnston) · 2024-10-20T20:31:26.309Z · comments (10)

[link] Taking nonlogical concepts seriously
Kris Brown (kris-brown) · 2024-10-15T18:16:01.226Z · comments (5)

Denver USA - ACX Meetups Everywhere Fall 2024
Eneasz · 2024-08-29T18:40:53.332Z · comments (0)

Testing "True" Language Understanding in LLMs: A Simple Proposal
MtryaSam · 2024-11-02T19:12:34.710Z · comments (2)

Deception and Jailbreak Sequence: 2. Iterative Refinement Stages of Jailbreaks in LLM
Winnie Yang (winnie-yang) · 2024-08-28T08:41:38.967Z · comments (2)

[link] Validating / finding alignment-relevant concepts using neural data
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-09-20T21:12:49.267Z · comments (0)

Broadly human level, cognitively complete AGI
p.b. · 2024-08-06T09:26:13.220Z · comments (0)

[link] Is Redistributive Taxation Justifiable? Part 1: Do the Rich Deserve their Wealth?
Alexander de Vries (alexander-de-vries) · 2024-09-05T10:23:08.958Z · comments (20)

Of Birds and Bees
RussellThor · 2024-09-30T10:52:15.069Z · comments (9)

[link] In-Context Learning: An Alignment Survey
alamerton · 2024-09-30T18:44:28.589Z · comments (0)

[link] Species as Canonical Referents of Super-Organisms
Yudhister Kumar (randomwalks) · 2024-10-18T07:49:52.944Z · comments (8)

[question] On the subject of in-house large language models versus implementing frontier models
Annapurna (jorge-velez) · 2024-09-23T15:00:32.811Z · answers+comments (1)

[link] Boons and banes
dkl9 · 2024-09-23T06:18:38.335Z · comments (0)

Foresight Vision Weekend 2024
Allison Duettmann (allison-duettmann) · 2024-10-01T21:59:55.107Z · comments (0)

A Brief Explanation of AI Control
Aaron_Scher · 2024-10-22T07:00:56.954Z · comments (1)

[question] If I ask an LLM to think step by step, how big are the steps?
ryan_b · 2024-09-13T20:30:50.558Z · answers+comments (1)

[question] What makes one a "rationalist"?
mathyouf · 2024-10-08T20:25:21.812Z · answers+comments (5)

Goal: Understand Intelligence
Johannes C. Mayer (johannes-c-mayer) · 2024-11-03T21:20:02.900Z · comments (8)

Increasing the Span of the Set of Ideas
Jeffrey Heninger (jeffrey-heninger) · 2024-09-13T15:52:39.132Z · comments (1)

Exploring Shard-like Behavior: Empirical Insights into Contextual Decision-Making in RL Agents
Alejandro Aristizabal (alejandro-aristizabal) · 2024-09-29T00:32:42.161Z · comments (0)

Understanding Hidden Computations in Chain-of-Thought Reasoning
rokosbasilisk · 2024-08-24T16:35:03.907Z · comments (1)

Thirty random thoughts about AI alignment
Lysandre Terrisse · 2024-09-15T16:24:10.572Z · comments (1)

[link] Optimising under arbitrarily many constraint equations
dkl9 · 2024-09-12T14:59:28.475Z · comments (0)

Food, Prison & Exotic Animals: Sparse Autoencoders Detect 6.5x Performing Youtube Thumbnails
Louka Ewington-Pitsos (louka-ewington-pitsos) · 2024-09-17T03:52:43.269Z · comments (2)

Thoughts on Evo-Bio Math and Mesa-Optimization: Maybe We Need To Think Harder About "Relative" Fitness?
Lorec · 2024-09-28T14:07:42.412Z · comments (6)

GPT4o is still sensitive to user-induced bias when writing code
Reed (ThomasReed) · 2024-09-22T21:04:54.717Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

steve2152 on Could orcas be (trained to be) smarter than humans? 

Possibly related: Could we use current AI methods to understand dolphins? [LW · GW] + comments

daniel-kokotajlo on Daniel Kokotajlo's Shortform

Thanks, I hadn't considered that. So as per my argument, there's some threshold of density above which it's easier to attack underground; as per your argument, there's some threshold of density where 'indirect fires' of large tunnel-destroying bombs become practical. Unclear which threshold comes first, but I'd guess it's the first.

davekasten on Daniel Kokotajlo's Shortform

I think the thing that you're not considering is that when tunnels are more prevalent and more densely packed, the incentives to use the defensive strategy of "dig a tunnel, then set off a very big bomb in it that collapses many tunnels" gets far higher. It wouldn't always be infantry combat, it would often be a subterranean equivalent of indirect fires.

viliam on Review: “The Case Against Reality”

All of the stuff we take for granted as making up the world we inhabit — objects, dimensions, qualities, time, causality — are, says Hoffman, not objectively real.

I expect these words to be followed by a redefinition of what "objectively real" means.

There’s really just no resemblance between the interface and the underlying reality at all.

Well, maybe no resemblance of... uhm... the substrate, for example we could be simulated on a computer, and the stuff our universe is made of could be completely different from the stuff the computer is made of.

But there is also the resemblance of... form, like if we experience something in our universe, it is probably in some way represented in the simulated computer. Like, the things we perceive as the laws of physics are in some way a consequence of something in the simulating software. Maybe not 1:1 directly; it could be the case that something we perceive as a basic law could be just some statistical average of the true underlying law.

What I am trying to say is that if our perceptions were utterly disconnected from the true reality, then we wouldn't be able to make any kinds of predictions, and probably couldn't even live.

So what is reality composed of? Hoffman’s bet is that it’s consciousness all the way down.

That would be a consciousness of someone who is really obsessed by calculating zillions of numbers precisely. Some kind of Divine Autistic Nerd.

tsvibt on Advisors for Smaller Major Donors?

Well, anyone who wants could pay me to advise them about giving to decrease X-risk by creating smarter humans. Funders less constrained by PR would of course be advantaged in that area.

measure on How to put California and Texas on the campaign trail!

I remember there was a movement a while back to have states agree to award their electors to the national proportional vote winner, but I'm not sure what came of that.

anthony-digiovanni on Winning isn't enough

The key claim is: You can’t evaluate which beliefs and decision theory to endorse just by asking “which ones perform the best?” Because the whole question is what it means to systematically perform better, under uncertainty. Every operationalization of “systematically performing better” we’re aware of is either:

Incomplete — like “avoiding dominated strategies”, which leaves a lot unconstrained;
A poorly motivated proxy for the performance we actually care about — like “doing what’s worked in the past”; or
Secretly smuggling in nontrivial non-pragmatic assumptions — like “doing what’s worked in the past, not because that’s what we actually care about, but because past performance predicts future performance”

This is what we meant to convey with this sentence: “On any way of making sense of those words, we end up either calling a very wide range of beliefs and decisions “rational”, or reifying an objective that has nothing to do with our terminal goals without some substantive assumptions.”

(I can't tell from your comment if you agree with all of that. But, if this was all obvious to you, great! But we’ve often had discussions where someone appealed to “which ones perform the best?” in a way that misses these points.)

steve2152 on Against empathy-by-default

Hmm, maybe we should distinguish two things:

(A) I find the feeling of picking up the tofu with the fork to be intrinsically satisfying—it feels satisfying and empowering to feel the tongs of the fork slide into the tofu.
(B) I don’t care at all about the feeling of the fork sliding into the tofu; instead I feel motivated to pick up tofu with the fork because I’m hungry and tofu is yummy.

For (A), the analogy to picking up feta is logically sound—this is legitimate evidence that picking up the feta will also feel intrinsically satisfying. And accordingly, my brain, having made the analogy, correctly feels motivated to pick up feta.

For (B), the analogy to picking up feta is irrelevant. The dimension along which I’m analogizing (how the fork slides in) is unrelated to the dimension which constitutes the source of my motivation (tofu being yummy). And accordingly, if I like the taste of tofu but dislike feta, then I will not feel motivated to pick up the feta, not even a little bit, let alone to the point where it’s determining my behavior.

The lesson here (I claim) is that our brain algorithms are sophisticated enough to not just note whether an analogy target has good or bad vibes, but rather whether the analogy target has good or bad vibes for reasons that legitimately transfer back to the real plan under consideration.

So circling back to empathy, if I was a sociopath, then “Ahmed getting punched” might still kinda remind me of “me getting punched”, but the reason I dislike “me getting punched” is because it’s painful, whereas “Ahmed getting punched” is not painful. So even if “me getting punched” momentarily popped into my sociopathic head, I would then immediately say to myself “ah, but that’s not something I need to worry about here”, and whistle a tune and carry on with my day.

Remember, empathy is a major force. People submit to torture and turn their lives upside down over feelings of empathy. If you want to talk about phenomena like “something unpleasant popped into my head momentarily, even if it doesn’t really have anything to do with this situation”, then OK maybe that kind of thing might have a nonzero impact on motivation, but even if it does, it’s gonna be tiny. It’s definitely not up to the task of explaining such a central part of human behavior, right?

sunwillrise on Winning isn't enough

some people say that "winning is about not playing dominated strategies"

I do not believe this statement. As in, I do not currently know of a single person, associated either with LW or with decision-theory academia, that says "not playing dominated strategies is entirely action-guiding." So, as Raemon pointed out [LW(p) · GW(p)], "this post seems like it’s arguing with someone but I’m not sure who."

In general, I tend to mildly disapprove of words like "a widely-used strategy", "we often encounter claims" etc, without any direct citations to the individuals who are purportedly making these mistakes. If it really was that widely-used, surely it would be trivial for the authors to quote a few examples off the top of their head, no? What does it say about them that they didn't?

tailcalled on Scissors Statements for President?

Electoral candidates can only be very bad because the country is very big and strong, which can only be the case because there's a lot of people, land, capital and institutions.

Noticing that two candidates for leading these resources are both bad is kind of useless without some other opinion on what form the resources should enter. A simple option would be that the form of the resources should lessen, e.g. that people should work less. The first step to this is to go away from Keynesianism. But if you take that to its logical conclusion, it implies e/acc replacement of humanity, VHEM, mass suicide, or whatever. It's not surprising that this is unpopular.

So that raises the question: What's some direction that the form of societal resources could be shifted towards that would be less confusing than a scissor statement candidate?

Because without an answer to this question, I'm not sure we even need elaborate theories on scissor statements.