LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[question] If I ask an LLM to think step by step, how big are the steps?
ryan_b · 2024-09-13T20:30:50.558Z · answers+comments (1)

Of Birds and Bees
RussellThor · 2024-09-30T10:52:15.069Z · comments (9)

Not all biases are equal - a study of sycophancy and bias in fine-tuned LLMs
jakub_krys (kryjak) · 2024-11-11T23:11:15.233Z · comments (0)

[link] Checking public figures on whether they "answered the question" quick analysis from Harris/Trump debate, and a proposal
david reinstein (david-reinstein) · 2024-09-11T20:25:27.845Z · comments (4)

Foresight Vision Weekend 2024
Allison Duettmann (allison-duettmann) · 2024-10-01T21:59:55.107Z · comments (0)

Funding for programs and events on global catastrophic risk, effective altruism, and other topics
abergal · 2024-08-14T23:59:48.146Z · comments (0)

[link] Boons and banes
dkl9 · 2024-09-23T06:18:38.335Z · comments (0)

Sequence overview: Welfare and moral weights
MichaelStJules · 2024-08-15T04:22:32.567Z · comments (0)

[link] Spherical cow
dkl9 · 2024-11-11T03:10:27.788Z · comments (0)

[link] Is Redistributive Taxation Justifiable? Part 1: Do the Rich Deserve their Wealth?
Alexander de Vries (alexander-de-vries) · 2024-09-05T10:23:08.958Z · comments (20)

[link] Consciousness As Recursive Reflections
Gunnar_Zarncke · 2024-10-05T20:00:53.053Z · comments (3)

Fake Blog Posts as a Problem Solving Device
silentbob · 2024-08-31T09:22:54.513Z · comments (0)

[question] What makes one a "rationalist"?
mathyouf · 2024-10-08T20:25:21.812Z · answers+comments (5)

[question] Does a time-reversible physical law/Cellular Automaton always imply the First Law of Thermodynamics?
Noosphere89 (sharmake-farah) · 2024-08-30T15:12:28.823Z · answers+comments (11)

The Great Bootstrap
KristianRonn · 2024-10-11T19:46:51.752Z · comments (0)

[question] On the subject of in-house large language models versus implementing frontier models
Annapurna (jorge-velez) · 2024-09-23T15:00:32.811Z · answers+comments (1)

[link] Validating / finding alignment-relevant concepts using neural data
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-09-20T21:12:49.267Z · comments (0)

[link] Thinking LLMs: General Instruction Following with Thought Generation
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-10-15T09:21:22.583Z · comments (0)

[link] Taking nonlogical concepts seriously
Kris Brown (kris-brown) · 2024-10-15T18:16:01.226Z · comments (5)

Moral Trade, Impact Distributions and Large Worlds
Larks · 2024-09-20T03:45:56.273Z · comments (0)

One person's worth of mental energy for AI doom aversion jobs. What should I do?
Lorec · 2024-08-26T01:29:01.700Z · comments (16)

Deception and Jailbreak Sequence: 2. Iterative Refinement Stages of Jailbreaks in LLM
Winnie Yang (winnie-yang) · 2024-08-28T08:41:38.967Z · comments (2)

Piling bounded arguments
momom2 (amaury-lorin) · 2024-09-19T22:27:41.534Z · comments (0)

[link] [Linkpost] Hawkish nationalism vs international AI power and benefit sharing
jakub_krys (kryjak) · 2024-10-18T18:13:19.425Z · comments (5)

Denver USA - ACX Meetups Everywhere Fall 2024
Eneasz · 2024-08-29T18:40:53.332Z · comments (0)

[question] What actual bad outcome has "ethics-based" RLHF AI Alignment already prevented?
Roko · 2024-10-19T06:11:12.602Z · answers+comments (16)

[link] October 2024 Progress in Guaranteed Safe AI
Quinn (quinn-dougherty) · 2024-10-28T23:34:51.689Z · comments (0)

Enhancing Mathematical Modeling with LLMs: Goals, Challenges, and Evaluations
ozziegooen · 2024-10-28T21:44:42.352Z · comments (0)

The Personal Implications of AGI Realism
xizneb · 2024-10-20T16:43:37.870Z · comments (7)

A brief theory of why we think things are good or bad
David Johnston (david-johnston) · 2024-10-20T20:31:26.309Z · comments (10)

Quantitative Trading Bootcamp [Nov 6-10]
Ricki Heicklen (bayesshammai) · 2024-10-28T18:39:58.480Z · comments (0)

[question] somebody explain the word "epistemic" to me
KvmanThinking (avery-liu) · 2024-10-28T16:40:24.275Z · answers+comments (8)

Thoughts on Evo-Bio Math and Mesa-Optimization: Maybe We Need To Think Harder About "Relative" Fitness?
Lorec · 2024-09-28T14:07:42.412Z · comments (6)

GPT4o is still sensitive to user-induced bias when writing code
Reed (ThomasReed) · 2024-09-22T21:04:54.717Z · comments (0)

[link] Optimising under arbitrarily many constraint equations
dkl9 · 2024-09-12T14:59:28.475Z · comments (0)

Food, Prison & Exotic Animals: Sparse Autoencoders Detect 6.5x Performing Youtube Thumbnails
Louka Ewington-Pitsos (louka-ewington-pitsos) · 2024-09-17T03:52:43.269Z · comments (2)

2025 Q1 Pivotal Research Fellowship (Technical & Policy)
Tobias H (clearthis) · 2024-11-12T10:56:24.858Z · comments (0)

Exploring Shard-like Behavior: Empirical Insights into Contextual Decision-Making in RL Agents
Alejandro Aristizabal (alejandro-aristizabal) · 2024-09-29T00:32:42.161Z · comments (0)

Understanding Hidden Computations in Chain-of-Thought Reasoning
rokosbasilisk · 2024-08-24T16:35:03.907Z · comments (1)

Increasing the Span of the Set of Ideas
Jeffrey Heninger (jeffrey-heninger) · 2024-09-13T15:52:39.132Z · comments (1)

[question] Can subjunctive dependence emerge from a simplicity prior?
Daniel C (harper-owen) · 2024-09-16T12:39:35.543Z · answers+comments (0)

Forever Leaders
Justice Howard (justice-howard) · 2024-09-14T20:55:39.095Z · comments (9)

Halifax Canada - ACX Meetups Everywhere Fall 2024
interstice · 2024-08-29T18:39:12.490Z · comments (0)

[link] AI Safety Newsletter #43: White House Issues First National Security Memo on AI Plus, AI and Job Displacement, and AI Takes Over the Nobels
Corin Katzke (corin-katzke) · 2024-10-28T16:03:39.258Z · comments (0)

Introducing Kairos: a new AI safety fieldbuilding organization (the new home for SPAR and FSP)
agucova · 2024-10-25T21:59:08.782Z · comments (0)

The Existential Dread of Being a Powerful AI System
testingthewaters · 2024-09-26T10:56:32.904Z · comments (1)

[question] Is School of Thought related to the Rationality Community?
Shoshannah Tekofsky (DarkSym) · 2024-10-15T12:41:33.224Z · answers+comments (6)

Thirty random thoughts about AI alignment
Lysandre Terrisse · 2024-09-15T16:24:10.572Z · comments (1)

[link] SCP Foundation - Anti memetic Division Hub
landscape_kiwi · 2024-09-15T13:40:52.691Z · comments (1)

Does “Ultimate Neartermism” via Eternal Inflation dominate Longtermism in expectation?
Jordan Arel · 2024-08-17T22:28:21.849Z · comments (1)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

quetzal_rainbow on D0TheMath's Shortform

You are making an error here: ZFC + not Consistent(ZFC) != ZFC.

Assuming ZFC + not Consistent(ZFC) we can prove Consistent(ZFC), because inconsistent systems can prove everything and ZFC + not Consistent(ZFC) + Consistent(ZFC) is, in fact, inconsistent. But it doesn't say anything about consistency of ZFC itself, because you can freely assume any sufficiently powerful system instead of ZFC. If you assume inconsistent system, then system + not Consistent(system) is still inconsistent, if you assume consistent system, then system + not Consistent(system) is inconsistent for reasoning above, so it can't prove whether assumed system is consistent or not.

satron on Buck's Shortform

I am asking this, because I am planning on entering the field, but this is a pretty niche topic that I haven't yet researched.

startattheend on Anvil Problems

I meant that they were functionally booleans, as a single condition is fulfilled "is rich", "has anvil", "AGI achieved". In the anvil example, any number past 1 corresponds to true. In programming, casting positive integers to booleans results in "true" for all positive numbers, and "false" in the case of zero, just like in the anvil example. The intuition carries over too well for me to ignore.

The first example which came to mind for me when reading the post was confidence, which is often treated as a boolean "Does he have confidence? yes/no". So you don't need any countable objects, only a condition/threshold which is either reached or not, with anything past "yes" still being "yes".

A function where everything past a threshold maps to true, and anything before it maps to false, is similar to the anvil example, and to a function like "is positive" (since a more positive number is still positive). But for the threshold to be exactly 1 unit, you need to choose a unit which is large enough. 1$ is not rich, and having one water droplet on you is not "wet", but with the appropriate unit (exactly the size of the threshold/condition) these should be functionally similar.

I'm hoping there is simple and intuitive mathematics for generalizing this class of problems. And now that I think about it, most of these things (the ones which can be used for making more of themselves) are catalysts (something used but not consumed in the process of making something). Using money to make more money, anvils to make more anvils, breeding more of a species before it goes extinct.

kajus on Complex Systems for AI Safety [Pragmatic AI Safety #3]

Great post!

In the past, broad interventions would clearly have been more effective: for instance, there would have been little use in studying empirical alignment prior to deep learning. Even more recently than the advent of deep learning, many approaches to empirical alignment were highly deemphasized when large, pretrained language models arrived on the scene (refer to our discussion of creative destruction in the last post).

As discussed in the last post, a leading motivation for researchers is the interestingness or “coolness” of a problem. Getting more people to research relevant problems is highly dependent on finding interesting and well-defined subproblems for them to work on. This relies on concretizing problems and providing funding for solving them.

This seems be a conflicting advice to me. If you try to follow both you might end up having hard time finding direction for research.

daemonicsigil on The Foraging (Ex-)Bandit [Ruleset & Reflections]

Thanks for making the game! I also played it, just didn't leave a comment on the original post. Scored 2751. I played each location for an entire day after building an initial food stockpile, and so figured out the timing of Tiger Forest and Dog Valley. But I also did some fairly dumb stuff, like assuming a time dependence for other biomes. And I underestimated Horse Hills, since when I foraged it for a full day, I got unlucky and only rolled a single large number. For what it's worth, I find these applet things more accessible than a full-on D&D.Sci (though those are also great), which I often end up not playing because it feels too much like work. With applets you can play on medium-low effort (which I did) and make lots of mistakes (which I did) and learn Valuable Lessons about How Not To Science (which one might hope I did).

jan-betley on Seven lessons I didn't learn from election day

The second reason that I don't trust the neighbor method is that people just... aren't good at knowing who a majority of their neighbors are voting for. In many cases it's obvious (if over 70% of your neighbors support one candidate or the other, you'll probably know). But if it's 55-45, you probably don't know which direction it's 55-45 in.

My guess is that there's some postprocessing here. E.g. if you assume that the "neighbor" estimate is wrong but without the refusal problem, and you have the same data from the previous election, then you could estimate the shift of opinions and apply that to other pools that ask about your vote. Or you could ask some additional question like "who did your neighbours vote for in the previous election" and compare that to the real data (ideally per county or so). I would be very surprised if they based the bets just on the raw results.

unexpectedvalues on Seven lessons I didn't learn from election day

If you ask people who their neighbors are voting for, they will make their best guess about who their neighbors are voting for. Occasionally their best guess will be to assume that their neighbors will vote the same way that they're voting, but usually not. Trump voters in blue areas will mostly answer "Harris" to this question, and Harris voters in red areas will mostly answer "Trump".

clone-of-saturn on Seven lessons I didn't learn from election day

The second reason that I don’t trust the neighbor method is that people just… aren’t good at knowing who a majority of their neighbors are voting for.

This seems like a point in favor of the neighbor method, not against it. You would want people to find "who are my neighbors voting for?" too difficult to readily answer and so mentally replace it with the simpler question "who am I voting for?" thus giving them a plausibly deniable way to admit to voting for Trump.

jimmy on [Intuitive self-models] 4. Trance

I'm the person JenniferRM mentioned. I'm also a physics guy, and got into studying/practicing hypnosis in ~2010/2011. I kinda moved on from "hypnosis" and drifted up the abstraction ladder, but still working on similar things and working on tying them together.

Anyway, here are my thoughts.

Suppose I really want her to be spinning clockwise in my mind. What might I do?

What worked for me is to focus on the foot alone and ignore the broader context so that I had a "clean slate" without "confirmatory experience" blocking my desired conclusion. When looking at the foot alone I experience it as oscillating rather than rotating (which I guess it technically is), and from there I can "release" it into whichever spin I intend by just kinda imagining that this is what's going on.

On the one hand, shifting intuitive models is surprisingly hard! You can’t necessarily just want to have a particular intuitive model, and voluntarily make that happen.

I actually disagree with this. It certainly seems hard, but the difficulty is largely illusory and pretty much disappears once you stop trying to walk through the wall and notice the front door.

The problem is that "wanting to have a particular model" isn't the thing that matters. You can want to have a particular model all you want, and you can even think the model is true all you want, but you're still talking about the statement itself not about the reality to which the statement refers. Even if you convince someone that their fear is irrational and they'd be better off not being scared, you've still only convinced them that their fear is irrational and they'd be better off not being scared. If you want to convince them that they are safe -- and therefore change their fear response itself -- then you need to convince them that they're safe. It's the difference between looking at yourself from the third person and judging whether your beliefs are correct or not, vs looking at the world from the first person and seeing what is there. If you want to change the third person perspective, then you can look at which models are desirable and why. If you want to change the first person models themselves, you have to look to the world and see what's there.

This doesn't really work with the spinning dancer because "Which way is the dancer spinning?" doesn't have an answer, but this is an artificial issue which doesn't exist in the real world. You still have to figure out "Is this safe enough to be worth doing?" and that's not always trivial, but the problem of "How do I change this irrational fear?" (for example) is. The answer is "By attending to the question of whether it is actually safe".

I don't deny that there's "skill" to it, but most of the skill IME is a meta skill of knowing what to even aim for rather than aiming well. Once you start attending to "Is it safe enough?", then when the answer is actually obvious the intuitive models just change. I can give a whole bunch of examples of this if you want, where people were stuck unable to change their responses and the problem just melts away with this redirection. Even stuff that you'd think would be resistant to change like physical pain can change essentially instantly. I've had it take as little as a single word.

Again we see that the subject is made to feel that his body is out of control, and becomes subject to a high-status person. Some hypnotists sit you down, ask you to stare upwards into their eyes and suggest that your eyelids are wanting to close—which works because looking upwards is tiring, and because staring up into a high-status person’s eyes makes you feel inferior.

This isn't exactly wrong, but I want to push back on the implication that this is the central or most important thing here.

The central thing, IMO, is a willingness to try on another person's worldview even though it clashes with your own. It doesn't require "inferiority"/"high status"/"control" except in the extremely minimal sense that they might know something important that you don't, and that seeing it for yourself might change your behavior. That alone will get you inhibition of all the normal stuff and an automatic (albeit tentative) acceptance of worldview-dissonant perspectives (e.g. name amnesia). It helps if the person has reason to respect and trust you which is kinda like "high status", but not really because it can just as easily happen with people on equal social standing in neutral contexts.

Similarly, hypnosis has very little to do with sleep and eye fatigue/closure is not the important part of eye contact. The important part of eye contact is that it's incredibly communicative. You can convey with eye contact things which you can't convey with words. "I see you". "Seeing you doesn't cause conflict in me". "I see you seeing me see you" and so on, to name a few. All the things you need to communicate to show someone that your perspective is safe and worthy of experiencing are best communicated with the eyes. And perhaps equally important it is a bid for attention, by holding your own.

So far, this isn’t a trance; I’m just describing a common social dynamic. Specifically, if I’m not in a hypnotic trance, the sequence of thoughts in the above might look like a three-step process:
[...]
i.e., in my intuitive model, first, the hypnotist exercises his free will with the intention of me standing; second, I (my homunculus) exercise my own free will with the intention of standing; and third, I actually stand. In this conceptualization, it’s my own free will / vitalistic force / wanting (§3.3.4) that causes me to stand. So this is not a trance.

It's important to note that while this self reflective narrative is indeed different in the way you describe, the underlying truth often is not. In the hypnosis literature this is known as "cold control theory", because it's the same control without the usual Higher Order Thoughts (HOT).

In "common social dynamics" we explain it as "I chose to", but what is actually happening a lot of the time is the speaker is exercising their free will through your body, and you're not objecting because it matches your narrative. The steps aren't actually in series, and you didn't choose to do it so much as you chose to not decline to do it.

These "higher order thoughts" do change some things, but turn out to be relatively unimportant and the better hypnotists usually don't bother too much with them and instead just address the object level. This is also why you get hypnotists writing books subtitled "there's no such thing as hypnosis" and stuff like that.

The short version is: If I have a tune in my head, then I’m very unlikely to simultaneously recall a memory of a different tune. Likewise, if I’m angry right now, then I’m less likely to recall past memories where I felt happy and forgiving, and vice-versa.

As far as I can tell, there are several different things going on with amnesia. I agree that this is one of them, and I'm not sure if I've seen anyone else notice this, so it's cool to see someone point it out.

The "null hypothesis", though, any time it comes to hypnosis is that it's all just response to suggestion. You "know" that being hypnotized involves amnesia, and you believe you're hypnotized, so you experience what you expect. There's an academic hypnosis researcher I talk to sometimes who doesn't even believe "hypnotic trance" is real in any fundamental sense and thinks that all the signs of trance are the result of suggestion.

I don't believe suggestion is all that's going on, but it really is sufficient for amnesia. The answer to Yudkowsky's old question of "Do we believe everything we're told?" [LW · GW] is indeed "Yes" -- if we don't preemptively push it away or actively remember to unbelieve later. Back when I was working this stuff out I did a fun experiment where I'd come up with an excuse to get people to not pre-emptively reject what I was about to say, then I'd suggest amnesia for this conversation and that they'd laugh when I scratch my nose, and then I'd distract them so that the suggestion could take effect before they had a chance to unbelieve it. The excuse was something like "I know this is ridiculous so I don't expect you to believe it, but hear me out and let me know if you understand" -- which is tricky because they think the fact that we "agreed" that they won't believe it means they actually aren't believing it when they say "I understand", even though the full statement is "I understand [that I will laugh when you scratch your nose and have no idea why"]. They still had awareness that this belief is wrong and would therefore act to stop themselves from acting on it, which is why the unexpected distraction was necessary in order to get their mind off of it long enough for it to work.

unexpectedvalues on Seven lessons I didn't learn from election day

Ah, I think I see. Would it be fair to rephrase your question as: if we "re-rolled the dice" a week before the election, how likely was Trump to win?

My answer is probably between 90% and 95%. Basically the way Trump loses is to lose some of his supporters or have way more late deciders decide on Harris. That probably happens if Trump says something egregiously stupid or offensive (on the level of the Access Hollywood tape), or if some really bad news story about him comes out, but not otherwise.