LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Robin Hanson AI X-Risk Debate — Highlights and Analysis
Liron · 2024-07-12T21:31:02.222Z · comments (7)

Untrustworthy models: a frame for scheming evaluations
Olli Järviniemi (jarviniemi) · 2024-08-19T16:27:11.088Z · comments (3)

[Intuitive self-models] 7. Hearing Voices, and Other Hallucinations
Steven Byrnes (steve2152) · 2024-10-29T13:36:16.325Z · comments (2)

The Shallow Bench
Karl Faulks (karl-faulks) · 2024-11-05T05:07:27.357Z · comments (5)

I finally got ChatGPT to sound like me
lsusr · 2024-09-17T09:39:59.415Z · comments (18)

Humanity isn't remotely longtermist, so arguments for AGI x-risk should focus on the near term
Seth Herd · 2024-08-12T18:10:56.543Z · comments (10)

Bounty for Evidence on Some of Palisade Research's Beliefs
benwr · 2024-09-23T20:01:20.917Z · comments (4)

Forecasting One-Shot Games
Raemon · 2024-08-31T23:10:05.475Z · comments (0)

Decision Theory in Space
lsusr · 2024-08-18T07:02:11.847Z · comments (18)

[link] MIRI's September 2024 newsletter
Harlan · 2024-09-16T18:15:40.785Z · comments (0)

[link] Michael Dickens' Caffeine Tolerance Research
niplav · 2024-09-04T15:41:53.343Z · comments (3)

AI as a powerful meme, via CGP Grey
TheManxLoiner · 2024-10-30T18:31:58.544Z · comments (6)

AI #88: Thanks for the Memos
Zvi · 2024-10-31T15:00:07.412Z · comments (5)

All The Latest Human tFUS Studies
sarahconstantin · 2024-08-09T22:20:04.561Z · comments (2)

AI #75: Math is Easier
Zvi · 2024-08-01T13:40:05.539Z · comments (25)

Toy Models of Feature Absorption in SAEs
chanind · 2024-10-07T09:56:53.609Z · comments (7)

Higher-effort summer solstice: What if we used AI (i.e., Angel Island)?
Rachel Shu (wearsshoes) · 2024-06-25T01:35:54.064Z · comments (9)

How to hire somebody better than yourself
lukehmiles (lcmgcd) · 2024-08-28T08:12:53.450Z · comments (5)

Work with me on agent foundations: independent fellowship
Alex_Altair · 2024-09-21T13:59:16.706Z · comments (5)

Principled Satisficing To Avoid Goodhart
JenniferRM · 2024-08-16T19:05:27.204Z · comments (2)

[link] Gwern Branwen interview on Dwarkesh Patel’s podcast: “How an Anonymous Researcher Predicted AI's Trajectory”
Said Achmiz (SaidAchmiz) · 2024-11-14T23:53:34.922Z · comments (0)

Startup Roundup #2
Zvi · 2024-08-06T13:30:06.554Z · comments (0)

AI #72: Denying the Future
Zvi · 2024-07-11T15:00:05.865Z · comments (8)

We Don't Know Our Own Values, but Reward Bridges The Is-Ought Gap
johnswentworth · 2024-09-19T22:22:05.307Z · comments (47)

[link] AI Rights for Human Safety
Simon Goldstein (simon-goldstein) · 2024-08-01T23:01:07.252Z · comments (6)

[link] What Ketamine Therapy Is Like
Sable · 2024-11-11T11:09:08.602Z · comments (5)

We ran an AI safety conference in Tokyo. It went really well. Come next year!
Blaine (blaine-rogers) · 2024-07-17T06:55:39.620Z · comments (1)

~80 Interesting Questions about Foundation Model Agent Safety
RohanS · 2024-10-28T16:37:04.713Z · comments (4)

AI #80: Never Have I Ever
Zvi · 2024-09-10T17:50:08.074Z · comments (20)

[link] Open Sourcing Metaculus
ChristianWilliams · 2024-07-02T22:30:01.339Z · comments (0)

Start an Upper-Room UV Installation Company?
jefftk (jkaufman) · 2024-10-19T02:00:10.691Z · comments (9)

Economics Roundup #3
Zvi · 2024-09-10T13:50:06.955Z · comments (9)

In defense of technological unemployment as the main AI concern
tailcalled · 2024-08-27T17:58:01.992Z · comments (36)

Case Study: Interpreting, Manipulating, and Controlling CLIP With Sparse Autoencoders
Gytis Daujotas (gytis-daujotas) · 2024-08-01T21:08:38.800Z · comments (6)

Simplifying Corrigibility – Subagent Corrigibility Is Not Anti-Natural
Rubi J. Hudson (Rubi) · 2024-07-16T22:44:17.128Z · comments (27)

[question] "Deception Genre" What Books are like Project Lawful?
Double · 2024-08-28T17:19:52.172Z · answers+comments (20)

Sci-Fi books micro-reviews
Yair Halberstadt (yair-halberstadt) · 2024-06-24T09:49:28.523Z · comments (27)

Understanding Positional Features in Layer 0 SAEs
bilalchughtai (beelal) · 2024-07-29T09:36:40.701Z · comments (0)

[link] Analyzing how SAE features evolve across a forward pass
bensenberner · 2024-11-07T22:07:02.827Z · comments (0)

New Executive Team & Board — PIBBSS
Nora_Ammann · 2024-07-01T19:30:45.261Z · comments (1)

The need for multi-agent experiments
Martín Soto (martinsq) · 2024-08-01T17:14:16.590Z · comments (3)

Motivation control
Joe Carlsmith (joekc) · 2024-10-30T17:15:50.881Z · comments (7)

Which LessWrong/Alignment topics would you like to be tutored in? [Poll]
Ruby · 2024-09-19T01:35:02.999Z · comments (12)

[link] Why Georgism Lost Its Popularity
Zero Contradictions · 2024-07-20T15:08:41.469Z · comments (50)

[link] cancer rates after gene therapy
bhauth · 2024-10-16T15:32:53.949Z · comments (0)

How difficult is AI Alignment?
Sammy Martin (SDM) · 2024-09-13T15:47:10.799Z · comments (6)

Ambiguity in Prediction Market Resolution is Still Harmful
aphyer · 2024-07-31T20:32:40.217Z · comments (17)

Minimal Motivation of Natural Latents
johnswentworth · 2024-10-14T22:51:58.125Z · comments (14)

Trust as a bottleneck to growing teams quickly
benkuhn · 2024-07-13T18:00:04.579Z · comments (3)

Formalizing the Informal (event invite)
abramdemski · 2024-09-10T19:22:53.564Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

quetzal_rainbow on D0TheMath's Shortform

You are making an error here: ZFC + not Consistent(ZFC) != ZFC.

Assuming ZFC + not Consistent(ZFC) we can prove Consistent(ZFC), because inconsistent systems can prove everything and ZFC + not Consistent(ZFC) + Consistent(ZFC) is, in fact, inconsistent. But it doesn't say anything about consistency of ZFC itself, because you can freely assume any sufficiently powerful system instead of ZFC. If you assume inconsistent system, then system + not Consistent(system) is still inconsistent, if you assume consistent system, then system + not Consistent(system) is inconsistent for reasoning above, so it can't prove whether assumed system is consistent or not.

satron on Buck's Shortform

I am asking this, because I am planning on entering the field, but this is a pretty niche topic that I haven't yet researched.

startattheend on Anvil Problems

I meant that they were functionally booleans, as a single condition is fulfilled "is rich", "has anvil", "AGI achieved". In the anvil example, any number past 1 corresponds to true. In programming, casting positive integers to booleans results in "true" for all positive numbers, and "false" in the case of zero, just like in the anvil example. The intuition carries over too well for me to ignore.

The first example which came to mind for me when reading the post was confidence, which is often treated as a boolean "Does he have confidence? yes/no". So you don't need any countable objects, only a condition/threshold which is either reached or not, with anything past "yes" still being "yes".

A function where everything past a threshold maps to true, and anything before it maps to false, is similar to the anvil example, and to a function like "is positive" (since a more positive number is still positive). But for the threshold to be exactly 1 unit, you need to choose a unit which is large enough. 1$ is not rich, and having one water droplet on you is not "wet", but with the appropriate unit (exactly the size of the threshold/condition) these should be functionally similar.

I'm hoping there is simple and intuitive mathematics for generalizing this class of problems. And now that I think about it, most of these things (the ones which can be used for making more of themselves) are catalysts (something used but not consumed in the process of making something). Using money to make more money, anvils to make more anvils, breeding more of a species before it goes extinct.

kajus on Complex Systems for AI Safety [Pragmatic AI Safety #3]

Great post!

In the past, broad interventions would clearly have been more effective: for instance, there would have been little use in studying empirical alignment prior to deep learning. Even more recently than the advent of deep learning, many approaches to empirical alignment were highly deemphasized when large, pretrained language models arrived on the scene (refer to our discussion of creative destruction in the last post).

As discussed in the last post, a leading motivation for researchers is the interestingness or “coolness” of a problem. Getting more people to research relevant problems is highly dependent on finding interesting and well-defined subproblems for them to work on. This relies on concretizing problems and providing funding for solving them.

This seems be a conflicting advice to me. If you try to follow both you might end up having hard time finding direction for research.

daemonicsigil on The Foraging (Ex-)Bandit [Ruleset & Reflections]

Thanks for making the game! I also played it, just didn't leave a comment on the original post. Scored 2751. I played each location for an entire day after building an initial food stockpile, and so figured out the timing of Tiger Forest and Dog Valley. But I also did some fairly dumb stuff, like assuming a time dependence for other biomes. And I underestimated Horse Hills, since when I foraged it for a full day, I got unlucky and only rolled a single large number. For what it's worth, I find these applet things more accessible than a full-on D&D.Sci (though those are also great), which I often end up not playing because it feels too much like work. With applets you can play on medium-low effort (which I did) and make lots of mistakes (which I did) and learn Valuable Lessons about How Not To Science (which one might hope I did).

jan-betley on Seven lessons I didn't learn from election day

The second reason that I don't trust the neighbor method is that people just... aren't good at knowing who a majority of their neighbors are voting for. In many cases it's obvious (if over 70% of your neighbors support one candidate or the other, you'll probably know). But if it's 55-45, you probably don't know which direction it's 55-45 in.

My guess is that there's some postprocessing here. E.g. if you assume that the "neighbor" estimate is wrong but without the refusal problem, and you have the same data from the previous election, then you could estimate the shift of opinions and apply that to other pools that ask about your vote. Or you could ask some additional question like "who did your neighbours vote for in the previous election" and compare that to the real data (ideally per county or so). I would be very surprised if they based the bets just on the raw results.

unexpectedvalues on Seven lessons I didn't learn from election day

If you ask people who their neighbors are voting for, they will make their best guess about who their neighbors are voting for. Occasionally their best guess will be to assume that their neighbors will vote the same way that they're voting, but usually not. Trump voters in blue areas will mostly answer "Harris" to this question, and Harris voters in red areas will mostly answer "Trump".

clone-of-saturn on Seven lessons I didn't learn from election day

The second reason that I don’t trust the neighbor method is that people just… aren’t good at knowing who a majority of their neighbors are voting for.

This seems like a point in favor of the neighbor method, not against it. You would want people to find "who are my neighbors voting for?" too difficult to readily answer and so mentally replace it with the simpler question "who am I voting for?" thus giving them a plausibly deniable way to admit to voting for Trump.

jimmy on [Intuitive self-models] 4. Trance

I'm the person JenniferRM mentioned. I'm also a physics guy, and got into studying/practicing hypnosis in ~2010/2011. I kinda moved on from "hypnosis" and drifted up the abstraction ladder, but still working on similar things and working on tying them together.

Anyway, here are my thoughts.

Suppose I really want her to be spinning clockwise in my mind. What might I do?

What worked for me is to focus on the foot alone and ignore the broader context so that I had a "clean slate" without "confirmatory experience" blocking my desired conclusion. When looking at the foot alone I experience it as oscillating rather than rotating (which I guess it technically is), and from there I can "release" it into whichever spin I intend by just kinda imagining that this is what's going on.

On the one hand, shifting intuitive models is surprisingly hard! You can’t necessarily just want to have a particular intuitive model, and voluntarily make that happen.

I actually disagree with this. It certainly seems hard, but the difficulty is largely illusory and pretty much disappears once you stop trying to walk through the wall and notice the front door.

The problem is that "wanting to have a particular model" isn't the thing that matters. You can want to have a particular model all you want, and you can even think the model is true all you want, but you're still talking about the statement itself not about the reality to which the statement refers. Even if you convince someone that their fear is irrational and they'd be better off not being scared, you've still only convinced them that their fear is irrational and they'd be better off not being scared. If you want to convince them that they are safe -- and therefore change their fear response itself -- then you need to convince them that they're safe. It's the difference between looking at yourself from the third person and judging whether your beliefs are correct or not, vs looking at the world from the first person and seeing what is there. If you want to change the third person perspective, then you can look at which models are desirable and why. If you want to change the first person models themselves, you have to look to the world and see what's there.

This doesn't really work with the spinning dancer because "Which way is the dancer spinning?" doesn't have an answer, but this is an artificial issue which doesn't exist in the real world. You still have to figure out "Is this safe enough to be worth doing?" and that's not always trivial, but the problem of "How do I change this irrational fear?" (for example) is. The answer is "By attending to the question of whether it is actually safe".

I don't deny that there's "skill" to it, but most of the skill IME is a meta skill of knowing what to even aim for rather than aiming well. Once you start attending to "Is it safe enough?", then when the answer is actually obvious the intuitive models just change. I can give a whole bunch of examples of this if you want, where people were stuck unable to change their responses and the problem just melts away with this redirection. Even stuff that you'd think would be resistant to change like physical pain can change essentially instantly. I've had it take as little as a single word.

Again we see that the subject is made to feel that his body is out of control, and becomes subject to a high-status person. Some hypnotists sit you down, ask you to stare upwards into their eyes and suggest that your eyelids are wanting to close—which works because looking upwards is tiring, and because staring up into a high-status person’s eyes makes you feel inferior.

This isn't exactly wrong, but I want to push back on the implication that this is the central or most important thing here.

The central thing, IMO, is a willingness to try on another person's worldview even though it clashes with your own. It doesn't require "inferiority"/"high status"/"control" except in the extremely minimal sense that they might know something important that you don't, and that seeing it for yourself might change your behavior. That alone will get you inhibition of all the normal stuff and an automatic (albeit tentative) acceptance of worldview-dissonant perspectives (e.g. name amnesia). It helps if the person has reason to respect and trust you which is kinda like "high status", but not really because it can just as easily happen with people on equal social standing in neutral contexts.

Similarly, hypnosis has very little to do with sleep and eye fatigue/closure is not the important part of eye contact. The important part of eye contact is that it's incredibly communicative. You can convey with eye contact things which you can't convey with words. "I see you". "Seeing you doesn't cause conflict in me". "I see you seeing me see you" and so on, to name a few. All the things you need to communicate to show someone that your perspective is safe and worthy of experiencing are best communicated with the eyes. And perhaps equally important it is a bid for attention, by holding your own.

So far, this isn’t a trance; I’m just describing a common social dynamic. Specifically, if I’m not in a hypnotic trance, the sequence of thoughts in the above might look like a three-step process:
[...]
i.e., in my intuitive model, first, the hypnotist exercises his free will with the intention of me standing; second, I (my homunculus) exercise my own free will with the intention of standing; and third, I actually stand. In this conceptualization, it’s my own free will / vitalistic force / wanting (§3.3.4) that causes me to stand. So this is not a trance.

It's important to note that while this self reflective narrative is indeed different in the way you describe, the underlying truth often is not. In the hypnosis literature this is known as "cold control theory", because it's the same control without the usual Higher Order Thoughts (HOT).

In "common social dynamics" we explain it as "I chose to", but what is actually happening a lot of the time is the speaker is exercising their free will through your body, and you're not objecting because it matches your narrative. The steps aren't actually in series, and you didn't choose to do it so much as you chose to not decline to do it.

These "higher order thoughts" do change some things, but turn out to be relatively unimportant and the better hypnotists usually don't bother too much with them and instead just address the object level. This is also why you get hypnotists writing books subtitled "there's no such thing as hypnosis" and stuff like that.

The short version is: If I have a tune in my head, then I’m very unlikely to simultaneously recall a memory of a different tune. Likewise, if I’m angry right now, then I’m less likely to recall past memories where I felt happy and forgiving, and vice-versa.

As far as I can tell, there are several different things going on with amnesia. I agree that this is one of them, and I'm not sure if I've seen anyone else notice this, so it's cool to see someone point it out.

The "null hypothesis", though, any time it comes to hypnosis is that it's all just response to suggestion. You "know" that being hypnotized involves amnesia, and you believe you're hypnotized, so you experience what you expect. There's an academic hypnosis researcher I talk to sometimes who doesn't even believe "hypnotic trance" is real in any fundamental sense and thinks that all the signs of trance are the result of suggestion.

I don't believe suggestion is all that's going on, but it really is sufficient for amnesia. The answer to Yudkowsky's old question of "Do we believe everything we're told?" [LW · GW] is indeed "Yes" -- if we don't preemptively push it away or actively remember to unbelieve later. Back when I was working this stuff out I did a fun experiment where I'd come up with an excuse to get people to not pre-emptively reject what I was about to say, then I'd suggest amnesia for this conversation and that they'd laugh when I scratch my nose, and then I'd distract them so that the suggestion could take effect before they had a chance to unbelieve it. The excuse was something like "I know this is ridiculous so I don't expect you to believe it, but hear me out and let me know if you understand" -- which is tricky because they think the fact that we "agreed" that they won't believe it means they actually aren't believing it when they say "I understand", even though the full statement is "I understand [that I will laugh when you scratch your nose and have no idea why"]. They still had awareness that this belief is wrong and would therefore act to stop themselves from acting on it, which is why the unexpected distraction was necessary in order to get their mind off of it long enough for it to work.

unexpectedvalues on Seven lessons I didn't learn from election day

Ah, I think I see. Would it be fair to rephrase your question as: if we "re-rolled the dice" a week before the election, how likely was Trump to win?

My answer is probably between 90% and 95%. Basically the way Trump loses is to lose some of his supporters or have way more late deciders decide on Harris. That probably happens if Trump says something egregiously stupid or offensive (on the level of the Access Hollywood tape), or if some really bad news story about him comes out, but not otherwise.