LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

How I'd like alignment to get done (as of 2024-10-18)
TristanTrim · 2024-10-18T23:39:03.107Z · comments (2)

A Poem Is All You Need: Jailbreaking ChatGPT, Meta & More
Sharat Jacob Jacob (sharat-jacob-jacob) · 2024-10-29T12:41:30.337Z · comments (0)

Can startups be impactful in AI safety?
Esben Kran (esben-kran) · 2024-09-13T19:00:33.306Z · comments (0)

Goal: Understand Intelligence
Johannes C. Mayer (johannes-c-mayer) · 2024-11-03T21:20:02.900Z · comments (19)

[link] AI Prejudices: Practical Implications
PeterMcCluskey · 2024-10-19T02:19:58.695Z · comments (0)

The current state of RSPs
Zach Stein-Perlman · 2024-11-04T16:00:42.630Z · comments (0)

Amoeba roles in tech
Sindhu Shivaprasad (sindhu-shivaprasad) · 2024-10-04T17:25:46.568Z · comments (0)

Editing at the Take Level
jefftk (jkaufman) · 2024-09-24T11:30:04.914Z · comments (1)

ML4Good (AI Safety Bootcamp) - Experience report
JanEbbing · 2024-11-05T01:18:43.554Z · comments (0)

Updating the NAO Simulator
jefftk (jkaufman) · 2024-10-30T13:50:06.908Z · comments (0)

Spooky Recommendation System Scaling
phdead · 2024-10-31T22:00:51.728Z · comments (0)

Motte-and-Bailey: a Short Explanation
Lorec · 2024-10-23T22:29:55.074Z · comments (0)

[link] Comparing Forecasting Track Records for AI Benchmarking and Beyond
ChristianWilliams · 2024-09-25T21:01:15.975Z · comments (0)

[link] A primer on ML in antibody engineering
Abhishaike Mahajan (abhishaike-mahajan) · 2024-09-23T17:03:07.628Z · comments (0)

Beyond Defensive Technology
ejk64 · 2024-10-14T11:34:24.595Z · comments (1)

On epistemic autonomy
sanyer (santeri-koivula) · 2024-08-31T18:50:43.377Z · comments (0)

Self location for LLMs by LLMs: Self-Assessment Checklist.
weightt an (weightt-an) · 2024-09-26T19:57:31.707Z · comments (0)

Switching to a Yamaha P-121 Keyboard
jefftk (jkaufman) · 2024-10-02T02:20:02.284Z · comments (0)

[link] [Linkpost] Building Altruistic and Moral AI Agent with Brain-inspired Affective Empathy Mechanisms
Gunnar_Zarncke · 2024-11-04T10:15:35.550Z · comments (0)

Conversational Signposts—An Antidote to Dull Social Interactions
Declan Molony (declan-molony) · 2024-10-22T05:37:56.175Z · comments (6)

[link] Intention-to-Treat (Re: How harmful is music, really?)
kqr · 2024-09-18T18:44:41.128Z · comments (0)

Substituting Talkbox for Breath Controller
jefftk (jkaufman) · 2024-10-27T19:10:03.768Z · comments (0)

[link] Anthropic - The case for targeted regulation
anaguma · 2024-11-05T07:07:48.174Z · comments (0)

[question] Has Anyone Here Consciously Changed Their Passions?
Spade · 2024-09-09T01:36:26.197Z · answers+comments (12)

[link] AI Safety Newsletter #40: California AI Legislation Plus, NVIDIA Delays Chip Production, and Do AI Safety Benchmarks Actually Measure Safety?
Corin Katzke (corin-katzke) · 2024-08-21T18:09:33.284Z · comments (0)

[link] The Computational Complexity of Circuit Discovery for Inner Interpretability
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-10-17T13:18:46.378Z · comments (2)

[link] OpenAI’s cybersecurity is probably regulated by NIS Regulations
Adam Jones (domdomegg) · 2024-10-25T11:06:38.392Z · comments (2)

Palisade is hiring: Exec Assistant, Content Lead, Ops Lead, and Policy Lead
Charlie Rogers-Smith (charlie.rs) · 2024-10-09T00:04:03.837Z · comments (0)

[link] AISafety.info: What are Inductive Biases?
Algon · 2024-09-19T17:26:24.581Z · comments (4)

What is Randomness?
martinkunev · 2024-09-27T17:49:42.704Z · comments (2)

Switching to a 4GB SD
jefftk (jkaufman) · 2024-09-23T11:20:05.432Z · comments (1)

[link] How harmful is music, really?
dkl9 · 2024-09-17T14:53:25.426Z · comments (6)

[question] How Should We Use Limited Time to Maximize Long-Term Impact?
queelius · 2024-10-12T20:02:46.801Z · answers+comments (3)

[question] LW resources on childhood experiences?
nahir91595 · 2024-10-14T17:04:07.810Z · answers+comments (7)

Festival Stats 2024
jefftk (jkaufman) · 2024-11-12T02:00:04.831Z · comments (0)

Crafting Polysemantic Transformer Benchmarks with Known Circuits
Evan Anders (evan-anders) · 2024-08-23T22:03:15.288Z · comments (0)

A Policy Proposal
phdead · 2024-09-29T20:45:34.745Z · comments (4)

Keyboard Gremlins
jefftk (jkaufman) · 2024-09-20T02:30:07.140Z · comments (0)

Just How Good Are Modern Chess Computers?
nem · 2024-09-19T18:57:21.254Z · comments (1)

On agentic generalist models: we're essentially using existing technology the weakest and worst way you can use it
Yuli_Ban · 2024-08-28T01:57:17.387Z · comments (2)

[question] Does life actually locally *increase* entropy?
tailcalled · 2024-09-16T20:30:33.148Z · answers+comments (27)

[link] Book Review: Replacing Guilt - On Having Something to Fight For
Cole Killian (cole-killian) · 2024-11-03T19:47:35.093Z · comments (0)

[question] Where should I look for information on gut health?
FinalFormal2 · 2024-08-20T19:44:30.632Z · answers+comments (10)

[link] When to join a respectability cascade
B Jacobs (Bob Jacobs) · 2024-09-24T07:54:16.051Z · comments (1)

[question] I want a good multi-LLM API-powered chatbot
rotatingpaguro · 2024-09-08T09:40:52.736Z · answers+comments (3)

Making a Pedalboard
jefftk (jkaufman) · 2024-10-25T00:10:09.149Z · comments (0)

[Job Ad] MATS is hiring!
Jana (jana) · 2024-10-09T02:17:04.651Z · comments (0)

[question] What's a good book for a technically-minded 11-year old?
Martin Sustrik (sustrik) · 2024-10-19T06:05:12.178Z · answers+comments (32)

Request for advice: Research for Conversational Game Theory for LLMs
Rome Viharo (rome-viharo) · 2024-10-16T17:53:30.243Z · comments (0)

Derivative AT a discontinuity
Alok Singh (OldManNick) · 2024-10-24T02:48:24.573Z · comments (5)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

quetzal_rainbow on D0TheMath's Shortform

You are making an error here: ZFC + not Consistent(ZFC) != ZFC.

Assuming ZFC + not Consistent(ZFC) we can prove Consistent(ZFC), because inconsistent systems can prove everything and ZFC + not Consistent(ZFC) + Consistent(ZFC) is, in fact, inconsistent. But it doesn't say anything about consistency of ZFC itself, because you can freely assume any sufficiently powerful system instead of ZFC. If you assume inconsistent system, then system + not Consistent(system) is still inconsistent, if you assume consistent system, then system + not Consistent(system) is inconsistent for reasoning above, so it can't prove whether assumed system is consistent or not.

satron on Buck's Shortform

I am asking this, because I am planning on entering the field, but this is a pretty niche topic that I haven't yet researched.

startattheend on Anvil Problems

I meant that they were functionally booleans, as a single condition is fulfilled "is rich", "has anvil", "AGI achieved". In the anvil example, any number past 1 corresponds to true. In programming, casting positive integers to booleans results in "true" for all positive numbers, and "false" in the case of zero, just like in the anvil example. The intuition carries over too well for me to ignore.

The first example which came to mind for me when reading the post was confidence, which is often treated as a boolean "Does he have confidence? yes/no". So you don't need any countable objects, only a condition/threshold which is either reached or not, with anything past "yes" still being "yes".

A function where everything past a threshold maps to true, and anything before it maps to false, is similar to the anvil example, and to a function like "is positive" (since a more positive number is still positive). But for the threshold to be exactly 1 unit, you need to choose a unit which is large enough. 1$ is not rich, and having one water droplet on you is not "wet", but with the appropriate unit (exactly the size of the threshold/condition) these should be functionally similar.

I'm hoping there is simple and intuitive mathematics for generalizing this class of problems. And now that I think about it, most of these things (the ones which can be used for making more of themselves) are catalysts (something used but not consumed in the process of making something). Using money to make more money, anvils to make more anvils, breeding more of a species before it goes extinct.

kajus on Complex Systems for AI Safety [Pragmatic AI Safety #3]

Great post!

In the past, broad interventions would clearly have been more effective: for instance, there would have been little use in studying empirical alignment prior to deep learning. Even more recently than the advent of deep learning, many approaches to empirical alignment were highly deemphasized when large, pretrained language models arrived on the scene (refer to our discussion of creative destruction in the last post).

As discussed in the last post, a leading motivation for researchers is the interestingness or “coolness” of a problem. Getting more people to research relevant problems is highly dependent on finding interesting and well-defined subproblems for them to work on. This relies on concretizing problems and providing funding for solving them.

This seems be a conflicting advice to me. If you try to follow both you might end up having hard time finding direction for research.

daemonicsigil on The Foraging (Ex-)Bandit [Ruleset & Reflections]

Thanks for making the game! I also played it, just didn't leave a comment on the original post. Scored 2751. I played each location for an entire day after building an initial food stockpile, and so figured out the timing of Tiger Forest and Dog Valley. But I also did some fairly dumb stuff, like assuming a time dependence for other biomes. And I underestimated Horse Hills, since when I foraged it for a full day, I got unlucky and only rolled a single large number. For what it's worth, I find these applet things more accessible than a full-on D&D.Sci (though those are also great), which I often end up not playing because it feels too much like work. With applets you can play on medium-low effort (which I did) and make lots of mistakes (which I did) and learn Valuable Lessons about How Not To Science (which one might hope I did).

jan-betley on Seven lessons I didn't learn from election day

The second reason that I don't trust the neighbor method is that people just... aren't good at knowing who a majority of their neighbors are voting for. In many cases it's obvious (if over 70% of your neighbors support one candidate or the other, you'll probably know). But if it's 55-45, you probably don't know which direction it's 55-45 in.

My guess is that there's some postprocessing here. E.g. if you assume that the "neighbor" estimate is wrong but without the refusal problem, and you have the same data from the previous election, then you could estimate the shift of opinions and apply that to other pools that ask about your vote. Or you could ask some additional question like "who did your neighbours vote for in the previous election" and compare that to the real data (ideally per county or so). I would be very surprised if they based the bets just on the raw results.

unexpectedvalues on Seven lessons I didn't learn from election day

If you ask people who their neighbors are voting for, they will make their best guess about who their neighbors are voting for. Occasionally their best guess will be to assume that their neighbors will vote the same way that they're voting, but usually not. Trump voters in blue areas will mostly answer "Harris" to this question, and Harris voters in red areas will mostly answer "Trump".

clone-of-saturn on Seven lessons I didn't learn from election day

The second reason that I don’t trust the neighbor method is that people just… aren’t good at knowing who a majority of their neighbors are voting for.

This seems like a point in favor of the neighbor method, not against it. You would want people to find "who are my neighbors voting for?" too difficult to readily answer and so mentally replace it with the simpler question "who am I voting for?" thus giving them a plausibly deniable way to admit to voting for Trump.

jimmy on [Intuitive self-models] 4. Trance

I'm the person JenniferRM mentioned. I'm also a physics guy, and got into studying/practicing hypnosis in ~2010/2011. I kinda moved on from "hypnosis" and drifted up the abstraction ladder, but still working on similar things and working on tying them together.

Anyway, here are my thoughts.

Suppose I really want her to be spinning clockwise in my mind. What might I do?

What worked for me is to focus on the foot alone and ignore the broader context so that I had a "clean slate" without "confirmatory experience" blocking my desired conclusion. When looking at the foot alone I experience it as oscillating rather than rotating (which I guess it technically is), and from there I can "release" it into whichever spin I intend by just kinda imagining that this is what's going on.

On the one hand, shifting intuitive models is surprisingly hard! You can’t necessarily just want to have a particular intuitive model, and voluntarily make that happen.

I actually disagree with this. It certainly seems hard, but the difficulty is largely illusory and pretty much disappears once you stop trying to walk through the wall and notice the front door.

The problem is that "wanting to have a particular model" isn't the thing that matters. You can want to have a particular model all you want, and you can even think the model is true all you want, but you're still talking about the statement itself not about the reality to which the statement refers. Even if you convince someone that their fear is irrational and they'd be better off not being scared, you've still only convinced them that their fear is irrational and they'd be better off not being scared. If you want to convince them that they are safe -- and therefore change their fear response itself -- then you need to convince them that they're safe. It's the difference between looking at yourself from the third person and judging whether your beliefs are correct or not, vs looking at the world from the first person and seeing what is there. If you want to change the third person perspective, then you can look at which models are desirable and why. If you want to change the first person models themselves, you have to look to the world and see what's there.

This doesn't really work with the spinning dancer because "Which way is the dancer spinning?" doesn't have an answer, but this is an artificial issue which doesn't exist in the real world. You still have to figure out "Is this safe enough to be worth doing?" and that's not always trivial, but the problem of "How do I change this irrational fear?" (for example) is. The answer is "By attending to the question of whether it is actually safe".

I don't deny that there's "skill" to it, but most of the skill IME is a meta skill of knowing what to even aim for rather than aiming well. Once you start attending to "Is it safe enough?", then when the answer is actually obvious the intuitive models just change. I can give a whole bunch of examples of this if you want, where people were stuck unable to change their responses and the problem just melts away with this redirection. Even stuff that you'd think would be resistant to change like physical pain can change essentially instantly. I've had it take as little as a single word.

Again we see that the subject is made to feel that his body is out of control, and becomes subject to a high-status person. Some hypnotists sit you down, ask you to stare upwards into their eyes and suggest that your eyelids are wanting to close—which works because looking upwards is tiring, and because staring up into a high-status person’s eyes makes you feel inferior.

This isn't exactly wrong, but I want to push back on the implication that this is the central or most important thing here.

The central thing, IMO, is a willingness to try on another person's worldview even though it clashes with your own. It doesn't require "inferiority"/"high status"/"control" except in the extremely minimal sense that they might know something important that you don't, and that seeing it for yourself might change your behavior. That alone will get you inhibition of all the normal stuff and an automatic (albeit tentative) acceptance of worldview-dissonant perspectives (e.g. name amnesia). It helps if the person has reason to respect and trust you which is kinda like "high status", but not really because it can just as easily happen with people on equal social standing in neutral contexts.

Similarly, hypnosis has very little to do with sleep and eye fatigue/closure is not the important part of eye contact. The important part of eye contact is that it's incredibly communicative. You can convey with eye contact things which you can't convey with words. "I see you". "Seeing you doesn't cause conflict in me". "I see you seeing me see you" and so on, to name a few. All the things you need to communicate to show someone that your perspective is safe and worthy of experiencing are best communicated with the eyes. And perhaps equally important it is a bid for attention, by holding your own.

So far, this isn’t a trance; I’m just describing a common social dynamic. Specifically, if I’m not in a hypnotic trance, the sequence of thoughts in the above might look like a three-step process:
[...]
i.e., in my intuitive model, first, the hypnotist exercises his free will with the intention of me standing; second, I (my homunculus) exercise my own free will with the intention of standing; and third, I actually stand. In this conceptualization, it’s my own free will / vitalistic force / wanting (§3.3.4) that causes me to stand. So this is not a trance.

It's important to note that while this self reflective narrative is indeed different in the way you describe, the underlying truth often is not. In the hypnosis literature this is known as "cold control theory", because it's the same control without the usual Higher Order Thoughts (HOT).

In "common social dynamics" we explain it as "I chose to", but what is actually happening a lot of the time is the speaker is exercising their free will through your body, and you're not objecting because it matches your narrative. The steps aren't actually in series, and you didn't choose to do it so much as you chose to not decline to do it.

These "higher order thoughts" do change some things, but turn out to be relatively unimportant and the better hypnotists usually don't bother too much with them and instead just address the object level. This is also why you get hypnotists writing books subtitled "there's no such thing as hypnosis" and stuff like that.

The short version is: If I have a tune in my head, then I’m very unlikely to simultaneously recall a memory of a different tune. Likewise, if I’m angry right now, then I’m less likely to recall past memories where I felt happy and forgiving, and vice-versa.

As far as I can tell, there are several different things going on with amnesia. I agree that this is one of them, and I'm not sure if I've seen anyone else notice this, so it's cool to see someone point it out.

The "null hypothesis", though, any time it comes to hypnosis is that it's all just response to suggestion. You "know" that being hypnotized involves amnesia, and you believe you're hypnotized, so you experience what you expect. There's an academic hypnosis researcher I talk to sometimes who doesn't even believe "hypnotic trance" is real in any fundamental sense and thinks that all the signs of trance are the result of suggestion.

I don't believe suggestion is all that's going on, but it really is sufficient for amnesia. The answer to Yudkowsky's old question of "Do we believe everything we're told?" [LW · GW] is indeed "Yes" -- if we don't preemptively push it away or actively remember to unbelieve later. Back when I was working this stuff out I did a fun experiment where I'd come up with an excuse to get people to not pre-emptively reject what I was about to say, then I'd suggest amnesia for this conversation and that they'd laugh when I scratch my nose, and then I'd distract them so that the suggestion could take effect before they had a chance to unbelieve it. The excuse was something like "I know this is ridiculous so I don't expect you to believe it, but hear me out and let me know if you understand" -- which is tricky because they think the fact that we "agreed" that they won't believe it means they actually aren't believing it when they say "I understand", even though the full statement is "I understand [that I will laugh when you scratch your nose and have no idea why"]. They still had awareness that this belief is wrong and would therefore act to stop themselves from acting on it, which is why the unexpected distraction was necessary in order to get their mind off of it long enough for it to work.

unexpectedvalues on Seven lessons I didn't learn from election day

Ah, I think I see. Would it be fair to rephrase your question as: if we "re-rolled the dice" a week before the election, how likely was Trump to win?

My answer is probably between 90% and 95%. Basically the way Trump loses is to lose some of his supporters or have way more late deciders decide on Harris. That probably happens if Trump says something egregiously stupid or offensive (on the level of the Access Hollywood tape), or if some really bad news story about him comes out, but not otherwise.