LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Understanding Hidden Computations in Chain-of-Thought Reasoning
rokosbasilisk · 2024-08-24T16:35:03.907Z · comments (1)

The Existential Dread of Being a Powerful AI System
testingthewaters · 2024-09-26T10:56:32.904Z · comments (1)

Avoiding jailbreaks by discouraging their representation in activation space
Guido Bergman · 2024-09-27T17:49:20.785Z · comments (2)

Introducing Kairos: a new AI safety fieldbuilding organization (the new home for SPAR and FSP)
agucova · 2024-10-25T21:59:08.782Z · comments (0)

'Chat with impactful research & evaluations' (Unjournal NotebookLMs)
david reinstein (david-reinstein) · 2024-09-28T00:32:16.845Z · comments (0)

[question] how to truly feel my beliefs?
KvmanThinking (avery-liu) · 2024-11-11T00:04:30.994Z · answers+comments (6)

[link] SCP Foundation - Anti memetic Division Hub
landscape_kiwi · 2024-09-15T13:40:52.691Z · comments (1)

[question] Is School of Thought related to the Rationality Community?
Shoshannah Tekofsky (DarkSym) · 2024-10-15T12:41:33.224Z · answers+comments (6)

Inquisitive vs. adversarial rationality
gb (ghb) · 2024-09-18T13:50:09.198Z · comments (9)

Food, Prison & Exotic Animals: Sparse Autoencoders Detect 6.5x Performing Youtube Thumbnails
Louka Ewington-Pitsos (louka-ewington-pitsos) · 2024-09-17T03:52:43.269Z · comments (2)

[link] AI Safety Newsletter #43: White House Issues First National Security Memo on AI Plus, AI and Job Displacement, and AI Takes Over the Nobels
Corin Katzke (corin-katzke) · 2024-10-28T16:03:39.258Z · comments (0)

Increasing the Span of the Set of Ideas
Jeffrey Heninger (jeffrey-heninger) · 2024-09-13T15:52:39.132Z · comments (1)

Budapest Hungary - ACX Meetups Everywhere Fall 2024
Timothy Underwood (timothy-underwood-1) · 2024-08-29T18:37:41.313Z · comments (0)

The grass is always greener in the environment that shaped your values
Karl Faulks (karl-faulks) · 2024-11-17T18:00:15.852Z · comments (0)

Thirty random thoughts about AI alignment
Lysandre Terrisse · 2024-09-15T16:24:10.572Z · comments (1)

Against Job Boards: Human Capital and the Legibility Trap
vaishnav92 · 2024-10-24T20:50:50.266Z · comments (1)

Thoughts on Evo-Bio Math and Mesa-Optimization: Maybe We Need To Think Harder About "Relative" Fitness?
Lorec · 2024-09-28T14:07:42.412Z · comments (6)

[link] Contra Yudkowsky on 2-4-6 Game Difficulty Explanations
Josh Hickman (josh-hickman) · 2024-09-08T16:13:33.187Z · comments (1)

[question] Can subjunctive dependence emerge from a simplicity prior?
Daniel C (harper-owen) · 2024-09-16T12:39:35.543Z · answers+comments (0)

[question] Why is Gemini telling the user to die?
Burny · 2024-11-18T01:44:12.583Z · answers+comments (0)

Small improvement to Wikipedia page on Pareto Efficiency
ektimo · 2024-11-18T02:13:49.151Z · comments (0)

[link] [Linkpost] Interpretable Analysis of Features Found in Open-source Sparse Autoencoder (partial replication)
Fernando Avalos (fernando-avalos) · 2024-09-09T03:33:53.548Z · comments (1)

Retrieval Augmented Genesis
João Ribeiro Medeiros (joao-ribeiro-medeiros) · 2024-10-01T20:18:01.836Z · comments (0)

Forever Leaders
Justice Howard (justice-howard) · 2024-09-14T20:55:39.095Z · comments (9)

Halifax Canada - ACX Meetups Everywhere Fall 2024
interstice · 2024-08-29T18:39:12.490Z · comments (0)

[question] why won't this alignment plan work?
KvmanThinking (avery-liu) · 2024-10-10T15:44:59.450Z · answers+comments (7)

Another UFO Bet
codyz · 2024-11-01T01:55:27.301Z · comments (9)

GPT4o is still sensitive to user-induced bias when writing code
Reed (ThomasReed) · 2024-09-22T21:04:54.717Z · comments (0)

[question] How to cite LessWrong as an academic source?
PhilosophicalSoul (LiamLaw) · 2024-11-06T08:28:26.309Z · answers+comments (6)

[link] Metaculus's 'Minitaculus' Experiments — Collaborate With Us
ChristianWilliams · 2024-08-26T20:44:32.125Z · comments (0)

[link] Could Things Be Very Different?—How Historical Inertia Might Blind Us To Optimal Solutions
James Stephen Brown (james-brown) · 2024-09-11T09:53:07.474Z · comments (0)

[link] Should we abstain from voting? (In nondeterministic elections)
B Jacobs (Bob Jacobs) · 2024-10-02T10:07:43.167Z · comments (6)

[question] Artificial V/S Organoid Intelligence
10xyz (10xyz-coder) · 2024-10-23T14:31:46.385Z · answers+comments (0)

[link] Formalize the Hashiness Model of AGI Uncontainability
Remmelt (remmelt-ellen) · 2024-11-09T16:10:05.032Z · comments (0)

Longevity and the Mind
George3d6 · 2024-09-16T09:43:09.700Z · comments (2)

[question] If the DoJ goes through with the Google breakup,where does Deepmind end up?
O O (o-o) · 2024-10-12T05:06:50.996Z · answers+comments (1)

[link] Linkpost: Hypocrisy standoff
Chris_Leong · 2024-09-29T14:27:19.175Z · comments (1)

[link] An "Observatory" For a Shy Super AI?
Sherrinford · 2024-09-27T21:22:40.296Z · comments (0)

Using LLM's for AI Foundation research and the Simple Solution assumption
Donald Hobson (donald-hobson) · 2024-09-24T11:00:53.658Z · comments (0)

[link] Join the $10K AutoHack 2024 Tournament
Paul Bricman (paulbricman) · 2024-09-25T11:54:20.112Z · comments (0)

Reinforcement Learning from Information Bazaar Feedback, and other uses of information markets
Abhimanyu Pallavi Sudhir (abhimanyu-pallavi-sudhir) · 2024-09-16T01:04:32.953Z · comments (1)

[question] What (if anything) made your p(doom) go down in 2024?
Satron · 2024-11-16T16:46:43.865Z · answers+comments (1)

Using Narrative Prompting to Extract Policy Forecasts from LLMs
Max Ghenis (MaxGhenis) · 2024-11-05T04:37:52.004Z · comments (0)

[question] How do we know dreams aren't real?
Logan Zoellner (logan-zoellner) · 2024-08-22T12:41:57.380Z · answers+comments (31)

[link] Is P(Doom) Meaningful? Bayesian vs. Popperian Epistemology Debate
Liron · 2024-11-09T23:39:30.039Z · comments (0)

Some reasons to start a project to stop harmful AI
Remmelt (remmelt-ellen) · 2024-08-22T16:23:34.132Z · comments (0)

[link] Universal basic income isn’t always AGI-proof
Kevin Kohler (KevinKohler) · 2024-09-05T15:39:18.389Z · comments (3)

Scattered thoughts on what it means for an LLM to believe
TheManxLoiner · 2024-11-06T22:10:29.429Z · comments (4)

[question] AMA: International School Student in China
Novice · 2024-10-01T06:00:16.282Z · answers+comments (0)

Educational CAI: Aligning a Language Model with Pedagogical Theories
Bharath Puranam (bharath-puranam) · 2024-11-01T18:55:26.993Z · comments (1)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

gwern on Announcing turntrout.com, my new digital home

Possibly it will still be counterintuitive to many folks, as Said quoted in a sibling comment.

No, this is a little different. Your approach here sounds like ours and the intuitive one (just at the cost of additional complexity).

The 'auto dark mode' we abandoned is where you just use step #2 there and you skip #1 (and thus, any widget or toggle which enables a reader to do anything with localStorage), and 'auto is the only state'. The logic there is, the reader already has access to a widget or toggle to set their dark mode preference: it's just their OS/browser, which will have some config page somewhere with various settings like 'turn on dark mode at night' or 'always use dark mode' or 'always use light mode'. Just trust the OS/browser and use whatever setting it sends to the web page. Don't waste the effort and screen real estate to add in a redundant widget/toggle. It's handled already. Easier for everyone - it Just Works™!

Unfortunately, the connection between 'a year ago when I updated my Android phone and it asked me if I wanted to use the cool new dark mode, I said yes' and 'this webpage I am reading now is in dark mode for some reason and I can't change it back to normal???', apparently winds up eluding some readers. (This is what Said's sibling comment [LW(p) · GW(p)] is about.) It winds up being "too much magic".

The current toggle+localStorage+auto approach, on the other hand, while adding to the clutter, does not seem to confuse readers. At least, I have seen many fewer (or no) complaints about the dark mode being on when it shouldn't be after we moved to the toggle. So as far as we can tell, it's working.

david-james on Neutrality

The frenzy to couple everything into a single tangle of complexity is driven by…

In some cases, yes, but this is only one factor of many. Others include:

Our brains are often drawn to narratives, which are complex and interwoven. Hence the tendency to bundle up complex logical interdependencies into a narrative.
Our social structures are guided/constrained by our physical nature and technology. For in-person gatherings, ideological complexity (which I will call bundling) is often a dominant strategy. For example, imagine a highly unusual congregation: a large unified gathering of Monotheist worshippers with considerable internal diversity. Rather than “one track” consisting of shared ideology, they subdivide their readings and rituals into subgroups according to various sub-doctrinal positions? Why don’t we see much of this (if any) in the real world? My point here is there seems to be an advantage to bundling.

askwho on Monthly Roundup #24: November 2024

Reminder that I am producing a multi-voiced, ElevenLabs MV2 quality podcast feed for Don't Worry About the Vase for those that find that a valuable thing:
https://open.substack.com/pub/dwatvpodcast/p/monthly-roundup-24-november-2024?r=67y1h&utm_campaign=post&utm_medium=web&showWelcomeOnShare=true

All uniquely quoted people get a distinct voice for easy differentiation, and I personally find the main "Zvi Voice" really good for long listening.

david-james on Neutrality

Right. Some such agreements are often called social contracts. One catch is that a person born into them may not understand their historical origin or practical utility, much less agree with them.

leon-lang on Leon Lang's Shortform

Why I think scaling laws will continue to drive progress

Epistemic status: This is a thought I had since a while. I never discussed it with anyone in detail; a brief conversation could convince me otherwise.

According to recent reports there seem to be some barriers to continued scaling. We don't know what exactly is going on, but it seems like scaling up base models doesn't bring as much new capability as people hope.

However, I think probably they're still in some way scaling the wrong thing: The model learns to predict a static dataset on the internet; however, what it needs to do later is to interact with users and the world. For performing well in such a task, the model needs to understand the consequences of its actions, which means modeling interventional distributions P(X | do(A)) instead of static data P(X | Y). This is related to causal confusion as an argument against the scaling hypothesis [LW · GW].

This viewpoint suggests that if big labs figure out how to predict observations in an online-way by ongoing interactions of the models with users / the world, then this should drive further progress. It's possible that labs are already doing this, but I'm not aware of it, and so I guess they haven't yet fully figured out how to do that.

What triggered me writing this is that there is a new paper on scaling law for world modeling that's about exactly what I'm talking about here.

sil-ver on [Intuitive self-models] 8. Rooting Out Free Will Intuitions

(You did respond to all the important parts, rest of my comment is very much optional.)

I’m sure you’re aware that people feel like they have a broader continuous awareness of their visual field than they actully do. There are lots of demonstrations of this—e.g. change blindness, selective attention test, the fact that peripheral vision has terrible resolution and terrible color perception and makes faces look creepy. There’s a refrigerator light illusion thing—if X is in my peripheral vision, then maybe it’s currently active as just a little pointer in a tiny sub-area of my cortex, but as soon as I turn my attention to X it immediately unfolds in full detail across the global workspace.

Yes -- and my point was that appealing to these phenomena is the kind of thing you will probably have to do to explain the meta problem of seeing. Which raises all kinds of issues -- for example, change blindness by itself doesn't logically prove anything, since it's possible not to notice that something changed even if it was represented. Only the reverse conclusion is valid -- if a subject can tell that X changed, then X was in awareness, but if they can't tell, X may or may not have been in awareness. So teasing out exactly how much information is really present in awareness, given the positive and negative evidence, is a pretty big rabbit hole. (Poor resolution in peripheral vision does prove absence of information, but as with the memory example I've complained about in post #2 [LW(p) · GW(p)], this is an example of something people don't endorse under reflection anyway, so it doesn't get you very far. Like, there is a very, very big difference between arguing that peripheral resolution is poor, which people will agree with as soon as they actually pay attention to their peripheral vision for the first time, and arguing that the continuous visual image they think they see is not really there, which most people will stubbornly disagree with regardless of how much attention they pay to it.)

Anyway, that's the only claim I was making -- I was only trying to go as far as "this is why I think the problem is nontrivial and you haven't solved it yet", not "and that's why you can't solve it".

The contents of IT are really truly different from the contents of LIP [I didn’t check where the visual information gets to the cortex in blindsight, I’m just guessing LIP for concreteness]. Querying IT is a different operation than querying LIP. IT holds different types of information than LIP does, and does different things with that information, including leading to different visceral reactions, motivations, semantic knowledge, etc., all of which correspond to neuroscientific differences in how IT versus LIP is wired up.

All these differences between IT vs LIP are in the territory, not the map. So I definitely agree that “the distinction [between seeing and vague-sense-of-presence] isn’t just that we happen to call them by different labels”. They’re different like how the concept “hand” is different from the concept “foot”—a distinction on the map downstream of a distinction in the territory.

Right, and I agree that this makes it apriori plausible that they could account for the differences in how people talk about, e.g., vivid seeing vs. intangible intuitions. But it doesn't prove that they do, it only shows that this is the kind of explanation that, on first glance, looks like it could work. To actually solve the meta problem, you still have to do the work of explaining all the properties of introspective reports, which requires going into a lot of detail.

As of above, this is the only claim I was making -- I'm not saying any of these issues are provably impossible with your approach, I'm only saying that your approach hasn't provided a full solution yet. (And that I genuinely think most of the difficulty happens to be in these still unaddressed details; this was the point of the carrot/plant analogy.)

I think that’s compatible with my models, because those meditators still have a cortex, in which patterns of neurons can be firing or not firing at any particular time. And that’s the core aspect of the “territory” which corresponds to “conscious awareness” in the “map”. No amount of meditation, drugs, etc., can change that.

Fair enough, but I think it does show that free will isn't that central of a piece.

dr_s on Making a conservative case for alignment

Pretty much. It's not "naive" if it's literally the only option that actually does not harm everyone involved, unless of course we want to call every world leader and self-appointed foreign policy expert a blithering idiot with tunnel vision (I make no such claim a priori; ball's in their court).

It's important to not oversimplify things. It's also important to not overcomplicate them. Domain experts tend to be resistant to the first kind of mental disease, but tragically prone to the second. Sometimes it really is Just That Simple, and everything else is commentary and superfluous detail.

david-james on Neutrality

Durable institutions find ways to survive. I don’t mean survival merely in terms of legal continuity; I mean fidelity to their founding charter. Institutions not only have to survive past their first leader; they have to survive their first leader themself! The institution’s structure and policies must protect against the leader’s meandering attention, whims, and potential corruptions. In the case of Elon, based on his mercurial history, I would not bet that Musk would agree to the requisite policies.

viliam on What are Emotions?

Emotions are about reality, but emotions are also a part of reality, so we also have emotions about emotions. I can feel happy about some good thing happening in the outside world. And, separately, I can feel happy about being happy.

In the thought experiments about wireheading, people often say that they don't just want to experience (possibly fake) happy thoughts about X; they also want X to actually happen.

But let's imagine the converse: what if someone proposed a surgery that would make you unable to ever feel happy about X, even if you knew that X actually happened in the world. People would probably refuse that, too. Intuitively, we want to feel good emotions that we "deserve", plus there is also the factor of motivation. Okay, so let's imagine a surgery that removes your ability to feel happy about X, but solves the problem of motivation by e.g. giving you an urge to do X. People would probably refuse that, too.

So I think we actually want both the emotions and the things the emotions are about.

mr-hire on Matt Goldenberg's Short Form Feed

A lot of people are looking at the implications of o1's training process as a future scaling paradigm, but it seems to me that this implementation of applying inference time compute to just in time fine tune the model for hard questions is equally promising and may have equally impressive results if it scales with compute, and has equal potential in terms of low hanging fruit to be picked to improve it.

Don't sleep on test time training as a potential future scaling paradigm.