LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[question] Using hex to get murder advice from GPT-4o
Laurence Freeman (laurence-freeman) · 2024-11-13T18:30:23.475Z · answers+comments (5)

[link] Testing Genetic Engineering Detection with Spike-Ins
jefftk (jkaufman) · 2024-10-22T17:20:54.947Z · comments (0)

Rationalist Gnosticism
tailcalled · 2024-10-10T09:06:34.149Z · comments (10)

[link] Species as Canonical Referents of Super-Organisms
Yudhister Kumar (randomwalks) · 2024-10-18T07:49:52.944Z · comments (8)

[question] Doing Nothing Utility Function
k64 · 2024-09-26T22:05:18.821Z · answers+comments (9)

The Other Existential Crisis
James Stephen Brown (james-brown) · 2024-09-21T01:16:38.011Z · comments (24)

[link] Physics of Language models (part 2.1)
Nathan Helm-Burger (nathan-helm-burger) · 2024-09-19T16:48:32.301Z · comments (2)

Prediction markets and Taxes
Edmund Nelson (edmund-nelson) · 2024-11-01T17:39:35.191Z · comments (7)

A Dialogue on Deceptive Alignment Risks
Rauno Arike (rauno-arike) · 2024-09-25T16:10:12.294Z · comments (0)

[question] A Different Perspective on Rationality - Would This Be Valuable?
Gabriel Brito (gabriel-brito) · 2024-10-26T18:47:46.416Z · answers+comments (4)

Keeping it (less than) real: Against ℶ₂ possible people or worlds
quiet_NaN · 2024-09-13T17:29:44.915Z · comments (0)

how to rapidly assimilate new information
dhruvmethi · 2024-10-24T02:18:00.648Z · comments (3)

[link] Apply to Aether - Independent LLM Agent Safety Research Group
RohanS · 2024-08-21T09:47:11.493Z · comments (0)

Will AI and Humanity Go to War?
Simon Goldstein (simon-goldstein) · 2024-10-01T06:35:22.374Z · comments (4)

Testing "True" Language Understanding in LLMs: A Simple Proposal
MtryaSam · 2024-11-02T19:12:34.710Z · comments (2)

[link] Anthropic teams up with Palantir and AWS to sell AI to defense customers
Matrice Jacobine · 2024-11-09T11:50:34.050Z · comments (0)

The Bayesian Conspiracy Live Recording
Eneasz · 2024-11-06T16:25:13.380Z · comments (0)

Open letter to young EAs
Leif Wenar · 2024-10-11T19:49:10.818Z · comments (10)

Derivative AT a discontinuity
Alok Singh (OldManNick) · 2024-10-24T02:48:24.573Z · comments (5)

Thinking About a Pedalboard
jefftk (jkaufman) · 2024-10-08T11:50:02.054Z · comments (2)

Contra Musician Gender II
jefftk (jkaufman) · 2024-11-13T03:30:09.510Z · comments (0)

[link] In Praise of the Beatitudes
robotelvis · 2024-09-24T05:08:21.133Z · comments (7)

[link] Virtue is a Vector
robotelvis · 2024-09-10T03:02:45.737Z · comments (1)

Force Sequential Output with SCP?
jefftk (jkaufman) · 2024-11-09T12:40:06.098Z · comments (4)

Fundamental Uncertainty: Epilogue
Gordon Seidoh Worley (gworley) · 2024-11-16T00:57:48.823Z · comments (0)

[question] Are UV-C Air purifiers so useful?
JohnBuridan · 2024-09-04T14:16:01.310Z · answers+comments (0)

Toy Models of Superposition: Simplified by Hand
Axel Sorensen (axel-sorensen) · 2024-09-29T21:19:52.475Z · comments (3)

[link] Why do firms choose to be inefficient?
Nicholas D. (nicholas-d) · 2024-08-28T18:39:41.664Z · comments (4)

AGI's Opposing Force
SimonBaars (simonbaars) · 2024-08-16T04:18:06.900Z · comments (2)

[question] Is this a Pivotal Weak Act? Creating bacteria that decompose metal
doomyeser · 2024-09-11T18:07:19.385Z · answers+comments (9)

Electric Mandola
jefftk (jkaufman) · 2024-09-21T13:40:04.772Z · comments (0)

Becket First
jefftk (jkaufman) · 2024-09-22T17:10:04.304Z · comments (0)

[question] What do you expect AI capabilities may look like in 2028?
nonzerosum · 2024-08-23T16:59:53.007Z · answers+comments (5)

[link] Markets Are Information - Beating the Sportsbooks at Their Own Game
JJXW · 2024-11-07T20:58:43.389Z · comments (1)

[link] Molecular dynamics data will be essential for the next generation of ML protein models
Abhishaike Mahajan (abhishaike-mahajan) · 2024-08-26T14:50:23.790Z · comments (0)

[link] Triangulating My Interpretation of Methods: Black Boxes by Marco J. Nathan
adamShimi · 2024-10-09T19:13:26.631Z · comments (0)

[question] Change My Mind: Thirders in "Sleeping Beauty" are Just Doing Epistemology Wrong
DragonGod · 2024-10-16T10:20:22.133Z · answers+comments (67)

Steering LLMs' Behavior with Concept Activation Vectors
Ruixuan Huang (sprout_ust) · 2024-09-28T09:53:19.658Z · comments (0)

[link] In-Context Learning: An Alignment Survey
alamerton · 2024-09-30T18:44:28.589Z · comments (0)

Meta AI (FAIR) latest paper integrates system-1 and system-2 thinking into reasoning models.
happy friday (happy-friday) · 2024-10-24T16:54:15.721Z · comments (0)

[question] Set Theory Multiverse vs Mathematical Truth - Philosophical Discussion
Wenitte Apiou (wenitte-apiou) · 2024-11-01T18:56:06.900Z · answers+comments (25)

[link] Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities
Jonathan N (derpyplops) · 2024-11-05T01:01:08.083Z · comments (0)

HDBSCAN is Surprisingly Effective at Finding Interpretable Clusters of the SAE Decoder Matrix
Jaehyuk Lim (jason-l) · 2024-10-11T23:06:14.340Z · comments (2)

On Intentionality, or: Towards a More Inclusive Concept of Lying
Cornelius Dybdahl (Kalciphoz) · 2024-10-18T10:37:32.201Z · comments (0)

[link] Jailbreaking language models with user roleplay
loops (smitop) · 2024-09-28T23:43:10.870Z · comments (0)

LLMs are likely not conscious
research_prime_space · 2024-09-29T20:57:26.111Z · comments (8)

[link] AI Safety Newsletter #42: Newsom Vetoes SB 1047 Plus, OpenAI’s o1, and AI Governance Summary
Corin Katzke (corin-katzke) · 2024-10-01T20:35:32.399Z · comments (0)

Three main arguments that AI will save humans and one meta-argument
avturchin · 2024-10-02T11:39:08.910Z · comments (8)

[link] Contagious Beliefs—Simulating Political Alignment
James Stephen Brown (james-brown) · 2024-10-13T00:27:08.084Z · comments (0)

[link] Can AI agents learn to be good?
Ram Rachum (ram@rachum.com) · 2024-08-29T14:20:04.336Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

chipmonk on Ayn Rand’s model of “living money”; and an upside of burnout

Oh! The metaphor I've been using with my clients for the thing I think you're pointing at is reputation.

If the mind is a group (in this case a group of pattern predictors, but please also imagine it as a group of people), then ask yourself: How does a group of people (with no dictator) make a decision?

Well, they talk. They make bids.

Can one person use "willpower" and force the group to make a decision a particular way? Yes, if they make a strong enough bid and the rest of the group lets them. Why would the rest of the group let them? Reputation. But if they do that too many times with poor results, they lose their reputation and won't be able to dictate the group anymore. "Willpower" lost.

I suspect this happens in the mind among pattern predictors, too. (I believe @Kaj_Sotala [LW · GW] has written about this somewhere wrt Global Workspace Theory? I found this tweet in the meantime.) If a certain part of your mind lose reputation with the others parts, that part will lose reputation and won't be able to make competitive bids anymore. That part's "willpower" has decreased.

lukehmiles on lukehmiles's Shortform

Yes.

habryka4 on lukehmiles's Shortform

What do you mean by "applied research org"? Like, applied alignment research?

lukehmiles on lukehmiles's Shortform

I do think I could put a good team together and make decent contributions quickly

lukehmiles on lukehmiles's Shortform

I can only find capabilities jobs right now. I would be interested in starting a tiny applied research org or something. How hard is it to get funding for that? I don't have a strong relevant public record, but I did quite a lot of work at METR and elsewhere.

ryan_greenblatt on 5 ways to improve CoT faithfulness

On the other hand, your version of Daniel's protocol loses this property.

I'm talking about training only the Face, not training the policy (shoggoth) at all with the proposal I'm imagining.

And, these should clearly be separate models such that the training of one doesn't generalize to the other.

So, making the Face more deceptive doesn't kill the canary? The question is whether we end up with a highly misaligned policy (and this is clear from the chain of thought such that we can discover this). I think Daniel was expecting in the baseline proposal that you end up with a Face that does at least some censoring/mild deception due to training incentives (I think we already see a bit of this with o1).

Maybe your concern is that it is plausible that the policy won't be misaligned, but the Face will be and this will screw us over. This seems plausible and is worth worrying about (with both proposals). However, I thought that a core part of the hope is that the Face isn't smart enough to cleverly scheme (unless this schemeing happens in the CoT itself). And indeed, if the Face can cleverly scheme without CoT, what is stopping the policy from doing this too?

(It might also be possible to reduce the extent to which the initial Face has control over the world. E.g., you can initialize one Face to use for the first part of training (including the proposal I was mentioning) and then train another later on to imitate the earlier Face on a subpart of the distribution you inspect with the hope that it doesn't inherit learned misalignment. You could also deploy a different Face than the one you use for training if you aren't doing online training.)

yix on College technical AI safety hackathon retrospective - Georgia Tech

Thanks again Esben for collaborating with us! Can confidently say that the above is super valuable advice for any AI safety hackathon organizers, they're consistent with our experiences.

In the context of a college campus hackathon, I'd especially stress focus on preparing starter materials and making submission requirements clear early on!

nosignalnonoise on Heresies in the Shadow of the Sequences

I like what you're doing, but I feel like the heresies you propose are too tame.

Here are some more radical heresies to consider:

Most people are far more bottlenecked on some combination of akrasia and prospective memory, not on the accuracy of their models of the world. Rationalists in particular would be better off devoting effort to actually doing the obvious things than to understanding the world better.
Self deception is very instrumentally useful a large fraction of real world situations we find ourselves in, and we should use more of it.
1. Mormons seem to be especially good at coordinating on good lifestyle choices, so we should all consider becoming Mormon.
Among groups of 10+ people, it's usually more useful to get everyone all working on implementing the same plan than it is to come up with the best plan.
Intelligence (of the sort measured by exams and IQ tests) is only moderately important to success.

elityre on Lao Mein's Shortform

But he helped found OpenAI, and recently founded another AI company.

I think Elon's strategy of "telling the world not to build AGI, and then going to start another AGI company himself" is much less dumb / ethical fraught [LW(p) · GW(p)], than people often credit.

myles-h on What are Emotions?

"Values" happen to be a thing possessed by thinking entities

What happens then when a non-thinking thing feels happy? Is that happiness valued? To whom? Or do you think this is impossible?

I can imagine it possible for a fetus in the womb without any thoughts, sense of self, or an ability to move, to still be capable of feeling happiness. Now try to imagine a hypothetical person with a severe mental disability preventing them having any cohesive thoughts, sense of self, or an ability to move. Could they still feel happiness? What happens when the dopamine receptors get triggered?

It is my hypothesis that the mechanism by which emotions are felt does not require a "thinking" agent. This could be false and I now see how this is an assumption which many of my arguments rely on. Thank you for catching that.

It just seems so clear to me. When I feel pain or pleasure, I don't need to "think" about it for the emotion to be felt. I just immediately feel the pain or pleasure.

Anyway, if you assume that it is possible for a non-thinker to still be a feeler, then there is nothing logically inconceivable about a hypothetical happy rock. Then if you also say that happiness is good, and that good implies value, one must ask, who or what is valuing the happiness? The rock? The universe?

Ok maybe not "the universe" as to mean the collection of all objects within the universe. I'm more trying to say "the fabric of reality". Like there must be some physical process by which happiness is valued. Maybe a dimension by which emotional value is expressed?

I also suspect that some of the things you're calling "material terminal values" are actually better modeled as instrumental

You are partly correct about this. When I said I terminally value the making of kinetic sculptures, I was definitely making a simplification. I don't value the making of all kinetic sculptures, and I also value the making of things which aren't kinetic sculptures. I don't, however, do it because I think it is "fun". I can't formally define what the actual material terminal goal is but it is something more along the lines of, "something that is challenging, and requires a certain kind of problem solving, where the solution is beautiful in some way".

Anyway, it is often the case that the making of kinetic sculptures fits this description.

It is not true that I "simply enjoy the process of building them". Whatever the actual definition of my goal is, I don't want it because it is an instrumental goal to some emotion. This precisely what I am defining a material terminal goal to be. Any terminal goal which is not an emotion.

I also think you're calling something universal to humans when it really isn't.

I should have clarified this better. I am not saying the intensity or valence direction of emotions is universal. I am simply saying that the emotions, in general, are universally valued. Thank you for correcting me on the way masochists work. I didn't realize they were "genuinely wired differently". I just assumed they had some conflicting goal which made pain worth it. This doesn't break my argument however. I would say that the masochist is not feeling pain at that point. They would be feeling some other emotion for emotions are defined by the chemical and neural processes which make them happen. Similar to how my happiness and your happiness are not the same, but they are close enough to be grouped into a word. The key piece though is that regardless, as tslarm [LW · GW] says, "emotions are accompanied by (or identical with, depending on definitions) valenced qualia". They always have some value.

I agree that there are good reasons to value the feelings of others. I'm not sure the Ship of Theseus argument is one of them, really, but I'm also not sure I fully understood your point there.

Ahhh, yeah sorry that wasn't the clearest, I was making the point that one should value the emotions of more than just other humans. Like pigs, cats, dogs, or feely blobs.