LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

MIT FutureTech are hiring for a Head of Operations role
peterslattery · 2024-10-02T17:11:42.960Z · comments (0)

Three main arguments that AI will save humans and one meta-argument
avturchin · 2024-10-02T11:39:08.910Z · comments (8)

Interpreting the effects of Jailbreak Prompts in LLMs
Harsh Raj (harsh-raj-ep-037) · 2024-09-29T19:01:10.113Z · comments (0)

[link] It's important to know when to stop: Mechanistic Exploration of Gemma 2 List Generation
Gerard Boxo (gerard-boxo) · 2024-10-14T17:04:57.010Z · comments (0)

[link] Contagious Beliefs—Simulating Political Alignment
James Stephen Brown (james-brown) · 2024-10-13T00:27:08.084Z · comments (0)

Project Adequate: Seeking Cofounders/Funders
Lorec · 2024-11-17T03:12:12.995Z · comments (4)

An open response to Wittkotter and Yampolskiy
Donald Hobson (donald-hobson) · 2024-09-24T22:27:21.987Z · comments (0)

[question] Is it Legal to Maintain Turing Tests using Data Poisoning, and would it work?
Double · 2024-09-05T00:35:39.504Z · answers+comments (9)

[question] What are some positive developments in AI safety in 2024?
Satron · 2024-11-15T10:32:39.541Z · answers+comments (0)

[link] [Linkpost] Hawkish nationalism vs international AI power and benefit sharing
jakub_krys (kryjak) · 2024-10-18T18:13:19.425Z · comments (5)

[question] What are some good ways to form opinions on controversial subjects in the current and upcoming era?
notfnofn · 2024-10-27T14:33:53.960Z · answers+comments (21)

A brief theory of why we think things are good or bad
David Johnston (david-johnston) · 2024-10-20T20:31:26.309Z · comments (10)

[link] October 2024 Progress in Guaranteed Safe AI
Quinn (quinn-dougherty) · 2024-10-28T23:34:51.689Z · comments (0)

[question] If I ask an LLM to think step by step, how big are the steps?
ryan_b · 2024-09-13T20:30:50.558Z · answers+comments (1)

[link] Checking public figures on whether they "answered the question" quick analysis from Harris/Trump debate, and a proposal
david reinstein (david-reinstein) · 2024-09-11T20:25:27.845Z · comments (4)

[question] What actual bad outcome has "ethics-based" RLHF AI Alignment already prevented?
Roko · 2024-10-19T06:11:12.602Z · answers+comments (16)

Denver USA - ACX Meetups Everywhere Fall 2024
Eneasz · 2024-08-29T18:40:53.332Z · comments (0)

Of Birds and Bees
RussellThor · 2024-09-30T10:52:15.069Z · comments (9)

[link] Validating / finding alignment-relevant concepts using neural data
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-09-20T21:12:49.267Z · comments (0)

[link] Thinking LLMs: General Instruction Following with Thought Generation
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-10-15T09:21:22.583Z · comments (0)

Moral Trade, Impact Distributions and Large Worlds
Larks · 2024-09-20T03:45:56.273Z · comments (0)

[link] Consciousness As Recursive Reflections
Gunnar_Zarncke · 2024-10-05T20:00:53.053Z · comments (3)

Enhancing Mathematical Modeling with LLMs: Goals, Challenges, and Evaluations
ozziegooen · 2024-10-28T21:44:42.352Z · comments (0)

Piling bounded arguments
momom2 (amaury-lorin) · 2024-09-19T22:27:41.534Z · comments (0)

Quantitative Trading Bootcamp [Nov 6-10]
Ricki Heicklen (bayesshammai) · 2024-10-28T18:39:58.480Z · comments (0)

The Personal Implications of AGI Realism
xizneb · 2024-10-20T16:43:37.870Z · comments (7)

One person's worth of mental energy for AI doom aversion jobs. What should I do?
Lorec · 2024-08-26T01:29:01.700Z · comments (16)

A Brief Explanation of AI Control
Aaron_Scher · 2024-10-22T07:00:56.954Z · comments (1)

Not all biases are equal - a study of sycophancy and bias in fine-tuned LLMs
jakub_krys (kryjak) · 2024-11-11T23:11:15.233Z · comments (0)

[link] Boons and banes
dkl9 · 2024-09-23T06:18:38.335Z · comments (0)

[question] What makes one a "rationalist"?
mathyouf · 2024-10-08T20:25:21.812Z · answers+comments (5)

[link] Is Redistributive Taxation Justifiable? Part 1: Do the Rich Deserve their Wealth?
Alexander de Vries (alexander-de-vries) · 2024-09-05T10:23:08.958Z · comments (20)

Deception and Jailbreak Sequence: 2. Iterative Refinement Stages of Jailbreaks in LLM
Winnie Yang (winnie-yang) · 2024-08-28T08:41:38.967Z · comments (2)

The Great Bootstrap
KristianRonn · 2024-10-11T19:46:51.752Z · comments (0)

[question] somebody explain the word "epistemic" to me
KvmanThinking (avery-liu) · 2024-10-28T16:40:24.275Z · answers+comments (8)

[link] Spherical cow
dkl9 · 2024-11-11T03:10:27.788Z · comments (0)

Fake Blog Posts as a Problem Solving Device
silentbob · 2024-08-31T09:22:54.513Z · comments (0)

Join my new subscriber chat
sarahconstantin · 2024-11-06T02:30:11.059Z · comments (0)

[question] On the subject of in-house large language models versus implementing frontier models
Annapurna (jorge-velez) · 2024-09-23T15:00:32.811Z · answers+comments (1)

[link] Taking nonlogical concepts seriously
Kris Brown (kris-brown) · 2024-10-15T18:16:01.226Z · comments (5)

[question] Does a time-reversible physical law/Cellular Automaton always imply the First Law of Thermodynamics?
Noosphere89 (sharmake-farah) · 2024-08-30T15:12:28.823Z · answers+comments (11)

[question] How to cite LessWrong as an academic source?
PhilosophicalSoul (LiamLaw) · 2024-11-06T08:28:26.309Z · answers+comments (6)

Halifax Canada - ACX Meetups Everywhere Fall 2024
interstice · 2024-08-29T18:39:12.490Z · comments (0)

Introducing Kairos: a new AI safety fieldbuilding organization (the new home for SPAR and FSP)
agucova · 2024-10-25T21:59:08.782Z · comments (0)

[question] Why would ASI share any resources with us?
Satron · 2024-11-13T23:38:36.535Z · answers+comments (5)

Exploring Shard-like Behavior: Empirical Insights into Contextual Decision-Making in RL Agents
Alejandro Aristizabal (alejandro-aristizabal) · 2024-09-29T00:32:42.161Z · comments (0)

Understanding Hidden Computations in Chain-of-Thought Reasoning
rokosbasilisk · 2024-08-24T16:35:03.907Z · comments (1)

[question] why won't this alignment plan work?
KvmanThinking (avery-liu) · 2024-10-10T15:44:59.450Z · answers+comments (7)

Inquisitive vs. adversarial rationality
gb (ghb) · 2024-09-18T13:50:09.198Z · comments (9)

2025 Q1 Pivotal Research Fellowship (Technical & Policy)
Tobias H (clearthis) · 2024-11-12T10:56:24.858Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

lillybaeum on [Intuitive self-models] 8. Rooting Out Free Will Intuitions

I wonder how domination and submission relate to these concepts.

Note that d/s doesn't necessarily need to have a sexual connotation, although it nearly always does.

My understanding of the appeal of submission is that the ideal submissive state is one where the dominant partner is anticipating the needs and desires of the submissive partner, supplies these needs and desires, and reassures or otherwise convinces the submissive that they are capable of doing so, and will actively do so for the duration of the scene.

After reading your series, I'd assume what is happening here is a number of things all related to the belief in the homunculus and the constant valence calculations that the brain performs in order to survive and thrive in society.

You have no need to try to fight for dominance or be 'liked' or 'admired'. The dominant partner is your superior, and the dominant partner likes and admires you completely.
You have no need to plan things and determine their valence -- the dominant will anticipate any needs, desires and responsibilities, and take care of them for you.
You have no need to maintain a belief in your own 'willpower', 'identity', 'ego', etc... for the duration of the scene, you wear the mask of 'the obedient submissive'.

All things considered, it's absolutely no surprise that 'subspace' is an appealing place to be, it's sort of a shortcut to the truth you're describing. I wouldn't be surprised if some people even have an experience bordering on nirodha samapatti during a particularly deep, extensive scene, where they have little memory of the experience afterwards. I'm also not surprised that hypnodomination, a combination of d/s and trance, is so common, given that the two states are so similar.

alexander-gietelink-oldenziel on Alexander Gietelink Oldenziel's Shortform

Is true Novelty a Mirage?

One view on novelty is that it's a mirage. Novelty is 'just synthesis of existing work, plus some randomness.'

I don't think that's correct. I think true novelty is more subtle than that. Yes sometimes novel artforms or scientific ideas are about noisily mixing existing ideas. Does it describe all forms of novelty?

A reductio ad absurdum of the novelty-as-mirage point of view is that all artforms that have appeared since the dawn of time are simply noised versions of cavepaintings. This seems absurd.

Consider AlphaGO. Does AlphaGO just noisily mix human experts? No, alphaGO works on a different principle and I would venture strictly outcompetes anything based on averaging or smoothing over human experts.

AlphaGO is based on a different principle than averaging over existing data. Instead, AlphaGO starts with an initial guess on what good play looks like, perhaps imitated from previous plays. It then plays out to a long horizons and prunes those strategies that did poorly and upscales those strategies that did well. It iteratively amplifies, refines and distilles. I strongly suspect that approximately this modus operandi underlies much of human creativity as well.

True novelty is based on both the synthesis and refinement of existing work.

jkaufman on Trying Bluesky

That's still an algorithm, it's just a very simple one.

Personally, I prefer to have the posts I see be the product of a sophisticated algorithm (ex: there are some people I follow who post a lot, and for those people I would like to only see their best posts) but I want it to be one that is in my interest.

notfnofn on D0TheMath's Shortform

It's completely valid. And we can simplify it further to:

not Consistent(ZFC) -> not Consistent(ZFC + not Consistent(ZFC))

because if a set of axioms is already inconsistent, then it's inconsistent with anything added. But you still won't be able to actually derive a contradiction from this.

Edit: I think the right thing to do here is look at models for PA + not consistent(PA). I can't find a nice treatment of this at the moment, but here's a possibly wrong one by someone who was learning the subject at the time: https://angyansheng.github.io/blog/a-theory-that-proves-its-own-inconsistency

quetzal_rainbow on D0TheMath's Shortform

Okay, I kinda understood where I am wrong spiritually-intuitively, but now I don't understand where I'm wrong formally. Like which inference in chain

not Consistent(ZFC) -> some subsets of ZFC don't have a model -> some subsets of ZFC + not Consistent(ZFC) don't have a model -> not Consistent(ZFC + not Consistent(ZFC))

is actually invalid?

soli on OpenAI Email Archives (from Musk v. Altman)

why/how are you so sure that openai made things much worse (in the long run)?

rotatingpaguro on AI #90: The Wall

I see your proposed condition for meaningful debate as bureaucracy that adds friction rather than value.

cousin_it on OpenAI Email Archives (from Musk v. Altman)

"Temptations are bound to come, but woe to anyone through whom they come." Or to translate from New Testament into something describing the current situation: you should accept that AI will come, but you shouldn't be the one who hastens its coming.

Yes, this approach sounds very simple and naive. The people in this email exchange rejected it and went for a more sophisticated one: join the arms race and try to steer it. By now we see that these ultra-smart and ultra-rich people made things much worse than if they'd followed the "do no evil" approach. If this doesn't prove the "do no evil" approach, I'm not sure what will.

startattheend on Breaking beliefs about saving the world

I like this post, but I have some problems with it. Don't take it too hard, as I'm not the average LW reader. I think your post is quite in line with what most people here believe (but you're quite ambitious in the tasks you give yourself, so you might get downvoted as a result of minor mistakes and incompleteness resulting from that). I'm just an anomaly who happened to read your post.

By bringing attention to tactical/emotionally pulling patterns of suffering, people will recognize it in their own life, and we will create an unfulfilled desire that only we have the solution for.

I think this might make suffering worse. Suffering is subjective, so if you make people believe that they should be suffering, or that suffering is justified, they may suffer needlessly. For example, poverty doesn't make people as dissatisfied with life as relative poverty does. It's when people compare themselves to others and realize that they could have it better, that they start disliking what they have at the moment. If you create ideals, then people will work towards archiving them, but they will also suffer from the gap between the current state and what's ideal. You may argue "the reward redeems the suffering and makes it bearable", and yes, but only as long as people believe that they're getting closer to the goal. Most positive emotion we experience is a result of feeling ourselves moving towards our goals.

Personal concurrent life-satisfaction is possible in-spite of punishment/suffering when punishment/suffering is perceived as a necessary sacrifice for an impending reward.

Yes, which is why one should not reduce "suffering" but "the causes of unproductive suffering". Just like one shouldn't avoid "pain", but "actions which are painful and without benefit". The conclusions of "mans search for meaning" was that suffering is bearable as long as it as meaning, that only meaningless suffering is unbearable. I've personally felt this as well. One of the times I was the most happy, I was also the most depressed. But that might just have been a mixed episode as is known from bipolar disorder.
I'm nitpicking, but I believe it's important to state that "suffering" isn't a fundamental issue. If I touch a flame and burn my hand, then the flame is the issue, not the pain. In fact, the pain is protecting me from touching the flame again. Suffering is good for survival, for the same reason that pain is good for survival. The proof is that evolution made us suffer, that those who didn't suffer didn't pass on their genes.

We are products of EA

I'm not sure this is true? EA seems to be the opposite of darwinism, and survival of the fittest has been the standard until recent (everyone suddenly cares about reducing negative emotions and unfairness, to an almost pathological degree). But even if various forces helped me avoid suffering, would that really be a good thing?

I personally grew the most as a person as a result of suffering. You're probably right that you were the least productive when you didn't eat, but suffering is merely a signal that change is necessary, and when you experience great suffering, you become open to the idea of change. It's not uncommon that somebody hits rock bottom and turns their lives around for the better as a result. But while suffering is bearable, we can continue enduring, until we suffer the death of a thousand papercuts (or the death of the boiling frog, by our own hands)
That said, growth is usually a result of internal pressure, in which an inconsistency inside oneself finally snaps, so that one can focus on a single direction with determination. It's like a fever - the body almost kills itself, so that something harmful to it can die sooner.

We are still in trouble if the average human is as stupid as I am.

Are you sure suffering is caused by a lack of intelligence, and not by too much intelligence? ('Forbidden fruit' argument) And that we suffer from a lack of tech rather than from an abundance of tech? (As Ted Kaczynski and the Amish seem to think)
Many animals are thriving despite their lack of intelligence. Any problem more complicated than "Get water, food and shelter. Find a mate, and reproduce" is a fabricated problem. It's because we're more intelligent than animals that we fabricate more difficult problems. And if something was within out ability, we'd not consider it a problem, which is why we always fabricate problems which are beyond our current capacities, which is how we trick ourselves into growth and improvement. Growth and improvement which somehow resulted in us being so powerful that we can destroy ourselves. Horseshoe crabs seem content with themselves, and even after 400 million years they just do their own thing. Some of them seem endangered now, but that's because of us?

Bureaucracy

Caused by too much centralization, I think. Merging structures into fewer, bigger structures causes an overhead which doesn't seem to be worth it. Decentralizing everything may actually save the world, or at least decrease the feedback loop which causes a few entities to hog all the resources.

Moloch

Caused by too much information and optimization, and therefore unlikely to be solved with information and optimization. My take here is the same as with intelligence and tech. Why hasn't moloch killed us sooner? I believe it's because the conditions for moloch weren't yet reached (optimal strategies weren't visible, as the world wasn't legible and transparent enough), in which case, going back might be better than going forwards.

The tools you wish to use to solve human extinction are, from my perspective, what is currently leading us towards extinction. You can add AGI to this list of things if you want.

gerardus-mercator on Claude seems to be smarter than LessWrong community

So, if I understand you correctly, you now agree that a paperclip-maximizing agent won't utterly disregard paperclips relative to survival, because that would be suboptimal for its utility function.
However, if a paperclip-maximizing agent utterly disregarded paperclips relative to investigating the possibility of an objective goal, that would also be suboptimal for its utility function.
It sounds to me like you're saying that the intelligent agent will just disregard optimization of its utility function and instead investigate the possibility of an objective goal.
However, I don't agree with that. I don't see why an intelligent agent would do that if its utility function didn't already include a term for objective goals.
Again, I think a toy example might help to illustrate your position.