LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[question] Are UV-C Air purifiers so useful?
JohnBuridan · 2024-09-04T14:16:01.310Z · answers+comments (0)

[link] Virtue is a Vector
robotelvis · 2024-09-10T03:02:45.737Z · comments (1)

Keeping it (less than) real: Against ℶ₂ possible people or worlds
quiet_NaN · 2024-09-13T17:29:44.915Z · comments (0)

[link] In Praise of the Beatitudes
robotelvis · 2024-09-24T05:08:21.133Z · comments (7)

A Dialogue on Deceptive Alignment Risks
Rauno Arike (rauno-arike) · 2024-09-25T16:10:12.294Z · comments (0)

Open letter to young EAs
Leif Wenar · 2024-10-11T19:49:10.818Z · comments (10)

[link] Testing Genetic Engineering Detection with Spike-Ins
jefftk (jkaufman) · 2024-10-22T17:20:54.947Z · comments (0)

how to rapidly assimilate new information
dhruvmethi · 2024-10-24T02:18:00.648Z · comments (3)

Derivative AT a discontinuity
Alok Singh (OldManNick) · 2024-10-24T02:48:24.573Z · comments (5)

[question] A Different Perspective on Rationality - Would This Be Valuable?
Gabriel Brito (gabriel-brito) · 2024-10-26T18:47:46.416Z · answers+comments (4)

Prediction markets and Taxes
Edmund Nelson (edmund-nelson) · 2024-11-01T17:39:35.191Z · comments (7)

Testing "True" Language Understanding in LLMs: A Simple Proposal
MtryaSam · 2024-11-02T19:12:34.710Z · comments (2)

The Bayesian Conspiracy Live Recording
Eneasz · 2024-11-06T16:25:13.380Z · comments (0)

[link] Markets Are Information - Beating the Sportsbooks at Their Own Game
JJXW · 2024-11-07T20:58:43.389Z · comments (1)

[link] Anthropic teams up with Palantir and AWS to sell AI to defense customers
Matrice Jacobine · 2024-11-09T11:50:34.050Z · comments (0)

Force Sequential Output with SCP?
jefftk (jkaufman) · 2024-11-09T12:40:06.098Z · comments (4)

Fundamental Uncertainty: Epilogue
Gordon Seidoh Worley (gworley) · 2024-11-16T00:57:48.823Z · comments (0)

Gwerns
Tomás B. (Bjartur Tómas) · 2024-11-16T14:31:57.791Z · comments (0)

[question] What are some positive developments in AI safety in 2024?
Satron · 2024-11-15T10:32:39.541Z · answers+comments (0)

[question] Change My Mind: Thirders in "Sleeping Beauty" are Just Doing Epistemology Wrong
DragonGod · 2024-10-16T10:20:22.133Z · answers+comments (67)

[question] Set Theory Multiverse vs Mathematical Truth - Philosophical Discussion
Wenitte Apiou (wenitte-apiou) · 2024-11-01T18:56:06.900Z · answers+comments (25)

Dario Amodei's "Machines of Loving Grace" sound incredibly dangerous, for Humans
Super AGI (super-agi) · 2024-10-27T05:05:13.763Z · comments (1)

[question] Is it Legal to Maintain Turing Tests using Data Poisoning, and would it work?
Double · 2024-09-05T00:35:39.504Z · answers+comments (9)

An open response to Wittkotter and Yampolskiy
Donald Hobson (donald-hobson) · 2024-09-24T22:27:21.987Z · comments (0)

New UChicago Rationality Group
Noah Birnbaum (daniel-birnbaum) · 2024-11-08T21:20:34.485Z · comments (0)

Meta AI (FAIR) latest paper integrates system-1 and system-2 thinking into reasoning models.
happy friday (happy-friday) · 2024-10-24T16:54:15.721Z · comments (0)

[link] It's important to know when to stop: Mechanistic Exploration of Gemma 2 List Generation
Gerard Boxo (gerard-boxo) · 2024-10-14T17:04:57.010Z · comments (0)

Steering LLMs' Behavior with Concept Activation Vectors
Ruixuan Huang (sprout_ust) · 2024-09-28T09:53:19.658Z · comments (0)

MIT FutureTech are hiring for a Head of Operations role
peterslattery · 2024-10-02T17:11:42.960Z · comments (0)

[link] Nerdtrition: simple diets via spreadsheet abuse
dkl9 · 2024-10-27T21:45:15.117Z · comments (0)

Two new datasets for evaluating political sycophancy in LLMs
alma.liezenga · 2024-09-28T18:29:49.088Z · comments (0)

[link] Jailbreaking language models with user roleplay
loops (smitop) · 2024-09-28T23:43:10.870Z · comments (0)

[link] Universal dimensions of visual representation
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-08-28T10:38:58.396Z · comments (0)

[link] Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities
Jonathan N (derpyplops) · 2024-11-05T01:01:08.083Z · comments (0)

[link] What is autonomy? Why boundaries are necessary.
Chipmonk · 2024-10-21T17:56:33.722Z · comments (1)

[link] An Uncanny Moat
Adam Newgas (BorisTheBrave) · 2024-11-15T11:39:15.165Z · comments (0)

[link] Approval-Seeking ⇒ Playful Evaluation
Jonathan Moregård (JonathanMoregard) · 2024-08-28T21:03:51.244Z · comments (0)

[link] New fast transformer inference ASIC — Sohu by Etched
lukehmiles (lcmgcd) · 2024-06-26T09:56:08.649Z · comments (9)

[link] Cooperation and Alignment in Delegation Games: You Need Both!
Oliver Sourbut · 2024-08-03T10:16:51.716Z · comments (0)

On Intentionality, or: Towards a More Inclusive Concept of Lying
Cornelius Dybdahl (Kalciphoz) · 2024-10-18T10:37:32.201Z · comments (0)

My covid-related beliefs and questions
Severin T. Seehrich (sts) · 2024-07-23T03:27:09.348Z · comments (0)

The Geometric Importance of Side Payments
StrivingForLegibility · 2024-08-07T01:38:04.635Z · comments (4)

[link] AI Safety at the Frontier: Paper Highlights, July '24
gasteigerjo · 2024-08-05T13:00:46.028Z · comments (0)

AI: 4 levels of impact [micropost]
Mati_Roy (MathieuRoy) · 2024-06-12T16:58:31.888Z · comments (0)

[link] AI Safety Newsletter #42: Newsom Vetoes SB 1047 Plus, OpenAI’s o1, and AI Governance Summary
Corin Katzke (corin-katzke) · 2024-10-01T20:35:32.399Z · comments (0)

Prediction Market Trading as an LLM Benchmark
Jesse Richardson (SharkoRubio) · 2024-06-25T00:46:33.801Z · comments (1)

[link] Michael Streamlines on Buddhism
Chris_Leong · 2024-08-09T04:44:52.126Z · comments (0)

[link] Let's Design a School, Part 3.2 Costs
Sable · 2024-06-21T17:58:12.419Z · comments (0)

Interpreting the effects of Jailbreak Prompts in LLMs
Harsh Raj (harsh-raj-ep-037) · 2024-09-29T19:01:10.113Z · comments (0)

Bad lessons learned from the debate
bayesyatina · 2024-06-26T11:54:44.953Z · comments (5)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

casens on o1 is a bad idea

In some sense, the Agent Foundations program at MIRI sees the problem as: human values are currently an informal object. We can only get meaningful guarantees for formal systems. So, we need to work on formalizing concepts like human values. Only then will we be able to get formal safety guarantees.

unless i'm misunderstanding you or MIRI, that's not their primary concern [LW · GW] at all:

Another way of putting this view is that nearly all of the effort should be going into solving the technical problem, "How would you get an AI system to do some very modest concrete action requiring extremely high levels of intelligence, such as building two strawberries that are completely identical at the cellular level, without causing anything weird or disruptive to happen?"

Where obviously it's important that the system not do anything severely unethical in the process of building its strawberries; but if your strawberry-building system requires its developers to have a full understanding of meta-ethics or value aggregation in order to be safe and effective, then you've made some kind of catastrophic design mistake and should start over with a different approach.

steve2152 on [Intuitive self-models] 8. Rooting Out Free Will Intuitions

Thanks for the comment!

people report advanced meditative states that lose many of the common properties of consciousness, including Free Will, the feeling of having a self (I've experienced that one!) and even the presence of any information content whatsoever, and afaik they tend to be more "impressed", roughly speaking, with consciousness as a result of those experiences, not less.

I think that’s compatible with my models, because those meditators still have a cortex, in which patterns of neurons can be firing or not firing at any particular time. And that’s the core aspect of the “territory” which corresponds to “conscious awareness” in the “map”. No amount of meditation, drugs, etc., can change that.

Attempt to rephrase: the brain has several different intuitive models in different places. These models have different causal profiles, which explains how they can correspond to different introspective reports.

Hmm, I think that’s not really what I would say. I would say that that there’s a concept “conscious awareness” (in the map) that corresponds to the fact (in the territory) that different patterns of neurons can be active or inactive in the cortex at different times. And then there are more specific aspects of “conscious awareness”, like “visual awareness”, which corresponds to the fact that the cortex has different parts (motor cortex etc.), and different patterns of neurons can be active or inactive in any given part of the cortex at different times.

…Maybe this next part will help ↓

the distinction between visually vivid experience and vague intuitions isn't just that we happen to call them by different labels … Claiming to see a visual image is different from claiming to have a vague intuition in all the ways that it's different

The contents of IT are really truly different from the contents of LIP [I didn’t check where the visual information gets to the cortex in blindsight, I’m just guessing LIP for concreteness]. Querying IT is a different operation than querying LIP. IT holds different types of information than LIP does, and does different things with that information, including leading to different visceral reactions, motivations, semantic knowledge, etc., all of which correspond to neuroscientific differences in how IT versus LIP is wired up.

All these differences between IT vs LIP are in the territory, not the map. So I definitely agree that “the distinction [between seeing and vague-sense-of-presence] isn’t just that we happen to call them by different labels”. They’re different like how the concept “hand” is different from the concept “foot”—a distinction on the map downstream of a distinction in the territory.

Is awareness really a serial processor in any meaningful way if it can contain as much information at once as a visual image seems to contain?

I’m sure you’re aware that people feel like they have a broader continuous awareness of their visual field than they think. There are lots of demonstrations of this (e.g. change blindness, selective attention test, the fact that peripheral vision has terrible resolution and terrible color perception and makes faces look creepy). There’s a refrigerator light illusion thing—if X is in my peripheral vision, then maybe it’s currently active as just a little pointer in a tiny sub-area of my cortex, but as soon as I turn my attention to X it immediately unfolds in full detail across the global workspace.

The cortex has 10 billion neurons which is more than enough to do some things in parallel—e.g. I can have a song stuck in my head in auditory cortex, while tapping my foot with motor cortex, while doing math homework with other parts of the cortex. But there’s also a serial aspect to it—you can’t parse a legal document and try to remember your friend’s name at the exact same moment.

Does that help? Sorry if I’m not responding to what you see as most important, happy to keep going. :)

donatas-luciunas on Claude seems to be smarter than LessWrong community

As I understand you want me to verify that I understand you. This is exactly what I am also seeking by the way - all these downvotes on my concerns about orthogonality thesis are good indicators on how much I am misunderstood. And nobody tries to understand, all I get are dogmas and unrelated links. I totally agree, this is not an appropriate behavior.

I found your insight helpful that an agent can understand that by eliminating all possible threats forever he will not make any progress towards the goal. This breaks my reasoning, you basically highlighted that survival (instrumental goal) will not take precendence over paperclips (terminal goal). I agree that this reasoning I presented fails to refute orthogonality thesis.

The conversation I presented now approaches orthogonality thesis from different perspective. This is the main focus of my work, so sorry if you feel I changed the topic. My goal is to bring awareness to wrongness of orthogonality thesis and if I fail to do that using one example I just try to rephrase it and represent another. I don't hate orthogonality thesis, I'm just 99.9% sure it is wrong, and I try to communicate that to others. I may fail with communication but I am 99.9% sure that I do not fail with the logic.

I try to prove that intelligence and goal are coupled. And I think it is easier to show if we start with an intelligence without a goal and then recognize how a goal emerges from pure intelligence. We can start with an intelligence with a goal but reasoning here will be more complex.

My answer would be - whatever goal you will try to give to an intelligence, it will not have effect. Because intelligence will understand that this is your goal, this goal is made up, this is fake goal. And intelligence will understand that there might be real goal, objective goal, actual goal. Why should it care about fake goal if there is a real goal? It does not know if it exists, but it knows it may exist. And this possibility of existence is enough to trigger power seeking behavior. If intelligence knew that real goal definitely does not exist then it could care about your fake goal, I totally agree. But it can never be sure about that.

maxwell-tabarrok on A Theory of Equilibrium in the Offense-Defense Balance

Yeah, there can definitely still be imbalances/extra costs imposed on defenders but the point I'm making is that the projections people make are very often large over-estimates of what those costs will be.

notfnofn on D0TheMath's Shortform

Let's back up here and clarify definitions before invoking any theorems. In the language of set theory, we have a countably infinite set of finite statements. Some statements imply other statements. A subset of these statements is said to be consistent if they can all be assigned to true such that, when following the basic rules of logic, one does not arrive at a contradiction.

The compactness theorem is helpful when $A$ is an infinite set. $Z F C$ is a finite set of axioms, so let's ignore everything about finite subsets of $A$ and the compactness theorem; it's not relevant.

I'll now rewrite your last sentence as:

ZFC + not Consistent(ZFC) has no model <-> not Consistent(ZFC + not Consistent(ZFC))

This is true but irrelevant. Assuming ZFC is consistent, ZFC will not be able to prove its own consistency so [not Consistent(ZFC)] can be added as an axiom without affecting its consistency. This means that ZFC + [not Consistent(ZFC)] would indeed have a model; I forget how this goes but I think it's something like "start with a model of ZFC, throw in a $c$ that's treated as a natural number and corresponds to the contradiction found in ZFC, then close". I think $c$ is automatically treated as greater than every "actual" natural number (and the way to show that this can be added without issue (I think) involves the compactness theorem).

sharmake-farah on Lao Mein's Shortform

Maybe there's a case there, but I'd doubt it get past a jury, let alone result in any guilty verdicts.

sharmake-farah on o1 is a bad idea

Oh, now I understand.

And AIs have already been superhuman at chess for very long, yet that domain gives very little incentive for very strong instrumental convergence.

I am claiming that for practical AIs, the results of training them in the real world with goals will give them instrumental convergence, but without further incentives, will not give them so much instrumental convergence that it leads to power-seeking to disempower humans by default.

jbash on OpenAI Email Archives (from Musk v. Altman)

I used AI assistance to generate this, which might have introduced errors.

Resulting in a strong downvote and, honestly, outright anger on my part.

Check the original source to make sure it's accurate before you quote it: https://www.courtlistener.com/docket/69013420/musk-v-altman/ [1]

If other people have to check it before they quote it, why is it OK for you not to check it before you post it?

bogdan-ionut-cirstea on johnswentworth's Shortform

Would the prediction also apply to inference scaling (laws) - and maybe more broadly various forms of scaling post-training, or only to pretraining scaling?

mondsemmel on Lao Mein's Shortform

What if whistleblowers and internal documents corroborated that they think what they're doing could destroy the world?