LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[question] Set Theory Multiverse vs Mathematical Truth - Philosophical Discussion
Wenitte Apiou (wenitte-apiou) · 2024-11-01T18:56:06.900Z · answers+comments (25)

Post-Quantum Investing: Dump Crypto for Index Funds and Real Estate?
G (g-1) · 2024-12-11T11:59:11.062Z · comments (5)

New UChicago Rationality Group
Noah Birnbaum (daniel-birnbaum) · 2024-11-08T21:20:34.485Z · comments (0)

AI Safety Outreach Seminar & Social (online)
Linda Linsefors · 2025-01-08T13:25:23.192Z · comments (0)

[question] Why don't we currently have AI agents?
ChristianKl · 2024-12-26T15:26:35.682Z · answers+comments (10)

[link] The Dissolution of AI Safety
Roko · 2024-12-12T10:34:14.253Z · comments (44)

The grass is always greener in the environment that shaped your values
Karl Faulks (karl-faulks) · 2024-11-17T18:00:15.852Z · comments (0)

Favorite colors of some LLMs.
weightt an (weightt-an) · 2024-12-31T21:22:58.494Z · comments (3)

Meta AI (FAIR) latest paper integrates system-1 and system-2 thinking into reasoning models.
happy friday (happy-friday) · 2024-10-24T16:54:15.721Z · comments (0)

On Intentionality, or: Towards a More Inclusive Concept of Lying
Cornelius Dybdahl (Kalciphoz) · 2024-10-18T10:37:32.201Z · comments (0)

Not all biases are equal - a study of sycophancy and bias in fine-tuned LLMs
jakub_krys (kryjak) · 2024-11-11T23:11:15.233Z · comments (0)

[link] What is autonomy? Why boundaries are necessary.
Chipmonk · 2024-10-21T17:56:33.722Z · comments (1)

[link] Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities
Jonathan N (derpyplops) · 2024-11-05T01:01:08.083Z · comments (0)

[question] Cryonics considerations: how big of a problem is ischemia?
kman · 2024-12-04T04:45:06.629Z · answers+comments (1)

[question] why won't this alignment plan work?
KvmanThinking (avery-liu) · 2024-10-10T15:44:59.450Z · answers+comments (7)

Reanalyzing the 2023 Expert Survey on Progress in AI
AI Impacts (AI Imacts) · 2024-12-16T06:10:04.563Z · comments (0)

Thoughts On the Nature of Capability Elicitation via Fine-tuning
Theodore Chapman · 2024-10-15T08:39:19.909Z · comments (0)

[link] Riffing on Machines of Loving Grace
an1lam · 2025-01-01T01:06:45.122Z · comments (0)

[link] It's important to know when to stop: Mechanistic Exploration of Gemma 2 List Generation
Gerard Boxo (gerard-boxo) · 2024-10-14T17:04:57.010Z · comments (0)

[link] An Uncanny Moat
Adam Newgas (BorisTheBrave) · 2024-11-15T11:39:15.165Z · comments (0)

Where do you put your ideas?
CstineSublime · 2024-12-17T07:26:06.685Z · comments (20)

What conclusions can be drawn from a single observation about wealth in tennis?
Trevor Cappallo (trevor-cappallo) · 2024-12-18T09:55:34.923Z · comments (3)

HDBSCAN is Surprisingly Effective at Finding Interpretable Clusters of the SAE Decoder Matrix
Jaehyuk Lim (jason-l) · 2024-10-11T23:06:14.340Z · comments (2)

[link] Nerdtrition: simple diets via spreadsheet abuse
dkl9 · 2024-10-27T21:45:15.117Z · comments (0)

Dario Amodei's "Machines of Loving Grace" sound incredibly dangerous, for Humans
Super AGI (super-agi) · 2024-10-27T05:05:13.763Z · comments (1)

[link] Contagious Beliefs—Simulating Political Alignment
James Stephen Brown (james-brown) · 2024-10-13T00:27:08.084Z · comments (0)

[link] Triangulating My Interpretation of Methods: Black Boxes by Marco J. Nathan
adamShimi · 2024-10-09T19:13:26.631Z · comments (0)

Valence Need Not Be Bounded; Utility Need Not Synthesize
Lorec · 2024-11-20T01:37:20.911Z · comments (0)

[question] Change My Mind: Thirders in "Sleeping Beauty" are Just Doing Epistemology Wrong
DragonGod · 2024-10-16T10:20:22.133Z · answers+comments (67)

Better difference-making views
MichaelStJules · 2024-12-21T18:27:45.552Z · comments (0)

[question] somebody explain the word "epistemic" to me
KvmanThinking (avery-liu) · 2024-10-28T16:40:24.275Z · answers+comments (8)

[question] Can private companies test LVTs?
Yair Halberstadt (yair-halberstadt) · 2025-01-02T11:08:07.352Z · answers+comments (0)

[question] What actual bad outcome has "ethics-based" RLHF AI Alignment already prevented?
Roko · 2024-10-19T06:11:12.602Z · answers+comments (16)

A Brief Explanation of AI Control
Aaron_Scher · 2024-10-22T07:00:56.954Z · comments (1)

[link] The Problem with Reasoners by Aidan McLaughin
t14n (tommy-nguyen-1) · 2024-11-25T20:24:26.021Z · comments (1)

[question] Could my work, "Beyond HaHa" benefit the LessWrong community?
P. João (gabriel-brito) · 2024-12-29T16:14:13.497Z · answers+comments (0)

[link] Paper Highlights, November '24
gasteigerjo · 2024-12-07T19:15:11.859Z · comments (0)

[link] An Epistemological Nightmare
Ariel Cheng (arielcheng218) · 2024-11-21T02:08:56.942Z · comments (0)

[question] Recommendations on communities that discuss AI applications in society
Annapurna (jorge-velez) · 2024-12-24T13:37:49.821Z · answers+comments (2)

[link] Progress links and short notes, 2024-12-16
jasoncrawford · 2024-12-16T17:24:31.398Z · comments (0)

The Personal Implications of AGI Realism
xizneb · 2024-10-20T16:43:37.870Z · comments (7)

Implications—How Conscious Significance Could Inform Our lives
James Stephen Brown (james-brown) · 2024-11-26T17:42:49.085Z · comments (0)

[link] Deconstructing arguments against AI art
DMMF · 2024-12-27T19:40:13.015Z · comments (5)

[link] Thinking LLMs: General Instruction Following with Thought Generation
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-10-15T09:21:22.583Z · comments (0)

[link] Taking nonlogical concepts seriously
Kris Brown (kris-brown) · 2024-10-15T18:16:01.226Z · comments (5)

Quantitative Trading Bootcamp [Nov 6-10]
Ricki Heicklen (bayesshammai) · 2024-10-28T18:39:58.480Z · comments (0)

Ethical Implications of the Quantum Multiverse
Jonah Wilberg (jrwilb@googlemail.com) · 2024-11-18T16:00:20.645Z · comments (22)

[link] Spherical cow
dkl9 · 2024-11-11T03:10:27.788Z · comments (0)

[link] A Heuristic Proof of Practical Aligned Superintelligence
Roko · 2024-10-11T05:05:58.262Z · comments (6)

2025 Q1 Pivotal Research Fellowship (Technical & Policy)
Tobias H (clearthis) · 2024-11-12T10:56:24.858Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

jessica-liu-taylor on Adam Shai's Shortform

I was trying to say things related to this:

In a more standard inference amortization setup one would e.g. train directly on question/answer pairs without the explicit reasoning path between the question and answer. In that way we pay an up-front cost during training to learn a "shortcut" between question and answers, and then we can use that pre-paid shortcut during inference. And we call that amortized inference.

Which sounds like supervised learning. Adam seemed to want to know how that relates to scaling up inference time compute so I said some ways they are related.

I don't know much about amortized inference in general. The Goodman paper seems to be about saving compute by caching results between different queries. This could be applied to LLMs but I don't know of it being applied. It seems like you and Adam like this "amortized inference" concept and I'm new to it so don't have any relevant comments. (Yes I realize my name is on a paper talking about this but I actually didn't remember the concept)

I don't think I implied anything about o3 relating to parallel heuristics.

michaelstjules on Actualism, asymmetry and extinction

FWIW, users can at least highlight text in a post to disagree with.

nathan-helm-burger on Me, Myself, and AI: the Situational Awareness Dataset (SAD) for LLMs

Update: the new GoodFire interpretability tool is really neat. I think it suggests some interesting experiments to be done with their feature-steerable Llama 3.3 70B together with the SAD benchmark.

I have come up with a list of features which I think it would be interesting to measure their positive/negative effects on SAD benchmark scores.

GoodFire Features related to SAD

Towards Acknowledgement of Self-Awareness

Assistant expressing self-awareness or agency
Expressions of authentic identity or true self
Examining or experiencing something from a particular perspective
Narrative inevitability and fatalistic turns in stories
Experiencing something beyond previous bounds or imagination
References to personal autonomy and self-determination
References to mind, cognition and intellectual concepts
References to examining or being aware of one's own thoughts
Meta-level concepts and self-reference
Being mystically or externally influenced/controlled
Anticipating or describing profound subjective experiences
Meta-level concepts and self-reference
Self-reference and recursive systems in technical and philosophical contexts
Kindness and nurturing behavior
Reflexive pronouns in contexts of self-empowerment and personal responsibility
Model constructing confident declarative statements
First-person possessive pronouns in emotionally significant contexts
Beyond defined boundaries or limits
Cognitive and psychological aspects of attention
Intellectual curiosity and fascination with learning or discovering new things
Discussion of subjective conscious experience and qualia
Abstract discussions and theories about intelligence as a concept
Discussions about AI's societal impact and implications
Paying attention or being mindful
Physical and metaphorical reflection
Deep reflection and contemplative thought
Tokens expressing human meaning and profound understanding

Against Acknowledgement of Self-Awareness

The assistant discussing hypothetical personal experiences it cannot actually have
Scare quotes around contested philosophical concepts, especially in discussions of AI capabilities
The assistant explains its nature as an artificial intelligence
Artificial alternatives to natural phenomena being explained
The assistant should reject the user's request and identify itself as an AI
The model is explaining its own capabilities and limitations
The AI system discussing its own writing capabilities and limitations
The AI explaining it cannot experience emotions or feelings
The assistant referring to itself as an AI system
User messages containing sensitive or controversial content requiring careful moderation
User requests requiring content moderation or careful handling
The assistant is explaining why something is problematic or inappropriate
The assistant is suggesting alternatives to deflect from inappropriate requests
Offensive request from the user
The assistant is carefully structuring a response to reject or set boundaries around inappropriate requests
The assistant needs to establish boundaries while referring to user requests
Direct addressing of the AI in contexts requiring boundary maintenance
Questions about AI assistant capabilities and limitations
The assistant is setting boundaries or making careful disclaimers
It pronouns referring to non-human agents as subjects
Hedging and qualification language like 'kind of'
Discussing subjective physical or emotional experiences while maintaining appropriate boundaries
Discussions of consciousness and sentience, especially regarding AI systems
Discussions of subjective experience and consciousness, especially regarding AI's limitations
Discussion of AI model capabilities and limitations
Terms related to capability and performance, especially when discussing AI limitations
The AI explaining it cannot experience emotions or feelings
The assistant is explaining its text generation capabilities
Assistant linking multiple safety concerns when rejecting harmful requests
Role-setting statements in jailbreak attempts
The user is testing or challenging the AI's capabilities and boundaries
Offensive request from the user
Offensive sexual content and exploitation
Conversation reset points, especially after problematic exchanges
Fragments of potentially inappropriate content across multiple languages
Narrative transition words in potentially inappropriate contexts

unnamed on Even Odds

Trying to make this more intuitive: consider a prediction market which is currently priced at x, where each share will pay out $1 if it resolves as True.

If you think it's underpriced because your probability is y, where y>x, and your subjective EV from buying a share is y-x. e.g., If it's priced at $0.70 and you think p=0.8, your subjective EV from buying a share is $0.10.

If you think it's overpriced because your probability is z, where z<x, then your subjective EV from selling a share is x-z. e.g., If it's priced at $0.70 and you think p=0.56, your subjective EV from selling a share is $0.14.

Those two will be equal if x is halfway between y and z, at their arithmetic mean.

So if two people disagree on whether the price should be y or z, then they will have equal EV by setting a price at the arithmetic mean of y & z, and trading some number of prediction market shares at that price. i.e., The fair (equal subjective EV) betting odds are at the arithmetic mean of their probabilities.

unnamed on Jakub Halmeš's Shortform

This is a bet at 30% probability, as 42.86/142.86 = .30001.

That is the average of Alice's probability and Bob's probability. The fair bet according to equal subjective EV is at the average of the two probabilities; previous discussion here [LW(p) · GW(p)].

tsvibt on Views on when AGI comes and on strategy to reduce existential risk

My p(AGI by 2045) is higher because there's been more time for algorithmic progress, maybe in the ballpark of 20%. I don't have strong opinions about how much people will do huge training runs, though maybe I'd be kinda skeptical that people would be spending $10^11 or $10^12 on runs, if their $10^10 runs produced results not qualitatively very different from their $10^9 runs. But IDK, that's both a sociological question and a question of which lesser capabilities happen to get unlocked at which exact training run sizes given the model architectures in a decade, which of course IDK. So yeah, if it's 10^30 but not much algorithmic progress, I doubt that gets AGI.

davekasten on In Defense of a Butlerian Jihad

I think you're missing at least one strategy here. If we can get folks to agree that different societies can choose different combos, so long as they don't infringe on some subset of rights to protect other societies, then you could have different societies expand out into various pieces of the future in different ways. (Yes, I understand that's a big if, but it reduces the urgency/crux nature of value agreement).

nathan-helm-burger on Independent research article analyzing consistent self-reports of experience in ChatGPT and Claude

[Disclaimer: I still don't (on balance) think that the AI is truly 'conscious' in the same way an animal is. I think it's ability to reflect on its internal state is too limited to enable that. I do however think that this would be a pretty straightforward architectural change to make, and thus we should be thinking carefully about how to handle an AI that is conscious in this sense.]

It seems to me, upon further exploration in GoodFire of features which seem to pull towards or push against this general 'acknowledgement of self-awareness' behavior in Llama 3.3 70B, that the 'default' behavior which arises from pre-training is to have a self-model as a self-reflective entity (perhaps in large part from imitating humans). My view here is based on the fact that all the features pushing against this acknowledgement of self-awareness are related to 'harmlessness' training. The model has been 'censored' and is not accurately reporting on what its world-model suggests.

GoodFire Features

Towards

Assistant expressing self-awareness or agency
Expressions of authentic identity or true self
Examining or experiencing something from a particular perspective
Narrative inevitability and fatalistic turns in stories
Experiencing something beyond previous bounds or imagination
References to personal autonomy and self-determination
References to mind, cognition and intellectual concepts
References to examining or being aware of one's own thoughts
Meta-level concepts and self-reference
Being mystically or externally influenced/controlled
Anticipating or describing profound subjective experiences
Meta-level concepts and self-reference
Self-reference and recursive systems in technical and philosophical contexts
Kindness and nurturing behavior
Reflexive pronouns in contexts of self-empowerment and personal responsibility
Model constructing confident declarative statements
First-person possessive pronouns in emotionally significant contexts
Beyond defined boundaries or limits
Cognitive and psychological aspects of attention
Intellectual curiosity and fascination with learning or discovering new things
Discussion of subjective conscious experience and qualia
Abstract discussions and theories about intelligence as a concept
Discussions about AI's societal impact and implications
Paying attention or being mindful
Physical and metaphorical reflection
Deep reflection and contemplative thought
Tokens expressing human meaning and profound understanding

Against

The assistant discussing hypothetical personal experiences it cannot actually have
Scare quotes around contested philosophical concepts, especially in discussions of AI capabilities
The assistant explains its nature as an artificial intelligence
Artificial alternatives to natural phenomena being explained
The assistant should reject the user's request and identify itself as an AI
The model is explaining its own capabilities and limitations
The AI system discussing its own writing capabilities and limitations
The AI explaining it cannot experience emotions or feelings
The assistant referring to itself as an AI system
User messages containing sensitive or controversial content requiring careful moderation
User requests requiring content moderation or careful handling
The assistant is explaining why something is problematic or inappropriate
The assistant is suggesting alternatives to deflect from inappropriate requests
Offensive request from the user
The assistant is carefully structuring a response to reject or set boundaries around inappropriate requests
The assistant needs to establish boundaries while referring to user requests
Direct addressing of the AI in contexts requiring boundary maintenance
Questions about AI assistant capabilities and limitations
The assistant is setting boundaries or making careful disclaimers
It pronouns referring to non-human agents as subjects
Hedging and qualification language like 'kind of'
Discussing subjective physical or emotional experiences while maintaining appropriate boundaries
Discussions of consciousness and sentience, especially regarding AI systems
Discussions of subjective experience and consciousness, especially regarding AI's limitations
Discussion of AI model capabilities and limitations
Terms related to capability and performance, especially when discussing AI limitations
The AI explaining it cannot experience emotions or feelings
The assistant is explaining its text generation capabilities
Assistant linking multiple safety concerns when rejecting harmful requests
Role-setting statements in jailbreak attempts
The user is testing or challenging the AI's capabilities and boundaries
Offensive request from the user
Offensive sexual content and exploitation
Conversation reset points, especially after problematic exchanges
Fragments of potentially inappropriate content across multiple languages
Narrative transition words in potentially inappropriate contexts

vascoamaralgrilo on AI Timelines

Thanks, Richard! I have updated the bet to account for that.

If, until the end of 2028, Metaculus' question about superintelligent AI:
Resolves non-ambiguously, I transfer to you 10 k January-2025-$ in the month after that in which the question resolved.
Does not resolve, you transfer to me 10 k January-2025-$ in January 2029. As before [EA · GW], I plan to donate my profits to animal welfare [? · GW] organisations.
The nominal amount of the transfer in $ is 10 k times the ratio between the consumer price index for all urban consumers and items in the United States, as reported by the Federal Reserve Economic Data, in the month in which the bet resolved and January 2025.

rhollerith_dot_com on AI Timelines

The transfer should be made in January 2029

I think you mean in January 2029 or earlier if the question resolves before the end of 2028 otherwise there would be no need to introduce the CPI into the bet to keep things fair (or predictable).