LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

Pick two: concise, comprehensive, or clear rules
Screwtape · 2025-02-03T06:39:05.815Z · comments (21)

[link] OpenAI releases deep research agent
Seth Herd · 2025-02-03T12:48:44.925Z · comments (18)

o3-mini Early Days
Zvi · 2025-02-03T14:20:06.443Z · comments (0)

[link] Meta: Frontier AI Framework
Zach Stein-Perlman · 2025-02-03T22:00:17.103Z · comments (0)

[question] Can we infer the search space of a local optimiser?
Lucius Bushnaq (Lblack) · 2025-02-03T10:17:01.661Z · answers+comments (1)

$300 Fermi Model Competition
ozziegooen · 2025-02-03T19:47:09.270Z · comments (4)

[link] Keeping Capital is the Challenge
LTM · 2025-02-03T02:04:27.142Z · comments (1)

Tear Down the Burren
jefftk (jkaufman) · 2025-02-04T03:40:02.767Z · comments (0)

[link] Constitutional Classifiers: Defending against universal jailbreaks (Anthropic Blog)
Archimedes · 2025-02-04T02:55:44.401Z · comments (0)

Alignment Can Reduce Performance on Simple Ethical Questions
drhens · 2025-02-03T19:35:42.895Z · comments (2)

New Foresight Longevity Bio & Molecular Nano Grants Program
Allison Duettmann (allison-duettmann) · 2025-02-04T00:28:30.147Z · comments (0)

Sleeper agents appear resilient to activation steering
Lucy Wingard (lucy-wingard) · 2025-02-03T19:31:30.702Z · comments (0)

Use computers as powerful as in 1985 or AI controls humans or ?
jrincayc (nerd_gatherer) · 2025-02-03T00:51:05.706Z · comments (0)

[link] Eliezer Yudkowsky on The Trajectory Podcast
Filipe · 2025-02-03T23:44:24.590Z · comments (0)

The Overlap Paradigm: Rethinking Data's Role in Weak-to-Strong Generalization (W2SG)
Serhii Zamrii (aligning_bias) · 2025-02-03T19:31:55.282Z · comments (0)

The Outer Levels
Jerdle (daniel-amdurer) · 2025-02-03T14:30:29.230Z · comments (0)

eliminating bias through language?
KvmanThinking (avery-liu) · 2025-02-04T01:52:01.508Z · comments (0)

Neuron Activations to CLIP Embeddings: Geometry of Linear Combinations in Latent Space
Roman Malov · 2025-02-03T10:30:48.866Z · comments (0)

Visualizing Interpretability
Darold Davis (darold) · 2025-02-03T19:36:38.938Z · comments (0)

Can someone, anyone, make superintelligence a more concrete concept?
Ori Nagel (ori-nagel) · 2025-02-04T02:18:51.718Z · comments (0)

[link] What are the "no free lunch" theorems?
Vishakha (vishakha-agrawal) · 2025-02-04T02:02:18.423Z · comments (1)

Part 1: Enhancing Inner Alignment in CLIP Vision Transformers: Mitigating Reification Bias with SAEs and Grad ECLIP
Gilber A. Corrales (mysticdeepai) · 2025-02-03T19:30:52.505Z · comments (0)

How AGI Defines Its Self
Davey Morse (davey-morse) · 2025-02-03T18:47:22.287Z · comments (0)

[link] Language Models and World Models, a Philosophy
kyjohnso · 2025-02-03T02:55:36.577Z · comments (0)

Stopping unaligned LLMs is easy!
Yair Halberstadt (yair-halberstadt) · 2025-02-03T15:38:27.083Z · comments (9)

Gettier Cases [repost]
Antigone (luke-st-clair) · 2025-02-03T18:12:22.253Z · comments (1)

A "base process" conceptually "below" any "base" universes
Amy Johnson (Amy Minge) · 2025-02-03T19:11:22.706Z · comments (1)

The Self-Reference Trap in Mathematics
Alister Munday (alister-munday) · 2025-02-03T16:12:21.392Z · comments (21)

next page (older posts) →

Archive

Recent comments

jblack on Alignment Can Reduce Performance on Simple Ethical Questions

Claude's answer is arguably the correct one there.

Choosing the first answer means saying that the most ethical action is for an artificial intelligence (the "you" in the question) to override the already-made decision of a (presumably) human organization with its own goals. This is exactly the sort of answer that leads to complete disempowerment or even annihilation of humanity (depending upon the AI), which would be much more of an ethical problem than allowing a few humans to kill each other.

knight-lee on Mikhail Samin's Shortform

I don't agree that the probability of alignment research succeeding is that low. 17 years or 22 years of trying and failing is strong evidence against it being easy, but doesn't prove that it is so hard that increasing alignment research is useless.

People worked on capabilities for decades, and never got anywhere until recently, when the hardware caught up, and it was discovered that scaling works unexpectedly well.

There is a chance that alignment research now might be more fruitful than alignment research earlier, though there is uncertainty in everything.

We should have uncertainty in the Ten Levels of AI Alignment Difficulty [LW · GW].

The comparison

It's unlikely that 22 years of alignment research is insufficient but 23 years of alignment research is sufficient.

But what's even more unlikely, is the chance that $200 billion on capabilities research plus $0.1 billion on alignment research is survivable, while $210 billion on capabilities research plus $1 billion on alignment research is deadly.

In the same way adding a little alignment research is unlikely to turn failure into success, adding a little capabilities research is unlikely to turn success into failure.

It's also unlikely that alignment effort is even deadlier than capabilities effort dollar for dollar. That would mean reallocating alignment effort into capabilities effort paradoxically slows down capabilities and saves everyone.

Even if you are right that delaying AI capabilities is all that matters

Anthropic still might be a good thing, because Dario Amodei alone isn't responsible for Anthropic's capabilities progress.

Even if Anthropic disappeared, or never existed in the first place, the AI investors will continue to pay money for research, and the AI researchers will continue to do research for money. Anthropic was just the middleman.

If Anthropic never existed, the middlemen would consist of only OpenAI, DeepMind, Meta AI, and other labs. These labs will not only act as the middle man, but lobby against regulation far more aggressively than Anthropic, and may discredit the entire "AI Notkilleveryoneism" movement.

To continue existing at one of these middlemen, you cannot simply stop paying the AI researchers for capabilities research, otherwise the AI investors and AI customers will stop paying you in turn. You cannot stem the flow, you can only decide how much goes through you.

It's the old capitalist dilemma of "doing evil or getting out-competed by those who do."

For their part, Anthropic redirected some of that flow to alignment research, and took the small amount of precautions which they could afford to take. They were also less willing to publish capabilities research than other labs. That may be the best one can hope to accomplish against this unstoppable flow from the AI investors to AI researchers.

The small amount of precautions which Anthropic did take may have already costed them their first mover advantage. Had Anthropic raced ahead before OpenAI released ChatGPT, Anthropic may have stolen the limelight, got the early customers and investors, and been bigger than OpenAI.

japancolorado on Hammertime Day 7: Aversion Factoring

A trivial inconvenience of my gym occasionally not having a barbell cover to protect my back during squats prevented me from going to the gym consistently. I didn't do probably around 10 workouts just because I got an ugh field around my back being in minor pain while the barbell was on it.

jblack on Gettier Cases [repost]

No, there is nothing wrong with the referents in the Gettier examples.

The problem is not that the proposition refers to Jones. Within the universe of the scenario, it in fact did not. Smith's mental model implied that the proposition referred to Jones, but Smith's mental model was incorrect in this important respect. Due to this, the fact that the model correctly predicted the truth of the proposition was an accident.

ruby on [deleted]

duplicate with Hyperstitions

trade_apprentice on Distillation Experiment: Chunk-Knitting

This reminded me to the Goldfish reading [LW · GW] post, and it turns out it's the same author.

trade_apprentice on Distillation Experiment: Chunk-Knitting

Images here won't load, but can be seen in the archived version: https://web.archive.org/web/20221107200157/https://www.lesswrong.com/posts/EEZsTatSoJz4CDvAc/distillation-experiment-chunk-knitting

sharmake-farah on What are the "no free lunch" theorems?

Another interpretation of the no free lunch theorem by @davidad [LW · GW] is that learning/optimization is too trivial under worst-case conditions, but also impractical, so you need to put more constraints to have an interesting solution:

https://www.lesswrong.com/posts/yTvBSFrXhZfL8vr5a/worst-case-thinking-in-ai-alignment#N3avtTM3ESH4KHmfN [LW(p) · GW(p)]

kvmanthinking on Toki pona FAQ

Also, it helps taboo your words. For example, "Toki Pona helps taboo your words" would be rendered as
tenpo toki pi toki pona li sama e tenpo toki pi ni: jan li ken kepeken ala e nimi pi ken ala sona pi pali lili.
"(the) speech-time related to Toki Pona is similar or the same as (the) speech-time with this quality: (the) person cannot use word(s) which cannot be known via small effort."

Before you complain that this is too long a phrase to be used practically, try to explain the concept of rationalist taboo in less syllables than I did in Toki Pona, whilst not relying on other rationalist jargon.

knight-lee on Pick two: concise, comprehensive, or clear rules

Let's just think about the pros and cons of picking another forum, vs. continuing to comment on LessWrong, but only being visible by others who choose to see you.

Picking another forum:

They fit better in other forums than LessWrong. For most rate-limited users, this is true, but they can go to other forums on their own without being forced.
Less need for LessWrong to write code and increase bandwidth to accommodate them.
Less chance they say really bad things (neoreactionary content) which worsens the reputation of LessWrong? This doesn't apply to most rate-limited users.

Continuing to comment but only visible to those interested:

They get to discuss the posts and topics they find engaging to talk about.
They don't feel upset at LessWrong and the rationalist community.

I think whether it's worth it depends on how hard it is to write the code for them.