LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Mission Impossible: Dead Reckoning Part 1 AI Takeaways
Zvi · 2023-11-01T12:52:29.341Z · comments (13)

Value learning in the absence of ground truth
Joel_Saarinen (joel_saarinen) · 2024-02-05T18:56:02.260Z · comments (8)

Run evals on base models too!
orthonormal · 2024-04-04T18:43:25.468Z · comments (6)

[link] Immortality or death by AGI
ImmortalityOrDeathByAGI · 2023-09-21T23:59:59.545Z · comments (30)

Fund Transit With Development
jefftk (jkaufman) · 2023-09-22T11:10:05.645Z · comments (22)

What distinguishes "early", "mid" and "end" games?
Raemon · 2024-06-21T17:41:30.816Z · comments (22)

How do we know that "good research" is good? (aka "direct evaluation" vs "eigen-evaluation")
Ruby · 2024-07-19T00:31:38.332Z · comments (21)

Sora What
Zvi · 2024-02-22T18:10:05.397Z · comments (3)

[link] [Paper Blogpost] When Your AIs Deceive You: Challenges with Partial Observability in RLHF
Leon Lang (leon-lang) · 2024-10-22T13:57:41.125Z · comments (0)

D&D.Sci Coliseum: Arena of Data Evaluation and Ruleset
aphyer · 2024-10-29T01:21:03.075Z · comments (12)

Caring about excellence
owencb · 2024-07-22T14:24:37.892Z · comments (4)

4. Existing Writing on Corrigibility
Max Harms (max-harms) · 2024-06-10T14:08:35.590Z · comments (13)

shortest goddamn bayes guide ever
lukehmiles (lcmgcd) · 2024-05-10T07:06:23.734Z · comments (8)

[Intuitive self-models] 8. Rooting Out Free Will Intuitions
Steven Byrnes (steve2152) · 2024-11-04T18:16:26.736Z · comments (13)

[link] A Good Explanation of Differential Gears
Johannes C. Mayer (johannes-c-mayer) · 2023-10-19T23:07:46.354Z · comments (4)

Goodbye, Shoggoth: The Stage, its Animatronics, & the Puppeteer – a New Metaphor
RogerDearnaley (roger-d-1) · 2024-01-09T20:42:28.349Z · comments (8)

[link] Constructive Cauchy sequences vs. Dedekind cuts
jessicata (jessica.liu.taylor) · 2024-03-14T23:04:07.300Z · comments (23)

Targeted Manipulation and Deception Emerge when Optimizing LLMs for User Feedback
Marcus Williams · 2024-11-07T15:39:06.854Z · comments (6)

Critiques of the AI control agenda
Jozdien · 2024-02-14T19:25:04.105Z · comments (14)

Demis Hassabis and Geoffrey Hinton Awarded Nobel Prizes
Anna Gajdova (anna-gajdova) · 2024-10-09T12:56:24.856Z · comments (14)

[link] on neodymium magnets
bhauth · 2024-01-30T15:58:24.088Z · comments (6)

Environmental allergies are curable? (Sublingual immunotherapy)
Chipmonk · 2023-12-26T19:05:08.880Z · comments (10)

Extended Interview with Zhukeepa on Religion
Ben Pace (Benito) · 2024-08-18T03:19:05.625Z · comments (59)

Toward Safety Case Inspired Basic Research
Lucas Teixeira · 2024-10-31T23:06:32.854Z · comments (2)

[link] Will releasing the weights of large language models grant widespread access to pandemic agents?
jefftk (jkaufman) · 2023-10-30T18:22:59.677Z · comments (25)

The predictive power of dissipative adaptation
dr_s · 2023-12-17T14:01:31.568Z · comments (14)

Saving the world sucks
Defective Altruism (Elijah Bodden) · 2024-01-10T05:55:46.504Z · comments (29)

Higher-effort summer solstice: What if we used AI (i.e., Angel Island)?
Rachel Shu (wearsshoes) · 2024-06-25T01:35:54.064Z · comments (9)

Some costs of superposition
Linda Linsefors · 2024-03-03T16:08:20.674Z · comments (11)

Rapid capability gain around supergenius level seems probable even without intelligence needing to improve intelligence
Towards_Keeperhood (Simon Skade) · 2024-05-06T17:09:10.729Z · comments (16)

Decision Theory in Space
lsusr · 2024-08-18T07:02:11.847Z · comments (18)

Big Picture AI Safety: Introduction
EuanMcLean (euanmclean) · 2024-05-23T11:15:44.037Z · comments (7)

[link] For Civilization and Against Niceness
Gabriel Alfour (gabriel-alfour-1) · 2023-11-20T10:56:20.352Z · comments (14)

AI Safety 101 : Capabilities - Human Level AI, What? How? and When?
markov (markovial) · 2024-03-07T17:29:53.260Z · comments (8)

AI #88: Thanks for the Memos
Zvi · 2024-10-31T15:00:07.412Z · comments (5)

[link] If Clarity Seems Like Death to Them
Zack_M_Davis · 2023-12-30T17:40:42.622Z · comments (191)

On OpenAI’s Model Spec
Zvi · 2024-06-21T13:00:03.014Z · comments (3)

AI #68: Remarkably Reasonable Reactions
Zvi · 2024-06-13T16:30:02.969Z · comments (11)

Enriched tab is now the default LW Frontpage experience for logged-in users
Ruby · 2024-06-21T00:09:30.441Z · comments (27)

Forecasting One-Shot Games
Raemon · 2024-08-31T23:10:05.475Z · comments (0)

Thoughts on "The Offense-Defense Balance Rarely Changes"
Cullen (Cullen_OKeefe) · 2024-02-12T03:26:50.662Z · comments (4)

On the Proposed California SB 1047
Zvi · 2024-02-12T16:40:04.854Z · comments (18)

Vaniver's thoughts on Anthropic's RSP
Vaniver · 2023-10-28T21:06:07.323Z · comments (4)

AI #41: Bring in the Other Gemini
Zvi · 2023-12-07T15:10:05.552Z · comments (16)

[link] Metascience of the Vesuvius Challenge
Maxwell Tabarrok (maxwell-tabarrok) · 2024-03-30T12:02:38.978Z · comments (2)

Bounty for Evidence on Some of Palisade Research's Beliefs
benwr · 2024-09-23T20:01:20.917Z · comments (4)

[link] MIRI's September 2024 newsletter
Harlan · 2024-09-16T18:15:40.785Z · comments (0)

I finally got ChatGPT to sound like me
lsusr · 2024-09-17T09:39:59.415Z · comments (18)

[link] Robin Hanson AI X-Risk Debate — Highlights and Analysis
Liron · 2024-07-12T21:31:02.222Z · comments (7)

Trustworthy and untrustworthy models
Olli Järviniemi (jarviniemi) · 2024-08-19T16:27:11.088Z · comments (3)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

rohans on o1 is a bad idea

My best guess is that there was process supervision for capabilities but not for safety. i.e. training to make the CoT useful for solving problems, but not for "policy compliance or user preferences." This way they make it useful, and they don't incentivize it to hide dangerous thoughts. I'm not confident about this though.

satron on Sabotage Evaluations for Frontier Models

"One thing I appreciate about Buck/Ryan's comms around AI control is that they explicitly acknowledge that they believe control will fail for sufficiently intelligent systems."

Does that mean that they believe that after a certain point we would lose control over AI? I am new to this field, but doesn't this fact spell doom for humanity?

niplav on shortplav

There are two pre-existing Manifold Markets questions on whether LLM scaling laws will hold until 2027 and 2028, respectively, with currently little trading volume.

akash-wasil on Lao Mein's Shortform

and recently founded another AI company

Potentially a hot take, but I feel like xAI's contributions to race dynamics (at least thus far) have been relatively trivial. I am usually skeptical of the whole "I need to start an AI company to have a seat at the table", but I do imagine that Elon owning an AI company strengthens his voice. And I think his AI-related comms have mostly been used to (a) raise awareness about AI risk, (b) raise concerns about OpenAI/Altman, and (c) endorse SB1047 [which he did even faster and less ambiguously than Anthropic].

The counterargument here is that maybe if xAI was in 1st place, Elon's positions would shift. I find this plausible, but I also find it plausible that Musk (a) actually cares a lot about AI safety, (b) doesn't trust the other players in the race, and (c) is more likely to use his influence to help policymakers understand AI risk than any of the other lab CEOs.

alexander-gietelink-oldenziel on Alexander Gietelink Oldenziel's Shortform

[this is a draft. I strongly welcome comments]

The Latent Military Realities of the Coming Taiwan Crisis

A blockade of Taiwan seems significantly more likely than a full-scale invasion. The US's non-intervention in Ukraine suggests similar restraint might occur with Taiwan.

Nevertheless, Metaculus predicts a 65% chance of US military response to a Chinese invasion and separately gives 20-50% for some kind of Chinese military intervention by 2035. Let us imagine that the worst comes to pass and China and the United States are engaged in a hot war?

China's national memory of the 'century of humiliation' deeply shapes its modern strategic thinking. How many Westerners could faithfully recount the events of the Opium Wars? How many have even heard of the Boxer Rebellion, the Eight-nation alliance, the Tai-Ping rebellion? Yet these events are the core curriculum in Chinese education.

Chinese revanchism toward the West enjoys broad public support. The CCP repression of Chinese public opinion likely understates how popular this view is. CCP officals actually have more dovish view than the general public according to polling.

As other pieces of evidence: historically, the Boxer rebellion was a grass-root phenomenon. Movies depicting conflict between China and America consistently draw large audiences and positive reception. China has an absolute miniscule number of foreigners per capita and this has fallen after the pandemic and never rebounded.

China is the only nuclear power that has explicitly disavowed a nuclear first strike. It currently has a remarkably small nuclear stockpile (~200 warheads). With the increased sensor capabilities in recent years China has become vulnerable to a US nuclear first-strike destroying her launchers before she can react. This is likely part of the reason for a major build-up of her nuclear stockpile in recent years.

It is plausible that there will be a hot war without the use of nuclear weapons. The closest historical case is of course the Korea War, the last indirect conflict between the US and China, ended in stalemate despite massive US economic superiority. Today, that economic gap has largely closed - China's economy is 1.25x larger in PPP terms, while the US is only 40% bigger in nominal GDP.

How would a conventional US-China war look like? What can be learned from past conflicts?

The 1973 Falklands War between the UK and Argentina is the last air-naval war between near-peer powers. The 50-year gap since then equals the time between the US Civil War and WWI. Naval and air warfare technology advances much faster than land warfare - historically, this was tested through frequent conflicts. Today's unprecedented peace means we're largely guessing which naval technologies and doctrines will actually work. While land warfare in Ukraine looks like 'WWI with drones', naval warfare has likely seen much more dramatic changes.

Naval technology advances create bigger power gaps than land warfare. The Opium Wars showed this dramatically - British steamships simply sailed up Chinese rivers unopposed, forcing humiliating treaties on a land power.

Air warfare technology gaps may be even more extreme than naval ones. Modern F-35s achieve 20:0 kill ratios against previous-generation fighters in exercises.

The Arab-Israeli wars, and the Gulf war suggests some lessons about modern air warfare. These conflicts showed that air superiority is typically won or lost very quickly: initial strikes on airbases can be decisive, and most aircraft losses happen on the ground rather than in dogfights. This remains such a concern that it’s US Air Force doctrine to rotate aircraft between airfields. More broadly, these conflicts suggest that air warfare produces more decisive, one-sided outcomes than land battles - when one side gains air superiority, the results can be devastating.

Wild Cards

Drones and the Transparent Battlefield

Drones represent warfare's future, yet both sides underinvest. While the US military has only 10,000 small drones and 400 large ones, Ukraine alone produces 1-4 million drones annually. China leads in mass-producing small drones but lacks integration doctrine.The Ukraine war revealed how modern sensors create a 'transparent battlefield' where hiding large forces is impossible. Drones might make it trivially easy to find (and even destroy) submarines and surface ships.

Submarines

Since WWI Submarines are the kings of the sea. It is plausibly the case that submarines are dominant. A single torpedo from a submarine will sink an aircraft carrier - in exercises, small diesel-electric submarines regularly 'sink' entire carrier groups. These submarines can hide in sonar deadzones, regions where water temperature and salinity create acoustic blind spots.

Are Aircraft Carriers obsolete?

China now sports hypersonic missiles that at least in theory could disable an aircraft carrier from 1500 miles or beyond. On the flip side, missile defense effectiveness has increased dramatically, hypersonic missile effectiveness may be overstated. As a point of evidence of the remaining importance of air craft carriers, China is building her own fleet of aircraft carriers.

Military Competence Wildcard:

Peace means we don't know the true combat effectiveness of either military. Authoritarian militaries often suffer from corruption and incompetence - Chinese troops have been caught loading missile launchers with water instead of fuel during exercises [Comment 5: Need source]. But the US military also shows worrying signs: bureaucratic bloat, lack of recent peer conflict experience, and questions about training quality. Both militaries' actual combat effectiveness remains a major unknown. The US Navy now has more admirals than warships.

Stealth bombers and JASSM-ER

We don’t know what the real dominant weapon in a real conventional 21-century naval war between peers would be, but a plausible guess for a game-changing technology are Stealth Bombers & Stealth missiles.

The obscene cost made the B2 stealth bombers even less popular than the ever-more-costly jet fighters and the project was prematurely halted at 21 platforms. Despite the obscene cost it’s plausible that the B2 and it’s younger cousin the B21 is worth all the money and then some.

Unlike fighters a stealth bombers has something ‘true stealth’. While a stealth fighter like a F35 is better thought of as a ‘low-observable’ aircraft that is difficult to target-lock by short-wave radar but easily detectable by long-wave radar, the B2 stealth bomber is opaque to long-wave radar too. Stealth bombers can also carry air-to-air missiles so may even be effective against fighters. Manoeuvrability and speed, long the defining hallmark of fighters has become less important with the advent of highly accurate homing missiles.

Lockheed Martin has developed the JASSM-ER, a stealth missile with a range up to 900 miles. A B2 bomber has a range of up to something like 4000 miles. For comparison, the range of fighters is something in the range of 400-1200 miles.

A single hit of a JASSM-ER is probably a mission kill on a naval vessel. A B2 can carry up to 16 of these missiles. This means that a single squadron of stealth bombers taking off from a base in Guam could potentially wipe out half a fleet in a single sortie.

***********

And of course last but not least, the greatest wildcard of them all:

AGI.

I will refrain from speculating on the military implications of AGI.

Clear China Disadvantages, US Advantages:

Amphibious assaults are inherently difficult A full Taiwan invasion faces massive logistical hurdles. Taiwan could perhaps muster 500,000 defenders under full mobilization, requiring 1.5 million Chinese troops for a successful assault under standard military doctrine. For perspective, D-Day - history's largest amphibious invasion - landed only 133,000 troops.

China's energy vulnerability is significant - China imports 70% of its oil and 25% of its gas by sea. While Russia provides 20-40% of these imports and could increase supply, the US could severely disrupt China's energy access.

China's regional diplomacy has backfired - Chinas has alienated virtually all its neighbours. The US has basing options in Japan, Australia, Philippines, and across Pacific islands.

US carrier advantage The US operates 11 nuclear supercarriers with extensive blue-water experience. China has two smaller carriers active, one in trials, and one nuclear carrier under construction. The big questionmark is whether carriers might be obsolete or not.

US Stealth bomber advantage: The US leads with 21 B1s and 100 new B21s ordered, while China's H10 program still lags behind.

US submarine advantage US submarines are significantly technologically ahead. Putin selling Russian submarine technology might nullify some of that advantage, as might new cheap sea drones. Geographically, it’s hard for Chinese submarines to escape the China sea unnoticed.

Clear China Advantages, US Disadvantages:

Geography favors China Taiwan lies just 100 miles from mainland China while US forces must cross the Pacific. The massive Chinese Rocket Force can launch thousands of missiles from secure mainland positions.

Advanced missile capabilities Massive conventional rocket force plus claimed hypersonic missile capabilities [Comment : find skeptic hypersonic missile video]

China has been preparing for many years China has established numerous artificial islands with airfields throughout the region. They've successfully stolen F35 plans and are producing their own version at scale. The Chinese governments has built up enormous national emergency storages of essential resources in preparation for the (inevitable) conflict. Bringing Taiwan back into the fold has been a primary driver of policy for decades.

US Shipbuilding The US shipbuilding industry has collapsed to just 0.1% of global production, while China, South Korea, and Japan dominate with 35-40%, 25-30%, and 20-25% respectively.

abandon on Habryka's Shortform Feed

TechEmails' substack post with the same emails in a more centralized format includes citations; apparently these are mostly from Elon Musk, et al. v. Samuel Altman, et al. (2024)

abandon on dirk's Shortform

I more-or-less endorse the model described in larger language models may disappoint you [or, an eternally unfinished draft] [LW · GW], and moreover I think language is an inherently lossy instrument such that the minimally-lossy model won't have perfectly learned the causal processes or whatever behind its production.

akash-wasil on Making a conservative case for alignment

I agree with many points here and have been excited about AE Studio's outreach. Quick thoughts on China/international AI governance:

I think some international AI governance proposals have some sort of "kum ba yah, we'll all just get along" flavor/tone to them, or some sort of "we should do this because it's best for the world as a whole" vibe. This isn't even Dem-coded so much as it is naive-coded, especially in DC circles.
US foreign policy is dominated primarily by concerns about US interests. Other considerations can matter, but they are not the dominant driving force. My impression is that this is true within both parties (with a few exceptions).
I think folks interested in international AI governance should study international security agreements and try to get a better understanding of relevant historical case studies. Lots of stuff to absorb from the Cold War, the Iran Nuclear Deal, US-China relations over the last several decades, etc. (I've been doing this & have found it quite helpful.)
Strong Republican leaders can still engage in bilateral/multilateral agreements that serve US interests. Recall that Reagan negotiated arms control agreements with the Soviet Union, and the (first) Trump Administration facilitated the Abraham Accords. Being "tough on China" doesn't mean "there are literally no circumstances in which I would be willing to sign a deal with China." (But there likely does have to be a clear case that the deal serves US interests, has appropriate verification methods, etc.)

matrice-jacobine on Proposing the Conditional AI Safety Treaty (linkpost TIME)

Fortunately, the existential risks posed by AI are recognized by many close to President-elect Donald Trump. His daughter Ivanka seems to see the urgency of the problem. Elon Musk, a critical Trump backer, has been outspoken about the civilizational risks for many years, and recently supported California’s legislative push to safety-test AI. Even the right-wing Tucker Carlson provided common-sense commentary when he said: “So I don’t know why we’re sitting back and allowing this to happen, if we really believe it will extinguish the human race or enslave the human race. Like, how can that be good?” For his part, Trump has expressed concern about the risks posed by AI, too.

This is a strange contrast from the rest of the article, considering both Donald and Ivanka Trump's positions are largely informed by the "situational awareness" position arguing that the US should develop AGI before China to ensure US victory over China – which is explicitly the position Tegmark [LW · GW] and Leahy argue against (and consider existentially harmful) when they call to stop work on AGI and work on international co-operation to restrict it and develop tool AI instead.

I still see this kind of confusion between the two positions a fair bit and it is extremely strange. It's like if back in the original Cold War people couldn't tell the difference between anti-communist hawks and the Bulletin of the Atomic Scientists (let alone anti-war hippies) because technically they both considered nuclear arms race to be very important for the future of humanity.

abramdemski on o1 is a bad idea

Process supervision would be helpful, particularly if the model is punished for sensitive dependence on paraphrasings of the CoT, or rewarded for producing CoT which is useful even after paraphrasing.

One of the concerns that I have is that (I believe) the LLM before the CoT-training will be sensitive to some non-human-readable patterns, similar to adversarial examples for image recognition. The CoT training will encourage any patterns which help reach the desired conclusions, whether those patterns are human-readable or not.

A lot of possible process-supervision procedures will not do anything to help prevent this. For example, rewarding CoT for being human-understandable won't help, because the reward signal there can miss the subtle patterns being used. It only avoids the outcome where the CoT looks non-human-readable; it doesn't avoid the outcome where the CoT actually contains a bunch of human-unreadable patterns.

Using paraphrasing to avoid this is heavily dependent on the strength of the paraphraser. The paraphraser needs to have really good coverage of alternative ways to convey the same semantic information, so that any non-human-readable patterns get randomized. The paraphraser itself can't be vulnerable to steering by adversarial inputs.