LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

China-AI forecasts
[deleted] · 2024-02-25T16:49:33.652Z · comments (29)

[Interim research report] Evaluating the Goal-Directedness of Language Models
Rauno Arike (rauno-arike) · 2024-07-18T18:19:04.260Z · comments (4)

When Are Results from Computational Complexity Not Too Coarse?
Dalcy (Darcy) · 2024-07-03T19:06:44.953Z · comments (7)

[link] [Linkpost] George Mack's Razors
trevor (TrevorWiesinger) · 2023-11-27T17:53:45.065Z · comments (8)

[question] What progress have we made on automated auditing?
LawrenceC (LawChan) · 2024-07-06T01:49:43.714Z · answers+comments (1)

Whiteboard Pen Magazines are Useful
Johannes C. Mayer (johannes-c-mayer) · 2024-07-12T17:15:33.200Z · comments (6)

D&D.Sci(-fi): Colonizing the SuperHyperSphere [Evaluation and Ruleset]
abstractapplic · 2024-01-22T19:20:05.001Z · comments (7)

The Fundamental Theorem for measurable factor spaces
Matthias G. Mayer (matthias-georg-mayer) · 2023-11-12T19:25:25.583Z · comments (2)

[link] Generative ML in chemistry is bottlenecked by synthesis
Abhishaike Mahajan (abhishaike-mahajan) · 2024-09-16T16:31:34.801Z · comments (2)

Monthly Roundup #22: September 2024
Zvi · 2024-09-17T12:20:08.297Z · comments (9)

[link] Jailbreak steering generalization
Sarah Ball · 2024-06-20T17:25:24.110Z · comments (2)

International Scientific Report on the Safety of Advanced AI: Key Information
Aryeh Englander (alenglander) · 2024-05-18T01:45:10.194Z · comments (0)

Text Posts from the Kids Group: 2021
jefftk (jkaufman) · 2023-11-09T17:50:25.782Z · comments (1)

[link] The consistent guessing problem is easier than the halting problem
jessicata (jessica.liu.taylor) · 2024-05-20T04:02:03.865Z · comments (5)

[link] Simple Kelly betting in prediction markets
jessicata (jessica.liu.taylor) · 2024-03-06T18:59:18.243Z · comments (3)

How To Do Patching Fast
Joseph Miller (Josephm) · 2024-05-11T20:13:52.424Z · comments (6)

Monthly Roundup #14: January 2024
Zvi · 2024-01-24T12:50:09.231Z · comments (22)

Are we so good to simulate?
KatjaGrace · 2024-03-04T05:20:03.535Z · comments (24)

Aspiration-based Q-Learning
Clément Dumas (butanium) · 2023-10-27T14:42:03.292Z · comments (5)

[link] Elon files grave charges against OpenAI
mako yass (MakoYass) · 2024-03-01T17:42:13.963Z · comments (10)

Losing Faith In Contrarianism
omnizoid · 2024-04-25T20:53:34.842Z · comments (44)

Requirements for a Basin of Attraction to Alignment
RogerDearnaley (roger-d-1) · 2024-02-14T07:10:20.389Z · comments (9)

Stop talking about p(doom)
Isaac King (KingSupernova) · 2024-01-01T10:57:28.636Z · comments (22)

[link] Things You're Allowed to Do: At the Dentist
rbinnn · 2024-01-28T18:39:33.584Z · comments (16)

Is This Lie Detector Really Just a Lie Detector? An Investigation of LLM Probe Specificity.
Josh Levy (josh-levy) · 2024-06-04T15:45:54.399Z · comments (0)

AI #48: The Talk of Davos
Zvi · 2024-01-25T16:20:26.625Z · comments (9)

Inducing Unprompted Misalignment in LLMs
Sam Svenningsen (sven) · 2024-04-19T20:00:58.067Z · comments (6)

Possible OpenAI's Q* breakthrough and DeepMind's AlphaGo-type systems plus LLMs
Burny · 2023-11-23T03:16:09.358Z · comments (25)

[link] A High Decoupling Failure
Maxwell Tabarrok (maxwell-tabarrok) · 2024-04-14T19:46:09.552Z · comments (5)

[question] Is a random box of gas predictable after 20 seconds?
Thomas Kwa (thomas-kwa) · 2024-01-24T23:00:53.184Z · answers+comments (35)

[link] The Hippie Rabbit Hole -Nuggets of Gold in Rivers of Bullshit
Jonathan Moregård (JonathanMoregard) · 2024-01-05T18:27:01.769Z · comments (20)

UDT1.01: The Story So Far (1/10)
Diffractor · 2024-03-27T23:22:35.170Z · comments (6)

Review Report of Davidson on Takeoff Speeds (2023)
Trent Kannegieter · 2023-12-22T18:48:55.983Z · comments (11)

Interview with Vanessa Kosoy on the Value of Theoretical Research for AI
WillPetillo · 2023-12-04T22:58:40.005Z · comments (0)

AI #66: Oh to Be Less Online
Zvi · 2024-05-30T14:20:03.334Z · comments (6)

[link] Alignment Workshop talks
Richard_Ngo (ricraz) · 2023-09-28T18:26:30.250Z · comments (1)

Thousands of malicious actors on the future of AI misuse
Zershaaneh Qureshi (zershaaneh-qureshi) · 2024-04-01T10:08:42.357Z · comments (0)

Super-Exponential versus Exponential Growth in Compute Price-Performance
moridinamael · 2023-10-06T16:23:56.714Z · comments (25)

[link] Dall-E 3
p.b. · 2023-10-02T20:33:18.294Z · comments (9)

Ambiguity in Prediction Market Resolution is Still Harmful
aphyer · 2024-07-31T20:32:40.217Z · comments (17)

[link] Turning 22 in the Pre-Apocalypse
testingthewaters · 2024-08-22T20:28:25.794Z · comments (14)

[link] I didn't have to avoid you; I was just insecure
Chipmonk · 2024-08-17T16:41:50.237Z · comments (7)

Free Will and Dodging Anvils: AIXI Off-Policy
Cole Wyeth (Amyr) · 2024-08-29T22:42:24.485Z · comments (12)

Your LLM Judge may be biased
Henry Papadatos (henry) · 2024-03-29T16:39:22.534Z · comments (9)

Principles For Product Liability (With Application To AI)
johnswentworth · 2023-12-10T21:27:41.403Z · comments (55)

Deconfusing In-Context Learning
Arjun Panickssery (arjun-panickssery) · 2024-02-25T09:48:17.690Z · comments (1)

Striking Implications for Learning Theory, Interpretability — and Safety?
RogerDearnaley (roger-d-1) · 2024-01-05T08:46:58.915Z · comments (4)

Enhancing intelligence by banging your head on the wall
Bezzi · 2023-12-12T21:00:48.584Z · comments (26)

[link] Dark Skies Book Review
PeterMcCluskey · 2023-12-29T18:28:59.352Z · comments (3)

[question] Is there software to practice reading expressions?
lsusr · 2024-04-23T21:53:00.679Z · answers+comments (10)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

anthonyc on Struggling like a Shadowmoth

I think it's worth pointing out that the ultimate outcome here is "Jacen becomes a sith lord and has to be put down by Luke and his sister Jaina to save the New Republic/Galactic Alliance."

In order words, in the Star Wars universe, as usual, technology never works all that well for solving real problems, philosophy is dangerous for anyone who questions received wisdom, and human minds can't handle the complexity of reality or of using any non-Jedi philosophy for relating to The Force. Even an avowed pacifist will fall if they stray from the received path, even if it's only to learn other traditions that are already out there. The Light and Dark sides are closer to order/chaos than good/evil, change is chaos, and chaos corrupts, even when the change is otherwise good.

In other words, Jacen has more humility in the "I still have a lot to learn, I'm not a hero" sense, more growth mindset, and more tsuyoku naritai than possibly any other Jedi in the EU, and it destroys him.

To be clear, I think your subsequent analysis is good and insightful. Just, be very careful using anything from Star Wars as a guidepost, because more often than not the lesson is "Problems are caused by not following the established traditions, and even the plucky rebels are mostly only good when they're trying to restore what already once was, don't reach for much more than you're granted."

lucie-philippon on A Dozen Ways to Get More Dakka

I'm currently doing the Rethink Wellbeing IFS Course, and it's given me so much understanding of myself so quickly with no diminishing returns in sight yet, that it felt like the perfect time to apply More Dakka.

Therefore, I've used this list to generate ideas of how to apply More Dakka to my internal exploration, and found 30 strategies that sound super helpful :)

thomas-kwa on Thomas Kwa's Shortform

A Petrov Day carol

This is meant to be put to the Christmas carol "In the Bleak Midwinter" by Rossetti and Holst. Hopefully this can be occasionally sung like "For The Longest Term" is in EA spaces.

I tried to get Suno to sing this but can't yet get the lyrics, tune, and style all correct; this is the best attempt. I also will probably continue editing the lyrics because parts seem a bit rough, but I just wanted to write this up before everyone forgets about Petrov Day.

In the bleak midwinter
Petrov did forestall,
Smoke would block our sunlight,
Though it be mid-fall.
New York in desolation,
Moscow too,
In the bleak midwinter
We so nearly knew.
Computers blinked a warning:
Missiles on their way,
But Petrov chose to question
What the screens did say.
Had he sounded the alarm,
War would soon unfold,
Cities turned to ashes;
Ev'ry night so cold.
Toxic clouds loom o'er us,
Ash would fill the air,
Fields would yield no harvest,
Famine everywhere.
Radiation poisoning,
Sickness spreading wide,
Children weeping, starving,
With no place to hide.
But due to Petrov's wisdom
Spring will yet appear;
Petrov defied orders,
And reason conquered fear.
So we sing his story,
His deed we keep in mind;
From the bleak midwinter
He saved humankind.

mikhail-samin on [Completed] The 2024 Petrov Day Scenario

The game was very fun! I was playing as General Carter.

Some reflections:

I looked at the citizens' comments, and while some of them were notable (@Jesse Hoogland [LW · GW] calling for the other side to nuke us <3), I didn't find anything important after the game started- I considered the overall change in their karma if one or two sides get nuked, but comments from the citizens were not relevant to decision-making (including threats around reputation or post downvotes).
It was great to see the other side sharing my post [LW · GW] internally to calculate the probability of retaliation if we nuke them
It was a good idea to ask whether looking at the source code is ok and then share it, which made it clear Petrovs won't necessarily have much information on whether the missiles they see are real.
The incentives (+350..1000 LW karma) weren’t strong enough to make the generals try to win by making moves instead of winning by not playing, but I’m pretty happy with the outcome.
It’s awesome to be able to have transparent and legible decision-making processes and trust each other’s commitments.
One of the Petrovs preferred defeat to mutual destruction- I'm curious whether they'd report nukes if they were sure the nukes were real.

I'd claim that we kinda won the soft power competition:

we proposed commitments to not first-strike;
we bribed everyone (and then the whole website went down, but funnily enough, that didn’t affect our war room and diplomatic channel- deep in our bunkers, we were somehow protected from the LW downtime);
we proposed commitments to report through the diplomatic channel if someone on our side made a launch, which disincentivized individual generals from unilaterally launching the nukes, allowed Petrovs to ignore scary incoming missiles, and possibly was necessary to win the game;
finally, after a general on their side said they’ll triumph economically and culturally, General Brooks wrote a poem, and I generated a cultural gift, which made generals on the other side feel inspired. That was very wholesome and was highlighted in Ben Paces’s comment [LW(p) · GW(p)] after the game ended. I think our side triumphed here!

Thanks to everyone who participated for the experience!

mr-hire on [Completed] The 2024 Petrov Day Scenario

I happened to log on at that time and thought someone had launched a nuke

micheljusten on "Can AI Scaling Continue Through 2030?", Epoch AI (yes)

Is anyone aware of any counterarguments that have been written up against this post's thesis?

anthonyc on Is cybercrime really costing trillions per year?

As @faul_sname [LW · GW] notes, the $8T number (or $9.5T from on source cited in that wikipedia article) isn't plausible. At least, not without some very generous definitions of "cybercrime," "is," "costing," and "trillions."

By which I mean: if you squint really hard, and count all the money and time everyone everywhere is spending on all (digital and non-digital) cybercrime prevention and countermeasures, and try to estimate all the extra things people could do to generate value if they didn't have to worry about cybercrime, then sure, maybe you could get numbers up to a few trillion.

But that's a bit like saying the cost of other crime includes all spending on the criminal and civil justice system, all spending on private security and surveillance by individuals and businesses, the entire salary of every cashier (since they wouldn't be needed if people would just count up their own purchases and leave payment), and every time someone doesn't do something because they don't want to go out wandering by themselves at 3am. Not actually a useful metric for deciding where it's worthwhile to increase or decrease resource allocations or to make regulatory decisions.

sil-ver on [Intuitive self-models] 2. Conscious Awareness

…But interestingly, if I then immediately ask you what you were experiencing just now, you won’t describe it as above. Instead you’ll say that you were hearing “sm-” at t=0 and “-mi” at t=0.2 and “-ile” at t=0.4. In other words, you’ll recall it in terms of the time-course of the generative model that ultimately turned out to be the best explanation.

In my review of Dennett's book [? · GW], I argued that this doesn't disprove the "there's a well-defined stream of consciousness" hypothesis since it could be the case that memory is overwritten (i.e., you first hear "sm" not realizing what you're hearing, but then when you hear "smile", your brain deletes that part from memory).

Since then I've gotten more cynical and would now argue that there's nothing to explain because there are no proper examples of revisionist memory.^[1] Because here's the thing -- I agree that if you ask someone what they experience, they're probably going to respond as you say in the quote. Because they're not going to think much about it, and this is just the most natural thing to reply. But do you actually remember understanding "sm" at the point when you first heard it? Because I don't. If I think about what happened after the fact, I have a subtle sensation of understand the word, and I can vaguely recall that I've heard a sound at the beginning of the word, but I don't remember being able to place what it is at the time.

I've just tried to introspect on this listening to an audio conversation, and yeah, I don't have any such memories. I also tried it with slowed audio. I guess reply here if anyone thinks they genuinely misremember this if they pay attention.

The color phi phenomenon doesn't work for or anyone I've asked so at this point my assumption is that it's just not a real result (kudos for not relying on it here). I think Dennett's book is full of terrible epistemology so I'm surprised that he's included it anyway. ↩︎

tricular on What is it to solve the alignment problem?

I found Section 6 particularly interesting! Here's how I understand it:

Most of our worries about AI stem from catastrophic scenarios, like AI killing everyone.
It seems that to prevent these outcomes, we don’t need to do extremely complex things, such as pointing AI towards the extrapolated values of humanity.
Therefore, we don’t need to focus on instilling a perfect copy of human values into AI systems.

From my understanding, this context relates to the "be careful what you wish for" problem with AI, where AI could optimize in dangerous or unexpected ways. There's a race here: can we control AI well enough to still gain its benefits?

However, I don't think you've provided enough evidence that this level of control is actually possible. Additionally, there’s the issue of deceptive alignment—I’m not convinced we could manage this "race" without receiving some kind of feedback from AI systems.

Finally, the description of the oracle AI in this section seems quite similar to the idea of corrigible AI.

mateusz-baginski on peterbarnett's Shortform

I propose "token surprise" (as in type-token distinction). You expected this general type of thing but not that Ivanka would be one of the tokens instantiating it.