LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

AISafety.com – Resources for AI Safety
Søren Elverlin (soren-elverlin-1) · 2024-05-17T15:57:11.712Z · comments (3)

[link] Palworld development blog post
bhauth · 2024-01-28T05:56:19.984Z · comments (12)

[link] New voluntary commitments (AI Seoul Summit)
Zach Stein-Perlman · 2024-05-21T11:00:41.794Z · comments (17)

3C's: A Recipe For Mathing Concepts
johnswentworth · 2024-07-03T01:06:11.944Z · comments (5)

If we solve alignment, do we die anyway?
Seth Herd · 2024-08-23T13:13:10.933Z · comments (120)

Darwinian Traps and Existential Risks
KristianRonn · 2024-08-25T22:37:14.142Z · comments (14)

Scaffolding for "Noticing Metacognition"
Raemon · 2024-10-09T17:54:13.657Z · comments (4)

Testing which LLM architectures can do hidden serial reasoning
Filip Sondej · 2024-12-16T13:48:34.204Z · comments (9)

The Gemini Incident
Zvi · 2024-02-22T21:00:04.594Z · comments (19)

[link] Not every accommodation is a Curb Cut Effect: The Handicapped Parking Effect, the Clapper Effect, and more
Michael Cohn (michael-cohn) · 2024-09-15T05:27:36.691Z · comments (39)

[link] Gwern Branwen interview on Dwarkesh Patel’s podcast: “How an Anonymous Researcher Predicted AI's Trajectory”
Said Achmiz (SaidAchmiz) · 2024-11-14T23:53:34.922Z · comments (0)

Survey of 2,778 AI authors: six parts in pictures
KatjaGrace · 2024-01-06T04:43:34.590Z · comments (1)

Quick look: applications of chaos theory
Elizabeth (pktechgirl) · 2024-08-18T15:00:07.853Z · comments (51)

Some arguments against a land value tax
Matthew Barnett (matthew-barnett) · 2024-12-29T15:17:00.740Z · comments (37)

Rationality Quotes - Fall 2024
Screwtape · 2024-10-10T18:37:55.013Z · comments (26)

LessWrong Community Weekend 2024, open for applications
UnplannedCauliflower · 2024-05-01T10:18:21.992Z · comments (2)

[link] My thesis (Algorithmic Bayesian Epistemology) explained in more depth
Eric Neyman (UnexpectedValues) · 2024-05-09T19:43:16.543Z · comments (4)

[link] MIRI's May 2024 Newsletter
Harlan · 2024-05-15T00:13:30.153Z · comments (1)

Graceful Degradation
Screwtape · 2024-11-05T23:57:53.362Z · comments (8)

[link] AI takeoff and nuclear war
owencb · 2024-06-11T19:36:24.710Z · comments (6)

Should there be just one western AGI project?
rosehadshar · 2024-12-03T10:11:17.914Z · comments (72)

Dentistry, Oral Surgeons, and the Inefficiency of Small Markets
GeneSmith · 2024-11-01T17:26:06.466Z · comments (16)

[link] Is "superhuman" AI forecasting BS? Some experiments on the "539" bot from the Centre for AI Safety
titotal (lombertini) · 2024-09-18T13:07:40.754Z · comments (3)

[link] Best-of-N Jailbreaking
John Hughes (john-hughes) · 2024-12-14T04:58:48.974Z · comments (5)

Introducing Squiggle AI
ozziegooen · 2025-01-03T17:53:42.915Z · comments (13)

Effective Evil's AI Misalignment Plan
lsusr · 2024-12-15T07:39:34.046Z · comments (9)

Corrigibility = Tool-ness?
johnswentworth · 2024-06-28T01:19:48.883Z · comments (8)

[link] Gwern: Why So Few Matt Levines?
kave · 2024-10-29T01:07:27.564Z · comments (10)

A couple productivity tips for overthinkers
Steven Byrnes (steve2152) · 2024-04-20T16:05:50.332Z · comments (13)

Reward hacking behavior can generalize across tasks
Kei · 2024-05-28T16:33:50.674Z · comments (5)

The Obliqueness Thesis
jessicata (jessica.liu.taylor) · 2024-09-19T00:26:30.677Z · comments (17)

Attention SAEs Scale to GPT-2 Small
Connor Kissane (ckkissane) · 2024-02-03T06:50:22.583Z · comments (4)

The Parable Of The Fallen Pendulum - Part 2
johnswentworth · 2024-03-12T21:41:30.180Z · comments (8)

[Full Post] Progress Update #1 from the GDM Mech Interp Team
Neel Nanda (neel-nanda-1) · 2024-04-19T19:06:59.185Z · comments (10)

Secondary forces of debt
KatjaGrace · 2024-06-27T21:10:06.131Z · comments (18)

LLM chatbots have ~half of the kinds of "consciousness" that humans believe in. Humans should avoid going crazy about that.
Andrew_Critch · 2024-11-22T03:26:11.681Z · comments (53)

ACX Covid Origins Post convinced readers
ErnestScribbler · 2024-05-01T13:06:20.818Z · comments (7)

Bitter lessons about lucid dreaming
avturchin · 2024-10-16T21:27:04.725Z · comments (62)

In Defense of Open-Minded UDT
abramdemski · 2024-08-12T18:27:36.220Z · comments (27)

[link] [Linkpost] Practically-A-Book Review: Rootclaim $100,000 Lab Leak Debate
trevor (TrevorWiesinger) · 2024-03-28T16:03:36.452Z · comments (22)

Creating unrestricted AI Agents with Command R+
Simon Lermen (dalasnoin) · 2024-04-16T14:52:50.917Z · comments (13)

Questions for labs
Zach Stein-Perlman · 2024-04-30T22:15:55.362Z · comments (11)

Mid-conditional love
KatjaGrace · 2024-04-17T04:00:08.341Z · comments (21)

The Packaging and the Payload
Screwtape · 2024-11-12T03:07:37.209Z · comments (1)

[question] What could a policy banning AGI look like?
TsviBT · 2024-03-13T14:19:07.783Z · answers+comments (23)

🇫🇷 Announcing CeSIA: The French Center for AI Safety
Charbel-Raphaël (charbel-raphael-segerie) · 2024-12-20T14:17:13.104Z · comments (0)

The 2023 LessWrong Review: The Basic Ask
Raemon · 2024-12-04T19:52:40.435Z · comments (25)

[link] Soft Nationalization: how the USG will control AI labs
Deric Cheng (deric-cheng) · 2024-08-27T15:11:14.601Z · comments (7)

2025 Prediction Thread
habryka (habryka4) · 2024-12-30T01:50:14.216Z · comments (17)

On Claude 3.0
Zvi · 2024-03-06T18:50:04.766Z · comments (5)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

ryan_b on Childhood and Education #8: Dealing with the Internet

I agree fake pictures are harder to threaten with. But consider that the deepfake method makes everyone a potential target, rather than only targeting the population who would fall for the relationship side of the scam.

There are other reasons I think it would be grimly effective, but I am not about to spell it out for team evil.

aditya_sk on Launching Applications for the Global AI Safety Fellowship 2025!

Thanks for the feedback! Quite helpful to get more context.
Quick responses:
1) Yes, we did intend for the hook to be eye-grabbing/mildly salesy, as this is part of our promotional material shared across different platforms, and we were hoping this would be effective at garnering the interest of talented individuals and encourage them to work on AIS. Though we didn't think it was dishonest/false, instead we designed it be short but effective.
2)It was a sincere mistake that the post was made from an org account.
3) We missed gauging issues with the use of 'leading AI Safety organisations' , however, I think you are right. We could have been more cautious in how we framed it.
4) We have taken a note of stating the scope of our efforts and intend to factor this into consideration when designing our next outreach framing.

Thanks for your inputs!

cstinesublime on Is my distinctiveness evidence for being in a simulation?

Why are you so sure it's a computer simulation? How do you know it's not a drug trip? A fever dream? An unfathomable organism staring into some kind of (to it's particular phenomenology) plugging it's senses into a pseudo-random pattern generator from which is hallucinates or infers the experience of OP?

How could we falsify the simulation hypothesis?

cstinesublime on Is my distinctiveness evidence for being in a simulation?

I'm afraid I don't understand a lot of your assumptions. For example, why you think you being an example of any given superlative is somehow a falsifying observation of the reality - especially if other people/objects don't exist in uniform distributions. So it's not like a video game where every other NPC exactly 10 HP, but through use of cheat-code you've got 1000. And even so, that data from within the 'simulation' as you call it is not proof of something 'without'. I think the only evidence of that would be if you find yourself in a situation like Daffy Duck, the walls of reality closing in on you - meeting your maker directly.

I also wonder, how much Kant or Plato have you read and did you do any research, even on SEP before you asked? I feel like anyone who has questions about the 'simulation' would be best served reading the philosophers who have written eloquently on the matter of how we come to represent the world and really formed the concepts and language we use.

Or you could read (Tractatus) Wittgenstein and dismiss all metaphysics all together as nonsense - literally: that which cannot be sensed and therefore musn't be spoken about.

alkjash on Meal Replacements in 2025?

My thought process goes like: on most weekdays I sure wish I could skip breakfast and/or lunch and only have one sit-down meal with my family in the evening. Time savings and convenience are the main concerns I suppose.

The first solution that came to mind was to try Soylent/Mealsquares/Huel for a month and cross my fingers, 50/50 it just goes well and solves the problem. I posted to see if there were any obvious considerations I was missing, or clear standout options to try first.

Pre-made frozen meals and protein bars are also plausibly acceptable meal replacement options.

On a first pass frozen meals register as bulky and hard to store a month of at a time, and not something I'd bring to work. I've also never had an item I can imagine stomaching every day.

Protein bars seem mostly fine, but my vibe check is that meal replacements are basically enlightened protein bars? Like, maybe the nutrition profile is better and they are packaged in sizes more suitable for full meals?

snewman on What Indicators Should We Watch to Disambiguate AGI Timelines?

See my response to Daniel (https://www.lesswrong.com/posts/auGYErf5QqiTihTsJ/what-indicators-should-we-watch-to-disambiguate-agi?commentId=WRJMsp2bZCBp5egvr). In brief: I won't defend my vague characterization of "breakthroughs" nor my handwavy estimates of how how many are needed to reach AGI, how often they occur, and how the rate of breakthroughs might evolve. I would love to see someone attempt a more rigorous analysis along these lines (I don't feel particularly qualified to do so). I wouldn't expect that to result in a precise figure for the arrival of AGI, but I would hope for it to add to the conversation.

don-p on The Online Sports Gambling Experiment Has Failed

I posted this as a reply [and then posted it in the wrong subthread...reposting final version here] but I thought about it more, and on the Mets/Brewers odds thing, the "win %" from ESPN is something they've been doing for years. It's all over espn.com. It's not very smart, and doesn't have to be, because nothing is riding on it.

ESPNbet.com is licensed to use the ESPN name but is otherwise separate. Since their posted odds are their business, I expect them to be a lot more...meticulous, at least. The -160/+125 odds posted imply (after removing the house edge, aka de-vigging) a 58/42 probability split. And that represents only a 4.4% hold from those probabilities, so it's not outrageous. The standard -110/-110 on even money is a 4.8% hold.

snewman on What Indicators Should We Watch to Disambiguate AGI Timelines?

This is my "slow scenario". Not sure whether it's clear that I meant the things I said here to lean pessimistic – I struggled with whether to clutter each scenario with a lot of "might" and "if things go quickly / slowly" and so forth.

In any case, you are absolutely correct that I am handwaving here, independent of whether I am attempting to wave in the general direction of my median prediction or something else. The same is true in other places, for instance when I argue that even in what I am dubbing a "fast scenario" AGI (as defined here) is at least four years away. Perhaps I should have added additional qualifiers in the handful of places where I mention specific calendar timelines.

What I am primarily hoping to contribute is a focus on specific(ish) qualitative changes that (I argue) will need to emerge in AI capabilities along the path to AGI. A lot of the discourse seems to treat capabilities as a scalar, one-dimensional variable, with the implication that we can project timelines by measuring the rate of increase in that variable. At this point I don't think that's the best framing, or at least not the only useful framing.

One hope I have is that others can step in and help construct better-grounded estimates on things I'm gesturing at, such as how many "breakthroughs" (a term I have notably not attempted to define) would be needed to reach AGI and how many we might expect per year. But I'd be satisfied if my only contribution would be that people start talking a bit less about benchmark scores and a bit more about the indicators I list toward the end of the post – or, even better, some improved set of indicators.

jenniferrm on What’s the short timeline plan?

I think you're overindexing on the phrase "status quo", underindexing on "industry standard", and missing a lot of practical microstructure.

Lots of firms or teams across industry have attempted to "EG" implement multi-factor authentication or basic access control mechanisms or secure software development standards or red-team tests. Sony probably had some of that in some of its practices in some of its departments when North Korea 0wned them.

Google does not just "OR them together" and half-ass some of these things. It "ANDs together" reasonably high quality versions of everything. Then every year they anneal the culture a little bit more around small controlled probes of global adequacy.

..

Also, in reading that RAND document, I would like to report another "thonk!" sound!

..

Rand's author(s) seem to have entirely (like at a conceptual level) left out the possibility that AGI (during a training run or during QA with humans or whatever) would itself "become the attacker" and need to be defended against.

It is like they haven't even seen Ex Machina, or read A Fire Upon The Deep or Daemon.

You don't just have to keep bad guys OUT, you have to keep "the possible bad guy that was just created by a poorly understood daemon summoning process" IN, and that perspective doesn't appear anywhere in any of the RAND document that I can see.

No results when I ^f for [demon], [summon], [hypno], [subvert], [pervert], [escape].

(("Subvert" was used once, but it was in a basic bitch paragraph like this (bold in original):

Most access control systems are either software systems or have significant software components. In addition to the more specialized ways of undermining such systems described above, an attacker could undermine them by finding code vulnerabilities and then subverting their behavior (without actually dealing with their cryptographic or core functionality at all). A major category of code vulnerabilities that undermine access control systems on a regular basis are privilege escalation vulnerabilities.

Duh.

))

The best thing in that RAND document is [infected] which occurs twice <3

Page 54 (bold not in original):

PyTorch’s prerelease Linux packages, PyTorch-nightly, were compromised via a malicious binary on PyPi.122 The torchtriton dependency in the PyTorch-nightly build was replaced with a malicious package, leading to the extraction of sensitive data from infected systems.

Page 61 (bold not in original, footnote not a link in the original):

Agent.btz, the malware that infected the U.S. Department of Defense in 2008 and led to the creation of U.S. Cyber Command, came from a USB device that was found in a parking lot on a base in the Middle East.164

"Infection" is a really good metaphor and word for bringing extreme rigor to the tracing of actual causal processes, because it gets at the idea of "programs" (viral or microbial or fungal or whatever genomes (or genome products) spreading via arbitrary physical transmission methods). This is a prett darn correct frame! :-)

Ever since covid I've been beating the drum for BSL5 to handle scary diseases by assuming that the real danger is NOT that "a known disease brought into a lab to be studied there merely infects a scientist (or everyone inside the facility for that matter)", but that someone inside the BSL5 gets infected with something new that would never naturally evolve and then uses creativity to escape the BSL5 while infected and then that causes a pandemic.

Currently the BSL system only goes up to 4, and all incremental increases in caution and safety are related to the protection of the lab worker from what they are working with more and more and more, while assuming good faith by the site administrator, and the security guards at the exit gate (which may or may not be distinct from the entry gate) who might also get infected, and so on.

I want a Bio-SL5 standard to exist in biology, and I want all GoF research restricted by an international treaty that ensures that the GoF only happens inside BSL5 facilities.

Similarly, I also think humanity also needs an analogous "CompSci-SL5 standard" that imagines the entire computing facility (data center plus programmers plus janitors who have physical access and so on plus all the managers of everyone working inside the CSSL5 campus) is subverted by a nascent superintelligence and then used by that nascent superintelligence to try to subvert all of Earth.

There is no hint of anything at all like this as part of the threat modeling in the RAND report.

Also, if Google had such a thing back when I worked there, I didn't hear about it. (Then again, maybe the existence of it would have been kept secret?)

ete on Is my distinctiveness evidence for being in a simulation?

In a large universe [? · GW], you, and everyone else, exists both in and not in simulations. That is: The pattern you identify with exists in both basement reality (in many places) and also in simulations (in many places).

There is a question of what proportion of the you-patterns exist in basement reality, but it has a slightly different flavour, I think. It seems to trigger some deep evolved patterns (around fakeness?) less than the kind of existential fear that simulations with the naive conception of identity sometimes brings up.

But to answer that question: Maybe simulators tend to prefer "flat" simulations, where the entire system is simulated evenly to avoid divergence from the physical system it's trying to gather information about. Maybe your unique characteristic is the kind of thing that makes you more likely to be simulated in higher fidelity than the average human, and simulators prefer uneven simulations. Or maybe it's unusual but not particularly relevant for tactical simulations of what emerges from the intelligence explosion (which is probably where the majority of the simulation compute goes).

But, either way, that update is probably pretty small compared to the background high rate of simulations of "humans around at the time of the singularity". Bostrom's paper covers the general argument for simulations generally outnumbering basement reality due to ancestor simulations: https://simulation-argument.com/simulation.pdf

However, even granting all of the background assumptions that go into this: Not all observers who are you live in a simulation. You exist in both types of places. Simulations don't reduce your weight in the basement reality, they can only give you more places which you exist.