LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[question] Most capable publicly available agents?
Gabe · 2024-09-30T00:04:24.480Z · answers+comments (0)

Massive Activations and why <bos> is important in Tokenized SAE Unigrams
Louka Ewington-Pitsos (louka-ewington-pitsos) · 2024-09-05T02:19:25.592Z · comments (0)

Symmetry, Relativity, and Superposition: Nature's Blueprint for AI Alignment
Javier Marin Valenzuela (javier-marin-valenzuela) · 2024-09-13T08:27:04.027Z · comments (0)

[link] We Should Try to Directly Measure the Value of Scientific Papers
ohmurphy · 2024-09-05T09:08:18.116Z · comments (0)

Interpreting the effects of Jailbreak Prompts in LLMs
Harsh Raj (harsh-raj-ep-037) · 2024-09-29T19:01:10.113Z · comments (0)

Sampling Effects on Strategic Behavior in Supervised Learning Models
Phil Bland · 2024-09-24T07:44:41.677Z · comments (0)

Emergent Authorship: Creativity à la Communing
gswonk · 2024-09-14T19:02:07.635Z · comments (0)

AIS Hungary Operations Officer role, Deadline: 2024 October 6th
gergogaspar (gergo-gaspar) · 2024-09-25T13:54:25.077Z · comments (0)

[question] What does it mean for an event or observation to have probability 0 or 1 in Bayesian terms?
Noosphere89 (sharmake-farah) · 2024-09-17T17:28:52.731Z · answers+comments (22)

Exploring Decomposability of SAE Features
Vikram_N (viknat) · 2024-09-30T18:28:09.348Z · comments (0)

Not Just For Therapy Chatbots: The Case For Compassion In AI Moral Alignment Research
kenneth_diao · 2024-09-30T18:37:20.409Z · comments (0)

[link] Intelligence explosion: a rational assessment.
p4rziv4l · 2024-09-30T21:17:35.675Z · comments (0)

AGI Farm
Rahul Chand (rahul-chand) · 2024-10-01T04:29:58.606Z · comments (0)

Extending the Off-Switch Game: Toward a Robust Framework for AI Corrigibility
OwenChen · 2024-09-25T20:38:22.928Z · comments (0)

A Cable Holder for 2 Cent
Johannes C. Mayer (johannes-c-mayer) · 2024-09-06T11:01:43.391Z · comments (1)

Survey - Psychological Impact of Long-Term AI Engagement
Manuela García (manuela-garcia) · 2024-09-17T17:31:38.383Z · comments (1)

Superposition through Active Learning Lens
akankshanc · 2024-09-17T17:32:56.583Z · comments (0)

What Hayek Taught Us About Nature
Ground Truth Data (stewart-mior) · 2024-10-03T18:20:21.705Z · comments (6)

Two new datasets for evaluating political sycophancy in LLMs
alma.liezenga · 2024-09-28T18:29:49.088Z · comments (0)

Evaluating LLaMA 3 for political sycophancy
alma.liezenga · 2024-09-28T19:02:36.342Z · comments (2)

A Rational Company - Seeking Advisors
AlignmentOptimizer · 2024-09-21T19:51:22.476Z · comments (1)

I just can't agree with AI safety. Why am I wrong?
Юрий Бурак (yurii-burak-1) · 2024-09-13T17:48:22.245Z · comments (5)

[link] How to give effectively to US Dems
Hauke Hillebrandt (hauke-hillebrandt) · 2024-09-24T14:38:29.678Z · comments (0)

Amplify is hiring! Work with us to support field-building initiatives through digital marketing
gergogaspar (gergo-gaspar) · 2024-09-10T08:56:42.822Z · comments (1)

[link] A Nonconstructive Existence Proof of Aligned Superintelligence
Roko · 2024-09-12T03:20:09.531Z · comments (74)

LLMs are likely not conscious
research_prime_space · 2024-09-29T20:57:26.111Z · comments (4)

[question] Seeking Solutions for Aggregating Classifier Outputs
Saeid Ghafouri (saeid-ghafouri) · 2024-10-04T17:39:46.508Z · answers+comments (0)

Moscow – ACX Meetups Everywhere Fall 2024
red-hara · 2024-09-20T23:03:16.028Z · comments (0)

How to discover the nature of sentience, and ethics
Gustavo Ramires (gustavo-ramires) · 2024-09-11T17:22:46.076Z · comments (4)

[question] Why be moral if we can't measure how moral we are? Is it even possible to measure morality?
OKlogic · 2024-09-20T17:40:26.377Z · answers+comments (0)

An Unmeasured Song of Measurement
jan Sijan (tim-min) · 2024-09-21T15:08:31.048Z · comments (0)

The Forging of the Great Minds: An Unfinished Tale
Aryeh Englander (alenglander) · 2024-09-05T00:58:56.584Z · comments (0)

Knowledge Base 1: Could it increase intelligence and make it safer?
iwis · 2024-09-30T16:00:14.236Z · comments (0)

The Compute Conundrum: AI Governance in a Shifting Geopolitical Era
octavo (azelen) · 2024-09-28T01:05:17.328Z · comments (1)

The Chatbot of Babble
Aryeh Englander (alenglander) · 2024-09-05T00:56:49.011Z · comments (0)

[question] How does someone prove that their general intelligence is above average?
M. Y. Zuo · 2024-09-16T21:01:38.529Z · answers+comments (12)

[link] Knowledge's practicability
Ted Nguyễn (ted-nguyen) · 2024-09-18T02:31:59.018Z · comments (0)

[question] Non-human centric view of existence
ZY (AliceZ) · 2024-09-25T05:47:07.480Z · answers+comments (13)

Happy simulations
FateGrinder (nicolo-moretti) · 2024-10-01T21:05:11.131Z · comments (0)

[link] On the destruction of America’s best high school
Chris_Leong · 2024-09-12T15:30:20.001Z · comments (7)

Collapsing the Belief/Knowledge Distinction
Jeremias (jeremias-sur) · 2024-09-11T21:24:30.447Z · comments (8)

Attachment THEORY AND THE EFFECTS OF SECURE ATTACHMENT ON CHILD DEVELOPMENT
[deleted] · 2024-09-08T16:09:12.196Z · comments (0)

Who Feels More Alone?
marvinscheffold · 2024-09-22T11:54:06.324Z · comments (2)

[link] How to Live Well: My Philosophy of Life
Philosofer123 · 2024-09-25T01:13:37.952Z · comments (0)

My Critique of Effective Altruism
Dylan Price (dylan-price) · 2024-09-20T17:41:21.000Z · comments (7)

Heartless Genius: The Peril of Emotionally Blind AI
Javier Marin Valenzuela (javier-marin-valenzuela) · 2024-09-20T12:22:19.872Z · comments (2)

Head in the Cloud: Why an Upload of Your Mind is Not You
xhq (hallie-xu) · 2024-09-17T00:25:19.166Z · comments (3)

Perhaps Try a Little Therapy, As a Treat?
segfault (caleb-ditchfield) · 2024-09-06T08:51:00.987Z · comments (57)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

fabien-roger on Fabien's Shortform

I listened to the book Hacking the Hacker by Roger Grimes. It helped me think of other ways in which regular cybersecurity naturally transfers/can be adapted to help prevent AI takeovers:

Honeypots are a well-studied topic in cybersecurity. If you have infiltrated a network, you must be careful with the data you collect and the vulnerabilities you exploit because some of them may be traps! This kind of exercise feels close to the kind of attitude a misaligned AI has to have when taking subtle bad actions. The book is light on details about how honeypots are built and how they fail, and this is a topic I want to learn more about.
Humans are weak in somewhat LLM-like ways: you can't trust them with passwords, they will break your rules if that feels appropriate to them, ... The kind of things you might use to make trusted monitoring work even in worlds where adversarial robustness isn't solved might look a lot like the cursed stack of incentives and warnings people use in cyber when they want to make sure humans don't mindlessly follow the instructions of an attacker. (Though LLMs are also different/weaker than humans in a bunch of ways.)

(The other classic ways in which cyber transfers to preventing takeovers are mostly about how when you prevent spies from stealing model weights or using models in unauthorized ways, you naturally create infra that also makes it hard for AIs to self-exfiltrate or run unmonitored.)

It was also fun to see arguments and drama that are close in kind to the kind of arguments about what are the best ways to prevent AI catastrophes: people in cybersecurity argue about whether focusing on vulnerabilities is right or whether it's better to focus on the human element, whether the focus on strong passwords is right, whether some solutions are too bothersome/costly to be used in practice, whether imposing specific cybersecurity standards is a good idea, ... It made me realize how niche AI most safety arguments must look like to people outside the AI safety community.

But I don't recommend reading the book if you are not curious about everyday cybersecurity. Most of the book is about more regular everyday cyberattacks (social engineering, common vulns, ...) cybersecurity (patching, passwords, ...), and advice for people who want to do cyber professionally. It has a bunch of resource recommendations, but I don't know yet how good they are.

nathan-helm-burger on AISafety.info: What is the "natural abstractions hypothesis"?

Additional source on the subject I recommend: https://m.youtube.com/watch?embeds_referring_euri=https%3A%2F%2Fwww.lesswrong.com%2F&source_ve_path=MTY0OTksMjg2NjQsMTY0NTAz&v=V7AyriUcXZQ

gunnar_zarncke on Hyperpolation

Hi Newbie, what are your thoughts on it?

ektimo on ektimo's Shortform

We can be virtually certain that 2+2=4 based on priors. This is because it's true in the vast multitude of universes. In fact all the universes except the one universe that contains all the other universes. And I'm pretty sure that one doesn't exist anyway.

nathan-helm-burger on Bogdan Ionut Cirstea's Shortform

Related: https://www.lesswrong.com/posts/fdCaCDfstHxyPmB9h/vladimir_nesov-s-shortform?commentId=2ZRSnZEQDbWzsZA3M [LW(p) · GW(p)]

https://www.lesswrong.com/posts/MEBcfgjPN2WZ84rFL/o-o-s-shortform?commentId=QDEvi8vQkbTANCw2k [LW(p) · GW(p)]

I've been thinking hard about what my next step should be, after my job applications being turned down again by various safety orgs and Anthropic. Now it seems clear to me. I have a vision of how I expect an RSI process to start, using LLMs to mine testable hypotheses from existing published papers.

I should just put my money where my mouth is, and try to build the scaffolding for this. I can then share my attempts with someone at Anthropic. If I'm wrong, I will be wasting my time and savings. If I'm right, I might be substantially helping the world. Seems like a reasonable bet.

tchauvin on Dan Braun's Shortform

In general, the hacking capabilities of state actors and the likely involvement of national security when we get closer to AGI feel like significant blind spots of Lesswrong discourse.

(The Hacker and The State by Ben Buchanan is a great book to learn about the former)

nc-1 on Nathan Helm-Burger's Shortform

I think this effect will be more wide-spread than targeting only already-vulnerable people, and it is particularly hard to measure because the causes will be decentralised and the effects will be diffuse. I predict it being a larger problem if, in the run-up between narrow AI and ASI, we have a longer period of necessary public discourse and decision-making. If the period is very short then it doesn't matter. It may not affect many people given how much penetration AI chatbots have in the market before takeoff too.

cole-wyeth on Mark Xu's Shortform

I also don’t expect us to have robustly solved ASI-alignment in that timeframe. I simply fail to see a history in which AI control work now is a decisive factor. If you insist on making a top level claim that I haven’t thought through the branches of how things go, I’d appreciate a more substantive description of the branch I am not considering.

fabien-roger on Is cybercrime really costing trillions per year?

This is great, thank you very much!

alexander-gietelink-oldenziel on Vladimir_Nesov's Shortform

China is producing research in a number of areas right now that is surpassing the West and arguably more impressive scientifically than producing top LLMs.

A big reason China is lagging a little bit might be political interference at major tech companies. Xi Jinping instigated a major crackdown recently. There is also significantly less Chinese text data. I am not a China or tech expert so these sre just guesses.

In any case, I wouldn't assign it to much significance. The AI space is just moving so quickly that even a minor year delay can seem like lightyears. But that doesnt mean that Chinese companies cant so it or that a country-continent with 1,4 billion people and a history of many technological firsts cant scale up a transformer.