LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] [Linkpost] Interpretable Analysis of Features Found in Open-source Sparse Autoencoder (partial replication)
Fernando Avalos (fernando-avalos) · 2024-09-09T03:33:53.548Z · comments (1)

Thirty random thoughts about AI alignment
Lysandre Terrisse · 2024-09-15T16:24:10.572Z · comments (1)

[link] SCP Foundation - Anti memetic Division Hub
landscape_kiwi · 2024-09-15T13:40:52.691Z · comments (1)

[question] Can subjunctive dependence emerge from a simplicity prior?
Daniel C (harper-owen) · 2024-09-16T12:39:35.543Z · answers+comments (0)

Understanding Hidden Computations in Chain-of-Thought Reasoning
rokosbasilisk · 2024-08-24T16:35:03.907Z · comments (1)

Budapest Hungary - ACX Meetups Everywhere Fall 2024
Timothy Underwood (timothy-underwood-1) · 2024-08-29T18:37:41.313Z · comments (0)

The Existential Dread of Being a Powerful AI System
testingthewaters · 2024-09-26T10:56:32.904Z · comments (1)

Forever Leaders
Justice Howard (justice-howard) · 2024-09-14T20:55:39.095Z · comments (9)

[link] The AI regulator’s toolbox: A list of concrete AI governance practices
Adam Jones (domdomegg) · 2024-08-10T21:15:09.265Z · comments (1)

Differential knowledge interconnection
Roman Leventov · 2024-10-12T12:52:36.267Z · comments (0)

[question] A Different Perspective on Rationality - Would This Be Valuable?
Gabriel Brito (gabriel-brito) · 2024-10-26T18:47:46.416Z · answers+comments (2)

[link] Exposure can’t rule out disasters
Chipmonk · 2024-08-15T17:03:37.259Z · comments (19)

Seeking mentorship
Kevin Afachao (kevin-afachao) · 2024-09-21T16:54:58.353Z · comments (0)

[link] Should we abstain from voting? (In nondeterministic elections)
B Jacobs (Bob Jacobs) · 2024-10-02T10:07:43.167Z · comments (6)

[link] An "Observatory" For a Shy Super AI?
Sherrinford · 2024-09-27T21:22:40.296Z · comments (0)

[question] If the DoJ goes through with the Google breakup,where does Deepmind end up?
O O (o-o) · 2024-10-12T05:06:50.996Z · answers+comments (1)

[question] Artificial V/S Organoid Intelligence
10xyz (10xyz-coder) · 2024-10-23T14:31:46.385Z · answers+comments (0)

[link] AI Safety Newsletter #41: The Next Generation of Compute Scale Plus, Ranking Models by Susceptibility to Jailbreaking, and Machine Ethics
Corin Katzke (corin-katzke) · 2024-09-11T19:14:08.274Z · comments (1)

Reinforcement Learning from Information Bazaar Feedback, and other uses of information markets
Abhimanyu Pallavi Sudhir (abhimanyu-pallavi-sudhir) · 2024-09-16T01:04:32.953Z · comments (1)

[question] How do we know dreams aren't real?
Logan Zoellner (logan-zoellner) · 2024-08-22T12:41:57.380Z · answers+comments (31)

Some reasons to start a project to stop harmful AI
Remmelt (remmelt-ellen) · 2024-08-22T16:23:34.132Z · comments (0)

[link] Linkpost: Hypocrisy standoff
Chris_Leong · 2024-09-29T14:27:19.175Z · comments (1)

Meta: On viewing the latest LW posts
quiet_NaN · 2024-08-25T19:31:39.008Z · comments (2)

New Capabilities, New Risks? - Evaluating Agentic General Assistants using Elements of GAIA & METR Frameworks
Tej Lander (tej-lander) · 2024-09-29T18:58:56.253Z · comments (0)

[link] How long should political (and other) terms be?
ohmurphy · 2024-10-14T21:38:43.050Z · comments (0)

Grass Valley USA - ACX Meetups Everywhere Fall 2024
Raelifin · 2024-08-29T18:39:57.229Z · comments (0)

Biasing VLM Response with Visual Stimuli
Jaehyuk Lim (jason-l) · 2024-10-03T18:04:31.474Z · comments (0)

[question] AMA: International School Student in China
Novice · 2024-10-01T06:00:16.282Z · answers+comments (0)

Longevity and the Mind
George3d6 · 2024-09-16T09:43:09.700Z · comments (2)

Toy Models of Superposition: what about BitNets?
Alejandro Tlaie (alejandro-tlaie-boria) · 2024-08-08T16:29:02.054Z · comments (1)

[link] Universal basic income isn’t always AGI-proof
Kevin Kohler (KevinKohler) · 2024-09-05T15:39:18.389Z · comments (3)

Using LLM's for AI Foundation research and the Simple Solution assumption
Donald Hobson (donald-hobson) · 2024-09-24T11:00:53.658Z · comments (0)

Democracy beyond majoritarianism
Arturo Macias (arturo-macias) · 2024-09-03T15:10:56.284Z · comments (2)

The Carnot Engine of Economics
StrivingForLegibility · 2024-08-09T15:59:40.458Z · comments (0)

[link] Join the $10K AutoHack 2024 Tournament
Paul Bricman (paulbricman) · 2024-09-25T11:54:20.112Z · comments (0)

Methodology: Contagious Beliefs
James Stephen Brown (james-brown) · 2024-10-19T03:58:17.966Z · comments (0)

[question] What are the strategic implications if aliens and Earth civilizations produce similar utilities?
Maxime Riché (maxime-riche) · 2024-08-06T21:16:21.719Z · answers+comments (1)

Developmental Stages in Multi-Problem Grokking
James Sullivan · 2024-09-29T18:58:22.954Z · comments (0)

Building Safer AI from the Ground Up: Steering Model Behavior via Pre-Training Data Curation
Antonio Clarke (antonio-clarke) · 2024-09-29T18:48:23.308Z · comments (0)

Tbilisi Georgia - ACX Meetups Everywhere Fall 2024
Dmitrii (dmitrii) · 2024-08-29T18:36:43.223Z · comments (4)

[link] Game Theory and Society
Zero Contradictions · 2024-08-05T04:27:37.275Z · comments (0)

Jailbreaking ChatGPT and Claude using Web API Context Injection
Jaehyuk Lim (jason-l) · 2024-10-21T21:34:37.579Z · comments (0)

Interest poll: A time-waster blocker for desktop Linux programs
nahoj · 2024-08-22T20:44:04.479Z · comments (5)

[question] How do you finish your tasks faster?
Cipolla · 2024-08-21T20:01:41.306Z · answers+comments (2)

[link] The ELYSIUM Proposal - Extrapolated voLitions Yielding Separate Individualized Utopias for Mankind
Roko · 2024-10-16T01:24:51.102Z · comments (18)

[link] Predictions as Public Works Project — What Metaculus Is Building Next
ChristianWilliams · 2024-10-22T16:35:13.999Z · comments (0)

Bellevue-Redmond USA - ACX Meetups Everywhere Fall 2024
Cedar (xida-ren) · 2024-08-29T18:43:57.014Z · comments (8)

The great Enigma in the sky: The universe as an encryption machine
Alex_Shleizer · 2024-08-14T13:21:58.713Z · comments (1)

Tokyo (日本語) Japan - ACX Meetups Everywhere Fall 2024
Emi (emi-2) · 2024-08-29T18:35:28.013Z · comments (0)

Ways to think about alignment
Abhimanyu Pallavi Sudhir (abhimanyu-pallavi-sudhir) · 2024-10-27T01:40:50.762Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

nutrition-capsule on Overcoming Bias Anthology

Hanson seems to treat the global civilization as a cultural melting pot, but he does distinguish insular subcultures from that. I intuit he sees contemporary cultures on a gradient relative to global, hegemonic trends (which correlate with technological progress, increasing wealth and education) and thereby drifting pressures.

thane-ruthenis on The Compendium, A full argument about extinction risk from AGI

Sure. But:

and then they can do a bunch of the work of generalizing

This is the step which is best made unnecessary if you're crafting a message for a broad audience, I feel.

Most people are not going to be motivated to put this work in. Why would they? They get bombarded with a hundred credible-ish messages claiming high-importance content on a weekly basis. They don't have the time nor stamina to do a deep dive into each of them.

Which means any given subculture would generate its own "inferential bridge" between itself and your message, artefacts that do this work for the median member (consisting of reviews by any prominent subculture members, the takes that go viral, the entire shape of the discourse around the topic, etc.). The more work is needed, the longer these inferential bridges will be. The longer they are, the bigger the opportunity to willfully or accidentally mistranslate your message.

Like I said, it doesn't seem wise or even fair to your potential audience, to act as if those dynamics don't take place. As if the only people that deserve consideration are those that would put in the work themselves (despite the fact it may be a locally suboptimal way to distribute resources under their current world-model), and everyone else are lost causes.

cousin_it on Dentistry, Oral Surgeons, and the Inefficiency of Small Markets

Tragedy of capitalism in a nutshell. The best action is to dismantle the artificial scarcity of doctors. But the most profitable action is to build a company that will profit from that scarcity - and, when it gets big enough, lobby to perpetuate it.

richard_kennaway on JargonBot Beta Test

You and the LW team are indirectly responsible, but only for the general feature. You are not standing behind each individual statement the AI makes. If the author of the post does not vet it, no-one stands behind it. The LW admins can be involved only in hindsight, if the AI does something particularly egregious.

bogdan-ionut-cirstea on johnswentworth's Shortform

how recent reports of OpenAI’s o1 being deceptive have been questioned [LW(p) · GW(p)].

This seems to be confusing a dangerous capability eval (of being able to 'deceive' in a visible scratchpad) with an assessment of alignment, which seems like exactly what the 'questioning' was about.

adam_scholl on JargonBot Beta Test

I also use them rarely, fwiw. Maybe I'm missing some more productive use, but I've experimented a decent amount and have yet to find a way to make regular use even neutral (much less helpful) for my thinking or writing.

rom on MichaelDickens's Shortform

I agree with the claim you're making: that if FHI still existed and they applied for a grant from OP it would be rejected. This seems true to me.

I don't mean to nitpick, but it still feels misleading to claim "FHI could not get OP funding" when they did in fact get lots of funding from OP. It implies that FHI operated without any help from OP, which isn't true.

everydaybought on [deleted]

Preventing the onslaught of spam on the internet using digital ID's:

As LLM's start passing the turing test and beating CAPTCHA's, spammers will soon be able to pass as humans. Right now, people often draw conclusions and whole worldviews from interactions and consensus they observe online. But when bots are indistinguishable from humans, whoever has the most computing power will have the most representation online, and will be able to skew our perception of the world.

To prevent this, I think it's crucial for our sanity and epistemics that we have strong private digital identities so you can see next to a profile whether it's is a person or a bot. In order to protect anonymity, the system could use clever cryptography allowing people to prove that they are a real person but without revealing who they are (for things like whistleblowers etc). Alternatively, these systems could be limited to only knowing that you haven't spammed a certain number of requests in the past few minutes, still while protecting your anonymity.

The internet needs to be conducive to people forming consensus around facts since so many people nowadays base their opinions based on what they see online. I hope people lobby for digital ID systems to keep the internet from devolving.

tailcalled on What can we learn from insecure domains?

In crypto, a lot of people just HODL instead of using it for stuff in practice. I'd guess the more people use it, the more likely they are to run into one of the 99.9% of projects that are scams. (Though... if we count the people who've been hit by ransomware, it is non-obvious to me that the majority of users are HODLers rather than ransomeware victims.) To prevent losing one's crypto, there have also been developed techniques like "cold storage", which are extremely secure.

The HTTP server logs you posted aren't based on insecurity of most webservers, they are based on the insecurity of particular programs (or versions of programs or setups of programs). Important systems (e.g. online banking) almost always use different systems than the ones that are currently getting attacked. Attacks roll the dice in the hope that maybe they'll find someone with a known vulnerability to exploit, but presumably such exploits are extremely temporary.

Copilot is general instructed via the user of the program, and the user and is relatively trusted. I mean, people are still trying to "align" to be robust against the user, but 99.9% of the time that doesn't matter, and the remaining time is often stuff like internet harassment which is definitely not existentially risky, even if it is bad.

Some people are trying to introduce LLM agents into more general places, e.g. shops automatically handling emails from businesses. I'm pretty skeptical about this being secure, but if it turns out to be hopelessly insecure, I'd expect the shops to just decline using them.

Nuclear weapons were used twice when only the US had them. They only became existentially dangerous as multiple parties built up enormous stockpiles of them, but at the same time people understood that they were existentially dangerous and therefore avoided using them in war. More recently they've agreed that keeping such things around is bad and have been disassembling them under mutual surveillance. And they have systems set up to prevent other, less-stable countries from developing them.

habryka4 on The Compendium, A full argument about extinction risk from AGI

Like, here's a sanity-check: suppose you must convince a specific Creationist that the AGI Risk is real. Do you need to argue them out of Creationism in order to do so?

My guess is no, but also, my guess is we will probably still have better comms if I err on the side of explaining things how they come naturally to me, and entangled with the way I came to adopt a position, and then they can do a bunch of the work of generalizing. Of course, if something is deeply triggering or mindkilly to someone, then it's worth routing, but it's not like any analogy with evolution is invalid from the perspective of someone who believes in Creationism. Yes, some of the force of such an analogy would be lost, but most of it comes from the logical consistency, not the empirical evidence.