LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Executive Dysfunction 101
DaystarEld · 2024-05-23T12:43:13.785Z · comments (1)

Beta Tester Request: Rallypoint Bounties
lukemarks (marc/er) · 2024-05-25T09:11:11.446Z · comments (4)

Twin Peaks: under the air
KatjaGrace · 2024-05-31T01:20:04.624Z · comments (2)

[link] Sticker Shortcut Fallacy — The Real Worst Argument in the World
ymeskhout · 2024-06-12T14:52:41.988Z · comments (15)

[link] MIRI's July 2024 newsletter
Harlan · 2024-07-15T21:28:17.343Z · comments (2)

[link] Announcing Open Philanthropy's AI governance and policy RFP
Julian Hazell (julian-hazell) · 2024-07-17T02:02:39.933Z · comments (0)

Housing Roundup #9: Restricting Supply
Zvi · 2024-07-17T12:50:05.321Z · comments (8)

Using an LLM perplexity filter to detect weight exfiltration
Adam Karvonen (karvonenadam) · 2024-07-21T18:18:05.612Z · comments (11)

An experiment on hidden cognition
Olli Järviniemi (jarviniemi) · 2024-07-22T03:26:05.564Z · comments (2)

A Visual Task that's Hard for GPT-4o, but Doable for Primary Schoolers
Lennart Finke (l-f) · 2024-07-26T17:51:28.202Z · comments (4)

Can Generalized Adversarial Testing Enable More Rigorous LLM Safety Evals?
scasper · 2024-07-30T14:57:06.807Z · comments (0)

Proving the Geometric Utilitarian Theorem
StrivingForLegibility · 2024-08-07T01:39:10.920Z · comments (0)

[link] Beware the science fiction bias in predictions of the future
Nikita Sokolsky (nikita-sokolsky) · 2024-08-19T05:32:47.372Z · comments (20)

[link] what becoming more secure did for me
Chipmonk · 2024-08-22T17:44:48.525Z · comments (5)

[link] A primer on the next generation of antibodies
Abhishaike Mahajan (abhishaike-mahajan) · 2024-09-01T22:37:59.207Z · comments (0)

Scientific Method
Andrij “Androniq” Ghorbunov (andrij-androniq-ghorbunov) · 2024-02-18T21:06:45.228Z · comments (4)

[question] Would you have a baby in 2024?
martinkunev · 2023-12-25T01:52:04.358Z · answers+comments (76)

Defense Against The Dark Arts: An Introduction
Lyrongolem (david-xiao) · 2023-12-25T06:36:06.278Z · comments (36)

Technology path dependence and evaluating expertise
bhauth · 2024-01-05T19:21:23.302Z · comments (2)

[link] Altruism and Vitalism Aren't Fellow Travelers
Arjun Panickssery (arjun-panickssery) · 2024-08-09T02:01:11.361Z · comments (2)

[link] Compensating for Life Biases
Jonathan Moregård (JonathanMoregard) · 2024-01-09T14:39:14.229Z · comments (6)

Trying to be rational for the wrong reasons
Viliam · 2024-08-20T16:18:06.385Z · comments (8)

How Congressional Offices Process Constituent Communication
Tristan Williams (tristan-williams) · 2024-07-02T12:38:41.472Z · comments (0)

[link] The Living Planet Index: A Case Study in Statistical Pitfalls
Jan_Kulveit · 2024-06-24T10:05:55.101Z · comments (0)

Best-of-n with misaligned reward models for Math reasoning
Fabien Roger (Fabien) · 2024-06-21T22:53:21.243Z · comments (0)

Population ethics and the value of variety
cousin_it · 2024-06-23T10:42:21.402Z · comments (11)

UDT1.01: Local Affineness and Influence Measures (2/10)
Diffractor · 2024-03-31T07:35:52.831Z · comments (0)

[link] Secret US natsec project with intel revealed
Nathan Helm-Burger (nathan-helm-burger) · 2024-05-25T04:22:11.624Z · comments (0)

[link] Robert Caro And Mechanistic Models In Biography
adamShimi · 2024-07-14T10:56:42.763Z · comments (5)

[link] Was Partisanship Good for the Environmental Movement?
Jeffrey Heninger (jeffrey-heninger) · 2024-05-15T17:30:54.796Z · comments (0)

[link] Clickbait Soapboxing
DaystarEld · 2024-03-13T14:09:29.890Z · comments (15)

[link] Truth is Universal: Robust Detection of Lies in LLMs
Lennart Buerger · 2024-07-19T14:07:25.162Z · comments (3)

[link] Let's Design A School, Part 2.3 School as Education - The Curriculum (Phase 2, Specific)
Sable · 2024-05-15T20:58:50.981Z · comments (0)

Anomalous Concept Detection for Detecting Hidden Cognition
Paul Colognese (paul-colognese) · 2024-03-04T16:52:52.568Z · comments (3)

[link] Extinction Risks from AI: Invisible to Science?
VojtaKovarik · 2024-02-21T18:07:33.986Z · comments (7)

A brief review of China's AI industry and regulations
Elliot Mckernon (elliot) · 2024-03-14T12:19:00.775Z · comments (0)

I didn't think I'd take the time to build this calibration training game, but with websim it took roughly 30 seconds, so here it is!
mako yass (MakoYass) · 2024-08-02T22:35:21.136Z · comments (2)

[question] What percent of the sun would a Dyson Sphere cover?
Raemon · 2024-07-03T17:27:50.826Z · answers+comments (26)

[link] Cellular respiration as a steam engine
dkl9 · 2024-02-25T20:17:38.788Z · comments (1)

Paper Summary: The Koha Code - A Biological Theory of Memory
jakej (jake-jenks) · 2023-12-30T22:37:13.865Z · comments (2)

[link] Scenario planning for AI x-risk
Corin Katzke (corin-katzke) · 2024-02-10T00:14:11.934Z · comments (12)

[question] Could there be "natural impact regularization" or "impact regularization by default"?
tailcalled · 2023-12-01T22:01:46.062Z · answers+comments (6)

[link] Review of Alignment Plan Critiques- December AI-Plans Critique-a-Thon Results
Iknownothing · 2024-01-15T19:37:07.984Z · comments (0)

A bet on critical periods in neural networks
kave · 2023-11-06T23:21:17.279Z · comments (1)

Building Trust in Strategic Settings
StrivingForLegibility · 2023-12-28T22:12:24.024Z · comments (0)

A Basic Economics-Style Model of AI Existential Risk
Rubi J. Hudson (Rubi) · 2024-06-24T20:26:09.744Z · comments (3)

[link] Alignment work in anomalous worlds
Tamsin Leake (carado-1) · 2023-12-16T19:34:26.202Z · comments (4)

[link] AI Alignment [Progress] this Week (11/05/2023)
Logan Zoellner (logan-zoellner) · 2023-11-07T13:26:21.995Z · comments (0)

A conceptual precursor to today's language machines [Shannon]
Bill Benzon (bill-benzon) · 2023-11-15T13:50:51.226Z · comments (6)

Evolution did a surprising good job at aligning humans...to social status
Eli Tyre (elityre) · 2024-03-10T19:34:52.544Z · comments (37)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

nutrition-capsule on Overcoming Bias Anthology

Hanson seems to treat the global civilization as a cultural melting pot, but he does distinguish insular subcultures from that. I intuit he sees contemporary cultures on a gradient relative to global, hegemonic trends (which correlate with technological progress, increasing wealth and education) and thereby drifting pressures.

thane-ruthenis on The Compendium, A full argument about extinction risk from AGI

Sure. But:

and then they can do a bunch of the work of generalizing

This is the step which is best made unnecessary if you're crafting a message for a broad audience, I feel.

Most people are not going to be motivated to put this work in. Why would they? They get bombarded with a hundred credible-ish messages claiming high-importance content on a weekly basis. They don't have the time nor stamina to do a deep dive into each of them.

Which means any given subculture would generate its own "inferential bridge" between itself and your message, artefacts that do this work for the median member (consisting of reviews by any prominent subculture members, the takes that go viral, the entire shape of the discourse around the topic, etc.). The more work is needed, the longer these inferential bridges will be. The longer they are, the bigger the opportunity to willfully or accidentally mistranslate your message.

Like I said, it doesn't seem wise or even fair to your potential audience, to act as if those dynamics don't take place. As if the only people that deserve consideration are those that would put in the work themselves (despite the fact it may be a locally suboptimal way to distribute resources under their current world-model), and everyone else are lost causes.

cousin_it on Dentistry, Oral Surgeons, and the Inefficiency of Small Markets

Tragedy of capitalism in a nutshell. The best action is to dismantle the artificial scarcity of doctors. But the most profitable action is to build a company that will profit from that scarcity - and, when it gets big enough, lobby to perpetuate it.

richard_kennaway on JargonBot Beta Test

You and the LW team are indirectly responsible, but only for the general feature. You are not standing behind each individual statement the AI makes. If the author of the post does not vet it, no-one stands behind it. The LW admins can be involved only in hindsight, if the AI does something particularly egregious.

bogdan-ionut-cirstea on johnswentworth's Shortform

how recent reports of OpenAI’s o1 being deceptive have been questioned [LW(p) · GW(p)].

This seems to be confusing a dangerous capability eval (of being able to 'deceive' in a visible scratchpad) with an assessment of alignment, which seems like exactly what the 'questioning' was about.

adam_scholl on JargonBot Beta Test

I also use them rarely, fwiw. Maybe I'm missing some more productive use, but I've experimented a decent amount and have yet to find a way to make regular use even neutral (much less helpful) for my thinking or writing.

rom on MichaelDickens's Shortform

I agree with the claim you're making: that if FHI still existed and they applied for a grant from OP it would be rejected. This seems true to me.

I don't mean to nitpick, but it still feels misleading to claim "FHI could not get OP funding" when they did in fact get lots of funding from OP. It implies that FHI operated without any help from OP, which isn't true.

everydaybought on [deleted]

Preventing the onslaught of spam on the internet using digital ID's:

As LLM's start passing the turing test and beating CAPTCHA's, spammers will soon be able to pass as humans. Right now, people often draw conclusions and whole worldviews from interactions and consensus they observe online. But when bots are indistinguishable from humans, whoever has the most computing power will have the most representation online, and will be able to skew our perception of the world.

To prevent this, I think it's crucial for our sanity and epistemics that we have strong private digital identities so you can see next to a profile whether it's is a person or a bot. In order to protect anonymity, the system could use clever cryptography allowing people to prove that they are a real person but without revealing who they are (for things like whistleblowers etc). Alternatively, these systems could be limited to only knowing that you haven't spammed a certain number of requests in the past few minutes, still while protecting your anonymity.

The internet needs to be conducive to people forming consensus around facts since so many people nowadays base their opinions based on what they see online. I hope people lobby for digital ID systems to keep the internet from devolving.

tailcalled on What can we learn from insecure domains?

In crypto, a lot of people just HODL instead of using it for stuff in practice. I'd guess the more people use it, the more likely they are to run into one of the 99.9% of projects that are scams. (Though... if we count the people who've been hit by ransomware, it is non-obvious to me that the majority of users are HODLers rather than ransomeware victims.) To prevent losing one's crypto, there have also been developed techniques like "cold storage", which are extremely secure.

The HTTP server logs you posted aren't based on insecurity of most webservers, they are based on the insecurity of particular programs (or versions of programs or setups of programs). Important systems (e.g. online banking) almost always use different systems than the ones that are currently getting attacked. Attacks roll the dice in the hope that maybe they'll find someone with a known vulnerability to exploit, but presumably such exploits are extremely temporary.

Copilot is general instructed via the user of the program, and the user and is relatively trusted. I mean, people are still trying to "align" to be robust against the user, but 99.9% of the time that doesn't matter, and the remaining time is often stuff like internet harassment which is definitely not existentially risky, even if it is bad.

Some people are trying to introduce LLM agents into more general places, e.g. shops automatically handling emails from businesses. I'm pretty skeptical about this being secure, but if it turns out to be hopelessly insecure, I'd expect the shops to just decline using them.

Nuclear weapons were used twice when only the US had them. They only became existentially dangerous as multiple parties built up enormous stockpiles of them, but at the same time people understood that they were existentially dangerous and therefore avoided using them in war. More recently they've agreed that keeping such things around is bad and have been disassembling them under mutual surveillance. And they have systems set up to prevent other, less-stable countries from developing them.

habryka4 on The Compendium, A full argument about extinction risk from AGI

Like, here's a sanity-check: suppose you must convince a specific Creationist that the AGI Risk is real. Do you need to argue them out of Creationism in order to do so?

My guess is no, but also, my guess is we will probably still have better comms if I err on the side of explaining things how they come naturally to me, and entangled with the way I came to adopt a position, and then they can do a bunch of the work of generalizing. Of course, if something is deeply triggering or mindkilly to someone, then it's worth routing, but it's not like any analogy with evolution is invalid from the perspective of someone who believes in Creationism. Yes, some of the force of such an analogy would be lost, but most of it comes from the logical consistency, not the empirical evidence.