LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] DeepMind: Evaluating Frontier Models for Dangerous Capabilities
Zach Stein-Perlman · 2024-03-21T03:00:31.599Z · comments (8)

AI Safety Chatbot
markov (markovial) · 2023-12-21T14:06:48.981Z · comments (11)

Inspired by: Failures in Kindness
X4vier · 2024-07-27T01:21:42.848Z · comments (2)

A civilization ran by amateurs
Olli Järviniemi (jarviniemi) · 2024-05-30T17:57:32.601Z · comments (7)

The proper response to mistakes that have harmed others?
Ruby · 2023-12-31T04:06:31.505Z · comments (12)

[link] Dario Amodei — Machines of Loving Grace
Matrice Jacobine · 2024-10-11T21:43:31.448Z · comments (26)

Natural Latents Are Not Robust To Tiny Mixtures
johnswentworth · 2024-06-07T18:53:36.643Z · comments (8)

What is SB 1047 *for*?
Raemon · 2024-09-05T17:39:39.871Z · comments (8)

Balsa Update and General Thank You
Zvi · 2023-12-12T20:30:03.980Z · comments (8)

[question] We might be dropping the ball on Autonomous Replication and Adaptation.
Charbel-Raphaël (charbel-raphael-segerie) · 2024-05-31T13:49:11.327Z · answers+comments (30)

Vote on worthwhile OpenAI topics to discuss
Ben Pace (Benito) · 2023-11-21T00:03:03.898Z · comments (55)

AI #78: Some Welcome Calm
Zvi · 2024-08-22T14:20:10.812Z · comments (15)

An Actually Intuitive Explanation of the Oberth Effect
Isaac King (KingSupernova) · 2024-01-10T20:23:17.216Z · comments (33)

[question] What do we know about the AI knowledge and views, especially about existential risk, of the new OpenAI board members?
Zvi · 2024-03-11T14:55:05.128Z · answers+comments (2)

MATS Alumni Impact Analysis
utilistrutil · 2024-09-30T02:35:57.273Z · comments (6)

On OpenAI Dev Day
Zvi · 2023-11-09T16:10:06.646Z · comments (0)

Base LLMs refuse too
Connor Kissane (ckkissane) · 2024-09-29T16:04:21.343Z · comments (20)

Interdictor Ship
lsusr · 2024-08-19T04:59:18.487Z · comments (9)

What is "True Love"?
johnswentworth · 2024-08-18T16:05:47.358Z · comments (9)

"Epistemic range of motion" and LessWrong moderation
habryka (habryka4) · 2023-11-27T21:58:40.834Z · comments (3)

Raemon's Deliberate (“Purposeful?”) Practice Club
Raemon · 2023-11-14T18:24:19.335Z · comments (11)

There Should Be More Alignment-Driven Startups
Vaniver · 2024-05-31T02:05:06.799Z · comments (14)

AI Alignment via Slow Substrates: Early Empirical Results With StarCraft II
Lester Leong (lester-leong) · 2024-10-14T04:05:05.096Z · comments (9)

0th Person and 1st Person Logic
Adele Lopez (adele-lopez-1) · 2024-03-10T00:56:14.446Z · comments (28)

Showing SAE Latents Are Not Atomic Using Meta-SAEs
Bart Bussmann (Stuckwork) · 2024-08-24T00:56:46.048Z · comments (9)

Pollsters Should Publish Question Translations
jefftk (jkaufman) · 2024-09-08T22:10:04.932Z · comments (3)

Self-explaining SAE features
Dmitrii Kharlapenko (dmitrii-kharlapenko) · 2024-08-05T22:20:36.041Z · comments (13)

5 Physics Problems
DaemonicSigil · 2024-03-18T08:05:45.971Z · comments (0)

Approaching Human-Level Forecasting with Language Models
Fred Zhang (fred-zhang) · 2024-02-29T22:36:34.012Z · comments (6)

[link] How do open AI models affect incentive to race?
jessicata (jessica.liu.taylor) · 2024-05-07T00:33:20.658Z · comments (13)

[link] Is Claude a mystic?
jessicata (jessica.liu.taylor) · 2024-06-07T04:27:09.118Z · comments (23)

[link] Results from an Adversarial Collaboration on AI Risk (FRI)
Josh Rosenberg (josh-rosenberg) · 2024-03-11T20:00:24.642Z · comments (3)

[link] Linkpost: Memorandum on Advancing the United States’ Leadership in Artificial Intelligence
Nisan · 2024-10-25T04:37:00.828Z · comments (2)

Originality vs. Correctness
alkjash · 2023-12-06T18:51:49.531Z · comments (17)

New paper shows truthfulness & instruction-following don't generalize by default
joshc (joshua-clymer) · 2023-11-19T19:27:30.735Z · comments (0)

[link] Towards shutdownable agents via stochastic choice
EJT (ElliottThornley) · 2024-07-08T10:14:24.452Z · comments (7)

Thoughts on SB-1047
ryan_greenblatt · 2024-05-29T23:26:14.392Z · comments (1)

What's next for the field of Agent Foundations?
Nora_Ammann · 2023-11-30T17:55:13.982Z · comments (23)

[link] More people getting into AI safety should do a PhD
AdamGleave · 2024-03-14T22:14:48.855Z · comments (24)

Understanding SAE Features with the Logit Lens
Joseph Bloom (Jbloom) · 2024-03-11T00:16:57.429Z · comments (0)

AI #81: Alpha Proteo
Zvi · 2024-09-12T13:00:07.958Z · comments (3)

[link] Electrostatic Airships?
DaemonicSigil · 2024-10-27T04:32:34.852Z · comments (13)

[link] shoes with springs
bhauth · 2023-12-30T21:46:55.319Z · comments (6)

Feature Targeted LLC Estimation Distinguishes SAE Features from Random Directions
Lidor Banuel Dabbah · 2024-07-19T20:32:15.095Z · comments (6)

[link] Linkpost: Surely you can be serious
kave · 2024-07-18T22:18:09.271Z · comments (8)

The Sense Of Physical Necessity: A Naturalism Demo (Introduction)
LoganStrohl (BrienneYudkowsky) · 2024-02-24T02:56:31.458Z · comments (1)

Rationalists are missing a core piece for agent-like structure (energy vs information overload)
tailcalled · 2024-08-17T09:57:19.370Z · comments (9)

[link] An Opinionated Evals Reading List
Marius Hobbhahn (marius-hobbhahn) · 2024-10-15T14:38:58.778Z · comments (0)

Does AI risk “other” the AIs?
Joe Carlsmith (joekc) · 2024-01-09T17:51:47.020Z · comments (3)

D&D.Sci: The Mad Tyrant's Pet Turtles
abstractapplic · 2024-03-29T16:22:13.732Z · comments (18)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

adam_scholl on JargonBot Beta Test

I also use them rarely, fwiw. Maybe I'm missing some more productive use, but I've experimented a decent amount and have yet to find a way to make them even neutral (much less helpful) for my thinking or writing.

rom on MichaelDickens's Shortform

I agree with the claim you're making: that if FHI still existed and they applied for a grant from OP it would be rejected. This seems true to me.

I don't mean to nitpick, but it still feels misleading to claim "FHI could not get OP funding" when they did in fact get lots of funding from OP. It implies that FHI operated without any help from OP, which isn't true.

everydaybought on [deleted]

Preventing the onslaught of spam on the internet using digital ID's:

As LLM's start passing the turing test and beating CAPTCHA's, spammers will soon be able to pass as humans. Right now, people often draw conclusions and whole worldviews from interactions and consensus they observe online. But when bots are indistinguishable from humans, whoever has the most computing power will have the most representation online, and will be able to skew our perception of the world.

To prevent this, I think it's crucial for our sanity and epistemics that we have strong private digital identities so you can see next to a profile whether it's is a person or a bot. In order to protect anonymity, the system could use clever cryptography allowing people to prove that they are a real person but without revealing who they are (for things like whistleblowers etc). Alternatively, these systems could be limited to only knowing that you haven't spammed a certain number of requests in the past few minutes, still while protecting your anonymity.

The internet needs to be conducive to people forming consensus around facts since so many people nowadays base their opinions based on what they see online. I hope people lobby for digital ID systems to keep the internet from devolving.

tailcalled on What can we learn from insecure domains?

In crypto, a lot of people just HODL instead of using it for stuff in practice. I'd guess the more people use it, the more likely they are to run into one of the 99.9% of projects that are scams. (Though... if we count the people who've been hit by ransomware, it is non-obvious to me that the majority of users are HODLers rather than ransomeware victims.) To prevent losing one's crypto, there have also been developed techniques like "cold storage", which are extremely secure.

The HTTP server logs you posted aren't based on insecurity of most webservers, they are based on the insecurity of particular programs (or versions of programs or setups of programs). Important systems (e.g. online banking) almost always use different systems than the ones that are currently getting attacked. Attacks roll the dice in the hope that maybe they'll find someone with a known vulnerability to exploit, but presumably such exploits are extremely temporary.

Copilot is general instructed via the user of the program, and the user and is relatively trusted. I mean, people are still trying to "align" to be robust against the user, but 99.9% of the time that doesn't matter, and the remaining time is often stuff like internet harassment which is definitely not existentially risky, even if it is bad.

Some people are trying to introduce LLM agents into more general places, e.g. shops automatically handling emails from businesses. I'm pretty skeptical about this being secure, but if it turns out to be hopelessly insecure, I'd expect the shops to just decline using them.

Nuclear weapons were used twice when only the US had them. They only became existentially dangerous as multiple parties built up enormous stockpiles of them, but at the same time people understood that they were existentially dangerous and therefore avoided using them in war. More recently they've agreed that keeping such things around is bad and have been disassembling them under mutual surveillance. And they have systems set up to prevent other, less-stable countries from developing them.

habryka4 on The Compendium, A full argument about extinction risk from AGI

Like, here's a sanity-check: suppose you must convince a specific Creationist that the AGI Risk is real. Do you need to argue them out of Creationism in order to do so?

My guess is no, but also, my guess is we will probably still have better comms if I err on the side of explaining things how they come naturally to me, and entangled with the way I came to adopt a position, and then they can do a bunch of the work of generalizing. Of course, if something is deeply triggering or mindkilly to someone, then it's worth routing, but it's not like any analogy with evolution is invalid from the perspective of someone who believes in Creationism. Yes, some of the force of such an analogy would be lost, but most of it comes from the logical consistency, not the empirical evidence.

habryka4 on MichaelDickens's Shortform

In 2023/2024 OP drastically changed it's funding process and priorities (in part in response to FTX, in part in response to Dustin's preferences). This whole conversation is about the shift in OPs giving in this recent time period.

See also: https://forum.effectivealtruism.org/posts/foQPogaBeNKdocYvF/linkpost-an-update-from-good-ventures [EA · GW]

rom on MichaelDickens's Shortform

FHI could not get OP funding

Can you elaborate on what you mean by this?

OP appears to have been one of FHI's biggest funders according to Sandberg:^[1]

Eventually, Open Philanthropy became FHI’s most important funder, making two major grants: £1.6m in 2017, and £13.3m in 2018. Indeed, the donation behind this second grant was at the time the largest in the Faculty of Philosophy’s history (although, owing to limited faculty administrative capacity for hiring and the subsequent hiring freezes it imposed, a large part of this grant would remain unspent). With generous and unrestricted funding from a foundation that was aligned with FHI’s mission, we were free to expand our research in ways we thought would make the most difference.

The hiring (and fundraising) freeze imposed by Oxf began in 2020.

^{^}
See page 15

everydaybought on [deleted]

Prediction Market Manipulation Could Prevent Catastrophes:

TLDR: Risk premia incentivize people to manipulate the underlying events in prediction markets, and prevent large scale risks to markets like wars and recessions from happening.

Match-fixing is the illegal phenomenon in sports betting where an athlete bets on a game they are competing in and then changes their actions to win the bet. Something similar will likely happen with prediction markets despite its illegality. But when it does, there may be an incentive for it to happen in a way that benefits markets overall and prevents systemic risks:

According to risk premium theory, risky stocks which are correlated with the overall stock market are priced cheaper than their underlying value, making them good investments.

Prediction markets could be affected by this theory [LW · GW]: betting positions for something like "higher corporate profits", something which is positively correlated the performance of overall stock markets, might be systemically undervalued. This would create an incentive to bet that these outcomes won't happen.

Because of risk premia, events like "recession won't happen" or "war won't break out" would be better bets than their opposites since they are correlated with stock market performance. Therefore, any politician that wants to engage in "match-fixing" would have an incentive to match-fix in the direction that prevents risks.

For example, a politician could be more likely buy a betting position that "climate catastrophe won't happen," a position which is likely positively correlated with stock market performance. And then they would pass a sweeping climate proposal that prevents climate catastrophe. Similarly, a whistle-blower might bet against a presidential candidate who poses a threat to world stability and markets, and subsequently share unsavory information about them to tank their campaign.

Legalizing manipulating outcomes while betting on those same outcomes might seem wrong, but perhaps, like insider trading, it could have some benefits to society.

romeostevensit on What TMS is like

when you're stuck at the bottom of an attractor a hard kick to somewhere else can be good enough even with unknown side effects.

lc on The Compendium, A full argument about extinction risk from AGI

If it really wanted to, there would be nothing at all stopping the US military from launching a coup on its civilian government.

There are enormous hurdles preventing the U.S. military from overthrowing the civilian government.

The confusion in your statement is caused by blocking up all the members of the armed forces in the term "U.S. military". Principally, a coup is an act of coordination. Any given faction or person in the U.S. military would have an extremely difficult time organizing the forces necessary without being stopped by civilian or military law enforcement first, and then maintaining control of their civilian government afterwards without the legitimacy of democratic governance.

In general, "more powerful entities control weaker entities" is a constant. If you see something else, your eyes are probably betraying you.