LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Singular learning theory and bridging from ML to brain emulations
kave · 2023-11-01T21:31:54.789Z · comments (16)

The Sequences on YouTube
Neil (neil-warren) · 2024-01-07T01:44:39.663Z · comments (9)

Quick takes on "AI is easy to control"
So8res · 2023-12-02T22:31:45.683Z · comments (49)

A list of all the deadlines in Biden's Executive Order on AI
Valentin Baltadzhiev (valentin-baltadzhiev) · 2023-11-01T17:14:31.074Z · comments (2)

[link] Manifold Markets
PeterMcCluskey · 2024-02-02T17:48:36.630Z · comments (9)

AXRP Episode 30 - AI Security with Jeffrey Ladish
DanielFilan · 2024-05-01T02:50:04.621Z · comments (0)

Virtually Rational - VRChat Meetup
Tomás B. (Bjartur Tómas) · 2024-01-28T05:52:36.934Z · comments (3)

If a little is good, is more better?
DanielFilan · 2023-11-04T07:10:05.943Z · comments (16)

[link] OpenAI Superalignment: Weak-to-strong generalization
Dalmert · 2023-12-14T19:47:24.347Z · comments (3)

Fact Finding: Simplifying the Circuit (Post 2)
Senthooran Rajamanoharan (SenR) · 2023-12-23T02:45:49.675Z · comments (3)

Testing for consequence-blindness in LLMs using the HI-ADS unit test.
David Scott Krueger (formerly: capybaralet) (capybaralet) · 2023-11-24T23:35:29.560Z · comments (2)

Useful starting code for interpretability
eggsyntax · 2024-02-13T23:13:47.940Z · comments (2)

To Boldly Code
StrivingForLegibility · 2024-01-26T18:25:59.525Z · comments (4)

Clipboard Filtering
jefftk (jkaufman) · 2024-04-14T20:50:02.256Z · comments (1)

[question] Impressions from base-GPT-4?
mishka · 2023-11-08T05:43:23.001Z · answers+comments (25)

Improving SAE's by Sqrt()-ing L1 & Removing Lowest Activating Features
Logan Riggs (elriggs) · 2024-03-15T16:30:00.744Z · comments (5)

Control Symmetry: why we might want to start investigating asymmetric alignment interventions
domenicrosati · 2023-11-11T17:27:10.636Z · comments (1)

[link] Let's Design A School, Part 2.2 School as Education - The Curriculum (General)
Sable · 2024-05-07T19:22:21.730Z · comments (3)

[question] How to Model the Future of Open-Source LLMs?
Joel Burget (joel-burget) · 2024-04-19T14:28:00.175Z · answers+comments (9)

[question] What ML gears do you like?
Ulisse Mini (ulisse-mini) · 2023-11-11T19:10:11.964Z · answers+comments (4)

$250K in Prizes: SafeBench Competition Announcement
ozhang (oliver-zhang) · 2024-04-03T22:07:41.171Z · comments (0)

A Review of In-Context Learning Hypotheses for Automated AI Alignment Research
alamerton · 2024-04-18T18:29:33.892Z · comments (4)

[link] Report: Evaluating an AI Chip Registration Policy
Deric Cheng (deric-cheng) · 2024-04-12T04:39:45.671Z · comments (0)

Decent plan prize winner & highlights
lukehmiles (lcmgcd) · 2024-01-19T23:30:34.242Z · comments (2)

[link] **In defence of Helen Toner, Adam D'Angelo, and Tasha McCauley**
mrtreasure · 2023-12-06T02:02:32.004Z · comments (3)

Decent plan prize announcement (1 paragraph, $1k)
lukehmiles (lcmgcd) · 2024-01-12T06:27:44.495Z · comments (19)

Housing Roundup #9: Restricting Supply
Zvi · 2024-07-17T12:50:05.321Z · comments (8)

[link] Announcing Open Philanthropy's AI governance and policy RFP
Julian Hazell (julian-hazell) · 2024-07-17T02:02:39.933Z · comments (0)

[link] Beware the science fiction bias in predictions of the future
Nikita Sokolsky (nikita-sokolsky) · 2024-08-19T05:32:47.372Z · comments (20)

[link] UK AISI: Early lessons from evaluating frontier AI systems
Zach Stein-Perlman · 2024-10-25T19:00:21.689Z · comments (0)

[link] Conventional footnotes considered harmful
dkl9 · 2024-10-01T14:54:01.732Z · comments (16)

[link] Fictional parasites very different from our own
Abhishaike Mahajan (abhishaike-mahajan) · 2024-09-08T14:59:39.080Z · comments (0)

Proving the Geometric Utilitarian Theorem
StrivingForLegibility · 2024-08-07T01:39:10.920Z · comments (0)

You're Playing a Rough Game
jefftk (jkaufman) · 2024-10-17T19:20:06.251Z · comments (2)

[question] When can I be numerate?
FinalFormal2 · 2024-09-12T04:05:27.710Z · answers+comments (3)

[link] SB 1047 gets vetoed
ryan_b · 2024-09-30T15:49:38.609Z · comments (1)

[question] When engaging with a large amount of resources during a literature review, how do you prevent yourself from becoming overwhelmed?
corruptedCatapillar · 2024-11-01T07:29:49.262Z · answers+comments (2)

[link] The Best Essay (Paul Graham)
Chris_Leong · 2024-03-11T19:25:42.176Z · comments (2)

[link] Arrogance and People Pleasing
Jonathan Moregård (JonathanMoregard) · 2024-02-06T18:43:09.120Z · comments (7)

[link] Transformer Debugger
Henk Tillman (henk-tillman) · 2024-03-12T19:08:56.280Z · comments (0)

Economics Roundup #1
Zvi · 2024-03-26T14:00:06.332Z · comments (4)

[link] Was a Subway in New York City Inevitable?
Jeffrey Heninger (jeffrey-heninger) · 2024-03-30T00:53:21.314Z · comments (4)

[link] Structured Transparency: a framework for addressing use/mis-use trade-offs when sharing information
habryka (habryka4) · 2024-04-11T18:35:44.824Z · comments (0)

[link] Executive Dysfunction 101
DaystarEld · 2024-05-23T12:43:13.785Z · comments (1)

Beta Tester Request: Rallypoint Bounties
lukemarks (marc/er) · 2024-05-25T09:11:11.446Z · comments (4)

Twin Peaks: under the air
KatjaGrace · 2024-05-31T01:20:04.624Z · comments (2)

[link] Sticker Shortcut Fallacy — The Real Worst Argument in the World
ymeskhout · 2024-06-12T14:52:41.988Z · comments (15)

[link] An Intuitive Explanation of Sparse Autoencoders for Mechanistic Interpretability of LLMs
Adam Karvonen (karvonenadam) · 2024-06-25T15:57:16.872Z · comments (0)

Distillation of 'Do language models plan for future tokens'
TheManxLoiner · 2024-06-27T20:57:34.351Z · comments (2)

The Wisdom of Living for 200 Years
Martin Sustrik (sustrik) · 2024-06-28T04:44:10.609Z · comments (3)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

rom on MichaelDickens's Shortform

I agree with the claim you're making: that if FHI still existed and they applied for a grant from OP it would be rejected. This seems true to me.

I don't mean to nitpick, but it still feels misleading to claim "FHI could not get OP funding" when they did in fact get lots of funding from OP. It implies that FHI operated without any help from OP, which isn't true.

everydaybought on [deleted]

Preventing the onslaught of spam on the internet using digital ID's:

As LLM's start passing the turing test and beating CAPTCHA's, spammers will soon be able to pass as humans. Right now, people often draw conclusions and whole worldviews from interactions and consensus they observe online. But when bots are indistinguishable from humans, whoever has the most computing power will have the most representation online, and will be able to skew our perception of the world.

To prevent this, I think it's crucial for our sanity and epistemics that we have strong private digital identities so you can see next to a profile whether it's is a person or a bot. In order to protect anonymity, the system could use clever cryptography allowing people to prove that they are a real person but without revealing who they are (for things like whistleblowers etc). Alternatively, these systems could be limited to only knowing that you haven't spammed a certain number of requests in the past few minutes, still while protecting your anonymity.

The internet needs to be conducive to people forming consensus around facts since so many people nowadays base their opinions based on what they see online. I hope people lobby for digital ID systems to keep the internet from devolving.

tailcalled on What can we learn from insecure domains?

In crypto, a lot of people just HODL instead of using it for stuff in practice. I'd guess the more people use it, the more likely they are to run into one of the 99.9% of projects that are scams. (Though... if we count the people who've been hit by ransomware, it is non-obvious to me that the majority of users are HODLers rather than ransomeware victims.) To prevent losing one's crypto, there have also been developed techniques like "cold storage", which are extremely secure.

The HTTP server logs you posted aren't based on insecurity of most webservers, they are based on the insecurity of particular programs (or versions of programs or setups of programs). Important systems (e.g. online banking) almost always use different systems than the ones that are currently getting attacked. Attacks roll the dice in the hope that maybe they'll find someone with a known vulnerability to exploit, but presumably such exploits are extremely temporary.

Copilot is general instructed via the user of the program, and the user and is relatively trusted. I mean, people are still trying to "align" to be robust against the user, but 99.9% of the time that doesn't matter, and the remaining time is often stuff like internet harassment which is definitely not existentially risky, even if it is bad.

Some people are trying to introduce LLM agents into more general places, e.g. shops automatically handling emails from businesses. I'm pretty skeptical about this being secure, but if it turns out to be hopelessly insecure, I'd expect the shops to just decline using them.

Nuclear weapons were used twice when only the US had them. They only became existentially dangerous as multiple parties built up enormous stockpiles of them, but at the same time people understood that they were existentially dangerous and therefore avoided using them in war. More recently they've agreed that keeping such things around is bad and have been disassembling them under mutual surveillance. And they have systems set up to prevent other, less-stable countries from developing them.

habryka4 on The Compendium, A full argument about extinction risk from AGI

Like, here's a sanity-check: suppose you must convince a specific Creationist that the AGI Risk is real. Do you need to argue them out of Creationism in order to do so?

My guess is no, but also, my guess is we will probably still have better comms if I err on the side of explaining things how they come naturally to me, and entangled with the way I came to adopt a position, and then they can do a bunch of the work of generalizing. Of course, if something is deeply triggering or mindkilly to someone, then it's worth routing, but it's not like any analogy with evolution is invalid from the perspective of someone who believes in Creationism. Yes, some of the force of such an analogy would be lost, but most of it comes from the logical consistency, not the empirical evidence.

habryka4 on MichaelDickens's Shortform

In 2023/2024 OP drastically changed it's funding process and priorities (in part in response to FTX, in part in response to Dustin's preferences). This whole conversation is about the shift in OPs giving in this recent time period.

See also: https://forum.effectivealtruism.org/posts/foQPogaBeNKdocYvF/linkpost-an-update-from-good-ventures [EA · GW]

rom on MichaelDickens's Shortform

FHI could not get OP funding

Can you elaborate on what you mean by this?

OP appears to have been one of FHI's biggest funders according to Sandberg:^[1]

Eventually, Open Philanthropy became FHI’s most important funder, making two major grants: £1.6m in 2017, and £13.3m in 2018. Indeed, the donation behind this second grant was at the time the largest in the Faculty of Philosophy’s history (although, owing to limited faculty administrative capacity for hiring and the subsequent hiring freezes it imposed, a large part of this grant would remain unspent). With generous and unrestricted funding from a foundation that was aligned with FHI’s mission, we were free to expand our research in ways we thought would make the most difference.

The hiring (and fundraising) freeze imposed by Oxf began in 2020.

^{^}
See page 15

everydaybought on [deleted]

Prediction Market Manipulation Could Prevent Catastrophes:

TLDR: Risk premia incentivize people to manipulate the underlying events in prediction markets, and prevent large scale risks to markets like wars and recessions from happening.

Match-fixing is the illegal phenomenon in sports betting where an athlete bets on a game they are competing in and then changes their actions to win the bet. Something similar will likely happen with prediction markets despite its illegality. But when it does, there may be an incentive for it to happen in a way that benefits markets overall and prevents systemic risks:

According to risk premium theory, risky stocks which are correlated with the overall stock market are priced cheaper than their underlying value, making them good investments.

Prediction markets could be affected by this theory [LW · GW]: betting positions for something like "higher corporate profits", something which is positively correlated the performance of overall stock markets, might be systemically undervalued. This would create an incentive to bet that these outcomes won't happen.

Because of risk premia, events like "recession won't happen" or "war won't break out" would be better bets than their opposites since they are correlated with stock market performance. Therefore, any politician that wants to engage in "match-fixing" would have an incentive to match-fix in the direction that prevents risks.

For example, a politician could be more likely buy a betting position that "climate catastrophe won't happen," a position which is likely positively correlated with stock market performance. And then they would pass a sweeping climate proposal that prevents climate catastrophe. Similarly, a whistle-blower might bet against a presidential candidate who poses a threat to world stability and markets, and subsequently share unsavory information about them to tank their campaign.

Legalizing manipulating outcomes while betting on those same outcomes might seem wrong, but perhaps, like insider trading, it could have some benefits to society.

romeostevensit on What TMS is like

when you're stuck at the bottom of an attractor a hard kick to somewhere else can be good enough even with unknown side effects.

lc on The Compendium, A full argument about extinction risk from AGI

If it really wanted to, there would be nothing at all stopping the US military from launching a coup on its civilian government.

There are enormous hurdles preventing the U.S. military from overthrowing the civilian government.

The confusion in your statement is caused by blocking up all the members of the armed forces in the term "U.S. military". Principally, a coup is an act of coordination. Any given faction or person in the U.S. military would have an extremely difficult time organizing the forces necessary without being stopped by civilian or military law enforcement first, and then maintaining control of their civilian government afterwards without the legitimacy of democratic governance.

In general, "more powerful entities control weaker entities" is a constant. If you see something else, your eyes are probably betraying you.

jiro on Another UFO Bet

No, because I have no way to improve my ability to see loopholes and flaws, so there's always going to be residual uncertainty that can't be reduced. Risk aversion does the rest.