LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

AIS terminology proposal: standardize terms for probability ranges
eggsyntax · 2024-08-30T15:43:39.857Z · comments (12)

[link] Safety tax functions
owencb · 2024-10-20T14:08:38.099Z · comments (0)

The slingshot helps with learning
Wilson Wu (wilson-wu) · 2024-10-31T23:18:16.762Z · comments (0)

AI #85: AI Wins the Nobel Prize
Zvi · 2024-10-10T13:40:07.286Z · comments (6)

Bay Winter Solstice 2024: Speech Auditions
ozymandias · 2024-11-04T22:31:38.680Z · comments (0)

Open Thread Fall 2024
habryka (habryka4) · 2024-10-05T22:28:50.398Z · comments (99)

"Full Automation" is a Slippery Metric
ozziegooen · 2024-06-11T19:56:49.855Z · comments (1)

Some comments on intelligence
Viliam · 2024-08-01T15:17:07.215Z · comments (5)

[link] AISN #28: Center for AI Safety 2023 Year in Review
aogara (Aidan O'Gara) · 2023-12-23T21:31:40.767Z · comments (1)

Announcing SPAR Summer 2024!
laurenmarie12 · 2024-04-16T08:30:31.339Z · comments (2)

[link] Evaluating Stability of Unreflective Alignment
james.lucassen · 2024-02-01T22:15:40.902Z · comments (3)

The Third Gemini
Zvi · 2024-02-20T19:50:05.195Z · comments (2)

AI Constitutions are a tool to reduce societal scale risk
Sammy Martin (SDM) · 2024-07-25T11:18:17.826Z · comments (2)

AI #62: Too Soon to Tell
Zvi · 2024-05-02T15:40:04.364Z · comments (8)

Running the Numbers on a Heat Pump
jefftk (jkaufman) · 2024-02-09T03:00:04.920Z · comments (12)

[link] Anthropic: Reflections on our Responsible Scaling Policy
Zac Hatfield-Dodds (zac-hatfield-dodds) · 2024-05-20T04:14:44.435Z · comments (21)

[link] 2024 State of the AI Regulatory Landscape
Deric Cheng (deric-cheng) · 2024-05-28T11:59:06.582Z · comments (0)

Interpreting Quantum Mechanics in Infra-Bayesian Physicalism
Yegreg · 2024-02-12T18:56:03.967Z · comments (6)

AI #74: GPT-4o Mini Me and Llama 3
Zvi · 2024-07-25T13:50:06.528Z · comments (6)

Two Tales of AI Takeover: My Doubts
Violet Hour · 2024-03-05T15:51:05.558Z · comments (8)

[link] Baking vs Patissing vs Cooking, the HPS explanation
adamShimi · 2024-07-17T20:29:09.645Z · comments (16)

[question] Weighing reputational and moral consequences of leaving Russia or staying
spza · 2024-02-18T19:36:40.676Z · answers+comments (24)

[link] My MATS Summer 2023 experience
James Chua (james-chua) · 2024-03-20T11:26:14.944Z · comments (0)

Experiments with an alternative method to promote sparsity in sparse autoencoders
Eoin Farrell · 2024-04-15T18:21:48.771Z · comments (7)

Reviewing the Structure of Current AI Regulations
Deric Cheng (deric-cheng) · 2024-05-07T12:34:17.820Z · comments (0)

Weekly newsletter for AI safety events and training programs
Bryce Robertson (bryceerobertson) · 2024-05-03T00:33:29.418Z · comments (0)

AI #61: Meta Trouble
Zvi · 2024-05-02T18:40:03.242Z · comments (0)

I played the AI box game as the Gatekeeper — and lost
datawitch · 2024-02-12T18:39:35.777Z · comments (52)

AI labs can boost external safety research
Zach Stein-Perlman · 2024-07-31T19:30:16.207Z · comments (1)

DPO/PPO-RLHF on LLMs incentivizes sycophancy, exaggeration and deceptive hallucination, but not misaligned powerseeking
tailcalled · 2024-06-10T21:20:11.938Z · comments (13)

Paper Summary: Princes and Merchants: European City Growth Before the Industrial Revolution
Jeffrey Heninger (jeffrey-heninger) · 2024-07-15T21:30:04.043Z · comments (1)

[link] Abs-E (or, speak only in the positive)
dkl9 · 2024-02-19T21:14:32.095Z · comments (24)

[link] The Poker Theory of Poker Night
omark · 2024-04-07T09:47:01.658Z · comments (13)

Wholesome Culture
owencb · 2024-03-01T12:08:17.877Z · comments (3)

Aggregative Principles of Social Justice
Cleo Nardo (strawberry calm) · 2024-06-05T13:44:47.499Z · comments (10)

[LDSL#4] Root cause analysis versus effect size estimation
tailcalled · 2024-08-11T16:12:14.604Z · comments (0)

[link] My Methodological Turn
adamShimi · 2024-09-29T15:01:45.986Z · comments (0)

[link] [Paper] Hidden in Plain Text: Emergence and Mitigation of Steganographic Collusion in LLMs
Yohan Mathew (ymath) · 2024-09-25T14:52:48.263Z · comments (2)

Examples of How I Use LLMs
jefftk (jkaufman) · 2024-10-14T17:10:04.597Z · comments (2)

[question] Where to find reliable reviews of AI products?
Elizabeth (pktechgirl) · 2024-09-17T23:48:25.899Z · answers+comments (6)

Searching for phenomenal consciousness in LLMs: Perceptual reality monitoring and introspective confidence
EuanMcLean (euanmclean) · 2024-10-29T12:16:18.448Z · comments (7)

[link] AI forecasting bots incoming
Dan H (dan-hendrycks) · 2024-09-09T19:14:31.050Z · comments (44)

[link] Liquid vs Illiquid Careers
vaishnav92 · 2024-10-20T23:03:49.725Z · comments (6)

[link] Debate helps supervise human experts [Paper]
habryka (habryka4) · 2023-11-17T05:25:17.030Z · comments (6)

Quick Thoughts on Our First Sampling Run
jefftk (jkaufman) · 2024-05-23T00:20:02.050Z · comments (3)

[link] What fuels your ambition?
Cissy · 2024-01-31T18:30:53.274Z · comments (1)

“Clean” vs. “messy” goal-directedness (Section 2.2.3 of “Scheming AIs”)
Joe Carlsmith (joekc) · 2023-11-29T16:32:30.068Z · comments (1)

[question] How does it feel to switch from earn-to-give?
Neil (neil-warren) · 2024-03-31T16:27:22.860Z · answers+comments (4)

D&D.Sci Hypersphere Analysis Part 1: Datafields & Preliminary Analysis
aphyer · 2024-01-13T20:16:39.480Z · comments (1)

Investigating Bias Representations in LLMs via Activation Steering
DawnLu · 2024-01-15T19:39:14.077Z · comments (4)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

poignardazur on A basic systems architecture for AI agents that do autonomous research

I don't find that satisfying. Anyone can point out that a perimeter is "reasonably hard" to breach by pointing at a high wall topped with barbed wire, and naive observers will absolutely agree that the wall sure is very high and sure is made of reinforced concrete.

The perimeter is still trivially easy to breach if, say, the front desk is susceptible to social engineering tactics.

Claiming that an architecture is even reasonably secure still requires looking at it with an attacker's mindset. If you just look at the parts of the security you like, you can make a very convincing-sounding case that still misses glaring flaws. I'm not definitely saying that's what this article does, but it sure is giving me this vibe.

cubefox on The Case Against Moral Realism

Replace in the post "morality" with "rationality" and you get a reductio ad absurdum.

lukehmiles on The hostile telepaths problem

Regarding this

Such as the moms in the abusive partners example above: each one could acknowledge her self-deception once it was safe for her abusive partner to know too. She got enough power (financial or social) to protect herself and her child, making the telepathic scan no longer a dire threat.

I would add that most abusive people don't really like crushing their loved ones and it is sometimes easy to get them to stop, eg by having a peer of the abuser get a private word with the two parties separately. I think it is common for there to be simple miscommunication/misunderstanding — the abuser does not typically actually benefit from the accusative situation.

Why haven't abuser & abusee already talked and figured this out? Well there is some force field where you can't have a normal conversation with someone who is hitting you (or you are hitting) about the hitting. Although I don't know how to put it in your terms here from this post.

lao-mein on Lao Mein's Shortform

My takeaway from the US elections is that electoral blackmail in response to party in-fighting can work, and work well.

Dearborn and many other heavily Muslim areas of the US had plurality or near-plurality support for Trump, along with double-digit vote shares for Stein. It's notable that Stein supports cutting military support for Israel, which may signal a genuine preference rather than a protest vote. Many previously Democrat-voting Muslims explicitly cited a desire to punish Democrats as a major motivator for voting Trump or Stein.

Trump also has the advantage of not being in office, meaning he can make promises for brokering peace without having to pay the cost of actually doing so.

Thus, the cost of not voting Democrat in terms of your Gaza expectations may be low, or even negative.

Whatever happens, I think Democrats are going to take Muslim concerns about Gaza more seriously in future election cycles. The blackmail worked - Muslim Americans have a credible electoral threat against Democrats in the future.

lukehmiles on The hostile telepaths problem

What gaslighting goes on in math class?

tsvibt on What are the primary drivers that caused selection pressure for intelligence in humans?

IDK, fields don't have to have names, there's just lots of work on these topics. You could start here https://en.wikipedia.org/wiki/Evolutionary_anthropology and google / google-scholar around.

See also https://www.youtube.com/watch?v=tz-L2Ll85rM&list=PL1B24EADC01219B23&index=556 (I'm linking to the whole playlist, linking to a random old one because those are the ones I remember being good, IDK about the new ones).

directedevolution on Why our politicians aren't Median

This model also seems to rely on an assumption that there are more than two viable candidates, or that voters will refuse to vote at all rather than a candidate who supports 1/2 of their policy preferences.

If there were only two candidates and all voters chose whoever was closest to their policy preference, both would occupy the 20% block, since the extremes of the party would vote for them anyway.

But if there were three rigid categories and either three candidates, one per category, or voters refused to vote for a candidate not in their preferred category, then the model predicts more extreme candidates win.

I'm torn between the two for American elections, because:

The "correlated preferences" model here feels more true to life, psychologically.
Yet American politics goes from extremely disengaged primaries to a two-candidate FPTP general election, where the median voter theorem and the "correlated preferences" model seem to predict the same thing.
Voter turnout seems like a critically important part of democratic outcomes, and a model that only takes the order of policy preferences into account, rather than the intensity of those preferences, seems too limited.
Politicians often seem startlingly incompetent at inspiring the electorate, and it seems like we should think perhaps in "efficient market hypothesis" terms, where getting a political edge is extremely difficult because if anybody knew how to do it reliably, everybody would do it and the edge would disappear. In that sense, while both models can explain facets of candidate behavior and election outcomes, neither of them really offers a sufficiently detailed picture of elections to explain specific examples of election outcomes in a satisfying way.

towards_keeperhood on What are the primary drivers that caused selection pressure for intelligence in humans?

Thanks!

What's the ect? Or do you have links for where to learn more? (What's the name of the field?)

(I thought wikipedia would give me a good overview but your list was already more useful to me.)

lukehmiles on Should CA, TX, OK, and LA merge into a giant swing state, just for elections?

I am impressed with how far you thought this through. Amend the constitution, including the constitution amendment section

lukehmiles on Should CA, TX, OK, and LA merge into a giant swing state, just for elections?

The opposing states in the coalition will simply declare war against the defectors. It's surely worth keeping your own army to keep being a swing bloc.