LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] AI governance needs a theory of victory
Corin Katzke (corin-katzke) · 2024-06-21T16:15:46.560Z · comments (6)

[link] Inferring the model dimension of API-protected LLMs
Ege Erdil (ege-erdil) · 2024-03-18T06:19:25.974Z · comments (3)

ARENA4.0 Capstone: Hyperparameter tuning for MELBO + replication on Llama-3.2-1b-Instruct
25Hour (aaron-kaufman) · 2024-10-05T11:30:11.953Z · comments (2)

[question] If I have some money, whom should I donate it to in order to reduce expected P(doom) the most?
KvmanThinking (avery-liu) · 2024-10-03T11:31:19.974Z · answers+comments (36)

(Maybe) A Bag of Heuristics is All There Is & A Bag of Heuristics is All You Need
Sodium · 2024-10-03T19:11:58.032Z · comments (17)

Augmenting Statistical Models with Natural Language Parameters
jsteinhardt · 2024-09-20T18:30:10.816Z · comments (0)

Proveably Safe Self Driving Cars [Modulo Assumptions]
Davidmanheim · 2024-09-15T13:58:19.472Z · comments (26)

My disagreements with "AGI ruin: A List of Lethalities"
Noosphere89 (sharmake-farah) · 2024-09-15T17:22:18.367Z · comments (44)

One way violinists fail
Solenoid_Entity · 2024-05-29T04:08:17.675Z · comments (5)

[link] On Lies and Liars
Gabriel Alfour (gabriel-alfour-1) · 2023-11-17T17:13:03.726Z · comments (4)

Introducing REBUS: A Robust Evaluation Benchmark of Understanding Symbols
Arjun Panickssery (arjun-panickssery) · 2024-01-15T21:21:03.962Z · comments (0)

Important open problems in voting
Closed Limelike Curves · 2024-07-01T02:53:44.690Z · comments (1)

How good are LLMs at doing ML on an unknown dataset?
Håvard Tveit Ihle (havard-tveit-ihle) · 2024-07-01T09:04:03.687Z · comments (4)

[question] Do websites and apps actually generally get worse after updates, or is it just an effect of the fear of change?
lillybaeum · 2023-12-10T17:26:34.206Z · answers+comments (34)

The Consciousness Box
GradualImprovement · 2023-12-11T16:45:08.172Z · comments (22)

Templates I made to run feedback rounds for Ethan Perez’s research fellows.
Henry Sleight (ResentHighly) · 2024-03-28T19:41:15.506Z · comments (0)

Love, Reverence, and Life
Elizabeth (pktechgirl) · 2023-12-12T21:49:04.061Z · comments (7)

AI #63: Introducing Alpha Fold 3
Zvi · 2024-05-09T14:20:03.176Z · comments (2)

Musings on LLM Scale (Jul 2024)
Vladimir_Nesov · 2024-07-03T18:35:48.373Z · comments (0)

We have promising alignment plans with low taxes
Seth Herd · 2023-11-10T18:51:38.604Z · comments (9)

An illustrative model of backfire risks from pausing AI research
Maxime Riché (maxime-riche) · 2023-11-06T14:30:58.615Z · comments (3)

Rational Animations offers animation production and writing services!
Writer · 2024-03-15T17:26:07.976Z · comments (0)

Helpful examples to get a sense of modern automated manipulation
trevor (TrevorWiesinger) · 2023-11-12T20:49:57.422Z · comments (3)

Disentangling four motivations for acting in accordance with UDT
Julian Stastny · 2023-11-05T21:26:22.514Z · comments (3)

"Which chains-of-thought was that faster than?"
Emrik (Emrik North) · 2024-05-22T08:21:00.269Z · comments (4)

UDT1.01: Logical Inductors and Implicit Beliefs (5/10)
Diffractor · 2024-04-18T08:39:13.368Z · comments (2)

Confusing the metric for the meaning: Perhaps correlated attributes are "natural"
NickyP (Nicky) · 2024-07-23T12:43:18.681Z · comments (3)

Monthly Roundup #20: July 2024
Zvi · 2024-07-23T12:50:07.991Z · comments (9)

[link] patent process problems
bhauth · 2024-07-14T21:12:04.953Z · comments (13)

[link] The Cancer Resolution?
PeterMcCluskey · 2024-07-24T00:25:17.322Z · comments (24)

ChatGPT 4 solved all the gotcha problems I posed that tripped ChatGPT 3.5
VipulNaik · 2023-11-29T18:11:53.252Z · comments (16)

[link] AI Safety Memes Wiki
plex (ete) · 2024-07-24T18:53:04.977Z · comments (1)

[link] FTX expects to return all customer money; clawbacks may go away
Mikhail Samin (mikhail-samin) · 2024-02-14T03:43:13.218Z · comments (1)

Update #2 to "Dominant Assurance Contract Platform": EnsureDone
moyamo · 2023-11-28T18:02:50.367Z · comments (2)

[link] Vacuum: Theory and Technologies
ethanmorse · 2024-01-21T17:23:49.257Z · comments (0)

DIY LessWrong Jewelry
Fluffnutt (Pear) · 2024-08-25T21:33:56.173Z · comments (0)

One True Love
Zvi · 2024-02-09T15:10:05.298Z · comments (7)

Sparse autoencoders find composed features in small toy models
Evan Anders (evan-anders) · 2024-03-14T18:00:43.339Z · comments (12)

LLMs can strategically deceive while doing gain-of-function research
Igor Ivanov (igor-ivanov) · 2024-01-24T15:45:08.795Z · comments (4)

Mech Interp Lacks Good Paradigms
Daniel Tan (dtch1997) · 2024-07-16T15:47:32.171Z · comments (0)

Boston Solstice 2023 Retrospective
jefftk (jkaufman) · 2024-01-02T03:10:05.694Z · comments (0)

[link] Twitter thread on open-source AI
Richard_Ngo (ricraz) · 2024-07-31T00:26:11.655Z · comments (6)

Effectively Handling Disagreements - Introducing a New Workshop
Camille Berger (Camille Berger) · 2024-04-15T16:33:50.339Z · comments (2)

[question] Is AlphaGo actually a consequentialist utility maximizer?
faul_sname · 2023-12-07T12:41:05.132Z · answers+comments (8)

Monthly Roundup #16: March 2024
Zvi · 2024-03-19T13:10:05.529Z · comments (4)

5. Moral Value for Sentient Animals? Alas, Not Yet
RogerDearnaley (roger-d-1) · 2023-12-27T06:42:09.130Z · comments (41)

AGI will be made of heterogeneous components, Transformer and Selective SSM blocks will be among them
Roman Leventov · 2023-12-27T14:51:37.713Z · comments (9)

More on the Apple Vision Pro
Zvi · 2024-02-13T17:40:05.388Z · comments (5)

Experimentation (Part 7 of "The Sense Of Physical Necessity")
LoganStrohl (BrienneYudkowsky) · 2024-03-18T21:25:56.527Z · comments (0)

Takeaways from a Mechanistic Interpretability project on “Forbidden Facts”
Tony Wang (tw) · 2023-12-15T11:05:23.256Z · comments (8)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

ninety-three on Should CA, TX, OK, and LA merge into a giant swing state, just for elections?

This proposal increases the influence of the states, in the sense of "how much does it matter that any given person bothered to vote?", but does it increase their preference satisfaction? If the 4 states each conceive of themselves as red or blue states, then each of them will be thinking "under the current system I estimate an X% chance that we'll elect my party's president while under the new system I estimate a Y% chance we'll elect my party's president". If both sides are perfect predictors then one will conclude that Y<X so they should not do the deal. If both sides are imperfect predictors such that they both think Y>X, then the outside view still tells them it's equally likely that they're the sucker here and shouldn't participate.

aphyer on Gurkenglas's Shortform

Were whichever markets you're looking at open at this time? Most stuff doesn't trade that much out of hours.

aprilsr on Does the "ancient wisdom" argument have any validity? If a particular teaching or tradition is old, to what extent does this make it more trustworthy?

I mostly think it's too loose a heuristic and that you should dig into more details

thomas-kwa on Thomas Kwa's Shortform

What's the most important technical question in AI safety right now?

npostavs on AI #89: Trump Card

Finding two bugs in a large codebase doesn't seem especially suspicious to me.

sharmake-farah on Anthropic: Three Sketches of ASL-4 Safety Case Components

While I agree that people are in general overconfident, including LessWrongers, I don't particularly think this is because Bayesianism is philosophically incorrect, but rather due to both practical limits on computation combined with sometimes not realizing how data-poor their efforts truly are.

(There are philosophical problems with Bayesianism, but not ones that predict very well the current issues of overconfidence in real human reasoning, so I don't see why Bayesianism is so central here. Separately, while I'm not sure there can ever be a complete theory of epistemology, I do think that Bayesianism is actually quite general, and a lot of the principles of Bayesianism is probably implemented in human brains, allowing for practicality concerns like cost of compute.)

ricraz on Anthropic: Three Sketches of ASL-4 Safety Case Components

We have discussed this dynamic before but just for the record:

I think that if it became industry-standard practice for AGI corporations to write, publish, and regularly update (actual instead of just hypothetical) safety cases at at this level of rigor and detail, my p(doom) would cut in half.

This is IMO not the type of change that should be able to cut someone's P(doom) in half. There are so many different factors that are of this size and importance or bigger (including many that people simply have not thought of yet) such that, if this change could halve your P(doom), then your P(doom) should be oscillating wildly all the time.

I flag this as an example of prioritizing inside-view considerations too strongly in forecasts. I think this is the sort of problem that arises when you "take bayesianism too seriously", which is one of the reasons why I wrote my recent post on why I'm not a bayesian [LW · GW] (and also my earlier post on Knightian uncertainty [LW · GW]).

For context: our previous discussions about this related to Daniel's claim that appointing one specific person to one specific important job could change his P(doom) by double digit percentage points. I similarly think this is not the type of consideration that should be able to swing people's P(doom) that much (except maybe changing the US or Chinese leaders, but we weren't talking about those).

Lastly, since this is a somewhat critical comment, I should flag that I really appreciate and admire Daniel's forecasting, have learned a lot from him, and think he's generally a great guy. The epistemology disagreements just disproportionately bug me.

exceph on Scissors Statements for President?

In my experience, the first step in reconciling conflict is to understand one's own values, before listening to those of others. There are multiple reasons for this step, but the one relevant to your point is that by reflecting on the tradeoffs that I accept or reject and why, I can feel secure in listening to someone else's point of view. If their approach addresses my own concerns, then I can recognize it and that dissolves the disagreement. If it doesn't, then I know enough about what I really want to suggest modifications to their approach that would address my concerns. Either way, it keeps me safe from value-drift, especially on important principles like ethics.

Just because someone else has valid concerns doesn't mean I have to give up any of my own, but it doesn't mean we're at an impasse either. Humans have a habit of turning disagreements into false dichotomies. When they listen to each other, the conversation becomes, "alright, I understand your concerns, but you understand why mine are more important, right?" They are so quick to ask other people to sacrifice their values that they don't think of exploring alternative approaches, ones that can change the situation to fulfill the values of all the stakeholders. That's what I'm working on changing.

Does that all make sense?

cossontvaldes on I turned decision theory problems into memes about trolleys

I also found this hard to parse. I suggest the following edit:

Omega will send you the following message whenever it is true: "Exactly one of the following statements is true: (1) you will not pull the lever (2) the stranger will not pull the lever " You receive the message. Do you pull the lever?

abandon on Does the "ancient wisdom" argument have any validity? If a particular teaching or tradition is old, to what extent does this make it more trustworthy?

I’ve reread the comment thread and I think I’ve figured out what went wrong here. Starting from a couple posts ago, it looks like you were assuming that the reason I thought you were wrong was that I disagreed with your reasons for believing that people sometimes feel that way, and were trying to offer arguments for that point. I, on the other hand, found it obvious that the issue was that you were privileging the hypothesis, and was confused about why you were arguing the object-level premises of the post, which I hadn’t mentioned; this led me to assume it was a non-sequiter and respond with attempted clarifications of the presumed misunderstanding.
To clarify, I agree that some people view old things negatively. I don’t take issue with the claim that they do; I take issue with the claim that this is the likeliest or only possible explanation. (I do, however, think disagree-voting Anders' comment is a somewhat implausible way for someone to express that feeling, which for me is a reason to downweight the hypothesis.) I think you’re failing to consider sufficient breadth in the hypothesis-space, and in particular the mental move of assuming my disagreement was with the claim that your hypothesis is possible (rather than several steps upstream of that) is one which can make it difficult to model things accurately.