LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] The Cognitive-Theoretic Model of the Universe: A Partial Summary and Review
jessicata (jessica.liu.taylor) · 2024-03-27T19:59:27.893Z · comments (36)

EU policymakers reach an agreement on the AI Act
tlevin (trevor) · 2023-12-15T06:02:44.668Z · comments (7)

[link] Is "superhuman" AI forecasting BS? Some experiments on the "539" bot from the Centre for AI Safety
titotal (lombertini) · 2024-09-18T13:07:40.754Z · comments (3)

OpenAI: Leaks Confirm the Story
Zvi · 2023-12-12T14:00:04.812Z · comments (9)

Reward hacking behavior can generalize across tasks
Kei · 2024-05-28T16:33:50.674Z · comments (5)

Secondary forces of debt
KatjaGrace · 2024-06-27T21:10:06.131Z · comments (18)

Send us example gnarly bugs
Beth Barnes (beth-barnes) · 2023-12-10T05:23:00.773Z · comments (10)

Catastrophic sabotage as a major threat model for human-level AI systems
evhub · 2024-10-22T20:57:11.395Z · comments (4)

Attention SAEs Scale to GPT-2 Small
Connor Kissane (ckkissane) · 2024-02-03T06:50:22.583Z · comments (4)

Questions for labs
Zach Stein-Perlman · 2024-04-30T22:15:55.362Z · comments (11)

The Parable Of The Fallen Pendulum - Part 2
johnswentworth · 2024-03-12T21:41:30.180Z · comments (8)

MATS Summer 2023 Retrospective
utilistrutil · 2023-12-01T23:29:47.958Z · comments (34)

Creating unrestricted AI Agents with Command R+
Simon Lermen (dalasnoin) · 2024-04-16T14:52:50.917Z · comments (13)

[link] [Linkpost] Practically-A-Book Review: Rootclaim $100,000 Lab Leak Debate
trevor (TrevorWiesinger) · 2024-03-28T16:03:36.452Z · comments (22)

ACX Covid Origins Post convinced readers
ErnestScribbler · 2024-05-01T13:06:20.818Z · comments (7)

Value fragility and AI takeover
Joe Carlsmith (joekc) · 2024-08-05T21:28:07.306Z · comments (5)

Universal Love Integration Test: Hitler
Raemon · 2024-01-10T23:55:35.526Z · comments (65)

Darwinian Traps and Existential Risks
KristianRonn · 2024-08-25T22:37:14.142Z · comments (14)

What is malevolence? On the nature, measurement, and distribution of dark traits
David Althaus (wallowinmaya) · 2024-10-23T08:41:33.197Z · comments (13)

Bitter lessons about lucid dreaming
avturchin · 2024-10-16T21:27:04.725Z · comments (61)

[link] AI takeoff and nuclear war
owencb · 2024-06-11T19:36:24.710Z · comments (6)

Scaffolding for "Noticing Metacognition"
Raemon · 2024-10-09T17:54:13.657Z · comments (4)

[question] What could a policy banning AGI look like?
TsviBT · 2024-03-13T14:19:07.783Z · answers+comments (23)

Mid-conditional love
KatjaGrace · 2024-04-17T04:00:08.341Z · comments (21)

Lying Alignment Chart
Zack_M_Davis · 2023-11-29T16:15:28.102Z · comments (17)

On Claude 3.0
Zvi · 2024-03-06T18:50:04.766Z · comments (5)

Coherence of Caches and Agents
johnswentworth · 2024-04-01T23:04:31.320Z · comments (9)

Grief is a fire sale
Nathan Young · 2024-03-04T01:11:06.882Z · comments (1)

[Valence series] 3. Valence & Beliefs
Steven Byrnes (steve2152) · 2023-12-11T20:21:30.570Z · comments (11)

My 10-year retrospective on trying SSRIs
Kaj_Sotala · 2024-09-22T20:30:02.483Z · comments (10)

[link] The Offense-Defense Balance Rarely Changes
Maxwell Tabarrok (maxwell-tabarrok) · 2023-12-09T15:21:23.340Z · comments (23)

Analogies between scaling labs and misaligned superintelligent AI
scasper · 2024-02-21T19:29:39.033Z · comments (5)

JargonBot Beta Test
Raemon · 2024-11-01T01:05:26.552Z · comments (52)

Why I quit effective altruism, and why Timothy Telleen-Lawton is staying (for now)
Elizabeth (pktechgirl) · 2024-10-22T18:20:01.194Z · comments (77)

[link] Claude 3.5 Sonnet
Zach Stein-Perlman · 2024-06-20T18:00:35.443Z · comments (41)

Rationality Quotes - Fall 2024
Screwtape · 2024-10-10T18:37:55.013Z · comments (22)

The Obliqueness Thesis
jessicata (jessica.liu.taylor) · 2024-09-19T00:26:30.677Z · comments (16)

AISC9 has ended and there will be an AISC10
Linda Linsefors · 2024-04-29T10:53:18.812Z · comments (4)

Vote on Anthropic Topics to Discuss
Ben Pace (Benito) · 2024-03-06T19:43:47.194Z · comments (55)

On the CrowdStrike Incident
Zvi · 2024-07-22T12:40:05.894Z · comments (14)

My guess at Conjecture's vision: triggering a narrative bifurcation
Alexandre Variengien (alexandre-variengien) · 2024-02-06T19:10:42.690Z · comments (12)

[link] Bengio's Alignment Proposal: "Towards a Cautious Scientist AI with Convergent Safety Bounds"
mattmacdermott · 2024-02-29T13:59:34.959Z · comments (19)

[link] The problems with the concept of an infohazard as used by the LW community [Linkpost]
Noosphere89 (sharmake-farah) · 2023-12-22T16:13:54.822Z · comments (43)

(Not) Derailing the LessOnline Puzzle Hunt
Error · 2024-06-04T01:28:31.688Z · comments (2)

Idealized Agents Are Approximate Causal Mirrors (+ Radical Optimism on Agent Foundations)
Thane Ruthenis · 2023-12-22T20:19:13.865Z · comments (14)

Neural uncertainty estimation review article (for alignment)
Charlie Steiner · 2023-12-05T08:01:32.723Z · comments (3)

SAE-VIS: Announcement Post
CallumMcDougall (TheMcDouglas) · 2024-03-31T15:30:49.079Z · comments (8)

Dentistry, Oral Surgeons, and the Inefficiency of Small Markets
GeneSmith · 2024-11-01T17:26:06.466Z · comments (16)

A Simple Toy Coherence Theorem
johnswentworth · 2024-08-02T17:47:50.642Z · comments (16)

Could randomly choosing people to serve as representatives lead to better government?
John Huang · 2024-10-21T17:10:20.920Z · comments (12)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

exceph on Scissors Statements for President?

In my experience, the first step in reconciling conflict is to understand one's own values, before listening to those of others. There are multiple reasons for this step, but the one relevant to your point is that by reflecting on the tradeoffs that I accept or reject and why, I can feel secure in listening to someone else's point of view. If their approach addresses my own concerns, then I can recognize it and that dissolves the disagreement. If it doesn't, then I know enough about what I really want to suggest modifications to their approach that would address my concerns. Either way, it keeps me safe from value-drift, especially on important principles like ethics.

Just because someone else has valid concerns doesn't mean I have to give up any of my own, but it doesn't mean we're at an impasse either. Humans have a habit of turning disagreements into false dichotomies. When they listen to each other, the conversation becomes, "alright, I understand your concerns, but you understand why mine are more important, right?" They are so quick to ask other people to sacrifice their values that they don't think of exploring alternative approaches, ones that can change the situation to fulfill the values of all the stakeholders. That's what I'm working on changing.

Does that all make sense?

cossontvaldes on I turned decision theory problems into memes about trolleys

I also found this hard to parse. I suggest the following edit:

Omega will send you the following message whenever it is true: "Exactly one of the following statements is true: (1) you will not pull the lever (2) the stranger will not pull the lever " You receive the message. Do you pull the lever?

abandon on Does the "ancient wisdom" argument have any validity? If a particular teaching or tradition is old, to what extent does this make it more trustworthy?

I’ve reread the comment thread and I think I’ve figured out what went wrong here. Starting from a couple posts ago, it looks like you were assuming that the reason I thought you were wrong was that I disagreed with your reasons for believing that people sometimes feel that way, and were trying to offer arguments for that point. I, on the other hand, found it obvious that the issue was that you were privileging the hypothesis, and was confused about why you were arguing the object-level premises of the post, which I hadn’t mentioned; this led me to assume it was a non-sequiter and respond with attempted clarifications of the presumed misunderstanding.
To clarify, I agree that some people view old things negatively. I don’t take issue with the claim that they do; I take issue with the claim that this is the likeliest or only possible explanation. (I do, however, think disagree-voting Anders' comment is a somewhat implausible way for someone to express that feeling, which for me is a reason to downweight the hypothesis.) I think you’re failing to consider sufficient breadth in the hypothesis-space, and in particular the mental move of assuming my disagreement was with the claim that your hypothesis is possible (rather than several steps upstream of that) is one which can make it difficult to model things accurately.

matt-putz on 5 homegrown EA projects, seeking small donors

Thanks for the feedback! I’ll forward it to our team.

I think I basically agree with you that from reading the RFP page, this project doesn’t seem like a central example of the projects we’re describing (and indeed, many of the projects we do fund through this RFP are more like the examples given on the RFP page).

Some quick reactions:

FWIW, our team generally makes a lot of grants that are <$100k (much more so than other Open Phil teams).
I agree the application would probably take most people longer than the description that Gavin gave on Manifund. That said, I think it’s still relatively lean considering the distribution of projects we fund, though I agree it’s slightly long for projects as small as this one (but I think Gavin could have filled it out in <<2 days). For reference, this is our form.
Regarding turnaround time, my guess is for this project, we would have taken significantly less than 3 months, especially if they had indicated that receiving a decision was time-sensitive. For reference, the form currently says:

We expect to make most funding decisions in 3 months or less (assuming prompt responses to any follow-up questions we may have), and we may or may not be able to accommodate requests for greater time-sensitivity. Applicants asking for over $500K should expect a decision to take the full 3 months (or more, in particularly complex cases), and apply in advance accordingly. We’ll let you know as soon as we can if we anticipate a longer than 3-month decision timeline. [emphasis in original]

For $500k+ projects, I think a 3-month turnaround time is more defensible, though I do personally wish we generally had faster response times.

charlie-steiner on Understanding incomparability versus incommensurability in relation to RLHF

Is separate evaluation + context-insensitive aggregation actually how helpful + harmless is implemented in a reward model for any major LLM? I think Anthropic uses finetuning on a mixture of specialized training sets (plus other supervision that's more holistic) which is sort of like this but allows the model to generalize in a way that compresses the data, not just a way that keeps the same helpful/harmless tradeoff.

Anyhow, of course we'd like to use the "beneficial for humanity" goal, but sadly we don't have access to it at the moment :D Working on it.

abandon on Does the "ancient wisdom" argument have any validity? If a particular teaching or tradition is old, to what extent does this make it more trustworthy?

... No, I mean I'm discussing your statement "I'm curious why you were downvoted.... I will just assume that they're rationalists who dislike (and look down on) traditional/old things for moral reasons. This is not very flattering of me but I can't think of better explanations." I think the explanation you thought of is not a very likely one, and that you should not assume that it is true, but rather assume that you don't know and (if you care enough to spend the time) keep trying to think of explanations. I'm not taking any position on Anders' statement, though in the interests of showing the range of possibilities I'll offer some alternative explanations for why someone might have disagree-voted it.
-They might think that stuff that works is mixed with stuff that doesn't
-They might think that trial and error is not very powerful in this context
-They might think that wisdom which works often comes with reasonably-accurate causal explanations
-They might think that ancient wisdom is good and Anders is being unfairly negative about it
-They might think that ancient wisdom doesn't usually apply to real problems
Et cetera. There are a lot of possible explanations, and I think being confident it's the specific one you thought of is unwarranted.

gordon-seidoh-worley on Fundamental Uncertainty: Chapter 9 - How do we live with uncertainty?

Yes, red is perhaps the most useful to color to be able to see! That's why I chose to use it in this example.

abandon on Quantum Immortality: A Perspective if AI Doomers are Probably Right

I would argue that personal identity is based on a destructible body.

startattheend on Does the "ancient wisdom" argument have any validity? If a particular teaching or tradition is old, to what extent does this make it more trustworthy?

You don't think the entire western world is biased in favor of science to a degree which is a little naive? In addition to this, I think that people idolize intelligence and famous scientists, that they largely consider people born before the 1950s to have repulsive moral values, that they dislike tradition, that they consider it very important to be "educated", that they overestimate book smarts and underestimate the common sense of people living simple lives, and that they believe that things generally improve over time (such that older books are rarely worth bothering with), and I believe that social status in general make people associate with newer ideas over older ones. There's also a lot of people who have grown up around old, strict and religious people and who now dislike these. It doesn't help it that more intelligent people are higher in openness in general, and that rationalism correlates with a materialistic and mechanical worldview.

Many topics receive a lot more hostility than they deserve because of these biases, and usually because they're explained in a crazy way (for instance, Carl Jungs ideas are often called pseudoscience, and if you take the bible literally then it's clearly wrong) or because people associate them with immorality (say, the idea that casual sex is disliked by traditional because they were mean and narrow-minded, and not because casual sex caused problems for them, or because it might cause problems for us)

A lot of things are disliked or discarded despite being useful, and a lot of wisdom is in this category. All of this was packed in the message that "people dislike old things because it sounds irrational or immoral" (people tend to dislike long comments)

vladimir_nesov on Quantum Immortality: A Perspective if AI Doomers are Probably Right

who seems to think that first-person perspective is illusion and only third-person perspective is real

The taste of cheese is quite real, it's just not a technical consideration relevant for chip design. Concepts worth noticing are usually meaningful in some way, but most of them are unclear and don't offer a technical foothold in any given endeavor.