LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Environmental allergies are curable? (Sublingual immunotherapy)
Chipmonk · 2023-12-26T19:05:08.880Z · comments (10)

Sora What
Zvi · 2024-02-22T18:10:05.397Z · comments (3)

Critiques of the AI control agenda
Jozdien · 2024-02-14T19:25:04.105Z · comments (14)

Goodbye, Shoggoth: The Stage, its Animatronics, & the Puppeteer – a New Metaphor
RogerDearnaley (roger-d-1) · 2024-01-09T20:42:28.349Z · comments (8)

2023 Prediction Evaluations
Zvi · 2024-01-08T14:40:07.377Z · comments (0)

[link] Constructive Cauchy sequences vs. Dedekind cuts
jessicata (jessica.liu.taylor) · 2024-03-14T23:04:07.300Z · comments (23)

[link] on neodymium magnets
bhauth · 2024-01-30T15:58:24.088Z · comments (6)

Value learning in the absence of ground truth
Joel_Saarinen (joel_saarinen) · 2024-02-05T18:56:02.260Z · comments (8)

How to safely use an optimizer
Simon Fischer (SimonF) · 2024-03-28T16:11:01.277Z · comments (21)

[link] Soft Prompts for Evaluation: Measuring Conditional Distance of Capabilities
porby · 2024-02-02T05:49:11.189Z · comments (1)

Run evals on base models too!
orthonormal · 2024-04-04T18:43:25.468Z · comments (6)

D&D.Sci Coliseum: Arena of Data Evaluation and Ruleset
aphyer · 2024-10-29T01:21:03.075Z · comments (12)

On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback
Marcus Williams · 2024-11-07T15:39:06.854Z · comments (6)

Which evals resources would be good?
Marius Hobbhahn (marius-hobbhahn) · 2024-11-16T14:24:48.012Z · comments (4)

[link] [Paper Blogpost] When Your AIs Deceive You: Challenges with Partial Observability in RLHF
Leon Lang (leon-lang) · 2024-10-22T13:57:41.125Z · comments (0)

[link] "If we go extinct due to misaligned AI, at least nature will continue, right? ... right?"
plex (ete) · 2024-05-18T14:09:53.014Z · comments (23)

shortest goddamn bayes guide ever
lemonhope (lcmgcd) · 2024-05-10T07:06:23.734Z · comments (8)

Extended Interview with Zhukeepa on Religion
Ben Pace (Benito) · 2024-08-18T03:19:05.625Z · comments (59)

4. Existing Writing on Corrigibility
Max Harms (max-harms) · 2024-06-10T14:08:35.590Z · comments (15)

How do we know that "good research" is good? (aka "direct evaluation" vs "eigen-evaluation")
Ruby · 2024-07-19T00:31:38.332Z · comments (21)

What distinguishes "early", "mid" and "end" games?
Raemon · 2024-06-21T17:41:30.816Z · comments (22)

Caring about excellence
owencb · 2024-07-22T14:24:37.892Z · comments (4)

Some Experiments I'd Like Someone To Try With An Amnestic
johnswentworth · 2024-05-04T22:04:19.692Z · comments (33)

Rapid capability gain around supergenius level seems probable even without intelligence needing to improve intelligence
Towards_Keeperhood (Simon Skade) · 2024-05-06T17:09:10.729Z · comments (16)

AI #75: Math is Easier
Zvi · 2024-08-01T13:40:05.539Z · comments (25)

I finally got ChatGPT to sound like me
lsusr · 2024-09-17T09:39:59.415Z · comments (18)

[link] MIRI's September 2024 newsletter
Harlan · 2024-09-16T18:15:40.785Z · comments (0)

Big Picture AI Safety: Introduction
EuanMcLean (euanmclean) · 2024-05-23T11:15:44.037Z · comments (7)

Conflating value alignment and intent alignment is causing confusion
Seth Herd · 2024-09-05T16:39:51.967Z · comments (18)

[link] Michael Dickens' Caffeine Tolerance Research
niplav · 2024-09-04T15:41:53.343Z · comments (3)

Forecasting One-Shot Games
Raemon · 2024-08-31T23:10:05.475Z · comments (0)

1. The CAST Strategy
Max Harms (max-harms) · 2024-06-07T22:29:13.005Z · comments (19)

[Valence series] 4. Valence & Liking / Admiring
Steven Byrnes (steve2152) · 2024-06-10T14:19:51.194Z · comments (12)

How to hire somebody better than yourself
lemonhope (lcmgcd) · 2024-08-28T08:12:53.450Z · comments (5)

AI #68: Remarkably Reasonable Reactions
Zvi · 2024-06-13T16:30:02.969Z · comments (11)

Enriched tab is now the default LW Frontpage experience for logged-in users
Ruby · 2024-06-21T00:09:30.441Z · comments (27)

On OpenAI’s Model Spec
Zvi · 2024-06-21T13:00:03.014Z · comments (3)

Higher-effort summer solstice: What if we used AI (i.e., Angel Island)?
Rachel Shu (wearsshoes) · 2024-06-25T01:35:54.064Z · comments (9)

Bounty for Evidence on Some of Palisade Research's Beliefs
benwr · 2024-09-23T20:01:20.917Z · comments (4)

Trustworthy and untrustworthy models
Olli Järviniemi (jarviniemi) · 2024-08-19T16:27:11.088Z · comments (3)

Decision Theory in Space
lsusr · 2024-08-18T07:02:11.847Z · comments (18)

[link] Robin Hanson AI X-Risk Debate — Highlights and Analysis
Liron · 2024-07-12T21:31:02.222Z · comments (7)

Humanity isn't remotely longtermist, so arguments for AGI x-risk should focus on the near term
Seth Herd · 2024-08-12T18:10:56.543Z · comments (10)

All The Latest Human tFUS Studies
sarahconstantin · 2024-08-09T22:20:04.561Z · comments (2)

On the Proposed California SB 1047
Zvi · 2024-02-12T16:40:04.854Z · comments (18)

AI Safety 101 : Capabilities - Human Level AI, What? How? and When?
markov (markovial) · 2024-03-07T17:29:53.260Z · comments (8)

I'm open for projects (sort of)
cousin_it · 2024-04-18T18:05:01.395Z · comments (13)

[link] If Clarity Seems Like Death to Them
Zack_M_Davis · 2023-12-30T17:40:42.622Z · comments (191)

Saving the world sucks
Defective Altruism (Elijah Bodden) · 2024-01-10T05:55:46.504Z · comments (29)

AI #41: Bring in the Other Gemini
Zvi · 2023-12-07T15:10:05.552Z · comments (16)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

cleo-nardo on Counting AGIs

Thanks for putting this together — very useful!

cesiumquail on Ultralearning in 80 days

Cool! I’m going to add my thoughts here, but I’m no authority so feel free to ignore and do whatever feels best.

Waking up early is fine as long as you’re also going to bed early. Chronic sleep deprivation is bad.

If you’re studying CS, give special attention to machine learning and the current AI landscape. It’s hard to predict what AI will look like in five years, but it’s the most important thing to be tracking.

If learning Quenya is fun and intrinsically rewarding, then that’s great, but if you’re doing it for practical reasons there are probably more efficient options. I actually have a system for writing things I don’t want anyone to read. I write in English, but I replace key words with other words based on associations that only I would find meaningful. This requires no preparatory memorization and is basically impossible to decrypt without my brain, as long as I don’t give away the meaning with context clues.

For writing, the two essential things are to have good ideas and to communicate them clearly. In my opinion Scott Alexander is the best example of this, so here’s his guide to nonfiction writing. I endorse just copying his style unless you find something you like better.

I would add a few things about writing:

Make everything predictable and standard except the most important parts that you want to emphasize.
Be honest and use the tone that feels most natural.
Spend most of your effort searching for the best ideas. Then just write them down clearly.

For general rationality, books aren’t all that helpful in my opinion. There’s a sensitivity to the specifics of each situation that’s hard to transmit except by direct example. I think you would get more out of following people who seem smart. I endorse Eliezer Yudkowsky, Scott Alexander, Wei Dai, Gwern, Connor Leahy, Dwarkesh Patel, and Stefan Schubert.

abandon on If You Demand Magic, Magic Won't Help

How much of this was written by an LLM?

renato on Ultralearning in 80 days

I guess you will have several recurrent tasks and some short/medium-term goals, then i'd recommend using something like this to track how calibrated your predictions/estimations are:

https://www.lesswrong.com/posts/8JEHPAcJ6ppywtkqK/calibrated-estimation-of-workload [LW · GW]

It helps you not only to organize what you are doing and how are you progressing, but also to cultivate a better sense of how to estimate what you can do and get used to develop a quantified way to make predictions using the shorter feedback of your tasks. It doesn't automatically translate to other domains, but at least you will already have a better framework to make predictions about other things, e.g., you will have a clearer idea what it means to say that "something should happen with a x % chance."

It doesn't takes much effort after you get used to it, and if you are going to keep a to-do list, the predictions add almost no extra burden. Checking the results is mostly automatic (you can experiment with other ways to look at the data, ex. based on how long are the predictions or for a specific project of kind of task), and it gives you good feedback on how to adjust the predictions you will make next. And, it helps you to get a better view of what is possible to do each day and prioritize what is more important. For example, after i automatically predict what i have to do one day, i can review the predictions based on the load i know i can handle and some other past information to have a better estimation of what i expect to accomplish that day.

Additionally, there is no guilt after failing to do everything, because the idea is to push yourself and correct until you can finish the expected number of tasks.

I noticed i could push myself to more thing this way than if i had just a common to-do list to complete and i could just balance how much i need to work and how much i can just procrastinate to finish what i've set. I could also set some goals or have more abstract tasks, e.g. "finish a big project," and then start breaking it into smaller goals/tasks to track how i was progressing and to distribute the load until the deadline, instead of just work in small bursts and eventually try to do too much when the deadline was getting closer.

The only caveat is that you will game your predictions, as focusing on the ones with a higher prediction because you are expected to complete them more often and don't mess with your calibration curve, but soon you will learn to incorporate this kind of information to make your predictions. And, it is also possible to use this to your advantage later, for example, by picking a tasks that repulses you, and keep getting postponed, and assign a higher chance that you you do it, and then just do it because you said you were going to do it.

gasstationmanager on GasStationManager's Shortform

Any Lean enthusiasts here? You might be interested to check out Code with Proofs: the Arena. Test your Lean skills with our coding challenges!

Code for the website is open sourced at https://github.com/GasStationManager/CodeProofTheArena

cronodas on "The Solomonoff Prior is Malign" is a special case of a simpler argument

That might be okay. But I reserve the right to refuse to treat any possible "mind" that does not participate in the arrow of time as though it did not exist.

hleumas on a space habitat design

I don't get what is the issue with rotating cylinder and stability. As I imagine the cylinder, it has radius << length, thus his axis of rotation will be the one with the smallest possible moment of innertia and thus should be stable.

Dzhanibekov effect applies only to 2nd principal axis so should be relevant only for cylinders with radius similar to length.

noggin-scratcher on Ultralearning in 80 days

While I can appreciate it on the level of nerd aesthetics, I would be dubious of the choice of Quenya. Unless you're already a polyglot (as a demonstration of your aptitude for language-learning), it seems unlikely—without a community of speakers to immerse yourself in—that you'll reach the kind of fluid fluency that would make it natural to think in a conlang.

And if you do in fact have the capacity to acquire a language to that degree of fluency so easily, but don't already have several of the major world languages, it seems to me that the benefits of being able to communicate with an additional fraction of the world's population would outweigh those of knowing a language selected for mostly no-one else knowing it.

james-camacho on [bounty $100] Why are there no interesting (1D, 2-state) quantum cellular automata?

I think you're looking for the irreducible representations of . I'll come back and explain this later, but it's going to take awhile to write up.

dana on Crosspost: Developing the middle ground on polarized topics

I interpret the main argument as:
You cannot predict the direction of policy that would result from certain discussions/beliefs
The discussions improve the accuracy of our collective world model, which is very valuable
Therefore, we should have the discussions first and worry about policy later.

I agree that in many cases there will be unforeseen positive consequences as a result of the improved world model, but in my view, it is obviously false that we cannot make good directionally-correct predictions of this sort for many X. And the negative will clearly outweigh the positive for some large group in many cases. In that case, the question is how much you are willing to sacrifice for the collective knowledge.

If you want to highlight people who handle this well, the only interesting case is people from group A in favor of discussing X where X is presumed to lead to Y and Y negatively impacts A. Piper's X has a positive impact on her beliefs (discussing solutions to falling birth-rates as one who believes it is a problem), and Caplan's X has a positive impact on him (he is obviously high IQ), so neither of these are interesting samples. There is no reason for either of these to inherently want to avoid discussing these X. Even worse, Caplan's rejected "Y" is a clear strawman, which begs the question and actually negatively updates me on his beliefs. More realistic Ys are things like IQ-based segregation, resource allocation, reproductive policies, etc.

If I reject these Ys for ideological reasons, and the middle ground looks like what I think it looks like, I do not want to expose the middle ground.