LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Auditing failures vs concentrated failures
ryan_greenblatt · 2023-12-11T02:47:35.703Z · comments (0)

What does davidad want from «boundaries»?
Chipmonk · 2024-02-06T17:45:42.348Z · comments (1)

Start an Upper-Room UV Installation Company?
jefftk (jkaufman) · 2024-10-19T02:00:10.691Z · comments (9)

Userscript to always show LW comments in context vs at the top
Vlad Sitalo (harcisis) · 2023-11-21T17:53:30.418Z · comments (8)

On Trust
johnswentworth · 2023-12-06T19:19:07.680Z · comments (26)

In defense of technological unemployment as the main AI concern
tailcalled · 2024-08-27T17:58:01.992Z · comments (36)

[link] LLM Evaluators Recognize and Favor Their Own Generations
Arjun Panickssery (arjun-panickssery) · 2024-04-17T21:09:12.007Z · comments (1)

Case Study: Interpreting, Manipulating, and Controlling CLIP With Sparse Autoencoders
Gytis Daujotas (gytis-daujotas) · 2024-08-01T21:08:38.800Z · comments (6)

Commonsense Good, Creative Good
jefftk (jkaufman) · 2023-09-27T19:50:07.486Z · comments (11)

Which LessWrong/Alignment topics would you like to be tutored in? [Poll]
Ruby · 2024-09-19T01:35:02.999Z · comments (12)

Job Listing: Managing Editor / Writer
Gretta Duleba (gretta-duleba) · 2024-02-21T23:41:26.818Z · comments (2)

Locating My Eyes (Part 3 of "The Sense of Physical Necessity")
LoganStrohl (BrienneYudkowsky) · 2024-02-29T03:09:25.810Z · comments (4)

[question] Where is the Town Square?
Gretta Duleba (gretta-duleba) · 2024-02-13T03:53:18.205Z · answers+comments (8)

Humanity isn't remotely longtermist, so arguments for AGI x-risk should focus on the near term
Seth Herd · 2024-08-12T18:10:56.543Z · comments (10)

Minimal Motivation of Natural Latents
johnswentworth · 2024-10-14T22:51:58.125Z · comments (14)

The need for multi-agent experiments
Martín Soto (martinsq) · 2024-08-01T17:14:16.590Z · comments (3)

[link] Non-alignment project ideas for making transformative AI go well
Lukas Finnveden (Lanrian) · 2024-01-04T07:23:13.658Z · comments (1)

AXRP Episode 25 - Cooperative AI with Caspar Oesterheld
DanielFilan · 2023-10-03T21:50:07.552Z · comments (0)

[link] Jacob on the Precipice
Richard_Ngo (ricraz) · 2023-09-26T21:16:39.590Z · comments (8)

[link] Why Georgism Lost Its Popularity
Zero Contradictions · 2024-07-20T15:08:41.469Z · comments (50)

The Next ChatGPT Moment: AI Avatars
kolmplex (luke-man) · 2024-01-05T20:14:10.074Z · comments (10)

New Executive Team & Board — PIBBSS
Nora_Ammann · 2024-07-01T19:30:45.261Z · comments (1)

[question] Does reducing the amount of RL for a given capability level make AI safer?
Chris_Leong · 2024-05-05T17:04:01.799Z · answers+comments (22)

Why does generalization work?
Martín Soto (martinsq) · 2024-02-20T17:51:10.424Z · comments (16)

My intellectual journey to (dis)solve the hard problem of consciousness
Charbel-Raphaël (charbel-raphael-segerie) · 2024-04-06T09:32:41.612Z · comments (41)

Ambiguity in Prediction Market Resolution is Still Harmful
aphyer · 2024-07-31T20:32:40.217Z · comments (17)

Is Chinese total factor productivity lower today than it was in 1956?
Ege Erdil (ege-erdil) · 2023-08-18T22:33:50.560Z · comments (0)

The Sinews of Sudan’s Latest War
Tim Liptrot (rockthecasbah) · 2023-08-04T18:17:27.860Z · comments (12)

Laying the Foundations for Vision and Multimodal Mechanistic Interpretability & Open Problems
Sonia Joseph (redhat) · 2024-03-13T17:09:17.027Z · comments (13)

Autonomous replication and adaptation: an attempt at a concrete danger threshold
Hjalmar_Wijk · 2023-08-17T01:31:10.554Z · comments (0)

Recreating the caring drive
Catnee (Dmitry Savishchev) · 2023-09-07T10:41:16.453Z · comments (14)

Sci-Fi books micro-reviews
Yair Halberstadt (yair-halberstadt) · 2024-06-24T09:49:28.523Z · comments (27)

[link] cancer rates after gene therapy
bhauth · 2024-10-16T15:32:53.949Z · comments (0)

[link] rapid growth
Chipmonk · 2024-06-05T00:43:51.501Z · comments (0)

Childhood and Education Roundup #4
Zvi · 2024-01-30T13:50:06.033Z · comments (10)

[link] An EPUB of Arbital's AI Alignment section
mesaoptimizer · 2023-10-16T19:36:29.109Z · comments (1)

How difficult is AI Alignment?
Sammy Martin (SDM) · 2024-09-13T15:47:10.799Z · comments (6)

Incidental polysemanticity
Victor Lecomte (victor-lecomte) · 2023-11-15T04:00:00.000Z · comments (7)

2023 LessWrong Community Census, Request for Comments
Screwtape · 2023-11-01T16:32:19.102Z · comments (37)

Understanding Positional Features in Layer 0 SAEs
bilalchughtai (beelal) · 2024-07-29T09:36:40.701Z · comments (0)

[link] How bad is chlorinated water?
bhauth · 2023-12-13T18:00:12.640Z · comments (18)

Concrete empirical research projects in mechanistic anomaly detection
Erik Jenner (ejenner) · 2024-04-03T23:07:21.502Z · comments (3)

The Case for Predictive Models
Rubi J. Hudson (Rubi) · 2024-04-03T18:22:20.243Z · comments (7)

NYT is suing OpenAI&Microsoft for alleged copyright infringement; some quick thoughts
Mikhail Samin (mikhail-samin) · 2023-12-27T18:44:33.976Z · comments (17)

Debate: Get a college degree?
Ben Pace (Benito) · 2024-08-12T22:23:34.744Z · comments (14)

Taking responsibility and partial derivatives
Ruby · 2023-12-31T04:33:51.419Z · comments (1)

[question] What rationality failure modes are there?
Ulisse Mini (ulisse-mini) · 2024-01-19T09:12:57.924Z · answers+comments (11)

How toy models of ontology changes can be misleading
Stuart_Armstrong · 2023-10-21T21:13:56.384Z · comments (0)

Navigating emotions in an uncertain & confusing world
Akash (akash-wasil) · 2023-11-20T18:16:09.492Z · comments (1)

Housing Roundup #7
Zvi · 2024-03-04T15:00:08.192Z · comments (1)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

lukas-finnveden on Sabotage Evaluations for Frontier Models

There's at least two different senses in which "control" can "fail" for a powerful system:

Control evaluations can indicate that there's no way to deploy the system such that you both (i) get a lot of use out of it, and (ii) can get a low probability of catastrophe.
Control evaluations are undermined such that humans think that the model can be deployed safely, but actually the humans were misled and there's a high probability of catastrophe.

My impression is that Ryan & Buck typically talks about the first case. (E.g. in the link above.) I.e.: My guess would be that they're not saying that well-designed control evaluations become untrustworthy — just that they'll stop promising you safety.

But to be clear: In this question, you're asking about something more analogous to the second case, right? (Sabotage/sandbagging evaluations being misleading about models' actual capabilities at sabotage & sandbagging?)

My question posed in other words: Would you count "evaluations clearly say that models can sabotage & sandbag" as success or failure?

yoav-ravid on Overcoming Bias Anthology

Typo: It's Prediction Markets "Fail" To *Mooch (not Moloch)

skybluecat on Bitter lessons about lucid dreaming

Don't know if this counts but I sort of can affect and notice dreams without being really lucid in the sense of clearly knowing it's a dream. It feels more like I somehow believe everything is real but I'm having superpowers (like becoming a superhero), and I would use the powers in ways that make sense in the dream setting, instead of being my waking self and consciously choosing what I want to dream of next. As a kid, I noticed I could often fly when chased by enemies in my dreams, and later I could do more kinds of things in my dreams just by willing it, perhaps as a result of consuming too many scifi or fantasy books and games. And I noticed some recurrent patterns in my dreams, like places that don't exist in real life but dreaming-me believe to be my school or hometown. Sometimes I get a strange sense of "I dreamed of this before" when I somehow feel like I have had the same or similar dreams as I'm having now, but without really realizing that I'm dreaming or remembering who I am in waking life. Then I subconsciously know I can do these things, or can focus on seeing and memorizing more of the dream world (if it was interesting) so I can write it down after waking up.

david-johnston on A brief theory of why we think things are good or bad

I think precisely defining "good" and "bad" is a bit beside the point - it's a theory about how people come to believe things are good and bad, and we're perfectly capable of having vague beliefs about goodness and badness. That said, the theory is lacking a precise account of what kind of beliefs it is meant to explain.

The LLM section isn't meant as support for the theory, but speculation about what it would say about the status of "experiences" that language models can have. Compared to my pre-existing notions, the theory seems quite willing to accommodate LLMs having good and bad experiences on par with those that people have.

directedevolution on Alexander Gietelink Oldenziel's Shortform

Sunglasses aren’t cool. They just tint the allure the wearer already has.

johnswentworth on Slightly More Than You Wanted To Know: Pregnancy Length Effects

Huh. Sounds like longer gestation is straightforwardly pretty good for the baby, but is then counterbalanced by difficult birth if it gets too big, which would mean the baby-in-a-bag thing could actually yield some substantial benefits over a normal pregnancy. Like, you could maybe leave the baby in there a few extra weeks and it would be great.

seth-herd on A brief theory of why we think things are good or bad

You just have to explain what you mean by "good" or "bad"'. They're very vague terms. Often they mean "things I like" or "things I and those I like like" . "Those I like" could be anywhere between one other person I like a little, and every thing that can think even a little getting equal weight to myself. People mean all of those things by "good". You can guess by context what someone might mean, but if you want to have a clear discussion, it's best to specify.

As for whether those other information-processing systems (like LLMs and bugs) really have opinions about what is good and bad for them in the same rich way humans seem to, that is a separate question.

skybluecat on I'm consistently overwhelmed by basic obligations. Are there any paradigm shifts or other rationality-based tips that would be helpful?

Wow I just saw this on the frontpage and thought I sometimes feel like this too, although about slightly different things and without that much heart-racing. I'm late and there are already many good answers, but here are my extreme and possibly horrible lifehacks for when I'm struggling/feeling lazy during the pandemic:

tldr: Like others said, get away with less chores.

I haven't ironed or folded clothes since like forever. (If you really cares about that, maybe find clothes that look OK without ironing, idk). I don't go out or exert myself that much, don't change clothes that often if I don't want to, and bought plenty of similar clothes (online, in bulk or used) so I can let laundry pile up more and do more at a time (assuming I can use a washing machine; if you don't have one or can't easily access - say the laundromat is too far from your home, there are other tricks for hand washing). Maybe unethical tip: most people don't need to shower that often either(just be less self-conscious unless someone important in your life minds it) and can use dry shampoo(or body powder/corn starch), alcohol and wet wiping etc to delay the need of showering and shampooing, in case these tasks are unappealing to you; also silk clothes are known to absorb oil from the skin better and can last longer before having to be cleaned - you can get lots of used silk shirts for cheap, and they are comfortable too if you have sensory issues or hate static.

I rarely need to "do dishes" as I cook for myself and can take shortcuts/lower standards. You could use disposable plastic cutlery and paper dishes, but I don't like them. Instead I use quality nonstick pots and pans (imo important!) that just need a gentle wipe, and 1-2 microwave-safe containers if necessary, and either cook one-pot meals, or batch cook things like stews(or order family-sized delivery for savings, if it's something I like and can't easily cook myself) and portion and freeze them to be reheated on the plate in a microwave or added to the pot. You can cook starch(like rice/pasta) and veggies and add seasoned raw protein or frozen stuff on top, in a pot (or Instant Pot, or a large ceramic bowl in the microwave), so no need to wash separate containers. If rinsing just 1 pot and 1 bowl/plate per meal is too much or if you can't rinse immediately after the meal, you can just wipe the pot with a paper towel (with a bit of water if you must), and store used plates in a separate plastic bin (not in the sink itself) and soak/wash them in a batch, without making the sink unusable for other tasks.

Like others said, most other chores can be simplified or automated (like robot vacuum or rearranging storage so things are dropped at appropriate places more naturally), or at least you can get away with doing less. Dust build-up can be reduced by air purifier (especially one that's next to the window, and it's good for you too), and most people don't really need to wipe most surfaces that often. The only time I really need to make my bed is after changing sheets. You didn't mention storage/organizing, but I struggled a lot with it. YMMV because storage needs and habits vary wildly. Some are just minimalist and will never want to have all that stuff I keep. Some prefer drawers so they don't have to see the stuff. I prefer (large, metal) shelves/racks and other open storage, and open boxes/bins organized by category (OK, mostly) on top of that, so I can maximize the amount of stuff stored for the amount of visual clutter while keeping things easy to access and put back(important imo, if it's not easy I might as well not have the item/storage).

I can't say much about food shopping as tastes and environments are different, and I personally don't mind shopping that much, but having a list of favorites/repeated purchases (especially for online shopping), ordering larger amounts and batch cooking (and/or freezing) may help. If cooking larger batches seem difficult, maybe choose ingredients that need less preprocessing (like veggies that need less/no peeling), and an Instant Pot and/or hotplate/electric griddle with digital temperature controls can help take out the skill/guess factor. Canned/non-perishable goods are helpful too. Just be careful about nutrients like protein and fiber if you decide to make changes to your diet for convenience; also you can freeze almost anything - frozen fruits/veggies are great and better than sad refrigerated leftovers imo.

As for having to work, I'm sorry but I don't have better ideas other than choosing a job/environment that suits you more, ideally one that is fun and meaningful to you so it doesn't feel like it's taking time away from your life that could be spent doing more meaningful things, or at least one that is not unpleasant and leaves you plenty of time and energy to do other things you like. And be sure to have fun and avoid burnout no matter how meaningful your career/cause is.

gavriel-kleinwaks on If far-UV is so great, why isn't it everywhere?

My description was a pretty quick gloss, but yep, the government is large and I know partners have been inquiring with various offices. Getting money is always going to be a problem. Honestly part of it is, let's say it takes three years to get funding for [something you care about], it's not actually that long in government timelines but it feels like forever when you work at a small organization or company and your work revolves around that particular thing.

gavriel-kleinwaks on If far-UV is so great, why isn't it everywhere?

(Let me know if I misunderstood; I'm reading your second sentence as "why aren't the companies...") On company size: The industry is split between emitter companies and consumer product companies; the emitter companies sell the far-UV emitter (basically the lightbulb) to a different company that builds the housing for consumers. The emitter companies are usually a branch of a larger electronics/lighting company; the consumer product companies are usually very small.

Some companies have run their own studies, but most of their installations are much too small to be studies in themselves. One problem I've heard about in the case of at least one larger installation is that the customer who sought the installation wanted the data to remain confidential. Otherwise, large studies are indeed mostly too costly for these companies to self-fund entirely, but they may offer partial funding or provide their lamps at-cost or as donations to studies.