LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] AI Safety at the Frontier: Paper Highlights, August '24
gasteigerjo · 2024-09-03T19:17:24.850Z · comments (0)

5 ways to improve CoT faithfulness
CBiddulph (caleb-biddulph) · 2024-10-05T20:17:12.637Z · comments (8)

Towards Quantitative AI Risk Management
Henry Papadatos (henry) · 2024-10-16T19:26:48.817Z · comments (1)

[link] Arithmetic Models: Better Than You Think
kqr · 2024-10-26T09:42:07.185Z · comments (4)

[link] New blog: Expedition to the Far Lands
Connor Leahy (NPCollapse) · 2024-08-17T11:07:48.537Z · comments (3)

Trading Candy
jefftk (jkaufman) · 2024-11-01T01:10:08.024Z · comments (4)

Distinguishing ways AI can be "concentrated"
Matthew Barnett (matthew-barnett) · 2024-10-21T22:21:13.666Z · comments (2)

[link] Why Recursion Pharmaceuticals abandoned cell painting for brightfield imaging
Abhishaike Mahajan (abhishaike-mahajan) · 2024-11-05T14:51:41.310Z · comments (1)

[link] Predicting Influenza Abundance in Wastewater Metagenomic Sequencing Data
jefftk (jkaufman) · 2024-09-23T17:25:58.380Z · comments (0)

[link] Generic advice caveats
Saul Munn (saul-munn) · 2024-10-30T21:03:07.185Z · comments (1)

There aren't enough smart people in biology doing something boring
Abhishaike Mahajan (abhishaike-mahajan) · 2024-10-21T15:52:04.482Z · comments (13)

[question] What prevents SB-1047 from triggering on deep fake porn/voice cloning fraud?
ChristianKl · 2024-09-26T09:17:39.088Z · answers+comments (21)

Domain-specific SAEs
jacob_drori (jacobcd52) · 2024-10-07T20:15:38.584Z · comments (0)

European Progress Conference
Martin Sustrik (sustrik) · 2024-10-06T11:10:03.819Z · comments (11)

Interpretability of SAE Features Representing Check in ChessGPT
Jonathan Kutasov (jonathan-kutasov) · 2024-10-05T20:43:36.679Z · comments (2)

[link] Evaluating Synthetic Activations composed of SAE Latents in GPT-2
Giorgi Giglemiani (Rakh) · 2024-09-25T20:37:48.227Z · comments (0)

[question] Any real toeholds for making practical decisions regarding AI safety?
lukehmiles (lcmgcd) · 2024-09-29T12:03:08.084Z · answers+comments (6)

Superintelligence Can't Solve the Problem of Deciding What You'll Do
Vladimir_Nesov · 2024-09-15T21:03:28.077Z · comments (11)

[link] If-Then Commitments for AI Risk Reduction [by Holden Karnofsky]
habryka (habryka4) · 2024-09-13T19:38:53.194Z · comments (0)

An AI crash is our best bet for restricting AI
Remmelt (remmelt-ellen) · 2024-10-11T02:12:03.491Z · comments (1)

Investigating Sensitive Directions in GPT-2: An Improved Baseline and Comparative Analysis of SAEs
Daniel Lee (daniel-lee) · 2024-09-06T02:28:41.954Z · comments (0)

Why is there Nothing rather than Something?
Logan Zoellner (logan-zoellner) · 2024-10-26T12:37:50.204Z · comments (3)

[link] Care Doesn't Scale
stavros · 2024-10-28T11:57:38.742Z · comments (1)

[question] Seeking AI Alignment Tutor/Advisor: $100–150/hr
MrThink (ViktorThink) · 2024-10-05T21:28:16.491Z · answers+comments (3)

LessWrong email subscriptions?
Raemon · 2024-08-27T21:59:56.855Z · comments (6)

[link] Can a Bayesian Oracle Prevent Harm from an Agent? (Bengio et al. 2024)
mattmacdermott · 2024-09-01T07:46:26.647Z · comments (0)

[link] Active Recall and Spaced Repetition are Different Things
Saul Munn (saul-munn) · 2024-11-08T20:14:56.092Z · comments (0)

Sleeping on Stage
jefftk (jkaufman) · 2024-10-22T00:50:07.994Z · comments (3)

The causal backbone conjecture
tailcalled · 2024-08-17T18:50:14.577Z · comments (0)

Do Sparse Autoencoders (SAEs) transfer across base and finetuned language models?
Taras Kutsyk · 2024-09-29T19:37:30.465Z · comments (7)

SAE features for refusal and sycophancy steering vectors
neverix · 2024-10-12T14:54:48.022Z · comments (4)

Just because an LLM said it doesn't mean it's true: an illustrative example
dirk (abandon) · 2024-08-21T21:05:59.691Z · comments (12)

[link] SB 1047 gets vetoed
ryan_b · 2024-09-30T15:49:38.609Z · comments (1)

Trying to be rational for the wrong reasons
Viliam · 2024-08-20T16:18:06.385Z · comments (8)

Standard SAEs Might Be Incoherent: A Choosing Problem & A “Concise” Solution
Kola Ayonrinde (kola-ayonrinde) · 2024-10-30T22:50:45.642Z · comments (0)

[link] UK AISI: Early lessons from evaluating frontier AI systems
Zach Stein-Perlman · 2024-10-25T19:00:21.689Z · comments (0)

[link] A primer on the next generation of antibodies
Abhishaike Mahajan (abhishaike-mahajan) · 2024-09-01T22:37:59.207Z · comments (0)

Fun With The Tabula Muris (Senis)
sarahconstantin · 2024-09-20T18:20:01.901Z · comments (0)

[link] Death notes - 7 thoughts on death
Nathan Young · 2024-10-28T15:01:13.532Z · comments (1)

The case for more Alignment Target Analysis (ATA)
Chi Nguyen · 2024-09-20T01:14:41.411Z · comments (13)

[link] Beware the science fiction bias in predictions of the future
Nikita Sokolsky (nikita-sokolsky) · 2024-08-19T05:32:47.372Z · comments (20)

[link] Fictional parasites very different from our own
Abhishaike Mahajan (abhishaike-mahajan) · 2024-09-08T14:59:39.080Z · comments (0)

the Daydication technique
chaosmage · 2024-10-18T21:47:46.448Z · comments (0)

[link] Introduction to Super Powers (for kids!)
Shoshannah Tekofsky (DarkSym) · 2024-09-20T17:17:27.070Z · comments (0)

[link] Conventional footnotes considered harmful
dkl9 · 2024-10-01T14:54:01.732Z · comments (16)

[question] When engaging with a large amount of resources during a literature review, how do you prevent yourself from becoming overwhelmed?
corruptedCatapillar · 2024-11-01T07:29:49.262Z · answers+comments (2)

A Triple Decker for Elfland
jefftk (jkaufman) · 2024-10-11T01:50:02.332Z · comments (0)

[link] what becoming more secure did for me
Chipmonk · 2024-08-22T17:44:48.525Z · comments (5)

[question] When can I be numerate?
FinalFormal2 · 2024-09-12T04:05:27.710Z · answers+comments (3)

AXRP Episode 36 - Adam Shai and Paul Riechers on Computational Mechanics
DanielFilan · 2024-09-29T05:50:02.531Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

stephen-fowler on quila's Shortform

I upvoted because I imagine more people reading this would slightly nudge group norms in a direction that is positive.

But being cynical:

I'm sure you believe that this is true, but I doubt that it is literally true.
Signalling this position is very low risk when the community is already on board.
Trying to do good may be insufficient if your work on alignment ends up being dual use.

connor-kissane on SAEs are highly dataset dependent: a case study on the refusal direction

Thanks! I'm not sure. My guess is that if you go super narrow, it may be more likely to result in an inconvenient level of "feature splitting". Since there are only a few total concepts to learn, an SAE of equivalent width might exploit its greater relative capacity to learn niche combinations of features (to reduce sparsity loss).

stephen-fowler on Daniel Kokotajlo's Shortform

My reply definitely missed that you were talking about tunnel densities beyond what has been historically seen.

I'm inclined to agree with your argument that there is a phase shift, but it seems like it is less to do the fact that there are tunnels, and more to do with the geography becoming less tunnel-like and more open.

I have a couple thoughts on your model that aren't direct refutations of anything you've said here:

I think the single term "density" is a too crude of a measure to get a good predictive model of how combat would play out. I'd expect there to be many parameters that describe a tunnel system and have a direct tactical impact. From your discussion of mines, I think "density" is referring to the number of edges in the network? I'd expect tunnel width, geometric layout etc would change how either side behaves.
I'm not sure about your background, but with zero hours of military combat under my belt, I doubt I can predict how modern subterranean combat plays out in tunnel systems with architectures that are beyond anything seen before in history.

l-rudolf-l on Survival without dignity

You have restored my faith in LessWrong! I was getting worried that despite 200+ karma and 20+ comments, no one had actually nitpicked the descriptions of what actually happens.

The zaps of light are diffraction limited.

In practice, if you want the atmospheric nanobots to zap stuff, you'll need to do some complicated mirroring because you need to divert sunlight. And it's not one contiguous mirror but lots of small ones. But I think we can still model this as basic diffraction with some circular mirror / lens.

Intensity , where $E$ is the total power of sunlight falling on the mirror disk, $r$ is the radius of the Airy disk, and $c_{e}$ is an efficiency constant I've thrown in (because of things like atmospheric absorption (Claude says, somewhat surprisingly, this shouldn't be ridiculuously large), and not all the energy in the diffraction pattern being in the Airy disk (about 84% is, says Claude), etc.)

Now, $E = π {(\frac{D}{2})}^{2} L$ , where $D$ is the diameter of the mirror configuration, $L$ is the solar irradiance. And $r = θ l$ , where $l$ is the focal length (distance from mirror to target), and $θ \approx 1.22 λ / D$ the angular size of the central spot.

So we have $I \approx \frac{c_{e} L D^{4}}{{1.22}^{2} \times 4 λ^{2} l^{2}}$ , so the required mirror configuration radius $D = \sqrt[4]{\frac{{1.22}^{2} \times 4 I λ^{2} l^{2}}{c_{e} L}}$ .

Plugging in some reasonable values like $λ \approx 5 \times 10^{- 7}$ m (average incoming sunlight - yes the concentration suffers a bit because it's not all this wavelength), $I = 10^{7}$ W/m^2 (the level of an industrial laser that can cut metal), $l = 10^{4}$ m (lower stratosphere), $L = 1361$ W/m^2 (solar irradiance), and a conservative guess that 99% of power is wasted so $c_{e} = 0.01$ , we get $D \approx 18$ m (and the resulting beam is about 3mm wide).

So a few dozen metres of upper atmosphere nanobots should actually give you a pretty ridiculous concentration of power!

(I did not know this when I wrote the story; I am quite surprised the required radius is this ridiculously tiny. But I had heard of the concept of a "weather machine" like this from the book Where is my flying car?, which I've reviewed here, which suggests that this is possible.)

Partly because it's hard to tell between an actual animal and a bunch of nanobots pretending to be an animal. So you can't zap the nanobots on the ground without making the ground uninhabitable for humans.

I don't really buy this, why is it obvious the nanobots could pretend to be an animal so well that it's indistinguishable? Or why would targeted zaps have bad side-effects?

The "California red tape" thing implies some alignment strategy that stuck the AI to obey the law, and didn't go too insanely wrong despite a superintelligence looking for loopholes

Yeah, successful alignment to legal compliance was established without any real justification halfway through. (How to do this is currently an open technical problem, which, alas, I did not manage to solve for my satirical short story.)

Convince humans that dyson sphere are pretty and don't block the view?

This is a good point, especially since high levels of emotional manipulation was an established in-universe AI capability. (The issue described with the Dyson sphere was less that it itself would block the view, and more that building it would require dismantling the planets in a way that ruins the view - though now I'm realising that "if the sun on Earth is blocked, all Earthly views are gone" is a simpler reason and removes the need for building anything on the other planets at all.)

There is also no clear explanation of why someone somewhere doesn't make a non-red-taped AI.

Yep, this is a plot hole.

cstinesublime on Oxidize's Shortform

I can't speak for the community but after having glanced at your entire post I can't be sure just what it is about. The closest you come to explaining it is near the end you promise to present a "high-level theory on the functional realities" that seem to be related to everything from increased military spending to someone accidentally creating a virus in the lab that wipes out humanity to combating cognitive bias. But what is your theory?

Your post also makes a number of generalize assumptions about the reader and human nature and invokes the pronoun "we" far too many times. I'm a hypocrite for pointing that out, because I tend to do it as well - but the problem is that unless you have a very narrow audience in mind, especially a community that you are a native to and know intimately, often you run the risk of making assumptions or statements they will at best be confused by, and at worst will get defensive for being included with.

Most of your assumptions aren't backed up by specific examples, citations to research. For example, in your first sentence you say that we subconsciously optimize for there being no major societal changes precipitated by technology. You don't back this up. I would assume that part of the reason why there are gold- bugs, just proves there is a huge contingent of people who invest real money based precisely on the fact that they can't anticipate what major economic changes future technologies might bring. There are currently billions of dollars being spent by firms like Apple, Google, even JP Morgan Chase into A.I. assistants, in anticipation of a major change.

I could one by one go through all these general assumptions, but there are too many for it to be worth my while. Not only that, most of the footnotes you use don't make reference to any concepts or observations which are particularly new or alien. The pareto principle, Compound Effect, Rumsfeld's Epistemology... I would expect your average Lesswrong reader is very familiar with these, they present no new insights.

danwil on The Median Researcher Problem

If an outsider's objective is to be taken seriously, they should write papers and submit them to peer review (e.g. conferences and journals).

Yann LeCun has gone so far to say that independent work only counts as "science" if submitted to peer review:

"Without peer review and reproducibility, chances are your methodology was flawed and you fooled yourself into thinking you did something great." - https://x.com/ylecun/status/1795589846771147018?s=19.

From my experience, professors are very open to discuss ideas and their work with anyone who seems serious, interested, and knowledgeable. Even someone inside academia will face skepticism if their work uses completely different methods. They will have to very convincingly prove the methods are valid.

cstinesublime on Johannes C. Mayer's Shortform

I'm missing a key piece of context here - when you say "doing something good" are you referring to educational or research reading; or do you mean any type of personal project which may or may not involve background research?

I may have some practical observations about note-taking which may be relevant, if I understand the context.

startattheend on Does the "ancient wisdom" argument have any validity? If a particular teaching or tradition is old, to what extent does this make it more trustworthy?

That sounds about right. And "people sometimes feel that way" is a good explanation for the downvote in my opinion. I was arguing the object-level premises of the post because the "disagree" downvote was factually wrong, and this factual wrongness, I argue, is caused by a faulty understanding of how truth works, and this faulty understanding is most common in the western world and in educated people, and in the ideologies which correlate with western thought and academia.

If you disagree with something which is true, I think the only likely explanations are "Does not understand" and "Has a dislike of", and the bias I pointed out covers both of these possibilities (the former is a "map vs territory" issue and the latter is a "morality vs reality" issue).

I think you figured out what went wrong nicely, but in the end the disagreement remains. I still consider my point likely. If somebody comes along and tells me that they disagreed with it for other reasons, I might even argue that they're lying to themselves, as I'm way to disillusioned to think that a "will to truth" exists. I think social status, moral values and other such things are stronger motivators than people will admit even to themselves.

rob-lucas on Bigger Livers?

One reason is just that eating food is enjoyable. I limit the amount of food I eat to stay within a healthy range, but if I could increase that amount while staying healthy, I could enjoy that excess.

I think there are two aspects to the enjoyment of food. One is related to satiety. I enjoy the feeling of sating my appetite, and failing to sate it leaves me with te negative experience of craving food (negative if I don't satisfy those cravings.

But the other aspect is just the enjoyment of eating each individual bite of food. Not the separate enjoyment of sating my appetite, but just the experience of eating.*

When I was younger and much more physically active I ate very large amounts of food. I miss being able to do that. I'm just as sated now with the much smaller portions I eat, but eating a small breakfast instead of a large one is a different experience.

This probably doesn't justify some sort of risky intervention in increasing liver size. Food is enjoyable, but so are a lot of other things in life. But shifting to a higher protien diet seems like the kind of safe intervention, potentially even also healthier in other respects, that, if it has the side effect of being able to eat a little more food, could improve quality of life with minimal other costs. Potential costs I see are related to the price of protein relative to other sources of nutrition, the cost of additional food (if the point is being able to eat more, you've got spend money for that excess), and, depending on one's moral views, something related to the source of the protien being added.

*I think Kahneman's remembering vs. expereincing selves adds some confusion here as well. When we remember a meal we don't necessarily remember the enjoyment we got from every bite, but probably put more weight on the feeling of satiety and the peak experience (how good did it taste at its best?). But the experiencing self experiences every bite. How much you want to weight the remembering vs. experiencing self is a philosophical issue, but I just want to note that it comes up here.

lukehmiles on Should CA, TX, OK, and LA merge into a giant swing state, just for elections?

I wonder if anybody has tried to quantify how much it's worth to be a swing voter. I imagine if you are the government contractor up for renewal then it's worth quite a lot, but I wonder how much of the money/benefits the average Joe sees.

I don't know much about swing state benefits except that Milwaukee, Wisconsin got their lead pipes replaced by the fed and the workers were required to be local and they say they were paid quite well https://youtube.com/watch?v=4VpwgG0P8VU