LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Circular Reasoning
abramdemski · 2024-08-05T18:10:32.736Z · comments (36)

Announcing Neuronpedia: Platform for accelerating research into Sparse Autoencoders
Johnny Lin (hijohnnylin) · 2024-03-25T21:17:58.421Z · comments (7)

Meaning & Agency
abramdemski · 2023-12-19T22:27:32.123Z · comments (17)

Just admit that you’ve zoned out
joec · 2024-06-04T02:51:27.594Z · comments (22)

Catastrophic sabotage as a major threat model for human-level AI systems
evhub · 2024-10-22T20:57:11.395Z · comments (8)

New page: Integrity
Zach Stein-Perlman · 2024-07-10T15:00:41.050Z · comments (3)

[link] Introducing METR's Autonomy Evaluation Resources
Megan Kinniment (megan-kinniment) · 2024-03-15T23:16:59.696Z · comments (0)

LLMs Look Increasingly Like General Reasoners
eggsyntax · 2024-11-08T23:47:28.886Z · comments (45)

Prediction Markets aren't Magic
SimonM · 2023-12-21T12:54:07.754Z · comments (29)

Towards Multimodal Interpretability: Learning Sparse Interpretable Features in Vision Transformers
hugofry · 2024-04-29T20:57:35.127Z · comments (8)

story-based decision-making
bhauth · 2024-02-07T02:35:27.286Z · comments (11)

Review: Conor Moreton's "Civilization & Cooperation"
Duncan Sabien (Deactivated) (Duncan_Sabien) · 2024-05-26T19:32:43.131Z · comments (8)

AI #73: Openly Evil AI
Zvi · 2024-07-18T14:40:05.770Z · comments (20)

Partial value takeover without world takeover
KatjaGrace · 2024-04-05T06:20:03.961Z · comments (23)

Stagewise Development in Neural Networks
Jesse Hoogland (jhoogland) · 2024-03-20T19:54:06.181Z · comments (1)

Anvil Problems
Screwtape · 2024-11-13T22:57:41.974Z · comments (12)

Based Beff Jezos and the Accelerationists
Zvi · 2023-12-06T16:00:08.380Z · comments (29)

Three Notions of "Power"
johnswentworth · 2024-10-30T06:10:08.326Z · comments (43)

[link] New report: Safety Cases for AI
joshc (joshua-clymer) · 2024-03-20T16:45:27.984Z · comments (14)

[Intuitive self-models] 1. Preliminaries
Steven Byrnes (steve2152) · 2024-09-19T13:45:27.976Z · comments (20)

A very strange probability paradox
notfnofn · 2024-11-22T14:01:36.587Z · comments (25)

[link] Debating with More Persuasive LLMs Leads to More Truthful Answers
Akbir Khan (akbir-khan) · 2024-02-07T21:28:10.694Z · comments (14)

Singular learning theory: exercises
Zach Furman (zfurman) · 2024-08-30T20:00:03.785Z · comments (5)

Teaching CS During Take-Off
andrew carle (andrew-carle) · 2024-05-14T22:45:39.447Z · comments (13)

On the abolition of man
Joe Carlsmith (joekc) · 2024-01-18T18:17:06.201Z · comments (18)

Covert Malicious Finetuning
Tony Wang (tw) · 2024-07-02T02:41:51.698Z · comments (4)

[link] More Hyphenation
Arjun Panickssery (arjun-panickssery) · 2024-02-07T19:43:29.086Z · comments (19)

We might be missing some key feature of AI takeoff; it'll probably seem like "we could've seen this coming"
Lukas_Gloor · 2024-05-09T15:43:11.490Z · comments (36)

[link] Detecting Genetically Engineered Viruses With Metagenomic Sequencing
jefftk (jkaufman) · 2024-06-27T14:01:34.868Z · comments (10)

Research update: Towards a Law of Iterated Expectations for Heuristic Estimators
Eric Neyman (UnexpectedValues) · 2024-10-07T19:29:29.033Z · comments (2)

[link] Re: Anthropic's suggested SB-1047 amendments
RobertM (T3t) · 2024-07-27T22:32:39.447Z · comments (13)

Natural Latents: The Concepts
johnswentworth · 2024-03-20T18:21:19.878Z · comments (18)

I'm a bit skeptical of AlphaFold 3
Oleg Trott (oleg-trott) · 2024-06-25T00:04:41.274Z · comments (14)

How well do truth probes generalise?
mishajw · 2024-02-24T14:12:19.729Z · comments (11)

[link] Self-Help Corner: Loop Detection
adamShimi · 2024-10-02T08:33:23.487Z · comments (6)

Solving adversarial attacks in computer vision as a baby version of general AI alignment
Stanislav Fort (stanislavfort) · 2024-08-29T17:17:47.136Z · comments (8)

OpenAI: Helen Toner Speaks
Zvi · 2024-05-30T21:10:02.938Z · comments (8)

Addressing Feature Suppression in SAEs
Benjamin Wright (Benw8888) · 2024-02-16T18:32:51.927Z · comments (4)

A Crisper Explanation of Simulacrum Levels
Thane Ruthenis · 2023-12-23T22:13:52.286Z · comments (13)

[Valence series] 2. Valence & Normativity
Steven Byrnes (steve2152) · 2023-12-07T16:43:49.919Z · comments (5)

GPT-o1
Zvi · 2024-09-16T13:40:06.236Z · comments (34)

Apply to be a Safety Engineer at Lockheed Martin!
yanni kyriacos (yanni) · 2024-03-31T21:02:08.499Z · comments (3)

There is a globe in your LLM
jacob_drori (jacobcd52) · 2024-10-08T00:43:40.300Z · comments (4)

The Aspiring Rationalist Congregation
maia · 2024-01-10T22:52:54.298Z · comments (23)

Reflections on Less Online
Error · 2024-07-07T03:49:44.534Z · comments (15)

5 homegrown EA projects, seeking small donors
Austin Chen (austin-chen) · 2024-10-28T23:24:25.745Z · comments (4)

Rejecting Television
Declan Molony (declan-molony) · 2024-04-23T04:59:50.253Z · comments (10)

[link] Anxiety vs. Depression
Sable · 2024-03-17T00:15:08.255Z · comments (35)

A simple case for extreme inner misalignment
Richard_Ngo (ricraz) · 2024-07-13T15:40:37.518Z · comments (41)

Scalable oversight as a quantitative rather than qualitative problem
Buck · 2024-07-06T17:42:41.325Z · comments (11)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

morphism on Pi Rogers's Shortform

People often say things like "do x. Your future self will thank you." But I've found that I very rarely actually thank my past self, after x has been done, and I've reaped the benefits of x.

This quick take is a preregistration: For the next month I will thank my past self more, when I reap the benefits of a sacrifice of their immediate utility.

e.g. When I'm stuck in bed because the activation energy to leave is too high, and then I overcome that and go for a run and then feel a lot more energized, I'll look back and say "Thanks 7 am Morphism!"

(I already do this sometimes, but I will now make a TAP [? · GW] out of it, which will probably cause me to do it more often.)

Then I will make a full post describing in detail what I did and what (if anything) changed about my ability to sacrifice short-term gains for greater long-term gains, along with plausible theories w/ probabilities on the causal connection (or lack thereof), as well as a list of potential confounders.

Of course, it is possible that I completely fail to even install the TAP. I don't think that's very likely, because I'm #1-prioritizing my own emotional well-being right now (I'll shift focus back onto my world-saving pursuits once I'm more stablely not depressed). In that case I will not write a full post because the experiment would have not even been done. I will instead just make a comment on this shortform to that effect.

davekasten on (The) Lightcone is nothing without its people: LW + Lighthaven's big fundraiser

Also, I would generally volunteer to help with selling Lighthaven as an event venue to boring consultant things that will give you piles of money, and IIRC Patrick Ward is interested in this as well, so please let us know how we can help.

davekasten on (The) Lightcone is nothing without its people: LW + Lighthaven's big fundraiser

I am excited for this grounds of "we deserve to have nice things," though for boring financial planning reasons I am not sure whether I will donate additional funds prior to calendar year end or in calendar year 2025.

(Note that I made a similar statement in the past and then donated $100 to Lighthaven very shortly thereafter, so, like, don't attempt to reverse-engineer my financial status from this or whatever.)

david-matolcsi on (The) Lightcone is nothing without its people: LW + Lighthaven's big fundraiser

I'm considering donating. Can you give us a little more information on breakdown of the costs? What are typical large expenses that the 1.6 million upkeep of Lighthaven consists of? Is this a usual cost for a similar sized event space, or is something about the location or the specialness of the place that makes it more expensive?

How much money does running LW cost? The post says it's >1M, which somewhat surprised me, but I have no idea what's the usual cost of running such a site is. Is the cost mostly server hosting or salaries for content moderation or salaries for software development or something I haven't thought of?

q-home on Making a conservative case for alignment

Agree that neopronouns are dumb. Wikipedia says they're used by 4% LGBTQ people and criticized both within and outside the community.

But for people struggling with normal pronouns (he/she/they), I have the following thoughts:

Contorting language to avoid words associated with beliefs... is not easier than using the words. Don't project beliefs onto words too hard.
Contorting language to avoid words associated with beliefs... is still a violation of free speech (if we have such a strong notion of free speech). So what is the motivation to propose that? It's a bit like a dog in the manger. "I'd rather cripple myself than help you, let's suffer together".
Don't maximize free speech (in a negligible way) while ignoring every other human value.
In an imperfect society, truly passive tolerance (tolerance which doesn't require any words/actions) is impossible. For example, in a perfect society, if my school has bigoted teachers, it immediately gets outcompeted by a non-bigoted school. In an imperfect society it might not happen. So we get enforceable norms.

Employees get paid, which kinda automatically reduces their free speech, because saying the wrong words can make them stop getting paid. (...) Employment is really a different situation. You get laws, and recommendations of your legal department; there is not much anyone can do about that.

I'm not familiar with your model of free speech (i.e. how you imagine free speech working if laws and power balances were optimal). People who value free speech usually believe that free speech should have power above money and property, to a reasonable degree. What's "reasonable" is the crux.

I think in situations where people work together on something unrelated to their beliefs, prohibiting to enforce a code of conduct is unreasonable. Because respect is crucial for the work environment and protecting marginalized groups. I assume people who propose to "call everyone they" or "call everyone by proper name" realize some of that.

If I let people use my house as a school, but find out that a teacher openly doesn't respect minority students (by rejecting to do the smallest thing for them), I'm justified to not let the teacher into my house.

I do not talk about people's past for no good reason, and definitely not just to annoy someone else. But if I have a good reason to point out that someone did something in the past, and the only way to do that is to reveal their previous name, then I don't care about the taboo.

I just think "disliking deadnaming under most circumstances = magical thinking, like calling Italy Rome" was a very strong, barely argued/explained opinion. In tandem with mentioning delusion (Napoleon) and hysteria. If you want to write something insulting, maybe bother to clarify your opinions a little bit more? Like you did in our conversation.

vladimir_nesov on INTELLECT-1 Release: The First Globally Trained 10B Parameter Model

This is DiLoCo (Nov 2023 paper), a local SGD setup where the outer optimizer updates much more rarely (every 100-500 steps of the inner optimizers), asking for much less bandwidth (it uses Nesterov momentum in its state). The inner optimizers run within individual clusters, and the outer optimizer aggregates updates from individual clusters, using a much slower network that connects the clusters. The experiments were done with models of up to 400M parameters. (See also this paper on asynchronous variants of DiLoCo.)

The original paper lacks good compute efficiency measurements. The distributed training experiments start from a checkpoint trained for 24K steps, continuing for 64K more steps (to a total of 88K) in various distributed configurations. Even for the non-distributed configuration the perplexity keeps increasing to step 29K (Figure 7b, Figure 9). The compute expended in a non-distributed run between steps 24K and 88K gets expended in an 8-cluster run between steps 24K and 32K, when perplexity barely starts going down from the global maximum. So there is no way of comparing how well an 8-cluster run uses its compute, because the non-distributed experiment stops so early (at 88K steps) that the uninformative poorly optimized early state of the model still dominates the distributed configuration that uses the same amount of compute (at 32K steps).

Prime Intellect first reproduced DiLoCo in Jul 2024 (blog post, paper) on models of up to 1.1B parameters, taking training efficiency measurements. The largest experiment with a 1.1B model runs across 4 nodes that communicate only every 125 steps, and matches perplexity of a similar training run within a single cluster (with communication every step) using 20% more compute (Figure 7, comparing with 4x batch size baseline).

The new 10B model lacks baselines for comparison, so doesn't help with understanding how training efficiency depends on scale, but the results on benchmarks seem similar to those of other models with similar size and number of training tokens (Table 4 in the blog post).

gwern on China Hawks are Manufacturing an AI Arms Race

Missing the point. This is not about being too stupid to think of >0 strategies, this is about being able & willing to execute strategies.

I too can think of 100 things, and I listed several diverse ways of responding and threw in a historical parallel just in case that wasn't clear after several paragraphs of discussing the problem with not having a viable strategy you can execute. Smartness is not the limit here: we are already smart enough to come up with strategies which could achieve the goal. All of those could potentially work. But none of them seem realistically on the table as something that the USA as it currently exists would be willing to commit to and see through to completion, and you will note that few critics - and no one serious - is responding something like, "oh sure, all part of the plan already, see our white paper laying out the roadmap: after we win, we would then order the AGIs to hack the planet and ensure our perpetual hegemony; that is indeed the exit plan. We botched it last time with nukes and stood by and let everyone else get nukes, but we'll follow through this time."

There is no difference between "won't execute a strategy" and "can't execute a strategy": they are the same thing. The point is that a strategy has to be executable or else it's not an actual strategy. And acting as if you can execute a strategy that you won't can lead you to take terrible decisions. You are like the cat who thinks before climbing a tree: "obviously, I will just climb back down", and who then proceeds climb up and to not climb back down but mew piteously. Well, maybe you shouldn't've climbed up in the first place then...?

("arms race bros will srsly launch a global arms race by saying they'll use the decisive advantage from winning the arms race to conquer the world, and then will not conquer the world")

dagon on Is the mind a program?

Hmm, still not following, or maybe not agreeing. I think that "if the reasoning used to solve the problem is philosophical" then "correct solution" is not available. "useful", "consensus", or "applicable in current societal context" might be better evaluations of a philosophical reasoning.

sweenesm on Understanding Emergence in Large Language Models

Thanks for the post. I think it'd be helpful if you could add some links to references for some of the things you say, such as:

For instance, between 10^10 and 10^11 parameters, models showed dramatic improvements in their ability to interpret emoji sequences representing movies.

joseph-miller on Joseph Miller's Shortform

There are two types of people in this world.

There are people who treat the lock on a public bathroom as a tool for communicating occupancy and a safeguard against accidental attempts to enter when the room is unavailable. For these people the standard protocol is to discern the likely state of engagement of the inner room and then tentatively proceed inside if they detect no signs of human activity.

And there are people who view the lock on a public bathroom as a physical barricade with which to temporarily defend possessed territory. They start by giving the door a hearty push to test the tensile strength of the barrier. On meeting resistance they engage with full force, wringing the handle up and down and slamming into the door with their full body weight. Only once their attempts are thwarted do they reluctantly retreat to find another stall.