LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Believing In
AnnaSalamon · 2024-02-08T07:06:13.072Z · comments (51)

[link] Explore More: A Bag of Tricks to Keep Your Life on the Rails
Shoshannah Tekofsky (DarkSym) · 2024-09-28T21:38:52.256Z · comments (13)

Refusal in LLMs is mediated by a single direction
Andy Arditi (andy-arditi) · 2024-04-27T11:13:06.235Z · comments (93)

[link] Introducing AI Lab Watch
Zach Stein-Perlman · 2024-04-30T17:00:12.652Z · comments (30)

SAE feature geometry is outside the superposition hypothesis
jake_mendel · 2024-06-24T16:07:14.604Z · comments (17)

AGI Safety and Alignment at Google DeepMind: A Summary of Recent Work
Rohin Shah (rohinmshah) · 2024-08-20T16:22:45.888Z · comments (33)

MIRI 2024 Mission and Strategy Update
Malo (malo) · 2024-01-05T00:20:54.169Z · comments (44)

Modern Transformers are AGI, and Human-Level
abramdemski · 2024-03-26T17:46:19.373Z · comments (88)

AI Control: Improving Safety Despite Intentional Subversion
Buck · 2023-12-13T15:51:35.982Z · comments (14)

The ‘strong’ feature hypothesis could be wrong
lewis smith (lsgos) · 2024-08-02T14:33:58.898Z · comments (17)

LLM Generality is a Timeline Crux
eggsyntax · 2024-06-24T12:52:07.704Z · comments (119)

Brute Force Manufactured Consensus is Hiding the Crime of the Century
Roko · 2024-02-03T20:36:59.806Z · comments (156)

CFAR Takeaways: Andrew Critch
Raemon · 2024-02-14T01:37:03.931Z · comments (62)

Superbabies: Putting The Pieces Together
sarahconstantin · 2024-07-11T20:40:05.036Z · comments (37)

"Slow" takeoff is a terrible term for "maybe even faster takeoff, actually"
Raemon · 2024-09-28T23:38:25.512Z · comments (69)

ChatGPT can learn indirect control
Raymond D · 2024-03-21T21:11:06.649Z · comments (27)

[link] "How could I have thought that faster?"
mesaoptimizer · 2024-03-11T10:56:17.884Z · comments (32)

Towards more cooperative AI safety strategies
Richard_Ngo (ricraz) · 2024-07-16T04:36:29.191Z · comments (133)

Mechanistically Eliciting Latent Behaviors in Language Models
Andrew Mack (andrew-mack) · 2024-04-30T18:51:13.493Z · comments (41)

[link] What TMS is like
Sable · 2024-10-31T00:44:22.612Z · comments (22)

OpenAI: Fallout
Zvi · 2024-05-28T13:20:04.325Z · comments (25)

Toward A Mathematical Framework for Computation in Superposition
Dmitry Vaintrob (dmitry-vaintrob) · 2024-01-18T21:06:57.040Z · comments (18)

The Sun is big, but superintelligences will not spare Earth a little sunlight
Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2024-09-23T03:39:16.243Z · comments (141)

[link] Jaan Tallinn's 2023 Philanthropy Overview
jaan · 2024-05-20T12:11:39.416Z · comments (5)

Funny Anecdote of Eliezer From His Sister
Noah Birnbaum (daniel-birnbaum) · 2024-04-22T22:05:31.886Z · comments (6)

Pay Risk Evaluators in Cash, Not Equity
Adam Scholl (adam_scholl) · 2024-09-07T02:37:59.659Z · comments (19)

Making a conservative case for alignment
Cameron Berg (cameron-berg) · 2024-11-15T18:55:40.864Z · comments (65)

Maybe Anthropic's Long-Term Benefit Trust is powerless
Zach Stein-Perlman · 2024-05-27T13:00:47.991Z · comments (21)

[link] Sam Altman’s Chip Ambitions Undercut OpenAI’s Safety Strategy
garrison · 2024-02-10T19:52:55.191Z · comments (52)

The Hopium Wars: the AGI Entente Delusion
Max Tegmark (MaxTegmark) · 2024-10-13T17:00:29.033Z · comments (55)

The impossible problem of due process
mingyuan · 2024-01-16T05:18:33.415Z · comments (64)

This might be the last AI Safety Camp
Remmelt (remmelt-ellen) · 2024-01-24T09:33:29.438Z · comments (34)

Thoughts on “AI is easy to control” by Pope & Belrose
Steven Byrnes (steve2152) · 2023-12-01T17:30:52.720Z · comments (62)

[question] Examples of Highly Counterfactual Discoveries?
johnswentworth · 2024-04-23T22:19:19.399Z · answers+comments (100)

Response to Aschenbrenner's "Situational Awareness"
Rob Bensinger (RobbBB) · 2024-06-06T22:57:11.737Z · comments (27)

Optimistic Assumptions, Longterm Planning, and "Cope"
Raemon · 2024-07-17T22:14:24.090Z · comments (46)

Self-Other Overlap: A Neglected Approach to AI Alignment
Marc Carauleanu (Marc-Everin Carauleanu) · 2024-07-30T16:22:29.561Z · comments (43)

How I Learned To Stop Trusting Prediction Markets and Love the Arbitrage
orthonormal · 2024-08-06T02:32:41.364Z · comments (26)

What's Going on With OpenAI's Messaging?
ozziegooen · 2024-05-21T02:22:04.171Z · comments (13)

[link] The Compendium, A full argument about extinction risk from AGI
adamShimi · 2024-10-31T12:01:51.714Z · comments (49)

My AI Model Delta Compared To Christiano
johnswentworth · 2024-06-12T18:19:44.768Z · comments (73)

You are not too "irrational" to know your preferences.
DaystarEld · 2024-11-26T15:01:42.996Z · comments (42)

Two easy things that maybe Just Work to improve AI discourse
jacobjacob · 2024-06-08T15:51:18.078Z · comments (35)

On Not Pulling The Ladder Up Behind You
Screwtape · 2024-04-26T21:58:29.455Z · comments (21)

My Interview With Cade Metz on His Reporting About Slate Star Codex
Zack_M_Davis · 2024-03-26T17:18:05.114Z · comments (187)

OMMC Announces RIP
Adam Scholl (adam_scholl) · 2024-04-01T23:20:00.433Z · comments (5)

A basic systems architecture for AI agents that do autonomous research
Buck · 2024-09-23T13:58:27.185Z · comments (15)

[link] Contra Ngo et al. “Every ‘Every Bay Area House Party’ Bay Area House Party”
Ricki Heicklen (bayesshammai) · 2024-02-22T23:56:02.318Z · comments (5)

Why Would Belief-States Have A Fractal Structure, And Why Would That Matter For Interpretability? An Explainer
johnswentworth · 2024-04-18T00:27:43.451Z · comments (21)

Cryonics is free
Mati_Roy (MathieuRoy) · 2024-09-29T17:58:17.108Z · comments (40)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

davekasten on (The) Lightcone is nothing without its people: LW + Lighthaven's big fundraiser

Also, I would generally volunteer to help with selling Lighthaven as an event venue to boring consultant things that will give you piles of money, and IIRC Patrick Ward is interested in this as well, so please let us know how we can help.

davekasten on (The) Lightcone is nothing without its people: LW + Lighthaven's big fundraiser

I am excited for this grounds of "we deserve to have nice things," though for boring financial planning reasons I am not sure whether I will donate additional funds prior to calendar year end or in calendar year 2025.

(Note that I made a similar statement in the past and then donated $100 to Lighthaven very shortly thereafter, so, like, don't attempt to reverse-engineer my financial status from this or whatever.)

david-matolcsi on (The) Lightcone is nothing without its people: LW + Lighthaven's big fundraiser

I'm considering donating. Can you give us a little more information on breakdown of the costs? What are typical large expenses that the 1.6 million upkeep of Lighthaven consists of? Is this a usual cost for a similar sized event space, or is something about the location or the specialness of the place that makes it more expensive?

How much money does running LW cost? The post says it's >1M, which somewhat surprised me, but I have no idea what's the usual cost of running such a site is. Is the cost mostly server hosting or salaries for content moderation or salaries for software development or something I haven't thought of?

q-home on Making a conservative case for alignment

Agree that neopronouns are dumb. Wikipedia says they're used by 4% LGBTQ people and criticized both within and outside the community.

But for people struggling with normal pronouns (he/she/they), I have the following thoughts:

Contorting language to avoid words associated with beliefs... is not easier than using the words. Don't project beliefs onto words too hard.
Contorting language to avoid words associated with beliefs... is still a violation of free speech (if we have such a strong notion of free speech). So what is the motivation to propose that? It's a bit like a dog in the manger. "I'd rather cripple myself than help you, let's suffer together".
Don't maximize free speech (in a negligible way) while ignoring every other human value.
In an imperfect society, truly passive tolerance (tolerance which doesn't require any words/actions) is impossible. For example, in a perfect society, if my school has bigoted teachers, it immediately gets outcompeted by a non-bigoted school. In an imperfect society it might not happen. So we get enforceable norms.

Employees get paid, which kinda automatically reduces their free speech, because saying the wrong words can make them stop getting paid. (...) Employment is really a different situation. You get laws, and recommendations of your legal department; there is not much anyone can do about that.

I'm not familiar with your model of free speech (i.e. how you imagine free speech working if laws and power balances were optimal). People who value free speech usually believe that free speech should have power above money and property, to a reasonable degree. What's "reasonable" is the crux.

I think in situations where people work together on something unrelated to their beliefs, prohibiting to enforce a code of conduct is unreasonable. Because respect is crucial for the work environment and protecting marginalized groups. I assume people who propose to "call everyone they" or "call everyone by proper name" realize some of that.

If I let people use my house as a school, but find out that a teacher openly doesn't respect minority students (by rejecting to do the smallest thing for them), I'm justified to not let the teacher into my house.

I do not talk about people's past for no good reason, and definitely not just to annoy someone else. But if I have a good reason to point out that someone did something in the past, and the only way to do that is to reveal their previous name, then I don't care about the taboo.

I just think "disliking deadnaming under most circumstances = magical thinking, like calling Italy Rome" was a very strong, barely argued/explained opinion. In tandem with mentioning delusion (Napoleon) and hysteria. If you want to write something insulting, maybe bother to clarify your opinions a little bit more? Like you did in our conversation.

vladimir_nesov on INTELLECT-1 Release: The First Globally Trained 10B Parameter Model

This is DiLoCo (Nov 2023 paper), a local SGD setup where the outer optimizer updates much more rarely (every 100-500 steps of the inner optimizers), asking for much less bandwidth (it uses Nesterov momentum in its state). The inner optimizers run within individual clusters, and the outer optimizer aggregates updates from individual clusters, using a much slower network that connects the clusters. The experiments were done with models of up to 400M parameters. (See also this paper on asynchronous variants of DiLoCo.)

The original paper lacks good compute efficiency measurements. The distributed training experiments start from a checkpoint trained for 24K steps, continuing for 64K more steps (to a total of 88K) in various distributed configurations. Even for the non-distributed configuration the perplexity keeps increasing to step 29K (Figure 7b, Figure 9). The compute expended in a non-distributed run between steps 24K and 88K gets expended in an 8-cluster run between steps 24K and 32K, when perplexity barely starts going down from the global maximum. So there is no way of comparing how well an 8-cluster run uses its compute, because the non-distributed experiment stops so early (at 88K steps) that the uninformative poorly optimized early state of the model still dominates the distributed configuration that uses the same amount of compute (at 32K steps).

Prime Intellect first reproduced DiLoCo in Jul 2024 (blog post, paper) on models of up to 1.1B parameters, taking training efficiency measurements. The largest experiment with a 1.1B model runs across 4 nodes that communicate only every 125 steps, and matches perplexity of a similar training run within a single cluster (with communication every step) using 20% more compute (Figure 7, comparing with 4x batch size baseline).

The new 10B model lacks baselines for comparison, so doesn't help with understanding how training efficiency depends on scale, but the results on benchmarks seem similar to those of other models with similar size and number of training tokens (Table 4 in the blog post).

gwern on China Hawks are Manufacturing an AI Arms Race

Missing the point. This is not about being too stupid to think of >0 strategies, this is about being able & willing to execute strategies.

I too can think of 100 things, and I listed several diverse ways of responding and threw in a historical parallel just in case that wasn't clear after several paragraphs of discussing the problem with not having a viable strategy you can execute. Smartness is not the limit here: we are already smart enough to come up with strategies which could achieve the goal. All of those could potentially work. But none of them seem realistically on the table as something that the USA as it currently exists would be willing to commit to and see through to completion, and you will note that few critics - and no one serious - is responding something like, "oh sure, all part of the plan already, see our white paper laying out the roadmap: after we win, we would then order the AGIs to hack the planet and ensure our perpetual hegemony; that is indeed the exit plan. We botched it last time with nukes and stood by and let everyone else get nukes, but we'll follow through this time."

There is no difference between "won't execute a strategy" and "can't execute a strategy": they are the same thing. The point is that a strategy has to be executable or else it's not an actual strategy. And acting as if you can execute a strategy that you won't can lead you to take terrible decisions. You are like the cat who thinks before climbing a tree: "obviously, I will just climb back down", and who then proceeds climb up and to not climb back down but mew piteously. Well, maybe you shouldn't've climbed up in the first place then...?

("arms race bros will srsly launch a global arms race by saying they'll use the decisive advantage from winning the arms race to conquer the world, and then will not conquer the world")

dagon on Is the mind a program?

Hmm, still not following, or maybe not agreeing. I think that "if the reasoning used to solve the problem is philosophical" then "correct solution" is not available. "useful", "consensus", or "applicable in current societal context" might be better evaluations of a philosophical reasoning.

sweenesm on Understanding Emergence in Large Language Models

Thanks for the post. I think it'd be helpful if you could add some links to references for some of the things you say, such as:

For instance, between 10^10 and 10^11 parameters, models showed dramatic improvements in their ability to interpret emoji sequences representing movies.

joseph-miller on Joseph Miller's Shortform

There are two types of people in this world.

There are people who treat the lock on a public bathroom as a tool for communicating occupancy and a safeguard against accidental attempts to enter when the room is unavailable. For these people the standard protocol is to discern the likely state of engagement of the inner room and then tentatively proceed inside if they detect no signs of human activity.

And there are people who view the lock on a public bathroom as a physical barricade with which to temporarily defend possessed territory. They start by giving the door a hearty push to test the tensile strength of the barrier. On meeting resistance they engage with full force, wringing the handle up and down and slamming into the door with their full body weight. Only once their attempts are thwarted do they reluctantly retreat to find another stall.

cbiddulph on You should consider applying to PhDs (soon!)

Thanks, this post made me seriously consider applying to a PhD, and I strong-upvoted. I had vaguely assumed that PhDs take way too long and don't allow enough access to compute compared to industry AI labs. But considering the long lead time required for the application process and the reminder that you can always take new opportunities as they come up, I now think applying is worth it.

However, looking into it, putting together a high-quality application starting now and finishing by the deadline seems approximately impossible? If the deadline were December 15, that would give you two weeks; other schools like Berkeley have even earlier deadlines. I asked ChatGPT how long it would take to apply to just a single school, and it said it would take 43–59 hours of time spent working, or ~4–6 weeks in real time. Claude said 37-55 hours/4-6 weeks.

Not to discourage anyone from starting their application now if they think they can do it - I guess if you're sufficiently productive and agentic and maybe take some amphetamines, you can do anything. But this seems like a pretty crazy timeline. Just the thought of asking someone to write me a recommendation letter in a two-week timeframe makes me feel bad.

Your post does make me think "if I were going to be applying to a PhD next December, what would I want to do now?" That seems pretty clarifying, and would probably be a helpful frame even if it turns out that a better opportunity comes along and I never apply to a PhD.

I think it'd be a good idea for you to repost this in August or early September of next year!