LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

(The) Lightcone is nothing without its people: LW + Lighthaven's big fundraiser
habryka (habryka4) · 2024-11-30T02:55:16.077Z · comments (214)

LessWrong's (first) album: I Have Been A Good Bing
habryka (habryka4) · 2024-04-01T07:33:45.242Z · comments (179)

OpenAI Email Archives (from Musk v. Altman and OpenAI blog)
habryka (habryka4) · 2024-11-16T06:38:03.937Z · comments (80)

I would have shit in that alley, too
Declan Molony (declan-molony) · 2024-06-18T04:41:06.545Z · comments (134)

Alignment Faking in Large Language Models
ryan_greenblatt · 2024-12-18T17:19:06.665Z · comments (53)

Transformers Represent Belief State Geometry in their Residual Stream
Adam Shai (adam-shai) · 2024-04-16T21:16:11.377Z · comments (100)

Failures in Kindness
silentbob · 2024-03-26T21:30:11.052Z · comments (60)

Reliable Sources: The Story of David Gerard
TracingWoodgrains (tracingwoodgrains) · 2024-07-10T19:50:21.191Z · comments (53)

The hostile telepaths problem
Valentine · 2024-10-27T15:26:53.610Z · comments (84)

The Best Tacit Knowledge Videos on Every Subject
Parker Conley (parker-conley) · 2024-03-31T17:14:31.199Z · comments (143)

How I got 4.2M YouTube views without making a single video
Closed Limelike Curves · 2024-09-03T03:52:33.025Z · comments (36)

There is way too much serendipity
Malmesbury (Elmer of Malmesbury) · 2024-01-19T19:37:57.068Z · comments (56)

[link] My hour of memoryless lucidity
Eric Neyman (UnexpectedValues) · 2024-05-04T01:40:56.717Z · comments (35)

Notifications Received in 30 Minutes of Class
tanagrabeast · 2024-05-26T17:02:20.989Z · comments (16)

[link] Thoughts on seed oil
dynomight · 2024-04-20T12:29:14.212Z · comments (129)

Safety isn’t safety without a social model (or: dispelling the myth of per se technical safety)
Andrew_Critch · 2024-06-14T00:16:47.850Z · comments (38)

[link] Survival without dignity
L Rudolf L (LRudL) · 2024-11-04T02:29:38.758Z · comments (28)

[link] [April Fools' Day] Introducing Open Asteroid Impact
Linch · 2024-04-01T08:14:15.800Z · comments (29)

MIRI 2024 Communications Strategy
Gretta Duleba (gretta-duleba) · 2024-05-29T19:33:39.169Z · comments (202)

You don't know how bad most things are nor precisely how they're bad.
Solenoid_Entity · 2024-08-04T14:12:54.136Z · comments (48)

[link] I got dysentery so you don’t have to
eukaryote · 2024-10-22T04:55:58.422Z · comments (4)

[link] Biological risk from the mirror world
jasoncrawford · 2024-12-12T19:07:06.305Z · comments (31)

[link] Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
evhub · 2024-01-12T19:51:01.021Z · comments (95)

Would catching your AIs trying to escape convince AI developers to slow down or undeploy?
Buck · 2024-08-26T16:46:18.872Z · comments (76)

Gentleness and the artificial Other
Joe Carlsmith (joekc) · 2024-01-02T18:21:34.746Z · comments (33)

Universal Basic Income and Poverty
Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2024-07-26T07:23:50.151Z · comments (135)

Non-Disparagement Canaries for OpenAI
aysja · 2024-05-30T19:20:13.022Z · comments (51)

[link] Scale Was All We Needed, At First
Gabe M (gabe-mukobi) · 2024-02-14T01:49:16.184Z · comments (33)

My AI Model Delta Compared To Yudkowsky
johnswentworth · 2024-06-10T16:12:53.179Z · comments (102)

80,000 hours should remove OpenAI from the Job Board (and similar EA orgs should do similarly)
Raemon · 2024-07-03T20:34:50.741Z · comments (71)

Overview of strong human intelligence amplification methods
TsviBT · 2024-10-08T08:37:18.896Z · comments (141)

Leaving MIRI, Seeking Funding
abramdemski · 2024-08-08T18:32:20.387Z · comments (19)

Express interest in an "FHI of the West"
habryka (habryka4) · 2024-04-18T03:32:58.592Z · comments (41)

[link] "No-one in my org puts money in their pension"
Tobes (tobias-jolly) · 2024-02-16T18:33:28.996Z · comments (16)

Raising children on the eve of AI
juliawise · 2024-02-15T21:28:07.737Z · comments (47)

On green
Joe Carlsmith (joekc) · 2024-03-21T17:38:56.295Z · comments (35)

The case for ensuring that powerful AIs are controlled
ryan_greenblatt · 2024-01-24T16:11:51.354Z · comments (66)

Getting 50% (SoTA) on ARC-AGI with GPT-4o
ryan_greenblatt · 2024-06-17T18:44:01.039Z · comments (50)

The Field of AI Alignment: A Postmortem, and What To Do About It
johnswentworth · 2024-12-26T18:48:07.614Z · comments (66)

The Great Data Integration Schlep
sarahconstantin · 2024-09-13T15:40:02.298Z · comments (16)

[link] My PhD thesis: Algorithmic Bayesian Epistemology
Eric Neyman (UnexpectedValues) · 2024-03-16T22:56:59.283Z · comments (14)

[link] Paul Christiano named as US AI Safety Institute Head of AI Safety
Joel Burget (joel-burget) · 2024-04-16T16:22:06.937Z · comments (58)

The Best Lay Argument is not a Simple English Yud Essay
J Bostock (Jemist) · 2024-09-10T17:34:28.422Z · comments (15)

My Clients, The Liars
ymeskhout · 2024-03-05T21:06:36.669Z · comments (85)

Truthseeking is the ground in which other principles grow
Elizabeth (pktechgirl) · 2024-05-27T01:09:20.796Z · comments (16)

The Online Sports Gambling Experiment Has Failed
Zvi · 2024-11-11T14:30:04.371Z · comments (30)

Laziness death spirals
PatrickDFarley · 2024-09-19T15:58:30.252Z · comments (36)

Ilya Sutskever and Jan Leike resign from OpenAI [updated]
Zach Stein-Perlman · 2024-05-15T00:45:02.436Z · comments (95)

Principles for the AGI Race
William_S · 2024-08-30T14:29:41.074Z · comments (13)

the case for CoT unfaithfulness is overstated
nostalgebraist · 2024-09-29T22:07:54.053Z · comments (40)

next page (older posts) →

Archive

Recent comments

alexander-gietelink-oldenziel on The Field of AI Alignment: A Postmortem, and What To Do About It

hi John,

Let's talk about a hypothetical blogger who, (for no particular reason), we'll call John. This blogger wants to critique the alignment field, but also needs a way to acknowledge his own work isn't empirically grounded. Maybe he needs the acknowledgment as a philosophical fig leaf, or maybe he's honestly concerned but not very good at seeing the plank in his own eye. Either way, he meets with two approaches to writing critiques - let's call them (again for no particular reason) "careful charitable engagement" and "provocative dismissal." Both can be effective, but they have very different impacts on community discourse. It turns out that careful engagement requires actually understanding what other researchers are doing, while provocative dismissal lets you write spicy posts from your armchair. Lo and behold, John endorses provocative dismissal as his Official Blogging Strategy, and continues writing critiques. (Actually the version which eventually emerges isn't even fully researched, it's a quite different version which just-so-happens to be even more dismissive of work he hasn't closely followed.)

Glad to hear you liked ILIAD [LW · GW].

Best,

AGO

towards_keeperhood on What are the most interesting / challenging evals (for humans) available?

Perhaps also not what you're looking for, but you could check out the google hashcode archive (here's an example problem). I never participated though, so don't know whether they would make that great tests. But it seems to me like general ad-hoc problem solving capabilities are more useful in hashcode than in other competetive programming competitions.

GPT4 summary: "Google Hash Code problems are real-world optimization and algorithmic challenges that require participants to design efficient solutions for large-scale scenarios. These problems are typically open-ended and focus on finding the best possible solution within given constraints, rather than exact correctness."

oliver-daniels on Write Good Enough Code, Quickly

also see Cognitive Load Is What Matters

lorec on Ten Levels of AI Alignment Difficulty

Positively transformative AI systems could reduce the overall risk from AI by: preventing the construction of a more dangerous AI; changing something about how global governance works; instituting surveillance or oversight mechanisms widely; rapidly and safely performing alignment research or other kinds of technical research; greatly improving cyberdefense; persuasively exposing misaligned behaviour in other AIs and demonstrating alignment solutions, and through many other actions that incrementally reduce risk.

One common way of imagining this process is that an aligned AI could perform a ‘pivotal act’ that solves AI existential safety in one swift stroke. However, it is important to consider this much wider range of ways in which one or several transformative AI systems could reduce the total risk from unaligned transformative AI.

Is it important to consider the wide range of ways in which a chimp could beat Garry Kasparov in a single chess match, or the wide range of ways in which your father [or, for that matter, von Neumann] could beat the house after going to Vegas?

Sorry if I sound arrogant, but this is a serious question. Sometimes differences in perspective can be large enough to warrant asking such silly-sounding questions.

I am unclear where you think the problem for a superintelligence [which is smart enough to complete some technologically superhuman pivotal act] is non-vanishingly-likely to come in, from a bunch of strictly less smart beings which existed previously, and which the smarter ASI can fully observe and outmaneuver.

If you don't think the intelligence difference is likely to be big enough that "the smarter ASI can fully observe and outmaneuver" the previously-extant, otherwise-impeding thinkers, then I understand where our difference of opinion lies, and would be happy to make my extended factual case that that's not true.

seth-herd on What are the strongest arguments for very short timelines?

I think the intelligence inherent in LLMs will make episodic memory systems useful immediately. The people I know building chatbots with persistent memory were already finding it useful with vector databases, it just topped out in capacity quickly by slowing down memory search too much. And that was as of a year ago.

I don't think I managed to convey one central point, which is that reflection and continuous learning together can fill a lot of cognitive gaps. I think they do for humans. We can analyze our own thinking and then use new strategies where appropriate. It seems like the pieces are all there for LLM cognitive architectures to do this as well. Such a system will still take time to dramatically self-improve by re-engineering its base systems. But there's a threshold of general intelligence and self-directed learning in which a system can self-correct and self-improve in limited but highly useful ways, so that its designers don't have to fix very flaw by hand, and it can just back up and try again differently.

I don't like the term unhobbling because it's more like adding cognitive tools that make new uses of LLMs considerable flexible intelligence.

All of the continuous learning approaches that would enable self-directed continuous are clunky now but there are no obvious roadblocks to their improving rapidly when a few competent teams start working on them full-time. And since there are multiple approaches already in play, there's a better chance some combination become useful quickly.

Yes, refining it and other systems will take time. But counting on it being a long time doesn't seem sensible. I am considering writing the complementary post, "what are the best arguments for long timelines". I'm curious. I expect the strongest ones to support something like five year timelines to what I consider AGI - which importantly is fully general in that it can learn new things, but will not meet the bar of doing 95% of remote jobs because it's not likely to be human-level in all areas right away.

I focus on that definition because it seems like a fairly natural category shift from limited tool AI to sapience and understanding [LW · GW] in "Real AGI" [LW · GW] that roughly match our intuitive understanding of humans as entities, minds, or beings that understand and can learn about themselves if they choose to.

The other reason I focus on that transition is that I expect it to function as a wake-up call to those who don't imagine agentic AI in detail. It will match their intuitions about humans well enough for our recognition of humans as very dangerous to also apply to that type of AI. Hopefully their growth from general-and-sapient-but-dumb-in-soe-ways will be slow enough for society to adapt - months to years may be enough.

viliam on If all trade is voluntary, then what is "exploitation?"

Sometimes what seems like stupidity locally, can be a part of a greater strategy. Other examples: Newcomb's paradox, strategic ignorance.

In the example of a magical pie, if we agree to share the pie fairly, I can get 0.5 pie. But if I insist on getting 90% of the pie, and based on my previous experience with exploiting starving people I can predict that there is let's say a 70% chance of you accepting the unfair deal, then I can get 0.63 pie on average. Getting 0.63 pie instead of 0.5 pie seems like a smart thing to do for an economical agent.

It is not a problem to lose some value, if more value is gained in turn. This is true even if things work probabilistically. Let's say that today was one of those 30% unlucky days when a starving stranger refused my offer to split the pie 90:10. From the short-term perspective, I have gambled and lost a half of the pie; stupid me! From the long-term perspective, I am still getting more pies than if I kept offering 50:50 deals instead.

In companies, it is not unusual that some kind of internal inflexibility imposes "stupid" losses in short term, but generates profits in long term as everyone accepts the fact that the company is "stupid" and inflexible, and gives up trying to negotiate a better deal. (It is truly stupid only if it also generates losses in long term. Which is difficult to estimate, and I have seen many companies doing seemingly stupid things and staying profitable in long term, so I became skeptical of my feelings when they tell me that something is obviously stupid.)

perhaps on What are the most interesting / challenging evals (for humans) available?

You might find some puzzle games to be useful. In particular Understand [LW · GW] is a game that was talked about on here as being good for learning how to test hypotheses and empirically deduce patterns. Similar to your Baba Is You experiments.

thane-ruthenis on The Field of AI Alignment: A Postmortem, and What To Do About It

I almost want to say that it sounds like we should recruit people from the same demographic as good startup founders. Almost.

Per @aysja [LW · GW]'s list, we want creative people with an unusually good ability to keep themselves on-track, who can fluently reason at several levels of abstraction, and who don't believe in the EMH. This fits pretty well with the stereotype of a successful technical startup founder – an independent vision, an ability to think technically and translate that technical vision into a product customers would want (i. e., develop novel theory and carry it across the theory-practice gap), high resilience in the face of adversity, high agency, willingness to believe you can spot an exploitable pattern where no-one did, etc.

... Or, at least, that is the stereotype of a successful startup founder from Paul Graham's essays. I expect that this idealized image diverges from reality in quite a few ways. (I haven't been following Silicon Valley a lot, but from what I've seen, I've not been impressed with all the LLM and LLM-wrapper startups. Which made me develop quite a dim image of what a median startup actually looks like.)

Still, when picking whom to recruit, it might be useful to adopt some of the heuristics Y Combinator/Paul Graham (claim to) employ when picking which startup-founder candidates to support?

(Connor Leahy also makes a similar point here [LW(p) · GW(p)]: that pursuing some ambitious non-templated vision in the real world is a good way to learn lessons that may double as insights regarding thorny philosophical problems.)

l-rudolf-l on Review: Planecrash

I copy-pasted markdown from the dev version of my own site, and the images showed up fine on my computer because I was running the dev server; images now fixed to point to the Substack CDN copies that the Substack version uses. Sorry for that.

l-rudolf-l on Review: Planecrash

Images issues now fixed, apologies for that