LessWrong 2.0 Reader

View: New · Old · Top

← previous page (newer posts) · next page (older posts) →

Trying to understand Hanson's Cultural Drift argument
Kemp (ethan-kemp) · 2024-07-22T20:20:32.734Z · comments (1)

Efficient Dictionary Learning with Switch Sparse Autoencoders
Anish Mudide (anish-mudide) · 2024-07-22T18:45:53.502Z · comments (15)

Analyzing DeepMind's Probabilistic Methods for Evaluating Agent Capabilities
Axel Højmark (hojmax) · 2024-07-22T16:17:07.665Z · comments (0)

The Garden of Eden
Alexander Turok · 2024-07-22T16:07:42.509Z · comments (1)

Caring about excellence
owencb · 2024-07-22T14:24:37.892Z · comments (4)

[link] Tim Dillon's fake business is the most influential video I have watched in the last 24 months
Stuart Johnson (stuart-johnson) · 2024-07-22T12:54:43.749Z · comments (0)

On the CrowdStrike Incident
Zvi · 2024-07-22T12:40:05.894Z · comments (14)

Auto-Enhance: Developing a meta-benchmark to measure LLM agents’ ability to improve other agents
Sam F. Brown (sam-4) · 2024-07-22T12:33:57.656Z · comments (0)

Initial Experiments Using SAEs to Help Detect AI Generated Text
Aaron_Scher · 2024-07-22T05:16:20.516Z · comments (0)

Categories of leadership on technical teams
benkuhn · 2024-07-22T04:50:04.071Z · comments (0)

An experiment on hidden cognition
Olli Järviniemi (jarviniemi) · 2024-07-22T03:26:05.564Z · comments (2)

OpenAI Boycott Revisit
Jake Dennie · 2024-07-22T01:44:55.094Z · comments (2)

Coalitional agency
Richard_Ngo (ricraz) · 2024-07-22T00:09:51.525Z · comments (4)

The AI Driver's Licence - A Policy Proposal
Joshua W (sooney) · 2024-07-21T20:38:07.093Z · comments (0)

[link] Demography and Destiny
Zero Contradictions · 2024-07-21T20:34:07.176Z · comments (11)

[link] The $100B plan with "70% risk of killing us all" w Stephen Fry [video]
Oleg Trott (oleg-trott) · 2024-07-21T20:06:39.615Z · comments (8)

[link] Raising Welfare for Lab Rodents
xanderbalwit · 2024-07-21T19:18:41.131Z · comments (0)

A simple model of math skill
Alex_Altair · 2024-07-21T18:57:33.697Z · comments (14)

Using an LLM perplexity filter to detect weight exfiltration
Adam Karvonen (karvonenadam) · 2024-07-21T18:18:05.612Z · comments (10)

[question] Would a scope-insensitive AGI be less likely to incapacitate humanity?
Jim Buhler (jim-buhler) · 2024-07-21T14:15:27.934Z · answers+comments (3)

[link] Holomorphic surjection theorem (Picard's little theorem)
dkl9 · 2024-07-21T13:24:18.300Z · comments (0)

aimless ace analyzes active amateur: a micro-aaaaalignment proposal
lukehmiles (lcmgcd) · 2024-07-21T12:37:39.925Z · comments (0)

Pivotal Acts are easier than Alignment?
Michael Soareverix (michael-soareverix) · 2024-07-21T12:15:12.818Z · comments (4)

Ball Sq Pathways
jefftk (jkaufman) · 2024-07-21T02:20:06.607Z · comments (1)

Freedom and Privacy of Thought Architectures
JohnBuridan · 2024-07-20T21:43:11.419Z · comments (2)

Introduction to Modern Dating: Strategic Dating Advice for beginners
Jesper Lindholm · 2024-07-20T15:45:25.705Z · comments (5)

[link] Why Georgism Lost Its Popularity
Zero Contradictions · 2024-07-20T15:08:41.469Z · comments (47)

[link] Only Fools Avoid Hindsight Bias
Kevin Dorst · 2024-07-20T13:42:35.755Z · comments (4)

A more systematic case for inner misalignment
Richard_Ngo (ricraz) · 2024-07-20T05:03:03.500Z · comments (4)

BatchTopK: A Simple Improvement for TopK-SAEs
Bart Bussmann (Stuckwork) · 2024-07-20T02:20:51.848Z · comments (0)

Krona Compare
jefftk (jkaufman) · 2024-07-20T01:10:03.994Z · comments (0)

(Approximately) Deterministic Natural Latents
johnswentworth · 2024-07-19T23:02:12.306Z · comments (0)

Feature Targeted LLC Estimation Distinguishes SAE Features from Random Directions
Lidor Banuel Dabbah · 2024-07-19T20:32:15.095Z · comments (6)

[link] JumpReLU SAEs + Early Access to Gemma 2 SAEs
Senthooran Rajamanoharan (SenR) · 2024-07-19T16:10:54.664Z · comments (8)

[link] Truth is Universal: Robust Detection of Lies in LLMs
Lennart Buerger · 2024-07-19T14:07:25.162Z · comments (1)

Sustainability of Digital Life Form Societies
Hiroshi Yamakawa (hiroshi-yamakawa) · 2024-07-19T13:59:13.973Z · comments (1)

[link] Romae Industriae
Maxwell Tabarrok (maxwell-tabarrok) · 2024-07-19T13:03:31.536Z · comments (2)

[question] Have people given up on iterated distillation and amplification?
Chris_Leong · 2024-07-19T12:23:04.625Z · answers+comments (1)

How do we know that "good research" is good? (aka "direct evaluation" vs "eigen-evaluation")
Ruby · 2024-07-19T00:31:38.332Z · comments (21)

[link] Linkpost: Surely you can be serious
kave · 2024-07-18T22:18:09.271Z · comments (7)

My experience applying to MATS 6.0
mic (michael-chen) · 2024-07-18T19:02:21.849Z · comments (3)

[question] What are the actual arguments in favor of computationalism as a theory of identity?
sunwillrise (andrei-alexandru-parfeni) · 2024-07-18T18:44:20.751Z · answers+comments (24)

[link] Yet Another Critique of "Luxury Beliefs"
ymeskhout · 2024-07-18T18:37:28.703Z · comments (10)

[Interim research report] Evaluating the Goal-Directedness of Language Models
Rauno Arike (rauno-arike) · 2024-07-18T18:19:04.260Z · comments (0)

[link] Interpretability in Action: Exploratory Analysis of VPT, a Minecraft Agent
Karolis Jucys (karolis-ramanauskas) · 2024-07-18T17:02:06.179Z · comments (0)

Activation Engineering Theories of Impact
kubanetics (jakub-nowak) · 2024-07-18T16:44:33.656Z · comments (1)

[question] Me & My Clone
SimonBaars (simonbaars) · 2024-07-18T16:25:40.770Z · answers+comments (19)

AI #73: Openly Evil AI
Zvi · 2024-07-18T14:40:05.770Z · comments (18)

A List of 45+ Mech Interp Project Ideas from Apollo Research’s Interpretability Team
Lee Sharkey (Lee_Sharkey) · 2024-07-18T14:15:50.248Z · comments (17)

SAEs (usually) Transfer Between Base and Chat Models
Connor Kissane (ckkissane) · 2024-07-18T10:29:46.138Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

artifex on Universal Basic Income and Poverty

I do not see what there is in a continued existence of 60-hour weeks that cannot be explained by the relative strength of the income and substitution effects. This doesn’t need to tell us about a poverty equilibrium, it can just tell us about people’s preferences?

gilch on tilek's Shortform

I think #1 implies #2 pretty strongly, but OK, I was mostly with you until #4. Why is it that low? I think #3 implies #4, with high probability (Murphy's Law). Why don't you?

#5 and #6 don't seem like strong objections. Multiple scenarios could happen multiple times in the interval we are talking about. Only one has to deal the final blow for it to be final, and even blows we survive, we can't necessarily recover from, or recover from quickly. The weaker civilization gets, the less likely it is to survive the next blow.

We can hope that warning shots wake up the world enough to make further blows less likely, but consider that the opposite may be true. Damage leads to desperation, which leads to war, which leads to arms races, which leads to cutting corners on safety, which leads to the next blow. Or human manipulation/deception through AI leads to widespread mistrust, which prevents us from coordinating on our collective problems in time. Or AI success leads to dependence, which leads to reluctance to change course, which makes recovery harder. Or repeated survival leads to complacency until we boil the frog to death. Or some combination of these, or similar cascading failures. It depends on the nature of the scenario. There are lots of ways things could go wrong, many roads to ruin; disaster is disjunctive.

Would warnings even work? Those in the know are sounding the alarm already. Are we taking them seriously enough? If not, why do you expect this to change?

chris_leong on How the AI safety technical landscape has changed in the last year, according to some practitioners

Super terse answer:

Because most people do stuff like try to increase the number of companies in the space.

And even though AI isn't like nukes yet, at one point it will be.

And just like you wouldn't want as many companies building nukes as possible - you'd either want a few highly vetted companies or a government effort - you don't want as many companies building AGI as possible.

faul_sname on On “first critical tries” in AI alignment

Does any specific human or group of humans currently have "control" in the sense of "that which is lost in a loss-of-control scenario"? If not, that indicates to me that it may be useful to frame the risk as "failure to gain control".

lukemarks on lukemarks's Shortform

More people should consider dropping out of high school, particularly if they:

Don't find their classes interesting
Have self-motivation
Don't plan on going to university

In most places, once you reach an age younger than the typical age of graduation you are not legally obligated to attend school. Many continue because it's normal, but some brief analysis could reveal that graduating is not worth the investment for you.

Some common objections I heard:

It's only more months, why not finish?

Why finish?

What if 'this whole thing' doesn't pan out?

The mistake in this objection is thinking there was a single reason I wanted to leave school. I was increasing my free time, not making a bet on a particular technology.

My parents would never consent to this.

In some cases this is true. You might be surprised if you demonstrate long term commitment and the ability to get financial support though.

Leaving high school is not the right decision for everyone, but many students won't even consider it. At least make the option available to yourself.

gilch on tilek's Shortform

I don't really have a problem with the term "intelligence" myself, but I see how it could carry anthropomorphic baggage for some people. However, I think the important parts are, in fact, analogous between AGI and humans. But I'm not attached to that particular word. One may as well say "competence" or "optimization power" without losing hold of the sense of "intelligence" we mean when we talk about AI.

In the study of human intelligence, it's useful to break down the g factor (what IQ tests purport to measure) into fluid and crystallized intelligence. The former being the processing power required to learn and act in novel situations, and the latter being what has been learned and the ability to call upon and apply that knowledge.

"Cognitive skills" seems like a reasonably good framing for further discussion, but I think recent experience in the field contradicts your second problem, even given this framing. The Bitter Lesson [LW · GW] says it well. Here are some relevant excerpts (it's worth a read and not that long).

The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin. [...] Seeking an improvement that makes a difference in the shorter term, researchers seek to leverage their human knowledge of the domain, but the only thing that matters in the long run is the leveraging of computation.

[...] researchers always tried to make systems that worked the way the researchers thought their own minds worked---they tried to put that knowledge in their systems---but it proved ultimately counterproductive, and a colossal waste of researcher's time, when, through Moore's law, massive computation became available and a means was found to put it to good use.

[...] We have to learn the bitter lesson that building in how we think we think does not work in the long run. The bitter lesson is based on the historical observations that 1) AI researchers have often tried to build knowledge into their agents, 2) this always helps in the short term, and is personally satisfying to the researcher, but 3) in the long run it plateaus and even inhibits further progress, and 4) breakthrough progress eventually arrives by an opposing approach based on scaling computation by search and learning. The eventual success is tinged with bitterness, and often incompletely digested, because it is success over a favored, human-centric approach.

One thing that should be learned from the bitter lesson is the great power of general purpose methods, of methods that continue to scale with increased computation even as the available computation becomes very great. The two methods that seem to scale arbitrarily in this way are search and learning.

[...] the actual contents of minds are tremendously, irredeemably complex; we should stop trying to find simple ways to think about the contents of minds [...] these are part of the arbitrary, intrinsically-complex, outside world. They are not what should be built in, as their complexity is endless; instead we should build in only the meta-methods that can find and capture this arbitrary complexity. [...] We want AI agents that can discover like we can, not which contain what we have discovered.

Your conception of intelligence in the "cognitive skills" framing seems to be mainly about the crystalized sort. The knowledge and skills and application thereof. You see how complex and multidimensional that is and object to the idea that collections of such could be well-ordered, making concepts like "smarter-than human" if not wholly devoid of meaning, at least wrongheaded.

I agree that "competence" is ultimately a synonym for "skill", but you're neglecting the fluid intelligence. We already know how to give computers the only "cognitive skills" that matters: the ones that let you acquire all the others. The ability to learn, mainly. And that one can be brute forced with more compute. All the complexity and multidimensionality you see come when something profoundly simple, algorithms measured in mere kilobytes of source code, interacts with data from the complex and multidimensional real world.

In the idealized limit, what I call "intelligence" is AIXI. Though the explanation is long, the definition is not. It really is that simple. All else we call "intelligence" is mere approximation and optimization of that.

mo-putera on Universal Basic Income and Poverty

I wasn't aware of these options, thank you.

raemon on 80,000 hours should remove OpenAI from the Job Board (and similar EA orgs should do similarly)

Size.

sharmake-farah on How the AI safety technical landscape has changed in the last year, according to some practitioners

My question is why do you consider most work on concentration of power risk net-negative?

remmelt-ellen on The case for stopping AI safety research

Appreciating your thoughtful comment.

It's hard to pin down ambiguity around how much alignment "techniques" make models more "usable", and how much that in turn enables more "scaling". This and the safety-washing concern gets us into messy considerations. Though I generally agree that participants of MATS or AISC programs can cause much less harm through either than researchers working directly on aligning eg. OpenAI's models for release.

Our crux though is about the extent of progress that can be made – on engineering fully autonomous machinery to control* their own effects in line with continued human safety. I agree with you that such a system can be engineered to start off performing more** of the tasks we want it to complete (ie. progress on alignment is possible). At the same time, there are fundamental limits to controllability [LW · GW] (ie. progress on alignment is capped).

This is where I think we need more discussion:

Is the extent of AGI control possible at least more than the extent of control needed
(to prevent eventual convergence on causing human extinction)?

* I use the term "control" in the established control theory sense, consistent with Yampolskiy's definition. Just to avoid confusing people, as the term gets used in more specialised ways in the alignment community (eg. in conversations about the shut-down problem or control agenda).
** This is a rough way of stating it. It's also about the machinery performing fewer of the tasks we wouldn't want the system to complete. And the relevant measure is not as much about the number of preferred tasks performed, as the preferred consequences. Finally, this raises a question about who the 'we' is who can express preferences that the system is to act in line with, and whether coherent alignment with different persons' preferences expressed from within different perceived contexts is even a sound concept.