LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

[link] Scale Was All We Needed, At First
Gabe M (gabe-mukobi) · 2024-02-14T01:49:16.184Z · comments (31)

[link] "No-one in my org puts money in their pension"
Tobes (tobias-jolly) · 2024-02-16T18:33:28.996Z · comments (7)

Brute Force Manufactured Consensus is Hiding the Crime of the Century
Roko · 2024-02-03T20:36:59.806Z · comments (156)

CFAR Takeaways: Andrew Critch
Raemon · 2024-02-14T01:37:03.931Z · comments (62)

Believing In
AnnaSalamon · 2024-02-08T07:06:13.072Z · comments (49)

[link] Sam Altman’s Chip Ambitions Undercut OpenAI’s Safety Strategy
garrison · 2024-02-10T19:52:55.191Z · comments (52)

[link] Contra Ngo et al. “Every ‘Every Bay Area House Party’ Bay Area House Party”
Ricki Heicklen (bayesshammai) · 2024-02-22T23:56:02.318Z · comments (5)

Every "Every Bay Area House Party" Bay Area House Party
Richard_Ngo (ricraz) · 2024-02-16T18:53:28.567Z · comments (6)

Timaeus's First Four Months
Jesse Hoogland (jhoogland) · 2024-02-28T17:01:53.437Z · comments (6)

2023 Survey Results
Screwtape · 2024-02-16T22:24:28.132Z · comments (26)

Raising children on the eve of AI
juliawise · 2024-02-15T21:28:07.737Z · comments (15)

[link] Masterpiece
Richard_Ngo (ricraz) · 2024-02-13T23:10:35.376Z · comments (20)

And All the Shoggoths Merely Players
Zack_M_Davis · 2024-02-10T19:56:59.513Z · comments (57)

Updatelessness doesn't solve most problems
Martín Soto (martinsq) · 2024-02-08T17:30:11.266Z · comments (43)

Things I've Grieved
Raemon · 2024-02-18T19:32:47.169Z · comments (6)

Rationality Research Report: Towards 10x OODA Looping?
Raemon · 2024-02-24T21:06:38.703Z · comments (21)

The Pareto Best and the Curse of Doom
Screwtape · 2024-02-21T23:10:01.359Z · comments (22)

Attitudes about Applied Rationality
Camille Berger (Camille Berger) · 2024-02-03T14:42:22.770Z · comments (18)

New LessWrong review winner UI ("The LeastWrong" section and full-art post pages)
kave · 2024-02-28T02:42:05.801Z · comments (63)

Skills I'd like my collaborators to have
Raemon · 2024-02-09T08:20:37.686Z · comments (9)

[link] A Chess-GPT Linear Emergent World Representation
karvonenadam · 2024-02-08T04:25:15.222Z · comments (14)

Counting arguments provide no evidence for AI doom
Nora Belrose (nora-belrose) · 2024-02-27T23:03:49.296Z · comments (177)

Lsusr's Rationality Dojo
lsusr · 2024-02-13T05:52:03.757Z · comments (17)

Announcing the London Initiative for Safe AI (LISA)
James Fox · 2024-02-02T23:17:47.011Z · comments (0)

[link] Ideological Bayesians
Kevin Dorst · 2024-02-25T14:17:25.070Z · comments (4)

Open Source Sparse Autoencoders for all Residual Stream Layers of GPT2-Small
Joseph Bloom (Jbloom) · 2024-02-02T06:54:53.392Z · comments (37)

OpenAI's Sora is an agent
CBiddulph (caleb-biddulph) · 2024-02-16T07:35:52.171Z · comments (25)

How to train your own "Sleeper Agents"
evhub · 2024-02-07T00:31:42.653Z · comments (7)

[link] My cover story in Jacobin on AI capitalism and the x-risk debates
garrison · 2024-02-12T23:34:16.526Z · comments (5)

Dreams of AI alignment: The danger of suggestive names
TurnTrout · 2024-02-10T01:22:51.715Z · comments (58)

Everything Wrong with Roko's Claims about an Engineered Pandemic
EZ97 · 2024-02-22T15:59:08.439Z · comments (10)

story-based decision-making
bhauth · 2024-02-07T02:35:27.286Z · comments (11)

How well do truth probes generalise?
mishajw · 2024-02-24T14:12:19.729Z · comments (11)

[link] More Hyphenation
Arjun Panickssery (arjun-panickssery) · 2024-02-07T19:43:29.086Z · comments (19)

[link] Debating with More Persuasive LLMs Leads to More Truthful Answers
Akbir Khan (akbir-khan) · 2024-02-07T21:28:10.694Z · comments (14)

Retirement Accounts and Short Timelines
jefftk (jkaufman) · 2024-02-19T18:50:05.231Z · comments (35)

AI #51: Altman’s Ambition
Zvi · 2024-02-20T19:50:07.439Z · comments (5)

[link] Things You’re Allowed to Do: University Edition
Saul Munn (saul-munn) · 2024-02-06T00:36:11.690Z · comments (13)

Addressing Feature Suppression in SAEs
Benjamin Wright (Benw8888) · 2024-02-16T18:32:51.927Z · comments (3)

The Gemini Incident
Zvi · 2024-02-22T21:00:04.594Z · comments (19)

Attention SAEs Scale to GPT-2 Small
Connor Kissane (ckkissane) · 2024-02-03T06:50:22.583Z · comments (4)

My guess at Conjecture's vision: triggering a narrative bifurcation
Alexandre Variengien (alexandre-variengien) · 2024-02-06T19:10:42.690Z · comments (12)

The One and a Half Gemini
Zvi · 2024-02-22T13:10:04.725Z · comments (4)

Analogies between scaling labs and misaligned superintelligent AI
scasper · 2024-02-21T19:29:39.033Z · comments (4)

Survey for alignment researchers!
Cameron Berg (cameron-berg) · 2024-02-02T20:41:44.323Z · comments (11)

Do sparse autoencoders find "true features"?
Demian Till · 2024-02-22T18:06:59.630Z · comments (33)

[link] Davidad's Provably Safe AI Architecture - ARIA's Programme Thesis
simeon_c (WayZ) · 2024-02-01T21:30:44.090Z · comments (17)

[link] Most experts believe COVID-19 was probably not a lab leak
DanielFilan · 2024-02-02T19:28:00.319Z · comments (89)

On the Debate Between Jezos and Leahy
Zvi · 2024-02-06T14:40:05.487Z · comments (6)

Preventing model exfiltration with upload limits
ryan_greenblatt · 2024-02-06T16:29:33.999Z · comments (16)

next page (older posts) →

Archive

Recent comments

kaj_sotala on We might be missing some key feature of AI takeoff; it'll probably seem like "we could've seen this coming"

No one (to my knowledge?) highlighted that the future might well go as follows:
“There’ll be gradual progress on increasingly helpful AI tools. Companies will roll these out for profit and connect them to the internet. There’ll be discussions about how these systems will eventually become dangerous, and safety-concerned groups might even set up testing protocols (“safety evals”). Still, it’ll be challenging to build regulatory or political mechanisms around these safety protocols so that, when they sound the alarm at a specific lab that the systems are becoming seriously dangerous, this will successfully trigger a slowdown and change the model release culture from ‘release by default’ to one where new models are air-gapped and where

Hmm, I feel like I always had something like this as one of my default scenarios. Though it would of course have been missing some key details such as the bit about model release culture, since that requires the concept of widely applicable pre-trained models that are released the way they are today.

E.g. Sotala & Yampolskiy 2015 and Sotala 2018 both discussed there being financial incentives to deploy increasingly sophisticated narrow-AI systems until they finally crossed the point of becoming AGI.

S&Y 2015:

Ever since the Industrial Revolution, society has become increasingly automated. Brynjolfsson [60] argue that the current high unemployment rate in the United States is partially due to rapid advances in information technology, which has made it possible to replace human workers with computers faster than human workers can be trained in jobs that computers cannot yet perform. Vending machines are replacing shop attendants, automated discovery programs which locate relevant legal documents are replacing lawyers and legal aides, and automated virtual assistants are replacing customer service representatives.
Labor is becoming automated for reasons of cost, efficiency and quality. Once a machine becomes capable of performing a task as well as (or almost as well as) a human, the cost of purchasing and maintaining it may be less than the cost of having a salaried human perform the same task. In many cases, machines are also capable of doing the same job faster, for longer periods and with fewer errors. In addition to replacing workers entirely, machines may also take over aspects of jobs that were once the sole domain of highly trained professionals, making the job easier to perform by less-skilled employees [298].
If workers can be affordably replaced by developing more sophisticated AI, there is a strong economic incentive to do so. This is already happening with narrow AI, which often requires major modifications or even a complete redesign in order to be adapted for new tasks. ‘A roadmap for US robotics’ [154] calls for major investments into automation, citing the potential for considerable improvements in the fields of manufacturing, logistics, health care and services.
Similarly, the US Air Force Chief Scientistʼs [78] ‘Technology horizons’ report mentions ‘increased use of autonomy and autonomous systems’ as a key area of research to focus on in the next decade, and also notes that reducing the need for manpower provides the greatest potential for cutting costs. In 2000, the US Congress instructed the armed forces to have one third of their deep strike force aircraft be unmanned by 2010, and one third of their ground combat vehicles be unmanned by 2015 [4].
To the extent that an AGI could learn to do many kinds of tasks—or even any kind of task—without needing an extensive re-engineering effort, the AGI could make the replacement of humans by machines much cheaper and more profitable. As more tasks become automated, the bottlenecks for further automation will require adaptability and flexibility that narrow-AI systems are incapable of. These will then make up an increasing portion of the economy, further strengthening the incentive to develop AGI. Increasingly sophisticated AI may eventually lead to AGI, possibly within the next several decades [39, 200].
Eventually it will make economic sense to automate all or nearly all jobs [130, 136, 289].

And with regard to the difficulty of regulating them, S&Y 2015 mentioned that:

... there is no clear way to define what counts as dangerous AGI. Goertzel [115] point out that there is no clear division between narrow AI and AGI and attempts to establish such criteria have failed. They argue that since AGI has a nebulous definition, obvious wide-ranging economic benefits and potentially significant penetration into multiple industry sectors, it is unlikely to be regulated due to speculative long-term risks.

and in the context of discussing AI boxing and oracles, argued that both AI boxing and Oracle AI are likely to be of limited (though possibly still some) value, since there's an incentive to just keep deploying all AI in the real world as soon as it's developed:

Oracles are likely to be released. As with a boxed AGI, there are many factors that would tempt the owners of an Oracle AI to transform it to an autonomously acting agent. Such an AGI would be far more effective in furthering its goals, but also far more dangerous.
Current narrow-AI technology includes HFT algorithms, which make trading decisions within fractions of a second, far too fast to keep humans in the loop. HFT seeks to make a very short-term profit, but even traders looking for a longer-term investment benefit from being faster than their competitors. Market prices are also very effective at incorporating various sources of knowledge [135]. As a consequence, a trading algorithmʼs performance might be improved both by making it faster and by making it more capable of integrating various sources of knowledge. Most advances toward general AGI will likely be quickly taken advantage of in the financial markets, with little opportunity for a human to vet all the decisions. Oracle AIs are unlikely to remain as pure oracles for long.
Similarly, Wallach [283] discuss the topic of autonomous robotic weaponry and note that the US military is seeking to eventually transition to a state where the human operators of robot weapons are ‘on the loop’ rather than ‘in the loop’. In other words, whereas a human was previously required to explicitly give the order before a robot was allowed to initiate possibly lethal activity, in the future humans are meant to merely supervise the robotʼs actions and interfere if something goes wrong.
Human Rights Watch [90] reports on a number of military systems which are becoming increasingly autonomous, with the human oversight for automatic weapons defense systems—designed to detect and shoot down incoming missiles and rockets—already being limited to accepting or overriding the computerʼs plan of action in a matter of seconds. Although these systems are better described as automatic, carrying out pre-programmed sequences of actions in a structured environment, than autonomous, they are a good demonstration of a situation where rapid decisions are needed and the extent of human oversight is limited. A number of militaries are considering the future use of more autonomous weapons.
In general, any broad domain involving high stakes, adversarial decision making and a need to act rapidly is likely to become increasingly dominated by autonomous systems. The extent to which the systems will need general intelligence will depend on the domain, but domains such as corporate management, fraud detection and warfare could plausibly make use of all the intelligence they can get. If oneʼs opponents in the domain are also using increasingly autonomous AI/AGI, there will be an arms race where one might have little choice but to give increasing amounts of control to AI/AGI systems.

8e9 on 8e9's Shortform

OpenAI is thinking about how to safely and responsibly allow its models to produce NSFW content that goes beyond answering sex-ed “birds and the bees” type questions.

I haven’t read the whole thing yet, but I’m glad they released this document (which also deals with many other thorny questions).

https://cdn.openai.com/spec/model-spec-2024-05-08.html#overview

d0themath on D0TheMath's Shortform

A list of some contrarian takes I have:

People are currently predictably too worried about misuse risks
What people really mean by "open source" vs "closed source" labs is actually "responsible" vs "irresponsible" labs, which is not affected by regulations targeting open source model deployment.
Neuroscience as an outer alignment[^rough] strategy is embarrassingly underrated.
Better information security at labs is not clearly a good thing, and if we're worried about great power conflict, probably a bad thing [LW(p) · GW(p)].
Much research on deception (Anthropic's recent work, trojans, jailbreaks, etc) is not targeting "real" instrumentally convergent deception reasoning, but learned heuristics. Not bad in itself, but IMO this places heavy asterisks on the results they can get.
ML robustness research (like FAR Labs' Go stuff) does not help with alignment, and helps moderately for capabilities.
The field of ML is a bad field to take epistemic lessons from. Note I don't talk about the results from ML.
ARC's MAD seems doomed to fail.
People in alignment put too much faith in the general factor g. It exists, and is powerful, but is not all-consuming or all-predicting. People are often very smart, but lack social skills, or agency, or strategic awareness, etc. And vice-versa. They can also be very smart in a particular area, but dumb in other areas. This is relevant for hiring & deference, but less for object-level alignment.
People are too swayed by rhetoric in general, and alignment, rationality, & EA too, but in different ways, and admittedly to a lesser extent than the general population. People should fight against this more than they seem to (which is not really at all, except for the most overt of cases). For example, I see nobody saying they don't change their minds on account of Scott Alexander because he's too powerful a rhetorician. Ditto for Eliezer, since he is also a great rhetorician. In contrast, Robin Hanson is a famously terrible rhetorician, so people should listen to him more.
There is a technocratic tendency in strategic thinking around alignment (I think partially inherited from OpenPhil, but also smart people are likely just more likely to think this way) which biases people towards more simple & brittle top-down models without recognizing how brittle those models are.

[^rough] A non-exact term

ramblindash on Dating Roundup #3: Third Time’s the Charm

So I guess I'm not sure what you mean by that. I think it might be easier to support what I'm saying in the negative. Some example of inauthenticity or un-openness might be:

Consciously faking your personality (in a way that you wouldn't want to maintain as an essentially permanent change)
Lying about what you want out of the relationship
Pretending to like/dislike hobbies or interests that you actually strongly dislike/like

The problem with doing these things is that, to the extent that doing them was necessary to gain the relationship, you are now stuck with a relationship that is built on a papered-over incompatibility. If your plan is that you will fake a completely different personality/goals/interests, then you will now be in a relationship where you have to permanently keep faking that stuff while constantly being wary that your new partner might find out you were faking plus you have to spend a lot of time and energy doing stuff and/or interacting with someone you don't actually like, or else ending the relationship and being back at square 1, except that you've invested time/energy that you won't get back. There can be toned-down good versions of this bad strategy tho, I think, which are more like "putting your best foot forward" than like "being inauthentic."

Truth: Looking for a life partner, getting desperate
Good strategy [probably depends on age, for this one]: Open to various possibilities, see how it goes.
Bad strategy: Your date says they are really only looking for short term fun, and you agree that's all you are looking for too.

Truth: A talkative person who loves debating ideas
Good strategy: Tone it down a little, try to listen as much as you talk and try to "yes, and" or "that's interesting, tell me more about what led you to that" your date's points rather than "no but" (you can often make similar points either way)
Bad strategy: Just agree with everything your date says; even if you actually have a strong opposing view

Truth: Don't really care for hiking much
Good strategy [when trying out someone who loves hiking]: "I haven't been too into that before, tell me what you love about it? I'd be open to giving it another shot"
Bad strategy: "OMG I love hiking too!"

The problem that all these bad strategies have in common is that if they are successful, you end up with something you don't want.

erioire on ErioirE's shortform:

How much of the developed world's economy is devoted to aesthetic personalization of products rather than accomplishing the essential functions of [product here]?
I am not saying aesthetics or personalization are 'bad', however I suspect that if the cost were quantified and demonstrated to people along with examples of more productive things that could be done with that money, many people might prefer forgoing some of our more wasteful things.

Example:
The cost of having thousands of different styles of sink faucet, instead of a small number of highly efficient and optimized faucet designs for distinct use cases [small household kitchen, large household kitchen, small form factor, high throughput restaurant]. These costs are created via the overhead caused by the redundant costs of engineering, design, manufacturing, and logistics.
These same factors apply more or less to every product where variations are sold primarily for aesthetic rather than functional purposes, particularly when they replace existing functional versions.

I believe the root cause of this inefficiency is our psychological tendency to overvalue ephemeral utility such as using possessions as social status tools rather than trying to optimize how we collectively use our limited economic output. For example, if a sizeable portion of the money in the market for functionally useless decorations were able to go towards medical research.

I do not know how a more efficient allocation of resources could be practically enacted. According to my understanding most attempts at centrally planned economies have even less success than the free market, as inefficient as it is.

If a large portion of people decided to prioritize their purchases better that would work, but that's obviously a very challenging coordination problem.

programcrafter on Thoughts on the relative economic benefits of polyamorous relationships?

I would guess this is somewhat similar to having a network of friends: a polycule is even bound to be smaller. And I can totally imagine being emotionally, romantically, sexually attached to one set of partners and opinion-sharing attached to a slightly different set.

t3t on RobertM's Shortform

Ah, does look like Zach beat me to the punch :)

I'm also still moderately confused, though I'm not that confused about labs not speaking up - if you're playing politics, then not throwing the PM under the bus seems like a reasonable thing to do. Maybe there's a way to thread the needle of truthfully rebutting the accusations without calling the PM out, but idk. Seems like it'd be difficult if you weren't either writing your own press release or working with a very friendly journalist.

nevin-wetherill on Open Thread Spring 2024

Hey, I'm new to LessWrong and working on a post - however at some point the guidelines which pop up at the top of a fresh account's "new post" screen went away, and I cannot find the same language in the New Users Guide or elsewhere on the site.

Does anyone have a link to this? I recall a list of suggestions like "make the post object-level," "treat it as a submission for a university," "do not write a poetic/literary post until you've already gotten a couple object-level posts on your record."

It seems like a minor oversight if it's impossible to find certain moderation guidelines/tips and tricks if you've already saved a draft/posted a comment.

I am not terribly worried about running headfirst into a moderation filter, as I can barely manage to write a comment which isn't as high effort of an explanation as I can come up with - but I do want that specific piece of text for reference, and now it appears to have evaporated into the shadow realm.

Am I just missing a link that would appear if I searched something else?

(Edit: also, sorry if this is the wrong place for this, I would've tried the "intercom" feature, but I am currently on the mobile version of the site, and that feature appears to be entirely missing there - and yes, I checked my settings to make sure it wasn't "hidden")

fowlertm on fowlertm's Shortform

We recently released an interview with independent scholar John Wentworth:

It mostly centers around two themes: "abstraction" (forming concepts) and "agency" (dealing with goal-directed systems).

Check it out!

habryka4 on Bogdan Ionut Cirstea's Shortform

At least Eliezer has been extremely clear that he is in favor of a stop not a pause (indeed, that was like the headline of his article "Pausing AI Developments Isn't Enough. We Need to Shut it All Down"), so I am confused why you list him with anything related to "pause".

My guess is me and Eliezer are both in favor of a pause, but mostly because a pause seems like it would slow down AGI progress, not because the next 6 months in-particular will be the most risky period.