LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

The Case For Free Will or Why LessWrong must commit to self determination
Troshen · 2014-04-07T12:07:18.537Z · comments (15)

← previous page (newer posts) · next page (older posts) →

Recent comments

tamsin-leake on Tamsin Leake's Shortform

I believe that ChatGPT was not released with the expectation that it would become as popular as it did.

Well, even if that's true, causing such an outcome by accident should still count as evidence of vast irresponsibility imo.

akash-wasil on Akash's Shortform

Agreed— my main point here is that the marketplace of ideas undervalues criticism.

I think one perspective could be “we should all just aim to do objective truth-seeking”, and as stated I agree with it.

The main issue with that frame, imo, is that it’s very easy to forget that the epistemic environment can be tilted in favor of certain perspectives.

EG I think it can be useful for “objective truth-seeking efforts” to be aware of some of the culture/status games that underincentivize criticism of labs & amplify lab-friendly perspectives.

benito on Stephen Fowler's Shortform

Not OP, but I take the claim to be "endorsing getting into bed with companies on-track to make billions of dollars profiting from risking the extinction of humanity in order to nudge them a bit, is in retrospect an obviously doomed strategy, and yet many self-identified effective altruists trusted their leadership to have secret good reasons for doing so and followed them in supporting the companies (e.g. working there for years including in capabilities roles and also helping advertise the company jobs). now that a new consensus is forming that it indeed was obviously a bad strategy, it is also time to have evaluated the leadership's decision as bad at the time of making the decision and impose costs on them accordingly, including loss of respect and power".

So no, not disincentivizing making positive EV bets, but updating about the quality of decision-making that has happened in the past.

habryka4 on simeon_c's Shortform

Sure, I'll try to post here if I know of a clear opportunity to donate to either comes up.

zach-stein-perlman on DeepMind's "Frontier Safety Framework" is weak and unambitious

Sorry for brevity.

We just disagree. E.g. you "walked away with a much better understanding of how OpenAI plans to evaluate & handle risks than how Anthropic plans to handle & evaluate risks"; I felt like Anthropic was thinking about everything well.

I think Anthropic's ASL-3 is reasonable and OpenAI's thresholds and corresponding commitments are unreasonable. If the ASL-4 threshold was high or commitments are poor such that ASL-4 was meaningless, I agree Anthropic's RSP would be at least as bad as OpenAI's.

One thing I think is a big deal: Anthropic's RSP treats internal deployment like external deployment; OpenAI's has almost no protections for internal deployment.

I agree "an initial RSP that mostly spells out high-level reasoning, makes few hard commitments, and focuses on misuse while missing the all-important evals and safety practices for ASL-4" is also a fine characterization of Anthropic's current RSP.

edit: or, like, pf thresholds too high, so pf seems doomed / not on track, but rsp:v1 is consistent with rspv:1.1 being great. At least Anthropic knows and says there’s a big hole. That’s not relevant to evaluating labs’ current commitments but is very relevant to predicting.

vladimir_nesov on What Are Non-Zero-Sum Games?—A Primer

See "Zero Sum" is a misnomer [LW · GW], shifting and rescaling of utility functions breaks formulations that simply ask to take a sum of payoffs, but we can rescue the concept to mean that all outcomes/strategies of the game are Pareto efficient.

"Positive sum" seems to be about Kaldor-Hicks efficiency, strategies where in principle there is a post-game redistribution of resources that would turn the strategies Pareto efficient, but there is no commitment or possibly even practical feasibility to actually perform the redistribution. This hypothetical redistribution step takes care of comparing utilities of different players. A whole game/interaction/project would then be "positive-sum" if each outcome/strategy is equivalent to some Pareto efficient strategy via a redistribution.

yonatan-cale-1 on simeon_c's Shortform

@habryka [LW · GW] , Would you reply to this comment if there's an opportunity to donate to either? Me and another person are interested, and others could follow this comment too if they wanted to

(only if it's easy for you, I don't want to add an annoying task to your plate)

zach-stein-perlman on Akash's Shortform

Sorry for brevity, I'm busy right now.

Noticing good stuff labs do, not just criticizing them, is often helpful. I wish you thought of this work more as "evaluation" than "criticism."
It's often important for evaluation to be quite truth-tracking. Criticism isn't obviously good by default.

akash-wasil on Akash's Shortform

My current perspective is that criticism of AGI labs is an under-incentivized public good. I suspect there's a disproportionate amount of value that people could have by evaluating lab plans, publicly criticizing labs when they break commitments or make poor arguments, talking to journalists/policymakers about their concerns, etc.

Some quick thoughts:

Soft power– I think people underestimate the how strong the "soft power" of labs is, particularly in the Bay Area.
Jobs– A large fraction of people getting involved in AI safety are interested in the potential of working for a lab one day. There are some obvious reasons for this– lots of potential impact from being at the organizations literally building AGI, big salaries, lots of prestige, etc.
- People (IMO correctly) perceive that if they acquire a reputation for being critical of labs, their plans, or their leadership, they will essentially sacrifice the ability to work at the labs.
- So you get an equilibrium where the only people making (strong) criticisms of labs are those who have essentially chosen to forgo their potential of working there.
Money– The labs and Open Phil (which has been perceived, IMO correctly, as investing primarily into metastrategies that are aligned with lab interests) have an incredibly large share of the $$$ in the space. When funding became more limited, this became even more true, and I noticed a very tangible shift in the culture & discourse around labs + Open Phil
Status games//reputation– Groups who were more inclined to criticize labs and advocate for public or policymaker outreach were branded as “unilateralist”, “not serious”, and “untrustworthy” in core EA circles. In many cases, there were genuine doubts about these groups, but my impression is that these doubts got amplified/weaponized in cases where the groups were more openly critical of the labs.
Subjectivity of "good judgment"– There is a strong culture of people getting jobs/status for having “good judgment”. This is sensible insofar as we want people with good judgment (who wouldn’t?) but this often ends up being so subjective that it ends up leading to people being quite afraid to voice opinions that go against mainstream views and metastrategies (particularly those endorsed by labs + Open Phil).
Anecdote– Personally, I found my ability to evaluate and critique labs + mainstream metastrategies substantially improved when I spent more time around folks in London and DC (who were less closely tied to the labs). In fairness, I suspect that if I had lived in London or DC *first* and then moved to the Bay Area, it’s plausible I would’ve had a similar feeling but in the “reverse direction”.

With all this in mind, I find myself more deeply appreciating folks who have publicly and openly critiqued labs, even in situations where the cultural and economic incentives to do so were quite weak (relative to staying silent or saying generic positive things about labs).

Examples: Habryka, Rob Bensinger, CAIS, MIRI, Conjecture, and FLI. More recently, @Zach Stein-Perlman [LW · GW], and of course Jan Leike and Daniel K.

akash-wasil on DeepMind's "Frontier Safety Framework" is weak and unambitious

I personally have a large amount of uncertainty around how useful prosaic techniques & control techniques will be. Here are a few statements I'm more confident in:

Ideally, AGI development would have much more oversight than we see in the status quo. Whether or not development or deployment activities keep national security risks below acceptable levels should be a question that governments are involved in answering. A sensible oversight regime would require evidence of positive safety or "affirmative safety".
My biggest concern with the prosaic/control metastrategy is that I think race dynamics substantially decrease its usefulness. Even if ASL-4 systems are deployed internally in a safe way, we're still not out of the acute risk period. And even if the leading lab (Lab A) is trustworthy/cautious, it will be worried that incautious Lab B is about to get to ASL-4 in 1-3 months. This will cause the leading lab to underinvest into control, feel like it doesn't have much time to figure out how to use its ASL-4 system (assuming it can be controlled), and feel like it needs to get to ASL-5+ rather quickly.

It's still plausible to me that perhaps this period of a few months is enough to pull off actions that get us out of the acute risk period (e.g., use the ASL-4 system to generate evidence that controlling more powerful systems would require years of dedicated effort and have Lab A devote all of their energy toward getting governments to intervene).

Given my understanding of the current leading labs, it's more likely to me that they'll underestimate the difficulties of bootstrapped alignment [LW · GW] and assume that things are OK as long as empirical tests don't show imminent evidence of danger. I don't think this prior is reasonable in the context of developing existentially dangerous technologies, particularly technologies that are intended to be smarter than you. I think sensible risk management [LW · GW] in such contexts should require a stronger theoretical/conceptual understanding of the systems one is designing.

(My guess is that you agree with some of these points and I agree with some points along the lines of "maybe prosaic/control techniques will just work, we aren't 100% sure they're not going to work", but we're mostly operating in different frames.)

(I also do like/respect a lot of the work you and Buck have done on control. I'm a bit worried that the control meme is overhyped, partially because it fits into the current interests of labs. Like, control seems like a great idea and a useful conceptual frame, but I haven't yet seen a solid case for why we should expect specific control techniques to work once we get to ASL-4 or ASL-4.5 systems, as well as what we plan to do with those systems to get us out of the acute risk period. Like, the early work on using GPT-3 to evaluate GPT-4 was interesting, but it feels like the assumption about the human red-teamers being better at attacking than GPT-4 will go away– or at least be much less robust– once we get to ASL-4. But I'm also sympathetic to the idea that we're at the early stages of control work, and I am genuinely interested in seeing what you, Buck, and others come up with as the control agenda progresses.)

LessWrong 2.0 Reader

Archive

Recent comments