Posts

AI Clarity: An Initial Research Agenda 2024-05-03T13:54:22.894Z
Announcing Convergence Analysis: An Institute for AI Scenario & Governance Research 2024-03-07T21:37:00.526Z
Information hazards: Why you should care and what you can do 2020-02-23T20:47:39.742Z
Mapping downside risks and information hazards 2020-02-20T14:46:30.259Z
State Space of X-Risk Trajectories 2020-02-09T13:56:41.402Z
AIXSU - AI and X-risk Strategy Unconference 2019-09-03T11:35:39.283Z
AI Safety Research Camp - Project Proposal 2018-02-02T04:25:46.005Z
Book Review: Naive Set Theory (MIRI research guide) 2015-08-14T22:08:37.028Z

Comments

Comment by David_Kristoffersson on Timelines to Transformative AI: an investigation · 2024-03-27T18:52:49.474Z · LW · GW

I agree with the general shape of your argument, including that Cotra and Carlsmith are likely to overestimate the compute of the human brain, and that frontier algorithms are not as efficient as algorithms could be.

My best guess is that a frontier model of the approximate expected capability of GPT-5 or GPT-6 (equivalently Claude 4 or 5, or similar advances in Gemini) will be sufficient for the automation of algorithmic exploration to an extent that the necessary algorithmic breakthroughs will be made.

But I disagree that it will happen this quickly. :)

Comment by David_Kristoffersson on Staring into the abyss as a core life skill · 2023-07-15T13:37:04.645Z · LW · GW

Thanks for this post Ben. I think a lot of what you're saying here could alternatively be filed under "Taking ideas seriously": the dedication to follow through with the consequences of ideas, even if their conclusions are unorthodox or uncomfortable.

Comment by David_Kristoffersson on Aligning AI by optimizing for "wisdom" · 2023-07-06T12:20:33.041Z · LW · GW

I would reckon: no single AI safety method "will work" because no single method is enough by itself. The idea expressed in the post would not "solve" AI alignment, but I think it's a thought-provoking angle on part of the problem.

Comment by David_Kristoffersson on Controlling Intelligent Agents The Only Way We Know How: Ideal Bureaucratic Structure (IBS) · 2023-07-02T20:06:51.771Z · LW · GW

Weber again: "And, so in light of this historical view, we need to remember that bureaucracy, taken as it is, is just an instrument of precision that can be put to service by purely political, economic, or any other dominating or controlling interest. Therefore the simultaneous development of democratization and bureaucratization should not be exaggerated, no matter how typical the phenomena may be." Yikes, okay, it seems like Weber understood the notion the orthogonality thesis."

Isn't this interesting, Weber's point is similar to the orthogonality thesis. This makes me realize a wider implication: the orthogonality thesis is actually very similar to the general argument that "technological progress is good" vs "no it isn't necessarily".

Weber: democratization isn't given from bureaucreatization
Orthogonality thesis: Intelligence and morality are orthogonal.
Technological caution argument: More powerful technology isn't by default a good thing for us.

I'm especially interested in constrasting orthogonality to technological caution. I'd like to express them in a common form. Intelligence is capability. Technology generally is capability. Morality = what is good. More capability dropped into parts of a society isn't necessarily a good thing. Be it that part of society is an AI, a human, a social system or a socio-technical system.

This is a generalization of the orthogonality thesis and the technological caution argument, assuming that AI gets embedded in society (which should be assumed).

Comment by David_Kristoffersson on Controlling Intelligent Agents The Only Way We Know How: Ideal Bureaucratic Structure (IBS) · 2023-07-02T20:04:27.184Z · LW · GW

Thanks Justin! This is an interesting perspective. I'd enjoy seeing a compilation of different perspectives on ensuring AI alignment. (Another recurrent example would be the cybersecurity perspective on AI safety.)

Bureaucratization is the ultimate specific means to turn a mutually agreed upon community action rooted in subjective feeling into action rooted in a rational agreement by mutual consent.

This sounds a lot like the general situation of creating moral or judicial systems for a society. (When it works well.)

The principle of fixed competencies The principle of hierarchically organized positions

Interestingly, these may go counter to Agile-associated practices and some practices I would consider generally good. It seems to be good to cultivate specialities, but also to cultivate some breadth in competencies. And to nurture bottom-up flows! Hierarchy has its limitations.

Comment by David_Kristoffersson on Aligning AI by optimizing for "wisdom" · 2023-07-01T15:43:27.175Z · LW · GW

I quite like the concept of alignment through coherence between the "coherence factors"!

"Wisdom" has many meanings. I would use the word differently to how the article is using it.

Comment by David_Kristoffersson on Abuse in LessWrong and rationalist communities in Bloomberg News · 2023-03-08T21:31:29.810Z · LW · GW

I think the healthy and compassionate response to this article would be to focus on addressing the harms victims have experienced. So I find myself disappointed by much of the voting and comment responses here.

I agree that the Bloomberg article doesn't acknowledge that most of the harms that they list have been perpetrated by people who have already mostly been kicked out of the community, and uses some unfair framings. But I think the bigger issue is that of harms experienced by women that may not have been addressed: that of unreported cases, and of insufficient measures taken against reported ones. I don't know if enough has been done, so it seems unwise to minimize the article and people who are upset about the sexual misconduct. And even if enough has been done in terms of responses and policy, I would prefer seeing more compassion.

Comment by David_Kristoffersson on Transformative VR Is Likely Coming Soon · 2022-10-13T11:19:36.564Z · LW · GW

I think I agree with your technological argument, but I'd take your 6 months and 2.5 years and multiply them by a factor of 2-4x.

Party of it is likely that we are conceiving the scenarios a bit differently. I might be including some additional practical considerations.

Comment by David_Kristoffersson on Why I think there's a one-in-six chance of an imminent global nuclear war · 2022-10-08T13:59:27.110Z · LW · GW

Yes, that's most of the 2-5%.

Comment by David_Kristoffersson on Why I think there's a one-in-six chance of an imminent global nuclear war · 2022-10-08T08:22:16.722Z · LW · GW

Thank you for this post, Max.

My background here:

  • I've watched the Ukraine war very closely since it started.
  • I'm not at all familiar with nuclear risk estimations.

Summary: I wouldn't give 70% for WW3/KABOOM from conventional NATO retaliation. I would give that 2-5% in this moment (I spent little time thinking about the precise number).

Motivation: I think conventional responses from NATO will cause Russia to generally back down. I think Putin wants to use the threat of nukes, not actually use them.

Even when cornered yet further, I expect Putin to assess that firing off nukes will make his situation even worse. Nuclear conflict would be an immense direct threat against himself and Russia, and the threat of nuclear conflict also increases the risk of people on the inside targeting him (because they don't want to die). Authoritarians respect force. A NATO response would be a show of force.

Putin has told the Russian public in the past that Russia couldn't win against NATO directly. Losing against NATO actually gives him a more palatable excuse: NATO is too powerful. Losing against Ukraine though, their little sibling, would be very humiliating. Losing in a contest of strength against someone supposedly weaker is almost unacceptable to authoritarians.

I think the most likely outcome is that Putin is deterred from firing a tactical nuke. And if he does fire one, NATO will respond conventionally (such as taking out the Black sea fleet), and this will cause Russia to back down in some manner.

Comment by David_Kristoffersson on The case for aligning narrowly superhuman models · 2021-03-18T14:08:03.969Z · LW · GW

The amount of effort going into AI as a whole ($10s of billions per year) is currently ~2 orders of magnitude larger than the amount of effort going into the kind of empirical alignment I’m proposing here, and at least in the short-term (given excitement about scaling), I expect it to grow faster than investment into the alignment work.

There's a reasonable argument (shoutout to Justin Shovelain) that the risk is that work such as this done by AI alignment people will be closer to AGI than the work done by standard commercial or academic research, and therefore accelerate AGI more than average AI research would. Thus, $10s of billions per year into general AI is not quite the right comparison, because little of that money goes to matters "close to AGI".

That said, on balance, I'm personally in favor of the work this post outlines.

Comment by David_Kristoffersson on Anti-Aging: State of the Art · 2021-01-04T15:32:20.036Z · LW · GW

Unfortunately, there is no good 'where to start' guide for anti-aging. This is insane, given this is the field looking for solutions to the biggest killer on Earth today.

Low hanging fruit intervention: Create a public guide to that effect on a web site.

Comment by David_Kristoffersson on Is this viable physics? · 2020-06-25T16:00:09.588Z · LW · GW

That being said, I would bet that one would be able to find other formalisms that are equivalent after kicking down the door...

At least, we've now hit one limit in the shape of universal computation: No new formalism will be able to do something that couldn't be done with computers. (Unless we're gravely missing something about what's going on in the universe...)

Comment by David_Kristoffersson on Good and bad ways to think about downside risks · 2020-06-12T14:07:00.762Z · LW · GW

When it comes to the downside risk, it's often that there are more unknown unknown that produce harm then positive unknown unknown. People are usually biased to overestimate the positive effects and underestimate the negative effects for the known unknown.

This seems plausible to me. Would you like to expand on why you think this is the case?

The asymmetry between creation and destruction? (I.e., it's harder to build than it is to destroy.)

Comment by David_Kristoffersson on Good and bad ways to think about downside risks · 2020-06-11T14:18:40.334Z · LW · GW

Very good point! The effect of not taking an action depends on what the counterfactual is: what would happen otherwise/anyway. Maybe the article should note this.

Comment by David_Kristoffersson on mind viruses about body viruses · 2020-04-03T17:28:54.075Z · LW · GW

Excellent comment, thank you! Don't let the perfect be the enemy of the good if you're running from an exponential growth curve.

Comment by David_Kristoffersson on The recent NeurIPS call for papers requires authors to include a statement about the potential broader impact of their work · 2020-02-24T13:16:53.628Z · LW · GW

Looks promising to me. Technological development isn't by default good.

Though I agree with the other commenters that this could fail in various ways. For one thing, if a policy like this is introduced without guidance on how to analyze the societal implications, people will think of wildly different things. ML researchers aren't by default going to have the training to analyze societal consequences. (Well, who does? We should develop better tools here.)

Comment by David_Kristoffersson on Jan Bloch's Impossible War · 2020-02-21T12:13:01.828Z · LW · GW

Or, at least, include a paragraph or a few to summarize it!

Comment by David_Kristoffersson on A point of clarification on infohazard terminology · 2020-02-03T11:10:05.019Z · LW · GW

Some quick musings on alternatives for the "self-affecting" info hazard type:

  • Personal hazard
  • Self info hazard
  • Self hazard
  • Self-harming hazard
Comment by David_Kristoffersson on AI alignment concepts: philosophical breakers, stoppers, and distorters · 2020-01-30T14:44:35.752Z · LW · GW

I wrote this comment to an earlier version of Justin's article:

It seems to me that most of the 'philosophical' problems are going to get solved as a matter of solving practical problems in building useful AI. You could call ML systems, AI, that is getting developed now 'empirical'. From the perspective of the people building current systems, they likely don't consider what they're doing as solving philosophical problems. Symbol grounding problem? Well, an image classifier built on a convolutional neural network learns to get quite proficient at grounding out classes like 'cars' and 'dogs' (symbols) from real physical scenes.

So, the observation I want to make, is that the philosophical problems we can think of that might trip over a system are likely to turn out to look like technical/research/practical problems that need to be solved by default for practical reasons in order to make useful systems.

The image classification problem wasn't solved in one day, but it was solved using technical skills, engineering skills, more powerful hardware, and more data. People didn't spend decades discussing philosophy: the problem was solved from some advances in the ideas of building neural networks and from more powerful computers.
Of course, image classification doesn't solve the symbol grounding problem in full. But other aspects of symbol grounding that people might find mystifying are getting solved piece-wise, as researchers and engineers are solving practical problems of AI.

Let's look at a classic problem formulation from MIRI, 'Ontology Identification':

Technical problem (Ontology Identification). Given goals specified in some ontology and a world model, how can the ontology of the goals be identified in the world model? What types of world models are amenable to ontology identification? For a discussion, see Soares (2015).

When you create a system that performs any function in the real world, you are in some sense giving it goals. Reinforcement Learning-trained systems are pursuing 'goals'. An autonomous car takes you from chosen points A to chosen points B; it has the overall goal of transporting people. The ontology identification problem is getting solved piece-wise as a practical matter. Perhaps the MIRI-style theory could give us a deeper understanding that helps us avoid some pitfalls, but it's not clear why these wouldn't be caught as practical problems.

What would a real philosophical landmine look like? A class of philosophical problems that wouldn't get solved as a practical matter, and pose a risk for harm against humanity would be the real philosophical landmines.

Comment by David_Kristoffersson on AIXSU - AI and X-risk Strategy Unconference · 2019-09-06T00:55:46.831Z · LW · GW

I expect the event to have no particular downside risks, and to give interesting input and spark ideas in experts and novices alike. Mileage will vary, of course. Unconferences foster dynamic discussion and a living agenda. If it's risky to host this event, then I expect AI strategy and forecasting meetups and discussions at EAG to be risky and they should also not be hosted.

I and other attendees of AIXSU pay careful attention to potential downside risks. I also think it's important we don't strangle open intellectual advancement. We need to figure out what we should talk about; not that we shouldn't talk.

AISC: To clarify: AI safety camp is different and puts bigger trust in the judgement of novices, since teams are generally run entirely by novices. The person who proposed running a strategy AISC found the reactions from experts to be mixed. He also reckoned the event would overlap with the existing AI safety camps, since they already include strategy teams.

Potential negative side effects of strategy work is a very important topic. Hope to discuss it with attendees at the unconference!

Comment by David_Kristoffersson on Three Stories for How AGI Comes Before FAI · 2019-08-17T14:48:16.645Z · LW · GW
We can subdivide the security story based on the ease of fixing a flaw if we're able to detect it in advance. For example, vulnerability #1 on the OWASP Top 10 is injection, which is typically easy to patch once it's discovered. Insecure systems are often right next to secure systems in program space.

Insecure systems are right next to secure systems, and many flaws are found. Yet, the larger systems (the company running the software, the economy, etc) manage to correct somehow. It's because there are mechanisms in the larger systems poised to patch the software when flaws are discovered. Perhaps we could fit and optimize this flaw-exploit-patch-loop in security as a technique for AI alignment.

If the security story is what we are worried about, it could be wise to try & develop the AI equivalent of OWASP's Cheat Sheet Series, to make it easier for people to find security problems with AI systems. Of course, many items on the cheat sheet would be speculative, since AGI doesn't actually exist yet. But it could still serve as a useful starting point for brainstorming.

This sounds like a great idea to me. Software security has a very well developed knowledge base at this point and since AI is software, there should be many good insights to port.

What possibilities aren't covered by the taxonomy provided?

Here's one that occurred to me quickly: Drastic technological progress (presumably involving AI) destabilizes society and causes strife. In this environment with more enmity, safety procedures are neglected and UFAI is produced.

Comment by David_Kristoffersson on Project Proposal: Considerations for trading off capabilities and safety impacts of AI research · 2019-08-17T13:38:34.713Z · LW · GW

This seems like a valuable research question to me. I have a project proposal in a drawer of mine that is strongly related: "Entanglement of AI capability with AI safety".

Comment by David_Kristoffersson on A case for strategy research: what it is and why we need more of it · 2019-07-12T07:08:57.094Z · LW · GW

My guess is that the ideal is to have semi-independent teams doing research. Independence in order to better explore the space of questions, and some degree of plugging in to each other in order to learn from each other and to coordinate.

Are there serious info hazards, and if so can we avoid them while still having a public discussion about the non-hazardous parts of strategy?

There are info hazards. But I think if we can can discuss Superintelligence publicly, then yes; we can have a public discussion about non-hazardous parts of strategy.

Are there enough people and funding to sustain a parallel public strategy research effort and discussion?

I think you could get a pretty lively discussion even with just 10 people, if they were active enough. I think you'd need a core of active posters and commenters, and there needs to be enough reason for them to assemble.

Comment by David_Kristoffersson on A case for strategy research: what it is and why we need more of it · 2019-06-21T18:02:59.030Z · LW · GW

Nice work, Wei Dai! I hope to read more of your posts soon.

However I haven't gotten much engagement from people who work on strategy professionally. I'm not sure if they just aren't following LW/AF, or don't feel comfortable discussing strategically relevant issues in public.

A bit of both, presumably. I would guess a lot of it comes down to incentives, perceived gain, and habits. There's no particular pressure to discuss on LessWrong or the EA forum. LessWrong isn't perceived as your main peer group. And if you're at FHI or OpenAI, you'll have plenty contact with people who can provide quick feedback already.

Comment by David_Kristoffersson on A case for strategy research: what it is and why we need more of it · 2019-06-21T17:09:58.336Z · LW · GW
I'm very confused why you think that such research should be done publicly, and why you seem to think it's not being done privately.

I don't think the article implies this:

Research should be done publicly

The article states: "We especially encourage researchers to share their strategic insights and considerations in write ups and blog posts, unless they pose information hazards."
Which means: share more, but don't share if you think there are possible negative consequences of it.
Though I guess you could mean that it's very hard to tell what might lead to negative outcomes. This is a good point. This is why we (Convergence) is prioritizing research on information hazard handling and research shaping considerations.

it's not being done privately

The article isn't saying strategy research isn't being done privately. What it is saying is that we need more strategy research and should increase investment into it.

Given the first sentence, I'm confused as to why you think that "strategy research" (writ large) is going to be valuable, given our fundamental lack of predictive ability in most of the domains where existential risk is a concern.

We'd argue that to get better predictive ability, we need to do strategy research. Maybe you're saying the article makes it looks like we are recommending any research that looks like strategy research? This isn't our intention.

Comment by David_Kristoffersson on AI Safety Research Camp - Project Proposal · 2019-01-24T11:15:00.685Z · LW · GW

Yes -- the plan is to have these on an ongoing basis. I'm writing this just as the deadline was passed for the one planned to April.

Here's the web site: https://aisafetycamp.com/

The facebook is also a good place to keep tabs on it: https://www.facebook.com/groups/348759885529601/

Comment by David_Kristoffersson on Beware Social Coping Strategies · 2018-02-05T09:42:40.043Z · LW · GW
Your relationship with other people is a macrocosm of your relationship with yourself.

I think there's something to that, but it's not that general. For example, some people can be very kind to others but harsh with themselves. Some people can be cruel to others but lenient to themselves.

If you can't get something nice, you can at least get something predictable

The desire for the predictable is what Autism Spectrum Disorder is all about, I hear.

Comment by David_Kristoffersson on "Taking AI Risk Seriously" (thoughts by Critch) · 2018-02-02T04:32:45.847Z · LW · GW

Here's the Less Wrong post for the AI Safety Camp!

Comment by David_Kristoffersson on A Fable of Science and Politics · 2016-10-26T08:57:49.936Z · LW · GW

It's bleen, without a moment's doubt.

Comment by David_Kristoffersson on LessWrong 2.0 · 2016-05-08T10:06:56.257Z · LW · GW

Counterpoint: Sometimes, not moving means moving, because everyone else is moving away from you. Movement -- change -- is relative. And on the Internet, change is rapid.

Comment by David_Kristoffersson on Meetup : First meetup in Stockholm · 2015-10-09T19:11:29.667Z · LW · GW

Interesting. I might show up.

Comment by David_Kristoffersson on Book Review: Naive Set Theory (MIRI research guide) · 2015-08-16T10:26:47.403Z · LW · GW

Thanks for the tip. Two other books on the subject that seem to be appreciated are Introduction to Set Theory by Karel Hrbacek and Classic Set Theory: For Guided Independent Study by Derek Goldrei.

Edit: math.se weighs in: http://math.stackexchange.com/a/264277/255573

Comment by David_Kristoffersson on Book Review: Naive Set Theory (MIRI research guide) · 2015-08-16T10:23:09.130Z · LW · GW

The author of the Teach Yourself Logic study guide agrees with you about reading multiple sources:

I very strongly recommend tackling an area of logic (or indeed any new area of mathematics) by reading a series of books which overlap in level (with the next one covering some of the same ground and then pushing on from the previous one), rather than trying to proceed by big leaps.

In fact, I probably can’t stress this advice too much, which is why I am highlighting it here. For this approach will really help to reinforce and deepen understanding as you re-encounter the same material from different angles, with different emphases.

Comment by David_Kristoffersson on Book Review: Naive Set Theory (MIRI research guide) · 2015-08-16T09:59:22.493Z · LW · GW

My two main sources of confusion in that sentence are:

  1. He says "distinct elements onto distinct elements", which suggests both injection and surjection.
  2. He says "is called one-to-one (usually a one-to-one correspondence)", which might suggest that "one-to-one" and "one-to-one correspondence" are synonyms -- since that is what he usually uses the parantheses for when naming concepts.

I find Halmos somewhat contradictory here.

But I'm convinced you're right. I've edited the post. Thanks.

Comment by David_Kristoffersson on Book Review: Naive Set Theory (MIRI research guide) · 2015-08-16T09:53:22.549Z · LW · GW

You guys must be right. And wikipedia corroborates. I'll edit the post. Thanks.

Comment by David_Kristoffersson on Welcome to Less Wrong! (7th thread, December 2014) · 2015-07-16T22:14:54.816Z · LW · GW

Hello.

I'm currently attempting to read through the MIRI research guide in order to contribute to one of the open problems. Starting from Basics. I'm emulating many of Nate's techniques. I'll post reviews of material in the research guide at lesswrong as I work through it.

I'm mostly posting here now just to note this. I can be terse at times.

See you there.

Comment by David_Kristoffersson on Dark Arts of Rationality · 2015-07-11T19:26:08.177Z · LW · GW

First, appreciation: I love that calculated modification of self. These, and similar techniques, can be very useful if put to use in the right way. I recognize myself here and there. You did well to abstract it all out this clearly.

Second, a note: You've described your techniques from the perspective of how they deviate from epistemic rationality - "Changing your Terminal Goals", "Intentional Compartmentalization", "Willful inconsistency". I would've been more inclined to describe them from the perspective of their central effect, e.g. something to the style of: "Subgoal ascension", "Channeling", "Embodying". Perhaps not as marketable to the lesswrong crowd. Multiple perspectives could be used as well.

Third, a question: How did you create that gut feeling of urgency?

Comment by David_Kristoffersson on MIRI's technical research agenda · 2015-01-27T19:42:01.922Z · LW · GW

And boxing, by the way, means giving the AI zero power.

No, hairyfigment's answer was entirely appropriate. Zero power would mean zero effect. Any kind of interaction with the universe means some level of power. Perhaps in the future you should say nearly zero power instead so as to avoid misunderstanding on the parts of others, as taking you literally on the "zero" is apparently "legalistic".

As to the issues with nearly zero power:

  • A superintelligence with nearly zero power could turn to be a heck of a lot more power than you expect.
  • The incentives to tap more perceived utility by unboxing the AI or building other unboxed AIs will be huge.

Mind, I'm not arguing that there is anything wrong with boxing. What's I'm arguing is that it's wrong to rely only on boxing. I recommend you read some more material on AI boxing and Oracle AI. Don't miss out on the references.

Comment by David_Kristoffersson on MIRI's technical research agenda · 2015-01-27T18:49:43.523Z · LW · GW

So you disagree with the premise of the orthogonality thesis. Then you know a central concept to probe to understand the arguments put forth here. For example, check out Stuart's Armstrong's paper: General purpose intelligence: arguing the Orthogonality thesis

Comment by David_Kristoffersson on MIRI's technical research agenda · 2015-01-23T19:12:35.342Z · LW · GW

There's no guarantee that boxing will ensure the safety of a soft takeoff. When your boxed AI starts to become drastically smarter than a human -- 10 times --- 1000 times -- 1000000 times -- the sheer enormity of the mind may slip out of human possibility to understand. All the while, a seemingly small dissonance between the AI's goals and human values -- or a small misunderstanding on our part of what goals we've imbued -- could magnify to catastrophe as the power differential between humanity and the AI explodes post-transition.

If an AI goes through the intelligence explosion, its goals will be what orchestrates all resources (as Omohundro's point 6 implies). If the goals of this AI does not align with human values, all we value will be lost.

Comment by David_Kristoffersson on MIRI's technical research agenda · 2015-01-23T17:38:29.786Z · LW · GW

Mark: So you think human-level intelligence by principle does not combine with goal stability. Aren't you simply disagreeing with the orthogonality thesis, "that an artificial intelligence can have any combination of intelligence level and goal"?

Comment by David_Kristoffersson on Facing the Intelligence Explosion discussion page · 2014-08-10T20:31:14.099Z · LW · GW

http://intelligenceexplosion.com/en/2012/ai-the-problem-with-solutions/ links to http://lukeprog.com/SaveTheWorld.html - which redirects to http://lukemuehlhauser.comsavetheworld.html/ - which isn't there anymore.