Posts

Inviting discussion of "Beat AI: A contest using philosophical concepts" 2024-05-29T11:55:35.603Z

Comments

Comment by David James (david-james) on Understanding Shapley Values with Venn Diagrams · 2024-12-16T13:54:55.633Z · LW · GW

To clarify: the claim is that Shapley values are the only way to guarantee the set containing all four properties: {Efficiency, Symmetry, Linearity, Null player}. There are other metrics that can achieve proper subsets.

Comment by David James (david-james) on Understanding Shapley Values with Venn Diagrams · 2024-12-16T13:47:33.502Z · LW · GW

Hopefully, you have gained some intuition for why Shapley values are “fair” and why they account for interactions among players.

The article fails to make a key point: in political economy and game theory, there are many definitions of "fairness" that seem plausible at face value, especially when considered one at a time. Even if one puts normative questions to the side, there are mathematical limits and constraints as one tries to satisfy various combinations simultaneously. Keeping these in mind, if you think of this as a design problem: it takes some care to choose metrics that reinforce some set of desired norms.

Comment by David James (david-james) on Compute and size limits on AI are the actual danger · 2024-11-25T04:56:27.717Z · LW · GW

Should the bill had been signed, it would have created severe enough pressures to do more with less to focus on building better and better abstractions once the limits are hit.

Ok, I see the argument. But even without such legislation, the costs of large training runs create major incentives to build better abstractions.

Comment by David James (david-james) on Compute and size limits on AI are the actual danger · 2024-11-25T04:50:51.974Z · LW · GW

Does this summary capture the core argument? Physical constraints on the human brain contributed to its success relative to other animals, because it had to "do more with less" by using abstraction. Analogously, constraints on AI compute or size will encourage more abstraction, increasing the likelihood of "foom" danger.

Comment by David James (david-james) on "Open Source AI" isn't Open Source · 2024-11-22T15:51:47.486Z · LW · GW

Though I'm reasonably sure Llama license (sic) isn't preventing viewing the source

This is technically correct but irrelevant. Meta doesn't provide any source code, by which I mean the full set of precursor steps (including the data and how to process it).

Generally speaking, a license defines usage rights; it has nothing to do with if/how the thing (e.g. source code) is made available.

As a weird example, one could publish a repository with a license but no source code. This would be odd. The license would have no power to mandate the code be released; that is a separate concern.

To put it another way, a license does not obligate the owner to release or share anything, whether it be compiled software, source code, weights, etc. A license simply outlines the conditions under which the thing (e.g. source code), once released, can be used or modified.

Comment by David James (david-james) on What are some positive developments in AI safety in 2024? · 2024-11-22T15:13:36.454Z · LW · GW

The paper AI Control: Improving Safety Despite Intentional Subversion is a practical, important step in the right direction. It demonstrates various protocols for aiming for safety even with malicious models that know they are suspected of being dangerous.

Ryan Greenblatt, Buck Shlegeris, Kshitij Sachan, Fabien Roger Proceedings of the 41st International Conference on Machine Learning, PMLR 235:16295-16336, 2024.

As large language models (LLMs) become more powerful and are deployed more autonomously, it will be increasingly important to prevent them from causing harmful outcomes. To do so, safety measures either aim at making LLMs try to avoid harmful outcomes or aim at preventing LLMs from causing harmful outcomes, even if they try to cause them. In this paper, we focus on this second layer of defense. We develop and evaluate pipelines of safety techniques (protocols) that try to ensure safety despite intentional subversion - an approach we call AI control. We investigate a setting in which we want to solve a sequence of programming problems without ever submitting subtly wrong code, using access to a powerful but untrusted model (in our case, GPT-4), access to a less powerful trusted model (in our case, GPT-3.5), and limited access to high-quality trusted labor. We investigate a range of protocols and red-team them by exploring strategies that the untrusted model could use to subvert them. We find that using the trusted model to edit untrusted-model code or using the untrusted model as a monitor substantially improves on simple baselines.

Related Video by Robert Miles: I highly recommend Using Dangerous AI, But Safely? released on Nov. 15, 2024.

Comment by David James (david-james) on What are some positive developments in AI safety in 2024? · 2024-11-22T15:05:54.336Z · LW · GW

NIST's AI Safety Institute (AISI) hired Paul Christiano as its Head of AI Safety.

Comment by David James (david-james) on What are some positive developments in AI safety in 2024? · 2024-11-22T15:02:37.224Z · LW · GW

From Is Deep Learning Actually Hitting a Wall? Evaluating Ilya Sutskever's Recent Claims:

But what we seem to be seeing is a bit different from deep learning broadly hitting a wall. More specifically it appears to be: returns to scaling up model pretraining are plateauing.

Comment by David James (david-james) on Newcomb's Problem and Regret of Rationality · 2024-11-22T01:32:14.646Z · LW · GW

I agree, but I’m not sure how durable this agreement will be. (I reversed my position while drafting this comment.)

Here is my one sentence summary of the argument above: If Omega can make a fully accurate prediction in a universe without backwards causality, this implies a deterministic universe.

Comment by David James (david-james) on China Hawks are Manufacturing an AI Arms Race · 2024-11-21T00:04:51.673Z · LW · GW

The Commission recommends: [...] 1. Congress establish and fund a Manhattan Project-like program dedicated to racing to and acquiring an Artificial General Intelligence (AGI) capability.

As mentioned above, the choice of Manhattan Project instead of Apollo Project is glaring.

Worse, there is zero mention of AI safety, AI alignment, or AI evaluation in the Recommendations document.

Lest you think I'm expecting too much, the report does talk about safety, alignment, and evaluation ... for non-AI topic areas! (see bolded words below: "safety", "aligning", "evaluate")

  • "Congress direct the U.S. Government Accountability Office to investigate the reliability of safety testing certifications for consumer products and medical devices imported from China." (page 736)
  • "Congress direct the Administration to create an Outbound Investment Office within the executive branch to oversee investments into countries of concern, including China. The office should have a dedicated staff and appropriated resources and be tasked with: [...] Expanding the list of covered sectors with the goal of aligning outbound investment restrictions with export controls." (page 737)
  • "Congress direct the U.S. Department of the Treasury, in coordination with the U.S. Departments of State and Commerce, to provide the relevant congressional committees a report assessing the ability of U.S. and foreign financial institutions operating in Hong Kong to identify and prevent transactions that facilitate the transfer of products, technology, and money to Russia, Iran, and other sanctioned countries and entities in violation of U.S. export controls, financial sanctions, and related rules. The report should [...] Evaluate the extent of Hong Kong’s role in facilitating the transfer of products and technologies to Russia, Iran, other adversary countries, and the Mainland, which are prohibited by export controls from being transferred to such countries;" (page 741)
Comment by David James (david-james) on Neutrality · 2024-11-19T14:06:26.285Z · LW · GW

I am not following the context of the comment above. Help me understand the connection? The main purpose of my comment above was to disagree with this sentence two levels up:

The frenzy to couple everything into a single tangle of complexity is driven by the misunderstanding that complacency is the only reason why your ideology is not the winning one

… in particular, I don’t think it captures the dominant driver of “coupling” or “bundling”.

Does the comment one level up above disagree with my claims? I’m not following the connection.

Comment by David James (david-james) on Neutrality · 2024-11-18T14:52:52.343Z · LW · GW

The frenzy to couple everything into a single tangle of complexity is driven by…

In some cases, yes, but this is only one factor of many. Others include:

  • Our brains are often drawn to narratives, which are complex and interwoven. Hence the tendency to bundle up complex logical interdependencies into a narrative.

  • Our social structures are guided/constrained by our physical nature and technology. For in-person gatherings, bundling of ideas is often a dominant strategy.

For example, imagine a highly unusual congregation: a large unified gathering of monotheistic worshippers with considerable internal diversity. Rather than “one track” consisting of shared ideology, they subdivide their readings and rituals into many subgroups. Why don’t we see much of this (if any) in the real world? Because ideological bundling often pairs well with particular ways of gathering.

P.S. I personally welcome gathering styles that promote both community and rationality (spanning a diversity of experiences and values).

Comment by David James (david-james) on Neutrality · 2024-11-18T14:41:37.930Z · LW · GW

Right. Some such agreements are often called social contracts. One catch is that a person born into them may not understand their historical origin or practical utility, much less agree with them.

Comment by David James (david-james) on Neutrality · 2024-11-18T14:29:23.219Z · LW · GW

Durable institutions find ways to survive. I don’t mean survival merely in terms of legal continuity; I mean fidelity to their founding charter. Institutions not only have to survive past their first leader; they have to survive their first leader themself! The institution’s structure and policies must protect against the leader’s meandering attention, whims, and potential corruptions. In the case of Elon, based on his mercurial history, I would not bet that Musk would agree to the requisite policies.

Comment by David James (david-james) on Neutrality · 2024-11-18T12:20:55.643Z · LW · GW

they weren’t designed to be ultra-robust to exploitation, or to make serious attempts to assess properties like truth, accuracy, coherence, usefulness, justice

There are notable differences between these properties. Usefulness and justice are quite different from the others (truth, accuracy, coherence). Usefulness (defined as suitability for a purpose, which is non-prescriptive as to the underlying norms) is different from justice (defined by some normative ideal). Coherence requires fewer commitments than truth and accuracy.

Ergo, I could see various instantiations of a library designed to satisfy various levels. Level 1 would value coherence. Level 2 would add truth and accuracy. Level 3: +usefulness. Level 4, +justice.

Comment by David James (david-james) on Laziness death spirals · 2024-10-19T16:05:05.568Z · LW · GW

I like having a list of small, useful things to do that tend to pay off in the long run, like:

  • go to the grocery store to make sure you have fresh fruits and vegetables
  • mediate for 10 minutes
  • do pushups and sit ups
  • journal for 10 minutes

When my brain feels cluttered, it is nice to have a list of time-boxed simple tasks that don’t require planning or assessment.

Comment by David James (david-james) on Provably Safe AI: Worldview and Projects · 2024-08-10T02:04:34.382Z · LW · GW

Verify human designs and automatically create AI-generated designs which provably cannot be opened by mechanical picking.

Such a proof would be subject to its definition of "mechanical picking" and a sufficiently accurate physics model. (For example, would an electronically-controllable key-looking object with adjustable key-cut depths with pressure sensors qualify as a "pick"?) 

I don't dispute the value of formal proofs for safety. If accomplished, they move the conversation to "is the proof correct?" and "are we proving the right thing?". Both are steps in the right direction, I think.

Comment by David James (david-james) on LLM Generality is a Timeline Crux · 2024-07-18T16:40:24.718Z · LW · GW

Thanks for the references; I'll need some time to review them. In the meanwhile, I'll make some quick responses.

As a side note, I'm not sure how tree search comes into play; in what way does tree search require unbounded steps that doesn't apply equally to linear search?

I intended tree search as just one example, since minimax tree search is a common example for game-based RL research.

No finite agent, recursive or otherwise, can plan over an unbounded number of steps in finite time...

In general, I agree. Though there are notable exceptions for cases such as (not mutually exclusive):

  • a closed form solution is found (for example, where a time-based simulation can calculate some quantity at an any arbitrary time step using the same amount of computation)

  • approximate solutions using a fixed number of computation steps are viable

  • a greedy algorithm can select the immediate next action that is equivalent to following a longer-term planning algorithm

... so it's not immediately clear to me how iteration/recursion is fundamentally different in practice.

Yes, like I said above, I agree in general and see your point.

As I'm confident we both know, some algorithms can be written more compactly when recursion/iteration are available. I don't know how much computation theory touches on this; i.e. what classes of problems this applies to and why. I would make an intuitive guess that it is conceptually related to my point earlier about closed-form solutions.

Comment by David James (david-james) on LLM Generality is a Timeline Crux · 2024-07-18T13:04:42.867Z · LW · GW

Note that this is different from the (also very interesting) question of what LLMs, or the transformer architecture, are capable of accomplishing in a single forward pass. Here we're talking about what they can do under typical auto-regressive conditions like chat.


I would appreciate if the community here could point me to research that agrees or disagrees with my claim and conclusions, below.

Claim: one pass through a transformer (of a given size) can only do a finite number of reasoning steps.

Therefore: If we want an agent that can plan over an unbounded number of steps (e.g. one that does tree-search), it will need some component that can do an arbitrary number of iterative or recursive steps.

Sub-claim: The above claim does not conflict with the Universal Approximation Theorem.

Comment by David James (david-james) on On predictability, chaos and AIs that don't game our goals · 2024-07-16T15:53:07.622Z · LW · GW

Claim: the degree to which the future is hard to predict has no bearing on the outer alignment problem.

  • If one is a consequentialist (of some flavor), one can still construct a "desirability tree" over various possible various future states. Sure, the uncertainty makes the problem more complex in practice, but the algorithm is still very simple. So I don't think that that a more complex universe intrinsically has anything to do with alignment per se.
    • Arguably, machines will have better computational ability to reason over a vast number of future states. In this sense, they will be more ethical according to consequentialism, provided their valuation of terminal states is aligned.
    • To be clear, of course, alignment w.r.t. the valuation of terminal states is important. But I don't think this has anything to do with a harder to predict universe. All we do with consequentialism is evaluate a particular terminal state. The complexity of how we got there doesn't matter.
    • (If you are detecting that I have doubts about the goodness and practicality of consequentialism, you would be right, but I don't think this is central to the argument here.)
    • If humans don't really carry out consequentialism like we hope they would (and surely humans are not rational enough to adhere to consequentialist ethics -- perhaps not even in principle!), we can't blame this on outer alignment, can we? This would be better described as goal misspecification.
  • If one subscribes to deontological ethics, then the problem becomes even easier. Why? One wouldn't have to reason probabilistically over various future states at all. The goodness of an action only has to do with the nature of the action itself.
  • Do you want to discuss some other kind of ethics? Is there some other flavor that would operate differentially w.r.t. outer alignment in a more versus less predictable universe?
Comment by David James (david-james) on On predictability, chaos and AIs that don't game our goals · 2024-07-16T15:47:04.831Z · LW · GW

Want to try out a thought experiment? Put that same particular human (who wanted to specify goals for an agent) in the financial scenario you mention. Then ask: how well would they do? Compare the quality of how the person would act versus how well the agent might act.

This raises related questions:

  • If the human doesn't know what they would want, it doesn't seem fair to blame the problem on alignment failure. In such a case, the problem would be a person's lack of clarity.
  • Humans are notoriously good rationalizers and may downplay their own bad decisions. Making a fair comparison between "what the human would have done" versus "what the AI agent would have done" may be quite tricky. (See the Fundamental Attribution Error a.k.a. correspondence bias.
Comment by David James (david-james) on On predictability, chaos and AIs that don't game our goals · 2024-07-16T11:11:46.324Z · LW · GW

As I understand it, the argument above doesn't account for the agent using the best information available at the time (in the future, relative to its goal specification).

I think there is some confusion around a key point. For alignment, do we need to define what an agent will do in all future scenarios? It depends what you mean.

  • In some sense, no, because in the future, the agent will have information we don't have now.
  • In some sense, yes, because we want to know (to some degree) how the agent will act with future (unknown) information. Put another way, we want to guarantee that certain properties hold about its actions.

Let's say we define an aligned agent doing what we would want, provided that we were in its shoes (i.e. knowing what it knew). Under this definition, it is indeed possible that to specify an agent's decision rule in a way that doesn't rely on long-range predictions (where predictive power gets fuzzy, like Alejandro says, due to measurement error and complexity). See also the adjacent by comment about a thermostat by eggsyntax.

Note: I'm saying "decision rule" intentionally, because even an individual human does not have a well-defined utility function. (edited)

Comment by David James (david-james) on Bottle Caps Aren't Optimisers · 2024-07-08T12:23:50.184Z · LW · GW

Nevertheless, it seems wrong to say that my liver is optimising my bank balance, and more right to say that it "detoxifies various metabolites, synthesizes proteins, and produces biochemicals necessary for digestion"---even though that gives a less precise account of the liver's behaviour.

I'm not following why this is a less precise account of the liver's behavior.

Comment by David James (david-james) on 80,000 hours should remove OpenAI from the Job Board (and similar EA orgs should do similarly) · 2024-07-08T01:20:26.493Z · LW · GW

Here is an example of a systems dynamics diagram showing some of the key feedback loops I see. We could discuss various narratives around it and what to change (add, subtract, modify).

┌───── to the degree it is perceived as unsafe ◀──────────┐                   
│          ┌──── economic factors ◀─────────┐             │                   
│        + ▼                                │             │                   
│      ┌───────┐     ┌───────────┐          │             │         ┌────────┐
│      │people │     │ effort to │      ┌───────┐    ┌─────────┐    │   AI   │
▼   -  │working│   + │make AI as │    + │  AI   │  + │potential│  + │becomes │
├─────▶│  in   │────▶│powerful as│─────▶│ power │───▶│   for   │───▶│  too   │
│      │general│     │ possible  │      └───────┘    │unsafe AI│    │powerful│
│      │  AI   │     └───────────┘          │        └─────────┘    └────────┘
│      └───────┘                            │                                 
│          │ net movement                   │ e.g. use AI to reason
│        + ▼                                │      about AI safety
│     ┌────────┐                          + ▼                                 
│     │ people │     ┌────────┐      ┌─────────────┐              ┌──────────┐
│   + │working │   + │ effort │    + │understanding│            + │alignment │
└────▶│ in AI  │────▶│for safe│─────▶│of AI safety │─────────────▶│  solved  │
      │ safety │     │   AI   │      └─────────────┘              └──────────┘
      └────────┘     └────────┘             │                                 
         + ▲                                │                                 
           └─── success begets interest ◀───┘

I find this style of thinking particularly constructive.

  • For any two nodes, you can see a visual relationship (or lack thereof) and ask "what influence do these have on each other and why?".
  • The act of summarization cuts out chaff.
  • It is harder to fool yourself about the completeness of your analysis.
  • It is easier to get to core areas of confusion or disagreement with others.

Personally, I find verbal reasoning workable for "local" (pairwise) reasoning but quite constraining for systemic thinking.

If nothing else, I hope this example shows how easily key feedback loops get overlooked. How many of us claim to have... (a) some technical expertise in positive and negative feedback? (b) interest in Bayes nets? So why don't we take the time to write out our diagrams? How can we do better?

P.S. There are major oversights in the diagram above, such as economic factors. This is not a limitation of the technique itself -- it is a limitation of the space and effort I've put into it. I have many other such diagrams in the works.

Comment by David James (david-james) on 80,000 hours should remove OpenAI from the Job Board (and similar EA orgs should do similarly) · 2024-07-06T18:14:38.020Z · LW · GW

I’m curious if your argument, distilled, is: fewer people skilled in technical AI work is better? Such a claim must be examined closely! Think of it from a systems dynamics point of view. We must look at more than just one relationship. (I personally try to press people to share some kind of model that isn’t presented only in words.)

Comment by David James (david-james) on Chapter 52: The Stanford Prison Experiment, Pt 2 · 2024-06-24T14:35:50.521Z · LW · GW

One important role of a criminal justice system is rehabilitation. Another, according to some, is retribution. Those in Azkaban suffer from perhaps the most of awful forms of retribution. Dementation renders a person incapable of rehabilitation.

Consider this if-then argument:

If:

  • Justice is served without error (which is not true)
  • The only purpose for criminal justice is retribution

Then: Azkabanian punishment is rational.

Otherwise, assuming there are other ways to protect society from the person, it is irrational to dement people.

Speaking broadly, putting aside the fictional word of Azkaban, there is an argument that suggests retribution for its own sake is wrong. It is simple: inflicting suffering is wrong, all other things equal. Retribution makes sense only to the extent it serves as a deterrent.

Comment by David James (david-james) on Why Yudkowsky Is Wrong And What He Does Can Be More Dangerous · 2024-06-15T02:05:59.180Z · LW · GW

First, I encourage you to put credence in the current score of -40 and a moderator saying the post doesn't meet LessWrong's quality bar.

By LD you mean Lincoln-Douglas debate, right? If so, please continue reading.

Second, I'd like to put some additional ideas up for discussion and consideration -- not debate -- I don't want to debate you, certainly not in LD style. If you care about truth-seeking, I suggest taking a hard and critical look at LD. To what degree is Lincoln-Douglas debate organized around truth-seeking? How often does a participant in an LD debate change their position based on new evidence? In my understanding, in practice, LD is quite uninterested in the notion of being "less wrong". It seems to be about a particular kind of "rhetorical art" of fortifying one's position as much as possible while attacking another's. One might hope that somehow the LD debate process surfaces the truth. Maybe, in some cases. But generally speaking, I find it to be a woeful distortion of curious discussion and truth-seeking.

Comment by David James (david-james) on Learn Bayes Nets! · 2024-06-14T02:30:56.921Z · LW · GW

Surprisingly, perhaps, https://dl.acm.org/doi/book/10.5555/534975 has a free link to the full-text PDF.

Comment by David James (david-james) on The Pavlov Strategy · 2024-06-05T20:25:59.399Z · LW · GW

Reinforcement learning is not required for the analysis above. Only evolutionary game theory is needed.

  • In evolutionary game theory, the population's mix of strategies changes via replicator dynamics.
  • In RL, each individual agent modifies its policy as it interacts with its environment using a learning algorithm.
Comment by David James (david-james) on AGI safety from first principles: Conclusion · 2024-06-04T16:32:18.900Z · LW · GW

Personally, I am most confident in 1, then 4, then 3, then 2 (in each case conditional on all the previous claims)

Oops. A previous version of this comment was wrong, so I edited it. The author’s confidence can be written as:

Also, independent of the author’s confidence:

Comment by David James (david-james) on My hour of memoryless lucidity · 2024-05-30T03:10:18.018Z · LW · GW

thereby writing directly into your brain’s long-term storage and bypassing the cache that would otherwise get erased

What do we know about "writing directly" into long-term storage versus a short-term cache? What studies? Any theories about the mechanism(s)?

Comment by David James (david-james) on Spaghetti Towers · 2024-05-27T15:06:01.366Z · LW · GW

First, thank you for writing this. I would ask that you continue to think & refine and share back what you discover, prove, or disprove.

To me, it seems quite likely that B will have a lot of regularity to it. It will not be good code from the human perspective, but there will be a lot of structure I think, simply because that structure is in T and the environment.

I'm interested to see if we can (i) do more than claim this is likely and (ii) unpack reasons that might require that it be the case.

One argument for (ii) would go like this. Assume the generating process for A has a preference for shorter length programs. So we can think of a A as a tending to find shorter description lengths that match task T.

Claim: shorter (and correct) descriptions reflect some combination of environmental structure and compression.

  • by 'environmental structure' I mean the laws underlying the task.
  • by 'compression' I mean using information theory embodied in algorithms to make the program smaller

I think this claim is true, but let's not answer that too quickly. I'd like to probe this question more deeply.

  1. Are there more than two factors (environmental structure & compression)?
  2. Is it possible that the description gets the structure wrong but makes up for it with great compression? I think so. One can imagine a clever trick by which a small program expands itself into something like a big ball of mud that solves the task well.
  3. Any expansion process takes time and space. This makes me wonder if we should care not only about description length but also run time and space. If we pay attention to both, it might be possible to penalize programs that expand into a big ball of mud.
  4. However, penalizing run time and space might be unwise, depending on what we care about. One could imagine a program that start with first principles and derives higher-level approximations that are good enough to model the domain. It might be worth paying the cost of setting up the approximations because they are used frequently. (In other words, the amortized cost of the expansion is low.)
  5. Broadly, what mathematical tools can we use on this problem?
Comment by David James (david-james) on Spaghetti Towers · 2024-05-27T14:51:15.583Z · LW · GW

See also Nomic, a game by Peter Suber where a move in the game is a proposal to change the rules of the game.

Comment by David James (david-james) on Spaghetti Towers · 2024-05-27T14:46:24.293Z · LW · GW

I grant that legalese increases the total page count, but I don't think it necessarily changes the depth of the tree very much (by depth I mean how many documents refer back to other documents).

I've seen spaghetti towers written in very concise computer languages (such as Ruby) that nevertheless involve perhaps 50+ levels (in this context, a level is a function call).

Comment by David James (david-james) on Spaghetti Towers · 2024-05-27T14:42:31.032Z · LW · GW

In my experience, programming languages with {static or strong} typing are considerably easier to refactor in comparison to languages with {weak or dynamic} typing.*

* The {static vs dynamic} and {strong vs weak} dimensions are sometimes blurred together, but this Stack Overflow Q&A unpacks the differences pretty well.

Comment by David James (david-james) on Spaghetti Towers · 2024-05-27T14:25:58.304Z · LW · GW

No source code

I get the intended meaning, but I would like to made the words a little more precise. While we can find the executable source code (DNA) for an organism, that DNA is far from a high-level language.

Comment by David James (david-james) on Smart People are Probably Dangerous · 2024-05-26T12:52:08.356Z · LW · GW

I got minimal value from the article as written, but I'm hoping that a steel-man version might be useful. In that spirit, I can grant a narrower claim: Smart people have more capability to fool us, all other things equal. Why? Because increased intelligence brings increased capability for deception.

  • This is as close to a tautology as I've seen in a long time. What predictive benefit comes from tautologies? I can't think of any.

  • But why focus on capability? Probability of harm is a better metric.

  • Now, with that in mind, one should not assume a straight line between capability and probability of harm. One should look at all potential causal factors.

  • More broadly, the "all other things equal part" is problematic here. I will try to write more on this topic when I have time. My thoughts are not fleshed out yet, but I think my unease has to do with how ceteris paribus imposes constraints on a system. The claim I want to examine would go something like this: those constraints "bind" the system in ways that prevent proper observation and analysis.

Comment by David James (david-james) on The Bottom Line · 2024-05-23T04:54:25.237Z · LW · GW

If instead you keep deliberating until the balance of arguments supports your preferred conclusion, you're almost guaranteed to be satisfied eventually!

Inspired by the above, I offer the pseudo code version...

loop {
    if assess(args, weights) > 1 { // assess active arguments
        break; // preferred conclusion is "proved"
    } else {
        arg = biased_sample(remaining_args); // without replacement
        args.insert(arg);
        optimize(args, weights); // mutates weights to maximize `assess(args, weights)`
    }
}

... the code above implements "the balance of arguments" as a function parameterized with weights. This allows for using an optimization process to reach one's desired conclusion more quickly :)

Comment by David James (david-james) on You Get About Five Words · 2024-05-23T01:03:28.880Z · LW · GW

Thanks for your quick answer -- you answered before I was even done revising my question. :) I can personally relate to Dan Luu's examples. / This immediately makes me want to find potential solutions, but I won't jump to any right now. / For now, I'll just mention the ways in which Jacob Collier can explain music harmony at many levels.

Comment by David James (david-james) on You Get About Five Words · 2024-05-23T00:24:46.322Z · LW · GW

Preface: I feel like I'm wearing the clown suit to a black tie event here. I'm new to LW and respect the high standards for discussion. So, I'll treat this an experiment. I'd rather be wrong, downvoted, and (hopefully) enlightened & persuaded than have this lingering suspicion that the emperor has no clothes.

I should also say that I personally care a lot about the topic of communication and brevity, because I have a tendency to say too much at one time and/or use the wrong medium in doing so. If anyone needs to learn how to be brief, it is me, and I'll write a few hundred words if necessary to persuade you of it.

Ok, that said, here are my top two concerns with the article: (1) This article strikes me as muddled and unclear. (i) I don't understand what "get" five words even means. (ii) I don't understand how coordination relates to the core claims or insight. My confusion leads to my second concern: (2) what can I take from this article?

Let's start with the second part. Is the author saying if I'm a CEO of a company of thousands I only "get" five words?

A quick aside: to me, "get" is an example of muddled language. What does the author mean w.r.t. (a) time period; (b) ... struggling for the right words here ... meaning? As to (a), do I "get" five words per message? Or five words some (unspecified) time frame? As to (b), is "get" a proxy for how many words the recipient/audience will read? But reading isn't enough for coordination, so I expect the author means something more. Does the author mean "read and understand" or "read and internalize" or "read and act on"?

Anyhow, due to the paragraph above, I don't know how to convert "You only get five words" into a prediction. In this sense, to me, the claim it isn't even wrong, because I don't know how to put it into practice.

Normally I would stop here, put the article aside, and move on. However, this article is featured here on LW and has many up-votes which suggests that others get a lot of value out of it. So I'm curious: what am I missing? Is there some connection to EA that makes this particularly salient, perhaps?

I have a guess that fans of the article have some translation layer that I'm missing. Perhaps if I could translate what the author means by get and coordination I would have the ah-ha moment.

To that end, would someone be so kind as to (a) summarize the key point(s) as simply as possible; with (b) clear intended meanings for "coordinate" and "get" (as in you only "get" X words) -- including what timeframe we're talking about -- and (c) the logic and evidence for the claims.

It is also possible that I'm not "calibrated" with the stated Epistemic Status:

all numbers are made up and/or sketchily sourced. Post errs on the side of simplistic poetry – take seriously but not literally."

Ok, but what does this mean for the reader? The standards of rationality still apply, right? There should still be some meaningful, clear, testable takeaway, right?

Comment by David James (david-james) on Workshop (hackathon, residence program, etc.) about for-profit AI Safety projects? · 2024-05-08T10:57:37.413Z · LW · GW

Would you please expand on how ai-plans.com addresses the question from the post above ... ?

Maybe let's try to make a smart counter-move and accelerate the development of for-profit AI Safety projects [...] ? With the obvious idea to pull some VC money, which is a different pool than AI safety philanthropic funds.

I took a look at ai-plans, but I have yet to find information about:

  1. How does it work?
  2. Who created it?
  3. What is the motivation for building it?
  4. What problem(s) will ai-plans help solve?
  5. Who controls / curates / moderates it?
  6. What is the process/algorithm for: curation? moderation? ranking?

I would suggest (i) answering these questions on the ai-plans website itself then (ii) adding links here.

Comment by David James (david-james) on If You Demand Magic, Magic Won't Help · 2024-05-07T10:42:30.228Z · LW · GW

Let's step back. This thread of the conversation is rooted in this claim: "Let's be honest: all fiction is a form of escapism.". Are we snared in the Disputing Definitions trap? To quote from that LW article:

if the issue arises, both sides should switch to describing the event in unambiguous lower-level constituents, like acoustic vibrations or auditory experiences. Or each side could designate a new word, like 'alberzle' and 'bargulum', to use for what they respectively used to call 'sound'; and then both sides could use the new words consistently. That way neither side has to back down or lose face, but they can still communicate. And of course you should try to keep track, at all times, of some testable proposition that the argument is actually about.

I propose that we recognize several lower-level testable claims, framed as questions. How many people read fiction to ...

  1. entertain?
  2. distract from an unpleasant reality?
  3. understand the human condition (including society)?
  4. think through alternative scenarios?

Now I will connect the conversation to these four points:

  • Luke_A_Somers wrote "Why would I ever want to escape from my wonderful life to go THERE?" which relates to #2.

  • thomblake mentions the The Philosophy of Horror. Consider this quote from the publisher's summary: "... horror not only arouses the senses but also raises profound questions about fear, safety, justice, and suffering. ... horror's ability to thrill has made it an integral part of modern entertainment." which suggests #1 and #3.

  • JonInstall pulls out the dictionary in the hopes of "settling" the debate. He's talking about #1.

  • Speaking for myself, when reading e.g. the embedded story The Tale of the Omegas in Life 3.0, my biggest takeaway was #4.

Does this sound about right?

Comment by David James (david-james) on MIRI announces new "Death With Dignity" strategy · 2024-05-06T03:55:52.849Z · LW · GW

If we know a meteor is about to hit earth, having only D days to prepare, what is rational for person P? Depending on P and D, any of the following might be rational: throw an end of the world party, prep to live underground, shoot ICBMs at the meteor, etc.

Comment by David James (david-james) on Announcing the LessWrong Curated Podcast · 2024-05-06T03:06:08.282Z · LW · GW

I listened to part of “Processor clock speeds are not how fast AIs think”, but I was disappointed by the lack of a human narrator. I am not interested in machine readings; I would prefer to go read the article.

Comment by David James (david-james) on How An Algorithm Feels From Inside · 2024-05-01T09:52:51.698Z · LW · GW

For Hopfield networks in general, convergence is not guaranteed. See [1] for convergence properties.

[1] J. Bruck, “On the convergence properties of the Hopfield model,” Proc. IEEE, vol. 78, no. 10, pp. 1579–1585, Oct. 1990, doi: 10.1109/5.58341.

Comment by David James (david-james) on How An Algorithm Feels From Inside · 2024-05-01T09:32:49.243Z · LW · GW

The audio reading of this post [1] mistakenly uses the word hexagon instead of pentagon; e.g. "Network 1 is a hexagon. Enclosed in the hexagon is a five-pointed star".

[1] [RSS feed](https://intelligence.org/podcasts/raz); various podcast sources and audiobooks can be found [here](https://intelligence.org/rationality-ai-zombies/)

Comment by David James (david-james) on Paul Christiano named as US AI Safety Institute Head of AI Safety · 2024-04-27T23:56:54.614Z · LW · GW

I'm not so sure.

I would expect that a qualified, well-regarded leader is necessary, but I'm not confident it is sufficient. Other factors might dominate, such as: budget, sustained attention from higher-level political leaders, quality and quantity of supporting staff, project scoping, and exogenous factors (e.g. AI progress moving in a way that shifts how NIST wants to address the issue).

What are the most reliable signals for NIST producing useful work, particularly in a relatively new field? What does history show us? What kind of patterns do we find when NIST engages with: (a) academia; (b) industry; (c) the executive branch?

 

Comment by David James (david-james) on Paul Christiano named as US AI Safety Institute Head of AI Safety · 2024-04-27T23:39:13.465Z · LW · GW

Another failure mode -- perhaps the elephant in the room from a governance perspective -- is national interests conflicting with humanity's interests. For example, actions done in the national interest of the US may ratchet up international competition (instead of collaboration).

Even if one puts aside short-term political disagreements, what passes for serious analysis around US national security seems rather limited in terms of (a) time horizon and (b) risk mitigation. Examples abound: e.g. support of one dictator until he becomes problematic, then switching support and/or spending massively to deal with the aftermath. 

Even with sincere actors pursuing smart goals (such as long-term global stability), how can a nation with significant leadership shifts every 4 to 8 years hope to ensure a consistent long-term strategy? This question suggests that an instrumental goal for AI safety involves promoting institutions and mechanisms that promote long-term governance.

Comment by David James (david-james) on Paul Christiano named as US AI Safety Institute Head of AI Safety · 2024-04-27T23:22:31.634Z · LW · GW

One failure mode could be a perception that the USG's support of evals is "enough" for now. Under such a perception, some leaders might relax their efforts in promoting all approaches towards AI safety.

Comment by David James (david-james) on The Crackpot Offer · 2024-04-22T13:53:57.263Z · LW · GW

perhaps I should apply Cantor’s Diagonal Argument to my clever construction, and of course it found a counterexample—the binary number (. . . 1111), which does not correspond to any finite whole number.

I’m not following despite having recently reviewed Cantor’s Diagonal Argument. I can imagine constructing a matrix such that the diagonal is all ones… but I don’t see how this connects up to the counterexample claim above.

Also, why worry that an infinite binary representation (of any kind) doesn’t correspond to a finite whole number? I suspect I’m missing something here. A little help please to help close this inferential distance?