Posts
Comments
It is more probable that A, than that A and B.
I can see the appeal here -- litanies tend to have a particular style after all -- but I wonder if we can improve it.
I see two problems:
- This doesn't convey that Occam's razor is about explanations of observations.
- In general, one explanation is not a logical "subset" of the other. So the comparison is not between
A
andA and B
; it is betweenA
andB
.
Perhaps one way forward would involve a mention (or reference to) Minimum Description Length (MDL) or Kolmogorov complexity.
I'm putting many of these in a playlist along with The Geeks Were Right by The Faint: https://www.youtube.com/watch?v=TF297rN_8OY
When I saw the future - the geeks were right
Egghead boys with thin white legs
They've got modified features and software brains
But that's what the girls like - the geeks were rightPredator skills, chemical wars, plastic islands at sea
Watch what the humans ruin with machines
“If you see fraud and do not say fraud, you are a fraud." --- Nasim Taleb
No. Taleb’s quote is too simplistic. There is a difference between (1) committing fraud; (2) denying fraud where it exists; and (3) saying nothing.
Worse, it skips over a key component of fraud: intent!
I prefer the following framing: If a person sees evidence of fraud, they should reflect on (a) the probability of fraud (which involves assessing the intention to deceive!); (b) their range of responses; (c) the effects of each response; and (d) what this means for their overall moral assessment.
I realize my framing draws upon consequentialist reasoning, but I think many other ethical framings would still criticize Taleb’s claim for being overly simplistic.
The comment above may open a Flood of Jesus-Backed Securities and Jesus-Leveraged Loans. Heavens!
The recent rise of reinforcement learning (RL) for language models introduces an interesting dynamic to this problem.
Saying “recent rise” feels wrong to me. In any case, it is vague. Better to state the details. What do you consider to be the first LLM? The first use of RLHF with a LLM? My answers would probably be 2018 (BERT) and 2019 (OpenAI), respectively.
HLE and benchmarks like it are cool, but they fail to test the major deficits of language models, like how they can only remember things by writing them down onto a scratchpad like the memento guy.
A scratch pad for thinking, in my view, is hardly a deficit at all! Quite the opposite. In the case of people, some level of conscious reflection is important and probably necessary for higher-level thought. To clarify, I am not saying consciousness itself is in play here. I’m saying some feedback loop is probably necessary — where the artifacts of thinking, reasoning, or dialogue can themselves become objects of analysis.
My claim might be better stated this way: if we want an agent to do sufficiently well on higher-level reasoning tasks, it is probably necessary for them to operate at various levels of abstraction, and we shouldn’t be surprised if this is accomplished by way of observable artifacts used to bridge different layers. Whether the mechanism is something akin to chain of thought or something else seems incidental to the question of intelligence (by which I mean assessing an agent's competence at a task, which follows Stuart Russell's definition).
I don’t think the author would disagree, but this leaves me wondering why they wrote the last part of the sentence above. What am I missing?
A just world is a world where no child is born predetermined to endure avoidable illness simply because of ancestral bad luck.
In clear-cut cases, this principle seems sound; if a certain gene only has deleterious effects, and it can be removed, this is clearly better (for the individual and almost certainly for everyone else too).
In practice, this becomes more complicated if one gene has multiple effects. (This may occur on its own or because the gene interacts with other genes.) What if the gene in question is a mixed bag? For example, consider a gene giving a 1% increased risk of diabetes while always improving visual acuity. To be clear, I'm saying complicated not unresolvable. Such tradeoffs can indeed be resolved with a suitable moral philosophy combined with sufficient data. However, the difference is especially salient because the person deciding isn't the person that has to live with said genes. The two people may have different philosophies, risk preferences, or lifestyles.
Not necessarily an optimizer, though: satisficers may do it too. A core piece often involves tradeoffs, such as material efficiency versus time efficiency.
A concrete idea: what if every LessWrong article prominently linked to a summary? Or a small number of highly-ranked summaries? This could reduce the burden on the original author, at the risk of having the second author’s POV differ somewhat.
What if LW went so far as to make summaries the preferred entry ways? Instead of a reader seeing a wall of text, they see a digestible chunk first?
I have been wanting this for a very long time. It isn’t easy nor obvious nor without hard trade-offs. In any case, I don’t know of many online forums nor information sources that really explore the potential here.
Related: why not also include metadata for retractions, corrections, and the like? TurnTrout’s new web site, for example, sometimes uses “info boxes” to say things like “I no longer stand by this line of research”.
At least when I'm reading I like to have some filler between the ideas to give me time to digest a thought and get to the next one.
This both fascinating and strange to me.
If you mean examples, elaboration, and explanation, then, yes, I get what you mean.
OTOH, if you mean “give the reader a mental break”, that invites other alternatives. For example, if you want to encourage people to pause after some text, it might be worthwhile to make it harder to mindlessly jump ahead. Break the flow. This can be done in many ways: vertical space, interactive elements, splitting across pages, and more.
This is a fun design space. So much about reading has evolved over time, with the medium imposing constraints on the process. We have more feasible options now!
and I don't really see how to do that without directly engaging with the knowledge of the failure modes there.
I agree. To put it another way, even if all training data was scrubbed of all flavors of deception, how could ignorance of it be durable?
If you have a clear policy objective, you can probably find someone, somewhere to give you a fair hearing.
To clarify, are you suggesting now is a better time than, say one year ago? If so, here are some factors working against such a claim: (a) There are fewer people around, so reaching someone is going to be harder. (b) The people that remain are trying to survive, which involves keeping a low profile. (c) People that will hear you out feel immense pressure to tow the line, which is usually considered the opposite of entertaining new ideas. (d) If an idea gets some traction, any sensible staffer will wonder what chaos will emerge next to render the idea untenable.
Now, if you happen to get an audience for a policy idea, it is also important to ask yourself (i) What is the experience level of the staffer in front of you? (ii) Do they understand how the system works? (iii) Will they be effective stewards for your policy goal?
In this climate especially, one cannot ignore concerns about stability and corruption. The leaders of the current administration seek to expand the power of the executive branch significantly. They are willing to stretch -- and break -- the rule of law, as various court orders have demonstrated. My point? An unstable political and legal environment is not conducive to serious policy aims. Policy, no matter what the aim, is predicated on a legal foundation that operates over time in some kind of known environment.
For example, if one's actual policy objective is to, say, modernize the IRS (which I would support, if done properly), there are steps to do this. Given the Republican Party's control of all three branches of government, they could do this legally. Many (perhaps most?) rational thinkers would support simplifying the tax code, increasing compliance, and increasing operational efficiency, even though we have different ideas about the aims and scope of government policy.
Academia is less selective than it used to be
To what degree is this true regarding elite-level Ph.D. programs that are likely to lead to publication in (i) mathematics and/or (ii) computer science?
Separately, we should remember that academic selection is a relative metric, i.e. graded on a curve. So, when it comes to Ph.D. programs, is the median 2024 Ph.D. graduate more capable (however you want to define it) than the corresponding graduate from 1985? This is complex, involving their intellectual foundations, depth of their specialized knowledge, various forms of raw intelligence, attention span, collaborative skills, communication ability (including writing skills), and computational tools?
I realize what I'm about to say next may not be representative of the median Ph.D. student, but it feels to me the 2024 graduates of, say, Berkeley or MIT (not to mention, say, Thomas Jefferson High School) are significantly more capable than the corresponding 1985 graduates. Does my sentiment resonate with others and/or correspond to some objective metrics?
For me, I'd say a lot of my gains come from asking AI questions rather than generating code directly.
This is often the case for me as well. I often work on solo side projects and use Claude to think out loud. This lets me put on different hats, just like when pair programming, including: design mode, implementation mode, testing mode, and documentation mode.
I rarely use generated code as-is, but I do find it interesting to look at. As a concrete example, I recently implemented a game engine for the board game Azul (and multithreaded solver engine) in Rust and found Claude very helpful for being an extra set of eyes. I used it sort of a running issue tracker, design partner, and critic.
Now that I think about it, maybe the best metaphor I can use is that Claude helps me project myself onto myself. For many of my projects, I lean towards "write good, understandable code" instead of "move fast and break things". This level of self-criticism and curiosity has served me well with Claude. Without this mentality, I can see why people dismiss LLM-assisted coding; it certainly is far from a magic genie.
I've long had a bias toward design-driven work (write the README first, think on a whiteboard, etc), whether it be coding or almost anything, so having an infinitely patient conversational partner can be really amazing at times. At other times, the failure modes are frustrating, to say the least.
I agree that having a binder of policy proposals ready is effective. There is a dark side to this too. If you are a policy maker, expect plenty of pre-prepared binders awaiting the situation you find yourself in. Different groups vary widely in their predictive abilities and intellectual honesty.
The history of think tanks is fascinating and complicated. On one hand, they provide a bridge from academia to policy that can retain some of the intellectual rigor of the former. On the other hand, they can be thinly veiled ideologically motivated places awaiting a favorable political environment.
Right. To expand on this: there are also situations where an interest group pushes hard on a broader coalition to move faster, sometimes even accusing their partners or allies of “not caring enough” or “dragging their feet”. Assuming bad faith or impugning the motives of one’s allies can sour working relationships. Understanding the constraints in play goes a long way towards fostering compromise.
The idea of “focusing events” is well known in public policy.
For example, see Thomas Birkland’s book “After Disaster: Agenda Setting, Public Policy, and Focusing Events” or any of his many articles such as “Focusing Events, Mobilization, and Agenda Setting” (Journal of Public Policy; Vol. 18, No. 1; 1998)
According to Birkland in “During Disaster: Refining the Concept of Focusing Events to Better Explain Long-Duration Crises”, John Kingdon first used the term “focusing events” in his book “Agendas, Alternatives, and Public Policy”.
There is a considerable literature on these topics that does not rely on Milton Friedman or his political philosophy. Invoking Friedman in policy circles can make it harder to have neutral conversation about topics unrelated to markets, such as the Overton window and “theories of change” which thankfully seem to have survived as both neutral and intellectually honest ways of talking about the policy process.
With this in mind, I suggest listing these other authors alongside Milton Friedman to give a broader context. This will help us flawed humans focus on the core ideas rather than wonder in the back of our heads if the ideas are part of a particular political philosophy. As such, it probably will help to get these concepts in wider circulation.
To the students of history out there, let me know to what degree Friedman played a key role in developing and/or socializing the ideas around crises and focusing events. If so, credit where credit it due.
For what it is worth, Friedman, Arthur Okun (“Equality and Efficiency”), and Birkland were assigned reading in my public policy studies. We were expected to be able to articulate all of their points of view clearly and honestly, even if we disagreed.
I find this article confusing. So I find myself returning to fundamentals of computer science algorithms: to greedy algorithms and under what conditions they are optimal. Would anyone care to build a bridge from this terminology to what the author is trying to convey?
I wonder if you underestimate the complexity of brokering, much less maintaining, a lasting peace, whether it be via superior persuasive abilities or vast economic resource advantages. If you are thinking more along the lines of domination that is so complete that any violent resistance seems minuscule and pointless that’s a different category for me. When I think of “long term peace”, I usually don’t think of simmering grudges that remain dormant because of a massive power imbalance. I will grant that perhaps ultimate form of “persuasion” would involve removing even the mental possibility of resistance.
As I understand it, the phrase “passing the buck” often involves a sense of abdicating responsibility. I don’t think this is what this author means. I would suggest finding alternative phrasings that convey the notion of delegating implementation according to some core principles, combined with the idea of passing the torch to more capable actors.
Note: this comment should not be taken to suggest that I necessarily agree or disagree with the article itself.
To clarify: the claim is that Shapley values are the only way to guarantee the set containing all four properties: {Efficiency, Symmetry, Linearity, Null player}. There are other metrics that can achieve proper subsets.
Hopefully, you have gained some intuition for why Shapley values are “fair” and why they account for interactions among players.
The article fails to make a key point: in political economy and game theory, there are many definitions of "fairness" that seem plausible at face value, especially when considered one at a time. Even if one puts normative questions to the side, there are mathematical limits and constraints as one tries to satisfy various combinations simultaneously. Keeping these in mind, you can think of this as a design problem; it takes some care to choose metrics that reinforce some set of desired norms.
Should the bill had been signed, it would have created severe enough pressures to do more with less to focus on building better and better abstractions once the limits are hit.
Ok, I see the argument. But even without such legislation, the costs of large training runs create major incentives to build better abstractions.
Does this summary capture the core argument? Physical constraints on the human brain contributed to its success relative to other animals, because it had to "do more with less" by using abstraction. Analogously, constraints on AI compute or size will encourage more abstraction, increasing the likelihood of "foom" danger.
Though I'm reasonably sure Llama license (sic) isn't preventing viewing the source
This is technically correct but irrelevant. Meta doesn't provide any source code, by which I mean the full set of precursor steps (including the data and how to process it).
Generally speaking, a license defines usage rights; it has nothing to do with if/how the thing (e.g. source code) is made available.
As a weird example, one could publish a repository with a license but no source code. This would be odd. The license would have no power to mandate the code be released; that is a separate concern.
To put it another way, a license does not obligate the owner to release or share anything, whether it be compiled software, source code, weights, etc. A license simply outlines the conditions under which the thing (e.g. source code), once released, can be used or modified.
The paper AI Control: Improving Safety Despite Intentional Subversion is a practical, important step in the right direction. It demonstrates various protocols for aiming for safety even with malicious models that know they are suspected of being dangerous.
Ryan Greenblatt, Buck Shlegeris, Kshitij Sachan, Fabien Roger Proceedings of the 41st International Conference on Machine Learning, PMLR 235:16295-16336, 2024.
As large language models (LLMs) become more powerful and are deployed more autonomously, it will be increasingly important to prevent them from causing harmful outcomes. To do so, safety measures either aim at making LLMs try to avoid harmful outcomes or aim at preventing LLMs from causing harmful outcomes, even if they try to cause them. In this paper, we focus on this second layer of defense. We develop and evaluate pipelines of safety techniques (protocols) that try to ensure safety despite intentional subversion - an approach we call AI control. We investigate a setting in which we want to solve a sequence of programming problems without ever submitting subtly wrong code, using access to a powerful but untrusted model (in our case, GPT-4), access to a less powerful trusted model (in our case, GPT-3.5), and limited access to high-quality trusted labor. We investigate a range of protocols and red-team them by exploring strategies that the untrusted model could use to subvert them. We find that using the trusted model to edit untrusted-model code or using the untrusted model as a monitor substantially improves on simple baselines.
Related Video by Robert Miles: I highly recommend Using Dangerous AI, But Safely? released on Nov. 15, 2024.
NIST's AI Safety Institute (AISI) hired Paul Christiano as its Head of AI Safety.
From Is Deep Learning Actually Hitting a Wall? Evaluating Ilya Sutskever's Recent Claims:
But what we seem to be seeing is a bit different from deep learning broadly hitting a wall. More specifically it appears to be: returns to scaling up model pretraining are plateauing.
I agree, but I’m not sure how durable this agreement will be. (I reversed my position while drafting this comment.)
Here is my one sentence summary of the argument above: If Omega can make a fully accurate prediction in a universe without backwards causality, this implies a deterministic universe.
The Commission recommends: [...] 1. Congress establish and fund a Manhattan Project-like program dedicated to racing to and acquiring an Artificial General Intelligence (AGI) capability.
As mentioned above, the choice of Manhattan Project instead of Apollo Project is glaring.
Worse, there is zero mention of AI safety, AI alignment, or AI evaluation in the Recommendations document.
Lest you think I'm expecting too much, the report does talk about safety, alignment, and evaluation ... for non-AI topic areas! (see bolded words below: "safety", "aligning", "evaluate")
- "Congress direct the U.S. Government Accountability Office to investigate the reliability of safety testing certifications for consumer products and medical devices imported from China." (page 736)
- "Congress direct the Administration to create an Outbound Investment Office within the executive branch to oversee investments into countries of concern, including China. The office should have a dedicated staff and appropriated resources and be tasked with: [...] Expanding the list of covered sectors with the goal of aligning outbound investment restrictions with export controls." (page 737)
- "Congress direct the U.S. Department of the Treasury, in coordination with the U.S. Departments of State and Commerce, to provide the relevant congressional committees a report assessing the ability of U.S. and foreign financial institutions operating in Hong Kong to identify and prevent transactions that facilitate the transfer of products, technology, and money to Russia, Iran, and other sanctioned countries and entities in violation of U.S. export controls, financial sanctions, and related rules. The report should [...] Evaluate the extent of Hong Kong’s role in facilitating the transfer of products and technologies to Russia, Iran, other adversary countries, and the Mainland, which are prohibited by export controls from being transferred to such countries;" (page 741)
I am not following the context of the comment above. Help me understand the connection? The main purpose of my comment above was to disagree with this sentence two levels up:
The frenzy to couple everything into a single tangle of complexity is driven by the misunderstanding that complacency is the only reason why your ideology is not the winning one
… in particular, I don’t think it captures the dominant driver of “coupling” or “bundling”.
Does the comment one level up above disagree with my claims? I’m not following the connection.
The frenzy to couple everything into a single tangle of complexity is driven by…
In some cases, yes, but this is only one factor of many. Others include:
-
Our brains are often drawn to narratives, which are complex and interwoven. Hence the tendency to bundle up complex logical interdependencies into a narrative.
-
Our social structures are guided/constrained by our physical nature and technology. For in-person gatherings, bundling of ideas is often a dominant strategy.
For example, imagine a highly unusual congregation: a large unified gathering of monotheistic worshippers with considerable internal diversity. Rather than “one track” consisting of shared ideology, they subdivide their readings and rituals into many subgroups. Why don’t we see much of this (if any) in the real world? Because ideological bundling often pairs well with particular ways of gathering.
P.S. I personally welcome gathering styles that promote both community and rationality (spanning a diversity of experiences and values).
Right. Some such agreements are often called social contracts. One catch is that a person born into them may not understand their historical origin or practical utility, much less agree with them.
Durable institutions find ways to survive. I don’t mean survival merely in terms of legal continuity; I mean fidelity to their founding charter. Institutions not only have to survive past their first leader; they have to survive their first leader themself! The institution’s structure and policies must protect against the leader’s meandering attention, whims, and potential corruptions. In the case of Elon, based on his mercurial history, I would not bet that Musk would agree to the requisite policies.
they weren’t designed to be ultra-robust to exploitation, or to make serious attempts to assess properties like truth, accuracy, coherence, usefulness, justice
There are notable differences between these properties. Usefulness and justice are quite different from the others (truth, accuracy, coherence). Usefulness (defined as suitability for a purpose, which is non-prescriptive as to the underlying norms) is different from justice (defined by some normative ideal). Coherence requires fewer commitments than truth and accuracy.
Ergo, I could see various instantiations of a library designed to satisfy various levels. Level 1 would value coherence. Level 2 would add truth and accuracy. Level 3: +usefulness. Level 4, +justice.
I like having a list of small, useful things to do that tend to pay off in the long run, like:
- go to the grocery store to make sure you have fresh fruits and vegetables
- mediate for 10 minutes
- do pushups and sit ups
- journal for 10 minutes
When my brain feels cluttered, it is nice to have a list of time-boxed simple tasks that don’t require planning or assessment.
Verify human designs and automatically create AI-generated designs which provably cannot be opened by mechanical picking.
Such a proof would be subject to its definition of "mechanical picking" and a sufficiently accurate physics model. (For example, would an electronically-controllable key-looking object with adjustable key-cut depths with pressure sensors qualify as a "pick"?)
I don't dispute the value of formal proofs for safety. If accomplished, they move the conversation to "is the proof correct?" and "are we proving the right thing?". Both are steps in the right direction, I think.
Thanks for the references; I'll need some time to review them. In the meanwhile, I'll make some quick responses.
As a side note, I'm not sure how tree search comes into play; in what way does tree search require unbounded steps that doesn't apply equally to linear search?
I intended tree search as just one example, since minimax tree search is a common example for game-based RL research.
No finite agent, recursive or otherwise, can plan over an unbounded number of steps in finite time...
In general, I agree. Though there are notable exceptions for cases such as (not mutually exclusive):
-
a closed form solution is found (for example, where a time-based simulation can calculate some quantity at an any arbitrary time step using the same amount of computation)
-
approximate solutions using a fixed number of computation steps are viable
-
a greedy algorithm can select the immediate next action that is equivalent to following a longer-term planning algorithm
... so it's not immediately clear to me how iteration/recursion is fundamentally different in practice.
Yes, like I said above, I agree in general and see your point.
As I'm confident we both know, some algorithms can be written more compactly when recursion/iteration are available. I don't know how much computation theory touches on this; i.e. what classes of problems this applies to and why. I would make an intuitive guess that it is conceptually related to my point earlier about closed-form solutions.
Note that this is different from the (also very interesting) question of what LLMs, or the transformer architecture, are capable of accomplishing in a single forward pass. Here we're talking about what they can do under typical auto-regressive conditions like chat.
I would appreciate if the community here could point me to research that agrees or disagrees with my claim and conclusions, below.
Claim: one pass through a transformer (of a given size) can only do a finite number of reasoning steps.
Therefore: If we want an agent that can plan over an unbounded number of steps (e.g. one that does tree-search), it will need some component that can do an arbitrary number of iterative or recursive steps.
Sub-claim: The above claim does not conflict with the Universal Approximation Theorem.
Claim: the degree to which the future is hard to predict has no bearing on the outer alignment problem.
- If one is a consequentialist (of some flavor), one can still construct a "desirability tree" over various possible various future states. Sure, the uncertainty makes the problem more complex in practice, but the algorithm is still very simple. So I don't think that that a more complex universe intrinsically has anything to do with alignment per se.
- Arguably, machines will have better computational ability to reason over a vast number of future states. In this sense, they will be more ethical according to consequentialism, provided their valuation of terminal states is aligned.
- To be clear, of course, alignment w.r.t. the valuation of terminal states is important. But I don't think this has anything to do with a harder to predict universe. All we do with consequentialism is evaluate a particular terminal state. The complexity of how we got there doesn't matter.
- (If you are detecting that I have doubts about the goodness and practicality of consequentialism, you would be right, but I don't think this is central to the argument here.)
- If humans don't really carry out consequentialism like we hope they would (and surely humans are not rational enough to adhere to consequentialist ethics -- perhaps not even in principle!), we can't blame this on outer alignment, can we? This would be better described as goal misspecification.
- If one subscribes to deontological ethics, then the problem becomes even easier. Why? One wouldn't have to reason probabilistically over various future states at all. The goodness of an action only has to do with the nature of the action itself.
- Do you want to discuss some other kind of ethics? Is there some other flavor that would operate differentially w.r.t. outer alignment in a more versus less predictable universe?
Want to try out a thought experiment? Put that same particular human (who wanted to specify goals for an agent) in the financial scenario you mention. Then ask: how well would they do? Compare the quality of how the person would act versus how well the agent might act.
This raises related questions:
- If the human doesn't know what they would want, it doesn't seem fair to blame the problem on alignment failure. In such a case, the problem would be a person's lack of clarity.
- Humans are notoriously good rationalizers and may downplay their own bad decisions. Making a fair comparison between "what the human would have done" versus "what the AI agent would have done" may be quite tricky. (See the Fundamental Attribution Error a.k.a. correspondence bias.
As I understand it, the argument above doesn't account for the agent using the best information available at the time (in the future, relative to its goal specification).
I think there is some confusion around a key point. For alignment, do we need to define what an agent will do in all future scenarios? It depends what you mean.
- In some sense, no, because in the future, the agent will have information we don't have now.
- In some sense, yes, because we want to know (to some degree) how the agent will act with future (unknown) information. Put another way, we want to guarantee that certain properties hold about its actions.
Let's say we define an aligned agent doing what we would want, provided that we were in its shoes (i.e. knowing what it knew). Under this definition, it is indeed possible that to specify an agent's decision rule in a way that doesn't rely on long-range predictions (where predictive power gets fuzzy, like Alejandro says, due to measurement error and complexity). See also the adjacent by comment about a thermostat by eggsyntax.
Note: I'm saying "decision rule" intentionally, because even an individual human does not have a well-defined utility function. (edited)
Nevertheless, it seems wrong to say that my liver is optimising my bank balance, and more right to say that it "detoxifies various metabolites, synthesizes proteins, and produces biochemicals necessary for digestion"---even though that gives a less precise account of the liver's behaviour.
I'm not following why this is a less precise account of the liver's behavior.
Here is an example of a systems dynamics diagram showing some of the key feedback loops I see. We could discuss various narratives around it and what to change (add, subtract, modify).
┌───── to the degree it is perceived as unsafe ◀──────────┐
│ ┌──── economic factors ◀─────────┐ │
│ + ▼ │ │
│ ┌───────┐ ┌───────────┐ │ │ ┌────────┐
│ │people │ │ effort to │ ┌───────┐ ┌─────────┐ │ AI │
▼ - │working│ + │make AI as │ + │ AI │ + │potential│ + │becomes │
├─────▶│ in │────▶│powerful as│─────▶│ power │───▶│ for │───▶│ too │
│ │general│ │ possible │ └───────┘ │unsafe AI│ │powerful│
│ │ AI │ └───────────┘ │ └─────────┘ └────────┘
│ └───────┘ │
│ │ net movement │ e.g. use AI to reason
│ + ▼ │ about AI safety
│ ┌────────┐ + ▼
│ │ people │ ┌────────┐ ┌─────────────┐ ┌──────────┐
│ + │working │ + │ effort │ + │understanding│ + │alignment │
└────▶│ in AI │────▶│for safe│─────▶│of AI safety │─────────────▶│ solved │
│ safety │ │ AI │ └─────────────┘ └──────────┘
└────────┘ └────────┘ │
+ ▲ │
└─── success begets interest ◀───┘
I find this style of thinking particularly constructive.
- For any two nodes, you can see a visual relationship (or lack thereof) and ask "what influence do these have on each other and why?".
- The act of summarization cuts out chaff.
- It is harder to fool yourself about the completeness of your analysis.
- It is easier to get to core areas of confusion or disagreement with others.
Personally, I find verbal reasoning workable for "local" (pairwise) reasoning but quite constraining for systemic thinking.
If nothing else, I hope this example shows how easily key feedback loops get overlooked. How many of us claim to have... (a) some technical expertise in positive and negative feedback? (b) interest in Bayes nets? So why don't we take the time to write out our diagrams? How can we do better?
P.S. There are major oversights in the diagram above, such as economic factors. This is not a limitation of the technique itself -- it is a limitation of the space and effort I've put into it. I have many other such diagrams in the works.
I’m curious if your argument, distilled, is: fewer people skilled in technical AI work is better? Such a claim must be examined closely! Think of it from a systems dynamics point of view. We must look at more than just one relationship. (I personally try to press people to share some kind of model that isn’t presented only in words.)
One important role of a criminal justice system is rehabilitation. Another, according to some, is retribution. Those in Azkaban suffer from perhaps the most of awful forms of retribution. Dementation renders a person incapable of rehabilitation.
Consider this if-then argument:
If:
- Justice is served without error (which is not true)
- The only purpose for criminal justice is retribution
Then: Azkabanian punishment is rational.
Otherwise, assuming there are other ways to protect society from the person, it is irrational to dement people.
Speaking broadly, putting aside the fictional word of Azkaban, there is an argument that suggests retribution for its own sake is wrong. It is simple: inflicting suffering is wrong, all other things equal. Retribution makes sense only to the extent it serves as a deterrent.
First, I encourage you to put credence in the current score of -40 and a moderator saying the post doesn't meet LessWrong's quality bar.
By LD you mean Lincoln-Douglas debate, right? If so, please continue reading.
Second, I'd like to put some additional ideas up for discussion and consideration -- not debate -- I don't want to debate you, certainly not in LD style. If you care about truth-seeking, I suggest taking a hard and critical look at LD. To what degree is Lincoln-Douglas debate organized around truth-seeking? How often does a participant in an LD debate change their position based on new evidence? In my understanding, in practice, LD is quite uninterested in the notion of being "less wrong". It seems to be about a particular kind of "rhetorical art" of fortifying one's position as much as possible while attacking another's. One might hope that somehow the LD debate process surfaces the truth. Maybe, in some cases. But generally speaking, I find it to be a woeful distortion of curious discussion and truth-seeking.
Surprisingly, perhaps, https://dl.acm.org/doi/book/10.5555/534975 has a free link to the full-text PDF.
Reinforcement learning is not required for the analysis above. Only evolutionary game theory is needed.
- In evolutionary game theory, the population's mix of strategies changes via replicator dynamics.
- In RL, each individual agent modifies its policy as it interacts with its environment using a learning algorithm.