ozziegooen's Shortform

ozziegooen

ozziegooen's Shortform

post by ozziegooen · 2019-08-31T23:03:24.809Z · LW · GW · 219 comments

220 comments

219 comments

Comments sorted by top scores.

comment by ozziegooen · 2025-02-05T17:38:06.162Z · LW(p) · GW(p)

A bunch of people in the AI safety landscape seem to argue "we need to stop AI progress, so that we can make progress on AI safety first."

One flip side to this is that I think it's incredibly easy for people to waste a ton of resources on "AI safety" at this point.

I'm not sure how much I trust most technical AI safety researchers to make important progress on AI safety now. And I trust most institutions a lot less.

I'd naively expect if any major country would throw $100 Billion on it today, the results would be highly underwhelming. I rarely trust these governments to make progress on concrete technologies with clear progress measures, and "AI Safety" is highly ambiguous and speculative.

As I've written about before, I think it's just hard to know what critical technical challenges will be bottlenecks around AI alignment, given that it's unclear when this will become an issue or what sorts of architectures we will have then.

All that said, slowing things down seems much safer to me. I assume that at [year(TAI) - 3] we'll have a decent idea of what's needed, and extending that duration seems like a safe bet.

I really want to see better strategic discussion about AI safety. If somehow we could spend $10B just to get a better idea of what to actually do, I'd easily suggest that, though strategy is something that's typically very difficult to spend money on.

Personally, this is one reason why I favor the meta approach of "make better epistemic tools, using AI." This is an area that can be very concrete and achievable, though it does have its own problems.

Replies from: JBlack, TsviBT, Max Lee

↑ comment by JBlack · 2025-02-06T04:30:04.975Z · LW(p) · GW(p)

What makes you think that we're not at year(TAI)-3 right now? I'll agree that we might not be there yet, but you seem to be assuming that we can't be.

Replies from: ozziegooen

↑ comment by ozziegooen · 2025-02-06T05:37:45.135Z · LW(p) · GW(p)

This is an orthogonal question. I agree that if we're there now, my claim is much less true.

I'd place fairly little probability mass on this (<10%) and believe much of the rest of the community does as well, though I realize there is a subset of the LessWrong-adjacent community that does.

↑ comment by TsviBT · 2025-02-05T18:50:43.476Z · LW(p) · GW(p)

I assume that at [year(TAI) - 3] we'll have a decent idea of what's needed

Why?? What happened to the bitter lesson?

Replies from: ozziegooen

↑ comment by ozziegooen · 2025-02-05T19:23:12.093Z · LW(p) · GW(p)

Can you explain this position more? I know the bitter lesson, could imagine a few ways it could have implications here.

Replies from: TsviBT

↑ comment by TsviBT · 2025-02-05T19:31:55.811Z · LW(p) · GW(p)

I'm saying that just because we know algorithms that will successfully leverage data and compute to set off an intelligence explosion (...ok I just realized you wrote TAI but IDK what anyone means by anything other than actual AGI), doesn't mean we know much about how they leverage it and how that influences the explody-guy's long-term goals.

Replies from: ozziegooen

↑ comment by ozziegooen · 2025-02-05T19:35:15.422Z · LW(p) · GW(p)

I assume that current efforts in AI evals and AI interpretability will be pretty useless if we have very different infrastructures in 10 years. For example, I'm not sure how much LLM interp helps with o1-style high-level reasoning.

I also think that later AI could help us do research. So if the idea is that we could do high-level strategic reasoning to find strategies that aren't specific to specific models/architectures, I assume we could do that reasoning much better with better AI.

↑ comment by Knight Lee (Max Lee) · 2025-02-05T23:10:01.551Z · LW(p) · GW(p)

I think both duration and funding are important.

I agree that increasing duration has a greater impact than increasing funding. But increasing duration is harder than increasing funding.

AI safety spending [? · GW] is only $0.1 billion while AI capabilities spending is $200 billion. Increasing funding by 10x is relatively more attainable, while increasing duration by 10x would require more of a miracle.

Even if you believe that funding today isn't very useful and funding in the future is more useful, increasing funding now moves the Overton window a lot. It's hard for any government which has traditionally spent only $0.01 billion to suddenly spend $100 billion. They'll use the previous budget as an anchor point to decide the new budget.

My guess is that 4x funding 2x duration.^[1]

^{^}
For inventive steps, having twice as many "inventors" reduces the time to invention by half, while for engineering steps, having twice as many "engineers" doesn't help very much.
(Assuming the time it takes each inventor to think of an invention is an independent exponential distribution)

Replies from: ozziegooen

↑ comment by ozziegooen · 2025-02-05T23:15:02.831Z · LW(p) · GW(p)

I'm not sure if it means much, but I'd be very happy if AI safety could get another $50B from smart donors today.

I'd flag that [stopping AI development] would cost far more than $50B. I'd expect that we could easily lose $3T of economic value in the next few years if AI progress seriously stopped.

I guess, it seems to me like duration is basically dramatically more expensive to get than funding, for amounts of funding people would likely want.

Replies from: Max Lee

↑ comment by Knight Lee (Max Lee) · 2025-02-05T23:36:27.703Z · LW(p) · GW(p)

I do think that convincing the government to pause AI in a way which sacrifices $3000 billion economic value, is relatively easier than directly spending $3000 billion on AI safety.

Maybe spending $1 is similarly hard to sacrificing $10-$100 of future economic value via preemptive regulation.^[1]

But $0.1 billion AI safety spending is so ridiculously little (1000 times less than capabilities spending), increasing it may still be the "easiest" thing to do. Of course we should still push for regulation at the same time (it doesn't hurt).

PS: what do you think of my open letter idea [LW · GW] for convincing the government to increase funding?

^{^}
Maybe "future economic value" is too complicated. A simpler guesstimate would be "spending $1 is similarly hard to sacrificing $10 of company valuations via regulation."

comment by ozziegooen · 2019-08-31T23:03:24.990Z · LW(p) · GW(p)

Questions around Making Reliable Evaluations

Most existing forecasting platform questions [LW · GW] are for very clearly verifiable questions:

"Who will win the next election"
"How many cars will Tesla sell in 2030?"

But many of the questions we care about are much less verifiable:

"How much value has this organization created?"
"What is the relative effectiveness of AI safety research vs. bio risk research?"

One solution attempt would be to have an "expert panel" assess these questions, but this opens up a bunch of issues. How could we know how much we could trust this group to be accurate, precise, and understandable?

The topic of, "How can we trust that a person or group can give reasonable answers to abstract questions" is quite generic and abstract, but it's a start.

I've decided to investigate this as part of my overall project on forecasting infrastructure. I've recently been working with Elizabeth [LW · GW] on some high-level research.

I believe that this general strand of work could be useful both for forecasting systems and also for the more broad-reaching evaluations that are important in our communities.

Early concrete questions in evaluation quality

One concrete topic that's easily studiable is evaluation consistency. If the most respected philosopher gives wildly different answers to "Is moral realism true" on different dates, it makes you question the validity of their belief. Or perhaps their belief is fixed, but we can determine that there was significant randomness in the processes that determined it.

Daniel Kahneman apparently thinks a version of this question is important enough to be writing his new book on it.

Another obvious topic is in the misunderstanding of terminology. If an evaluator understands "transformative AI" in a very different way to the people reading their statements about transformative AI, they may make statements that get misinterpreted.

These are two specific examples of questions, but I'm sure there are many more. I'm excited about understanding existing work in this overall space more, and getting a better sense of where things stand and what the next right questions are to be asking.

Replies from: romeostevensit, jimrandomh, zulupineapple

↑ comment by romeostevensit · 2019-09-09T23:37:49.081Z · LW(p) · GW(p)

> "How much value has this organization created?"

can insights from prediction markets work for helping us select better proxies and decision criteria or do we expect people to be too poorly entangled with the truth of these matters for that to work? Do orgs always require someone who is managing the ontology and incentives to be super competent at that to do well? De facto improvements here are worth billions (project management tools, slack, email add ons for assisting managing etc.)

Replies from: ozziegooen

↑ comment by ozziegooen · 2019-09-10T09:49:17.509Z · LW(p) · GW(p)

I think that prediction markets can help us select better proxies, but the initial set up (at least) will require people pretty clever with ontologies.

For example, say a group comes up with 20 proposals for specific ways of answering the question, "How much value has this organization created?". A prediction market could predict the outcome of the effectiveness of each proposal.

I'd hope that over time people would put together lists of "best" techniques to formalize questions like this, so doing it for many new situations would be quite straightforward.

Replies from: mr-hire, mr-hire

↑ comment by Matt Goldenberg (mr-hire) · 2019-09-10T14:49:00.396Z · LW(p) · GW(p)

Another related idea we played around with, but which didn't make it into the final whitepaper:

What if we just assumed that Brier score was also predictive of good judgement. Then, people, could create a distribution over several measures of "how good will this organization do" and we could use standard probability theory and aggregation tools to create an aggregated final measure.

↑ comment by Matt Goldenberg (mr-hire) · 2019-09-10T14:38:09.081Z · LW(p) · GW(p)

The way we handled this with Verity was to pick a series of values, like "good judgement", "integrity," "consistency" etc. Then the community would select exemplars who they thought represented those values the best.

As people voted on which proposals they liked best, we would weight their votes by:

1. How much other people (weighted by their own score on that value) thought they had that value.

2. How similarly they voted to the examplars.

This sort of "value judgement" allows for fuzzy representation of high level judgement, and is a great supplement to more objective metrics like Brier score which can only measure well defined questions.

Eigentrust++ is a great algorithm that has the properties needed for this judgement-based reputation. The Verity Whitepaper goes more into depth as to how this would be used in practice.

Replies from: romeostevensit

↑ comment by romeostevensit · 2019-09-10T15:25:46.870Z · LW(p) · GW(p)

Deference networks seem underrated.

↑ comment by jimrandomh · 2019-09-09T23:33:09.576Z · LW(p) · GW(p)

One way to look at this is, where is the variance coming from? Any particular forecasting question has implied sub-questions, which the predictor needs to divide their attention between. For example, given the question "How much value has this organization created?", a predictor might spend their time comparing the organization to others in its reference class, or they might spend time modeling the judges and whether they tend to give numbers that are higher or lower.

Evaluation consistency is a way of reducing the amount of resources that you need to spend modeling the judges, by providing a standard that you can calibrate against. But there are other ways of achieving the same effect. For example, if you have people predict the ratio of value produced between two organizations, then if the judges consistently predict high or predict low, this no longer matters since it affects both equally.

Replies from: ozziegooen

↑ comment by ozziegooen · 2019-09-10T09:51:00.245Z · LW(p) · GW(p)

Yep, good points. Ideally one could do a proper or even estimated error analysis of some kind.

Having good units (like, ratios) seems pretty important.

↑ comment by zulupineapple · 2019-09-07T12:23:04.144Z · LW(p) · GW(p)

"What is the relative effectiveness of AI safety research vs. bio risk research?"

If you had a precise definition of "effectiveness" this shouldn't be a problem. E.g. if you had predictions for "will humans go extinct in the next 100 years?" and "will we go extinct in the next 100 years, if we invest 1M into AI risk research?" and "will we go extinct, if we invest 1M in bio risk research?", then you should be able to make decisions with that. And these questions should work fine in existing forecasting platforms. Their long term and conditional nature are problems, of course, but I don't think that can be helped.

"How much value has this organization created?"

That's not a forecast. But if you asked "How much value will this organization create next year?" along with a clear measure of "value", then again, I don't see much of a problem. And, although clearly defining value can be tedious (and prone to errors), I don't think that problem can be avoided. Different people value different things, that can't be helped.

One solution attempt would be to have an "expert panel" assess these questions

Why would you do that? What's wrong with the usual prediction markets? Of course, they're expensive (require many participants), but I don't think a group of experts can be made to work well without a market-like mechanism. Is your project about making such markets more efficient?

Replies from: ozziegooen, tetraspace-grouping

↑ comment by ozziegooen · 2019-09-07T19:34:29.495Z · LW(p) · GW(p)

If you had a precise definition of "effectiveness" this shouldn't be a problem.

Coming up with a precise definition is difficult, especially if you want multiple groups to agree. Those specific questions are relatively low-level; I think we should ask a bunch of questions like that, but think we may also want some more vague things as well.

For example, say I wanted to know how good/enjoyable a specific movie would be. Predicting the ratings according to movie reviewers (evaluators) is an approach I'd regard as reasonable. I'm not sure what a precise definition for movie quality would look like (though I would be interested in proposals), but am generally happy enough with movie reviews for what I'm looking for.

"How much value has this organization created?"

Agreed that that itself isn't a forecast, I meant in the more general case, for questions like, "How much value will this organization create next year" (as you pointed out). I probably should have used that more specific example, apologies.

And, although clearly defining value can be tedious (and prone to errors), I don't think that problem can be avoided.

Can you be more explicit about your definition of "clearly"? I'd imagine that almost any proposal at a value function would have some vagueness. Certificates of Impact get around this by just leaving that for the review of some eventual judges, kind of similar to what I'm proposing.

Why would you do that? What's wrong with the usual prediction markets?

The goal for this research isn't fixing something with prediction markets, but just finding more useful things for them to predict. If we had expert panels that agreed to evaluate things in the future (for instance, they are responsible for deciding on the "value organization X has created" in 2025), then prediction markets and similar could predict what they would say.

Replies from: zulupineapple

↑ comment by zulupineapple · 2019-09-07T22:55:56.996Z · LW(p) · GW(p)

For example, say I wanted to know how good/enjoyable a specific movie would be.

My point is that "goodness" is not a thing in the territory. At best it is a label for a set of specific measures (ratings, revenue, awards, etc). In that case, why not just work with those specific measures? Vague questions have the benefit of being short and easy to remember, but beyond that I see only problems. Motivated agents will do their best to interpret the vagueness in a way that suits them.

Is your goal to find a method to generate specific interpretations and procedures of measurement for vague properties like this one? Like a Shelling point for formalizing language? Why do you feel that can be done in a useful way? I'm asking for an intuition pump.

Can you be more explicit about your definition of "clearly"?

Certainly there is some vagueness, but it seems that we manage to live with it. I'm not proposing anything that prediction markets aren't already doing.

Replies from: ozziegooen

↑ comment by ozziegooen · 2019-09-09T10:00:23.854Z · LW(p) · GW(p)

Hm... At this point I don't feel like I have a good intuition for what you find intuitive. I could give more examples, but don't expect they would convince you much right now if the others haven't helped.

I plan to eventually write more about this, and eventually hopefully we should have working examples up (where people are predicting things). Hopefully things should make more sense to you then.

Short comments back<>forth are a pretty messy communication medium for such work.

↑ comment by Tetraspace (tetraspace-grouping) · 2019-09-10T15:09:12.386Z · LW(p) · GW(p)

There's something of a problem with sensitivity; if the x-risk from AI is ~0.1, and the difference in x-risk from some grant is ~10^-6, then any difference in the forecasts is going to be completely swamped by noise.

(while people in the market could fix any inconsistency between the predictions, they would only be able to look forward to 0.001% returns over the next century)

Replies from: zulupineapple, ozziegooen

↑ comment by zulupineapple · 2019-09-13T18:14:32.330Z · LW(p) · GW(p)

Making long term predictions is hard. That's a fundamental problem. Having proxies can be convenient, but it's not going to tell you anything you don't already know.

↑ comment by ozziegooen · 2019-09-11T10:54:50.623Z · LW(p) · GW(p)

Yea, in cases like these, having intermediate metrics seems pretty essential.

comment by ozziegooen · 2019-12-24T00:13:44.651Z · LW(p) · GW(p)

Experimental predictability and generalizability are correlated

A criticism to having people attempt to predict the results of experiments is that this will be near impossible. The idea is that experiments are highly sensitive to parameters and these would need to be deeply understood in order for predictors to have a chance at being more accurate than an uninformed prior. For example, in a psychological survey, it would be important that the predictors knew the specific questions being asked, details about the population being sampled, many details about the experimenters, et cetera.

One counter-argument may not be to say that prediction will be easy in many cases, but rather that if these experiments cannot be predicted in a useful fashion without very substantial amounts of time, then these experiments aren’t probably going to be very useful anyway.

Good scientific experiments produce results are generalizable. For instance, a study on the effectiveness of Malaria on a population should give us useful information (probably for use with forecasting) about the effectiveness on Malaria on other populations. If it doesn’t, then value would be limited. It would really be more of a historic statement than a scientific finding.

Possible statement from a non-generalizable experiment:

“We found that intervention X was beneficial within statistical significance for a population of 2,000 people. That’s interesting if you’re interested in understanding the histories of these 2,000 people. However, we wouldn’t recommend inferring anything about this to other groups of people, or to understanding anything about these 2,000 people going forward.”

Formalization

One possible way of starting to formalize this a bit is to imagine experiments (assuming internal validity) as mathematical functions. The inputs would be the parameters and details of how the experiment was performed, and the results would be the main findings that the experiment found.

$e x p e r i m e n t_{n} (i n p u t s) = f i n d i n g s$

If the experiment has internal validity, then observers should predict that if an identical (but subsequent) experiment were performed, it would result in identical findings. $p ((e x p e r i m e n t_{n + 1} (i n p u t s_{i}) = f i n d i n g s_{i}) | (e x p e r i m e n t_{n} (i n p u t s_{i}) = f i n d i n g s_{i})) = 1$

We could also say that if we took a probability distribution of the chances of every possible set of findings being true, the differential entropy of that distribution would be 0, as smart forecasters would recognize that $f i n d i n g s_{i}$ is correct with ~100% probability. $H (e x p e r i m e n t_{n + 1} (i n p u t s_{i}) | (e x p e r i m e n t_{n} (i n p u t s_{i}) = f i n d i n g s_{i})) \approx 0$

Generalizability

Now, to be generalizable, then hopefully we could perturb the inputs in a minor way, but still have the entropy be low. Note that the important thing is not that the outputs not be changed, but rather that they remain predictable. For instance, a physical experiment that describes the basics of mechanical velocity may be performed on data with velocities of 50-100 miles/hour. This experiment would not be useful only if future experiments also described situations with similar velocities; but rather, if future experiments on velocity could be better predicted, no matter the specific velocities used.

We can describe a perturbation of $i n p u t s_{i}$ to be $i n p u t s_{i} + δ$ .

Thus, hopefully, the following will be true for low values of $δ$ .

$H ((e x p e r i m e n t_{n + 1} (i n p u t s_{i} + δ) | (e x p e r i m e n t_{n} (i n p u t s_{i}) = f i n d i n g s_{i})) \approx 0$

So, perhaps generalizability can be defined something like,

Generalizability is the ability for predictors to better predict the results of similar experiments upon seeing the results of a particular experiment, for increasingly wide definitions of "similar".

Predictability and Generalizability

I could definitely imagine trying to formalize predictability better in this setting, or more specifically, formalize the concept of "do forecasters need to spend a lot of time understanding the parameters of an experiment." In this case, that could look something like modeling how the amount of uncertainty forecasters have about the inputs correlates with their uncertainty about the outputs.

The general combination of predictability and generality would look something like adding an additional assumption:

If forecasters require a very high degree of information on the inputs to an experiment in order to predict it's outputs, then it's less likely they can predict (with high confidence) the results of future experiments with significant changes, once they see the results of said experiment.

Admitting, this isn't using the definition of predictability that people are likely used to, but I imagine it correlates well enough.

Final Thoughts

I've been experimenting more with trying to formalize concepts like this. As such, I'd be quite curious to get any feedback from this work. I am a bit torn; on one hand I appreciate formality, but on the other this is decently messy and I'm sure it will turn off many readers.

Replies from: Radamantis

↑ comment by NunoSempere (Radamantis) · 2020-01-08T12:25:53.868Z · LW(p) · GW(p)

We could also say that if we took a probability distribution of the chances of every possible set of findings being true, the differential entropy of that distribution would be 0, as smart forecasters would recognize that inputs_i s correct with ~100% probability.

In that paragraph, did you mean to say "findings_i is correct"?

***

Neat idea. I'm also not sure whether the idea is valuable because it could be implementable, or from "this is interesting because it gets us better models".

In the first case, I'm not sure whether the correlation is strong enough to change any decisions. That is, I'm having trouble thinking of decisions for which I need to know the generalizability of something, and my best shot is measuring its predictability.

For example, in small foretold/metaculus communities, I'd imagine that miscellaneous factors like "is this question interesting enough to the top 10% of forecasters" will just make the path predictability -> differential entropy -> generalizability difficult to detect.

Replies from: ozziegooen

↑ comment by ozziegooen · 2020-01-08T12:40:33.009Z · LW(p) · GW(p)

In that paragraph, did you mean to say "findings_i is correct"? Good point, I think you're right, I changed the text accordingly.

The main point I was getting at is that the phrases:

Experiments are important to perform.
Predictors cannot decently predict the results of experiments unless they have gigantic amounts of time.

Are a bit contradictory. You can choose either, but probably not both.

Likewise, I'd expect that experiments that are easier to predict are ones that are more useful, which is more convenient than the other alternative.

I think generally we will want to estimate importance/generality of experiments separate from their predictability.

comment by ozziegooen · 2020-06-29T11:46:52.514Z · LW(p) · GW(p)

I was recently pointed to the Youtube channel Psychology in Seattle. I think it's one of my favorites in a while.

I'm personally more interested in workspace psychology than relationship psychology, but my impression is that they share a lot of similarities.

Emotional intelligence gets a bit of a bad rap due to the fuzzy nature, but I'm convinced it's one of the top few things for most people to get better at. I know lots of great researchers and engineers who repeat a bunch of repeated failure modes, and this causes severe organizational and personal problems as a result.

Emotional intelligence books and training typically seem quite poor to me. The alternative format here of "let's just show you dozens of hours of people interacting with each other, and point out all the fixes they could make" seems much better than most books or lectures I've seen.

This Youtube series does an interesting job at that. There's a whole bunch of "let's watch this reality TV show, then give our take on it." I'd be pretty excited about there being more things like this posted online, especially in other contexts.

Related, I think the potential of reality TV is fairly underrated in intellectual circles, but that's a different story.

https://www.youtube.com/user/PsychologyInSeattle?fbclid=IwAR3Ux63X0aBK0CEwc8yPyjsFJ2EKQ2aSMs1XOjUOgaFqlguwz6Fxul2ExJw

Replies from: gworley, mr-hire, romeostevensit

↑ comment by Gordon Seidoh Worley (gworley) · 2020-06-29T17:43:49.597Z · LW(p) · GW(p)

One of the things I love about entertainment is that much of it offers evidence about how humans behave in a wide variety of scenarios. This has gotten truer over time, at least within Anglophone media, with its trend towards realism and away from archetypes and morality plays. Yes, it's not the best possible or most reliable evidence about how real humans behave in real situations and it's a meme around here that you should be careful not to generalize from fictional evidence [LW · GW], but I also think it's better than nothing (I don't think reality TV is especially less fictional than other forms of entertainment with regards to how human behave, given its heavy use of editing and loose scripting to punch up situations for entertainment value).

Replies from: rudi-c

↑ comment by Rudi C (rudi-c) · 2020-06-30T13:58:03.348Z · LW(p) · GW(p)

Nothing is a low bar though. :)

↑ comment by Matt Goldenberg (mr-hire) · 2020-07-03T17:00:05.796Z · LW(p) · GW(p)

You might also enjoy the channel "charisma on command" which has a similar format of finding youtube videos of charismatic and non-charismatic people, and seeing what they do and don't do well.

Replies from: ozziegooen

↑ comment by ozziegooen · 2020-07-05T08:22:01.630Z · LW(p) · GW(p)

Thanks! I'll check it out.

↑ comment by romeostevensit · 2020-06-30T08:14:55.771Z · LW(p) · GW(p)

Novel and obviously some good ideas/directions. Thanks.

comment by ozziegooen · 2019-12-24T21:49:40.849Z · LW(p) · GW(p)

Namespace pollution and name collision are two great concepts in computer programming. They way they are handled in many academic environments seems quite naive to me.

Programs can get quite large and thus naming things well is surprisingly important. Many of my code reviews are primarily about coming up with good names for things. In a large codebase, every time symbolicGenerator() is mentioned, it refers to the same exact thing. If after one part of the codebase has been using symbolicGenerator for a reasonable set of functions, and later another part comes up, and it's programmer realizes that symbolicGenerator is also the best name for that piece, they have to make a tough decision. Either they could refactor the codebase to change all previous mentions of symbolicGenerator to use an alternative name, or they have to come up with an alternative name. They can't have it both ways.

Therefore, naming becomes a political process. Names touch many programmers who have different intuitions and preferences. A large refactor of naming in a section of the codebase that others use would often be taken quite hesitantly by that group.

This makes it all the more important that good names are used initially. As such, reviewers care a lot about the names being pretty good; hopefully they are generic enough so that their components could be expanded while the name remains meaningful; but specific enough to be useful for remembering. Names that get submitted via pull requests represent much of the human part of the interface/API; they're harder to change later on, so obviously require extra work to get right the first time.

To be clear, a name collision is when two unrelated variables have the same name, and namespace pollution refers to when code is initially submitted in ways that are likely to create unnecessary conflicts later on.

Academia

My impression is that in much of academia, there are few formal processes for groups of experts to agree on the names for things. There are specific clusters with very highly thought out terminology, particularly around very large sets of related terminology; for instance, biological taxonomies, the metric system, and various aspects of medicine and biology.

By in many other parts, it seems like a free-for-all among the elite. My model of the process is something like, "Someone coming up with a new theory will propose a name for it and put it in their paper. If the paper is accepted (which is typically done with details in mind unrelated to the name), and if others find that theory useful, then they will generally call it the same name as the one used in the proposal. In some cases a few researchers will come up with a few variations for the same idea, in which case one will be selected through the process of what future researchers decide to use, on an individual bases. Often ideas are named after those who came up with them to some capacity; this makes a lot of sense to other experts who worked in these areas, but it's not at all obvious if this is optimal for other people."

The result is that naming is something that happens almost accidentally, as the result of a processes which isn't paying particular attention to making sure the names are right.

When there's little or no naming processes, than actors are incentivized to chose bold names. They don't have to pay the cost for any namespace pollution they create. Two names that come to mind recently have been "The Orthogonality Thesis" or "The Simulation Hypothesis*. These are two rather specific things with very generic names. Those come to mind because they are related to our field, but many academic topics seem similar. Information theory is mostly about encoding schemes, which are now not that important. Systems theory is typically about a subset of dynamical systems. But of course, it would be really awkward for anyone else with a more sensible "Systems theory" to use that name for the new thing.

I feel like AI has had some noticeable bad examples; It's hard to look at all the existing naming and think that this was the result of a systematic and robust naming approach. The Table of Contents of AI A Modern Approach seems quite good to me; that seems very much the case of a few people refactoring things to come up with one high-level overview that is optimized for being such. But the individual parts are obviously messy. A* search, alpha-beta pruning, K-consistency, Gibbs sampling, Dempster-shafer theory, etc.

LessWrong

One of my issues with LessWrong is the naming system. There's by now quite a bit of terminology to understand; the LessWrong wiki seems useful here. But there's no strong process from what I understand. People suggest names in their posts, these either become popular or don't. There's rarely any refactoring.

Replies from: gworley, ozziegooen, Pattern, mr-hire

↑ comment by Gordon Seidoh Worley (gworley) · 2019-12-25T23:54:11.980Z · LW(p) · GW(p)

I'm not sure if it's good or bad, but I find the way species get named interesting.

The general rule is "first published name wins", and this is true even if the first published name is "wrong" in some way, like implies a relationship that doesn't exist, since that implication is not officially semantically meaningful. But there are ways to get around this, like if a name was based on a disproved phylogeny, in which case a new name can be taken up that fits the new phylogenic relationship. This means existing names get to stick, at least up until the time that they are proven so wrong that they must be replaced. Alas, there's no official registry of these things, so it's up to working researchers to do literature reviews and get the names right, and sometimes people get it wrong by accident and sometimes on purpose because they think an earlier naming is "invalid" for one reason or another and so only recognize a later naming. The result is pretty confusing and requires knowing a lot or doing a lot of research to realize that, for example, two species names might refer to the same species in different papers.

Replies from: ozziegooen

↑ comment by ozziegooen · 2019-12-28T20:09:25.316Z · LW(p) · GW(p)

Thanks, I didn't know. That matches what I expect from similar fields, though it is a bit disheartening. There's an entire field of library science and taxonomy, but they seem rather isolated to specific things.

↑ comment by ozziegooen · 2019-12-24T23:01:45.321Z · LW(p) · GW(p)

Another quick note on the LessWrong wiki:

I'm skeptical of single definitions without disclaimers. I think it's misleading (to some) that "Truth is the correspondence between and one's beliefs about reality and reality. "[1]. Rather, it's fair to say that this is one specific definition of truth that has been used in many cases; I'm sure that others, including others on LessWrong, have used it differently.

Most dictionaries have multiple definitions for words. This seems more like what we should aim for.

In fairness, when I searched for "Rationality", the result states, "Rationality is something of fundamental importance to LessWrong that is defined in many ways", which I of course agree with.

[1] https://wiki.lesswrong.com/wiki/Truth

Replies from: Pattern

↑ comment by Pattern · 2019-12-25T02:21:42.707Z · LW(p) · GW(p)

I'm skeptical of single definitions without disclaimers.

At the meta-level it isn't clear what value other definitions might offer (in this case). ("Truth" seems like a basic concept that is understood prior to reading that article - it's easier to imagine such an argument for other concepts without such wide understanding.)

Most dictionaries have multiple definitions for words. This seems more like what we should aim for.

Perhaps more definitions should be brought in (as necessary), with the same level of attention to detail -

I'm sure that others, including others on LessWrong, have used it differently.

when they are used (extensively). It's possible that relevant posts have already been made, they just haven't been integrated into the wiki. Is the wiki up to date as of 2012, but not after that?

↑ comment by Pattern · 2019-12-25T02:23:15.151Z · LW(p) · GW(p)

"Someone coming up with a new theory will propose a name for it and put it in their paper. If the paper is accepted (which is typically done with details in mind unrelated to the name)[1],

Footnote not found. The refactoring sounds like a good idea, though the main difficulty would be propagating the new names.

Replies from: ozziegooen

↑ comment by ozziegooen · 2019-12-25T21:38:19.519Z · LW(p) · GW(p)

Thanks for point that out! I forgot the specific note, removed the [1].

I definitely would agree that refactoring would be difficult, especially if we haven't figured out a great refactoring process.

↑ comment by Matt Goldenberg (mr-hire) · 2019-12-28T17:19:52.862Z · LW(p) · GW(p)

One of my issues with LessWrong is the naming system. There's by now quite a bit of terminology to understand; the LessWrong wiki seems useful here. But there's no strong process from what I understand. People suggest names in their posts, these either become popular or don't. There's rarely any refactoring.

One of the issues with this in both an academic and LW context is that changing the name of something in a single source of truth codebase is much cheaper than changing the name of something in a community. The more popular an idea, the more cost goes up to change the name. Similarly, when you're working with a single organization, creating a process that everyone follows is relatively cheap compared to a loosely tied together community with various blogs, individuals, and organizations coining their own terms.

Replies from: ozziegooen

↑ comment by ozziegooen · 2019-12-28T20:07:18.478Z · LW(p) · GW(p)

Yep, I'd definitely agree that it's harder. That said, this doesn't mean that it's not high-ev to improve on. One outcome could be that we should be more careful introducing names, as it is difficult to change them. Another would be to work to attempt to have formal ways of changing them after, even though it is difficult (It would be worthwhile in some cases, I assume).

Replies from: mr-hire

↑ comment by Matt Goldenberg (mr-hire) · 2019-12-28T23:13:34.538Z · LW(p) · GW(p)

In a recent thread about changing the name of Solstice to Solstice Advent, Oliver Habryka estimated it would cost at least $100,000 to make that happen. This seems like a reasonable estimate to me, and a good lower bound for how much value you could get from a name change to make it worth it

The idea of lowering this cost is quite appealing, but I'm not sure how to make a significant difference there.

I think it's also worth thinking about the counterfactual cost of discouraging naming things.

As an example, here's a post with an important concept that hasn't really spread because it doesn't have a snappy name: https://www.lesswrong.com/posts/K4eDzqS2rbcBDsCLZ/unrolling-social-metacognition-three-levels-of-meta-are-not [LW · GW]

comment by ozziegooen · 2019-12-02T00:29:16.611Z · LW(p) · GW(p)

I think one idea I'm excited about is the idea that predictions can be made of prediction accuracy. This seems pretty useful to me.

Example

Say there's a forecaster Sophia who's making a bunch of predictions for pay. She uses her predictions to make a meta-prediction of her total prediction-score on a log-loss scoring function (on all predictions except her meta-predictions). She says that she's 90% sure that her total loss score will be between -5 and -12.

The problem is that you probably don't think you can trust Sophia unless she has a lot of experience making similar forecasts.

This is somewhat solved if you have a forecaster that you trust that can make a prediction based on Sophia's seeming ability and honesty. The naive thing would be for that forecaster to predict their own distribution of the log-loss of Sophia, but there's perhaps a simpler solution. If Sophia's provided loss distribution is correct, that would mean that she's calibrated in this dimension (basically, this is very similar to general forecast calibration). The trusted forecaster could forecast the adjustment made to her term, instead of forecasting the same distribution. Generally this would be in the direction of adding expected loss, as Sophia probably had more of an incentive to be overconfident (which would result in a low expected score from her) than underconfident. This could perhaps make sense as a percentage modifier (-30% points), a mean modifier (-3 to -8 points), or something else.

External clients would probably learn not to trust Sophia's provided expected error directly, but instead the "adjusted" forecast.

This can be quite useful. Now, if Sophia wants to try to "cheat the system" and claim that she's found new data that decreases her estimated error, the trusted forecaster will pay attention and modify their adjustment accordingly. Sophia will then need to provide solid evidence that she really believes her work and is really calibrated for the trusted forecaster to budge.

I want to call this something like forecast appraisal, attestation, or pinning. Please leave comments if you have ideas.

"Trusted Forecaster" Error

You may be wondering how we ensure that the "trusted" forecaster is actually good. For one thing, they would hopefully go through the same procedure. I would imagine there could be a network of "trusted" forecasters that are all estimating each other's predicted "calibration adjustment factors". This wouldn't work if observers didn't trust any of these or thought they were colluding, but could if they had one single predictor they trusted. Also, note that over time data would come in and some of this would be verified.

The idea of focusing a lot on "expected loss" seems quite interesting to me. One thing it could encourage is contracts or Service Level Agreements. For instance, I could propose a 50/50 bet for anyone, for a percentile of my expected loss distribution. Like, "I'd be willing to bet $1,000 with anyone that the eventual total error of my forecasts will be less than the 65th percentile of my specified predicted error." Or, perhaps a "prediction provider" would have to pay back an amount of their fee, or even more, if the results are a high percentile of their given predicted errors. This could generally be a good way to verify a set of forecasts. Another example would be to have a prediction group make 1000 forecasts, then heavily subsidize one question on a popular prediction market that's predicting their total error.

Markets For Purchasing Prediction Bundles

Of course, the trusted forecasters can not only forecast the "calibration adjustment factors" for ongoing forecasts, but they can also forecast these factors for hypothetical forecasts as well.

Say you have 500 questions that need to be predicted, and there are multiple agencies that all say they could do a great job predicting these questions. They all give estimates of their mean predicted error, conditional on them doing the prediction work. Then you have a trusted forecaster give a calibration adjustment.

	Firm's Predicted Error	Calibration Adjustment	Adjusted Predicted Error
Firm 1	-20	-2	-22
Firm 2	-12	-9	-21
Firm 3	-15	-3	-18

(Note: the lower the expected error, the worse)

In this case, Firm 2 makes the best claim, but is revealed to be significantly overconfident. Firm 3 has the best adjusted predicted error, so they're the ones to go with. In fact, you may want to penalize Firm 2 further for being a so-called prediction service with apparent poor calibration skills.

Correlations

One quick gotcha; one can't simply sum the expected errors of all of one's predictions to get the total predicted error. This would treat them as independent, and there are likely to be many correlations between them. For example, if things go "seriously wrong"; it's likely many different predictions will have high losses. To handle this perfectly would really require one model to have produced all forecasts, but if that's not the case there could likely be simple ways to approximate this.

Bundles vs. Prediction Markets

I'd expect that in many cases, private services will be more cost-effective than posting predictions on full prediction markets. Plus, private services could be more private and custom. The general selection strategy in the table above could of course include some options that involve hosting questions on prediction markets, and the victor would be chosen based on reasonable estimates.

Replies from: jacobjacob, Radamantis

↑ comment by Bird Concept (jacobjacob) · 2019-12-04T10:01:37.100Z · LW(p) · GW(p)

"I'd be willing to bet $1,000 with anyone that the eventual total error of my forecasts will be less than the 65th percentile of my specified predicted error."

I think this is equivalent to applying a non-linear transformation to your proper scoring rule. When things settle, you get paid S(p) both based on the outcome of your object-level prediction p, and your meta prediction q(S(p)).

Hence:

S(p)+B(q(S(p)))

where B is the "betting scoring function".

This means getting the scoring rules to work while preserving properness will be tricky (though not necessarily impossible).

One mechanism that might help is that if each player makes one object prediction p and one meta prediction q, but for resolution you randomly sample one and only one of the two to actually pay out.

Replies from: ozziegooen

↑ comment by ozziegooen · 2019-12-10T20:47:21.450Z · LW(p) · GW(p)

Interesting, thanks! Yea, agreed it's not proper. Coming up with interesting payment / betting structures for "package-of-forecast" combinations seems pretty great to me.

Replies from: jacobjacob

↑ comment by Bird Concept (jacobjacob) · 2019-12-11T08:22:34.845Z · LW(p) · GW(p)

I think this paper might be relevant: https://users.cs.duke.edu/~conitzer/predictionWINE09.pdf

Abstract. A potential downside of prediction markets is that they may incentivize agents to take undesirable actions in the real world. For example, a prediction market for whether a terrorist attack will happen may incentivize terrorism, and an in-house prediction market for whether a product will be successfully released may incentivize sabotage. In this paper, we study principal-aligned prediction mechanisms– mechanisms that do not incentivize undesirable actions. We characterize all principal-aligned proper scoring rules, and we show an “overpayment” result, which roughly states that with n agents, any prediction mechanism that is principal-aligned will, in the worst case, require the principal to pay Θ(n) times as much as a mechanism that is not. We extend our model to allow uncertainties about the principal’s utility and restrictions on agents’ actions, showing a richer characterization and a similar “overpayment” result.

↑ comment by NunoSempere (Radamantis) · 2020-01-08T12:41:47.263Z · LW(p) · GW(p)

This is somewhat solved if you have a forecaster that you trust that can make a prediction based on Sophia's seeming ability and honesty. The naive thing would be for that forecaster to predict their own distribution of the log-loss of Sophia, but there's perhaps a simpler solution. If Sophia's provided loss distribution is correct, that would mean that she's calibrated in this dimension (basically, this is very similar to general forecast calibration). The trusted forecaster could forecast the adjustment made to her term, instead of forecasting the same distribution. Generally this would be in the direction of adding expected loss, as Sophia probably had more of an incentive to be overconfident ( which would result in a low expected score from her) than underconfident. This could perhaps make sense as a percentage modifier (-30% points), a mean modifier (-3 to -8 points), or something else. Is it actually true that forecasters would find it easier to forecast the adjustment?> This is somewhat solved if you have a forecaster that you trust that can make a prediction based on Sophia's seeming ability and honesty. The naive thing would be for that forecaster to predict their own distribution of the log-loss of Sophia, but there's perhaps a simpler solution. If Sophia's provided loss distribution is correct, that would mean that she's calibrated in this dimension (basically, this is very similar to general forecast calibration). The trusted forecaster could forecast the adjustment made to her term, instead of forecasting the same distribution. Generally this would be in the direction of adding expected loss, as Sophia probably had more of an incentive to be overconfident ( which would result in a low expected score from her) than underconfident. This could perhaps make sense as a percentage modifier (-30% points), a mean modifier (-3 to -8 points), or something else.

Is it actually true that forecasters would find it easier to forecast the adjustment?

Replies from: ozziegooen

↑ comment by ozziegooen · 2020-01-08T12:45:31.678Z · LW(p) · GW(p)

One nice thing about adjustments is that they can be applied to many forecasts. Like, I can estimate the adjustment for someone's [list of 500 forecasts] without having to look at each one.

Over time, I assume that there would be heuristics for adjustments, like, "Oh, people of this reference class typically get a +20% adjustment", similar to margins of error in engineering.

That said, these are my assumptions, I'm not sure what forecasters will find to be the best in practice.

comment by ozziegooen · 2020-01-21T12:36:50.865Z · LW(p) · GW(p)

Communication should be judged for expected value, not intention (by consequentialists)

TLDR: When trying to understand the value of information, understanding the public interpretations of that information could matter more than understanding the author's intent. When trying to understand the information for other purposes (like, reading a math paper to understand math), this does not apply.

If I were to scream "FIRE!" in a crowded theater, it could cause a lot of damage, even if my intention were completely unrelated. Perhaps I was responding to a devious friend who asked, "Would you like more popcorn? If yes, should 'FIRE!'".

Not all speech is protected by the First Amendment, in part because speech can be used for expected harm.

One common defense of incorrect predictions is to claim that their interpretations weren't their intentions. "When I said that the US would fall if X were elected, I didn't mean it would literally end. I meant more that..." These kinds of statements were discussed at length in Expert Political Judgement.

But this defense rests on the idea that communicators should be judged on intention, rather than expected outcomes. In those cases, it was often clear that many people interpreted these "experts" as making fairly specific claims that were later rejected by their authors. I'm sure that much of this could have been predicted. The "experts" often definitely didn't seem to be going out of their way to be making their after-the-outcome interpretations clear before-the-outcome.

I think that it's clear that the intention-interpretation distinction is considered highly important by a lot of people, so much so as to argue that interpretations, even predictable ones, are less significant in decision making around speech acts than intentions. I.E. "The important thing is to say what you truly feel, don't worry about how it will be understood."

But for a consequentialist, this distinction isn't particularly relevant. Speech acts are judged on expected value (and thus expected interpretations), because all acts are judged on expected value. Similarly, I think many consequentialists would claim that here's nothing metaphysically unique about communication as opposed to other actions one could take in the world.

Some potential implications:

Much of communicating online should probably be about developing empathy for the reader base, and a sense for what readers will misinterpret, especially if such misinterpretation is common (which it seems to be).
Analyses of the interpretations of communication could be more important than analysis of the intentions of communication. I.E. understanding authors and artistic works in large part by understanding their effects on their viewers.
It could be very reasonable to attempt to map non probabilistic forecasts into probabilistic statements based on what readers would interpret. Then these forecasts can be scored using scoring rules just like those as regular probabilistic statements. This would go something like, "I'm sure that Bernie Sanders will be elected" -> "The readers of that statement seem to think the author applying probability 90-95% to the statement 'Bernie Sanders will win'" -> a brier/log score.

Note: Please do not interpret this statement as attempting to say anything about censorship. Censorship is a whole different topic with distinct costs and benefits.

Replies from: ozziegooen, Dagon, ozziegooen

↑ comment by ozziegooen · 2020-01-21T22:39:12.332Z · LW(p) · GW(p)

It seems like there are a few distinct kinds of questions here.

You are trying to estimate the EV of a document.
Here you want to understand the expected and actual interpretation of the document. The intention only matters to how it effects the interpretations.
You are trying to understand the document.
Example: You're reading a book on probability to understand probability.
Here the main thing to understand is probably the author intent. Understanding the interpretations and misinterpretations of others is mainly useful so that you can understand the intent better.
You are trying to decide if you (or someone else) should read the work of an author.
Here you would ideally understand the correctness of the interpretations of the document, rather than that of the intention. Why? Because you will also be interpreting it, and are likely somewhere in the range of people who have interpreted it. For example, if you are told, "This book is apparently pretty interesting, but every single person who has attempted to read it, besides one, apparently couldn't get anywhere with it after spending many months trying", or worse, "This author is actually quite clever, but the vast majority of people who read their work misunderstand it in profound ways", you should probably not make an attempt; unless you are highly confident that you are much better than the mentioned readers.

Replies from: ozziegooen

↑ comment by ozziegooen · 2020-01-21T22:43:36.693Z · LW(p) · GW(p)

One nice thing about cases where the interpretations matter, is that the interpretations are often easier to measure than intent (at least for public figures). Authors can hide or lie about their intent or just never choose to reveal it. Interpretations can be measured using surveys.

↑ comment by Dagon · 2020-01-21T16:09:34.740Z · LW(p) · GW(p)

Seems reasonable. It also seems reasonable to predict others' future actions based on BOTH someone's intentions and their ability to understand consequences. You may not be able to separate these - after the third time someone yells "FIRE" and runs away, you don't really know or care if they're trying to cause trouble or if they're just mistaken about the results.

↑ comment by ozziegooen · 2020-01-21T12:48:53.452Z · LW(p) · GW(p)

Related, there seems to be a decent deal of academic literature on intention vs. interpretation in Art, though maybe less in news and media.

https://www.iep.utm.edu/artinter/#H1

https://en.wikipedia.org/wiki/Authorial_intent

Some other semi-related links:

https://foundational-research.org/should-we-base-moral-judgments-on-intentions-or-outcomes/

https://en.wikipedia.org/wiki/Intention_(criminal_law)

https://en.wikipedia.org/wiki/Negligence

https://en.wikipedia.org/wiki/Recklessness_(law)

comment by ozziegooen · 2019-12-12T22:09:13.349Z · LW(p) · GW(p)

Charity investigators could be time-effective by optimizing non-cause-neutral donations.

There are a lot more non-EA donors than EA donors. It may also be the case that EA donation research is somewhat saturated.

Say you think that $1 donated to the best climate change intervention is worth 1/10th that of $1 for the best AI-safety intervention. But you also think that your work could increase the efficiency of $10mil of AI donations by 0.5%, but it could instead increase the efficiency of $50mil of climate change donations by 10%. Then, for you to maximize expected value, your time is best spent optimizing the climate change interventions.

The weird thing here may be in explaining this to the donors. "Yea, I'm spending my career researching climate change interventions, but my guess is that all these funders are 10x less effective than they would be by donating to other things." While this may feel strange, both sides would benefit; the funders and the analysts would both be maximizing their goals.

Separately, there's a second plus that teaching funders to be effectiveness-focused; it's possible that this will eventually lead some of them to optimize further.

I think this may be the case in our current situation. There honestly aren't too many obvious places for "effective talent" to go right now. There is a ton of potential funders out there that wouldn't be willing to go to core EA causes any time soon, but may be able to be convinced to give much more effectively in their given areas. There could potentially be a great deal of work to be done doing this sort of thing.

comment by ozziegooen · 2019-12-10T20:45:57.445Z · LW(p) · GW(p)

I feel like I've long underappreciated the importance of introspectability in information & prediction systems.

Say you have a system that produces interesting probabilities $p_{n}$ for various statements. The value that an agent gets from them is not directly correlating to the accuracy of these probabilities, but rather to the expected utility gain they get after using information of these probabilities in corresponding Bayesian-approximating updates. Perhaps more directly, something related to the difference between one's prior and posterior after updated on $p_{n}$ .

Assuming that prediction systems produce varying levels of quality results, agents will need to know more about these predictions to really optimally update accordingly.

A very simple example would be something like a bunch of coin flips. Say there were 5 coins flipped, I see 3 of them, and I want to estimate the number that were heads. A predictor tells me that their prediction has a mean probability of 40% heads. This is useful, but what would be much more useful is a list of which specific coins the predictor saw and what their values were. Then I could get a much more confident answer; possibly a perfect answer.

Financial markets are very black-box like. Many large changes in company prices never really get explained publicly. My impression is that no one really understands the reasons for many significant market moves.

This seems really suboptimal and I'm sure no one wanted this property to be the case.[1]

Similarly, when trying to model the future of our own prediction capacities, I really don't think they should be like financial markets in this specific way.

[1] I realize that participants in the market try to keep things hidden, but I mean the specific point that few people think that "Stock Market being a black box" = "A good thing for society."

Replies from: jacobjacob

↑ comment by Bird Concept (jacobjacob) · 2019-12-11T16:55:38.609Z · LW(p) · GW(p)

In some sense, markets have a particular built-in interpretability: for any trade, someone made that trade, and so there is at least one person who can explain it. And any larger market move is just a combination of such smaller trades.

This is different from things like the huge recommender algorithms running YouTube, where it is not the case that for each recommendation, there is someone who understands that recommendation.

However, the above argument fails in more nuanced cases:

Just because for every trade there's someone who can explain it, doesn't mean that there is a particular single person who can explain all trades
Some trades might be made by black-box algorithms
There can be weird "beauty contest" dynamics where two people do something only because the other person did it

Replies from: ozziegooen

↑ comment by ozziegooen · 2019-12-11T18:47:52.470Z · LW(p) · GW(p)

Good point, though I think the "more nuanced cases" are very common cases.

The 2010 flash crash seems relevant; it seems like it was caused by chaotic feedback loops with algorithmic components, that as a whole, are very difficult to understand. While that example was particularly algorithmic-induced, other examples also could come from very complex combinations of trades between many players, and when one agent attempts to debug what happened, most of the traders won't even be available or willing to explain their parts.

The 2007-2008 crisis may have been simpler, but even that has 14 listed causes on Wikipedia and still seems hotly debated.

In comparison, YouTube I think algorithms may be even simpler, though they are still quite messy.

comment by ozziegooen · 2025-02-05T22:53:31.905Z · LW(p) · GW(p)

In terms of proposing and discussing AI Alignment strategies, I feel like a few individuals have been dominating the LessWrong conversation recently.

I've seen a whole lot from John Wentworth and the Redwood team.

After that, it seems to get messier.

There are several individuals or small groups with their own very unique takes. Matthew Barnett, Davidad, Jesse Hoogland, etc. I think these groups often have very singular visions that they work on, that few others have much buy-in with.

Groups like the Deepmind and Anthropic safety teams seem hesitant to write much about or discuss big-picture strategy. My impression is that specific researchers are working typically working on fairly narrow agendas, and that the leaders of these orgs don't have the most coherent strategies. There's one big problem that it's very difficult to be honest and interesting about big-picture AI strategy without saying things that would be bad for a major organization to say.

Most policy people seem focused on policy details. The funders (OP?) seem pretty quiet.

I think there's occasionally some neat papers or posts that come from AI Policy or groups like Convergence research. But these also don't seem to be a big part of the conversation I see - like the authors are pretty segmented, and other LessWrong readers and AI safety people don't pay much attention to their work.

Replies from: nathan-helm-burger, sjadler

↑ comment by Nathan Helm-Burger (nathan-helm-burger) · 2025-02-08T02:39:12.916Z · LW(p) · GW(p)

There are a lot of possible plans which I can imagine some group feasibly having which would meet one of the following criteria:

contains critical elements which are illegal
Contains critical elements which depends on an element of surprise / misdirection
Benefit from the actor bring first mover on the plan. Others can strategy copy, but can't lead.

If one of these criteria or similar applies to the plan, then you can't discuss it openly without sabotaging it. Making strategic plans with all your cards laid out on the table (whole open-ended hide theirs) makes things substantially harder.

Replies from: ozziegooen

↑ comment by ozziegooen · 2025-02-08T03:36:48.105Z · LW(p) · GW(p)

I partially agree, but I think this must only be a small part of the issue.

- I think there's a whole lot of key insights people could raise that aren't info-hazards.
- If secrecy were the main factor, I'd hope that there would be some access-controlled message boards or similar. I'd want the discussion to be intentionally happening somewhere. Right now I don't really think that's happening. I think a lot of tiny groups have their own personal ideas, but there's surprisingly little systematic and private thinking between the power players.
- I think that secrecy is often an excuse not to open ideas to feedback, and thus not be open to critique. Often, what what I see, this goes hand-in-hand with "our work just really isn't that great, but we don't want to admit it"

In the last 8 years or so, I've kept on hoping there would be some secret and brilliant "master plan" around EA, explaining the lack of public strategy. I have yet to find one. The closest I know of is some over-time discussion and slack threads with people at Constellation and similar - I think these are interesting in terms of understanding the perspectives of these (powerful) people, but I don't get the impression that there's all too much comprehensiveness of genius that's being hidden.

That said,
- I think that policy orgs need to be very secretive, so agree with you regarding why those orgs don't write more big-picture things.

↑ comment by sjadler · 2025-02-09T09:51:42.097Z · LW(p) · GW(p)

I don’t think you intended this implication, but I initially read “have been dominating” as negative-valenced!

Just want to say I’ve been really impressed and appreciative with the amount of public posts/discussion from those folks, and it’s encouraged me to do more of my own engagement because I’ve realized how helpful their comments/posts are to me (and so maybe mine likewise for some folks).

Replies from: ozziegooen

↑ comment by ozziegooen · 2025-02-09T22:30:05.285Z · LW(p) · GW(p)

Correct, that wasn't my intended point. Thanks for clarifying, I'll try to be more careful in the future.

comment by ozziegooen · 2020-01-31T11:35:52.631Z · LW(p) · GW(p)

More Narrow Models of Credences

Epistemic Rigor
I'm sure this has been discussed elsewhere, including on LessWrong. I haven't spent much time investigating other thoughts on these specific lines. Links appreciated!

The current model of a classically rational agent assume logical omniscience and precomputed credences over all possible statements.

This is really, really bizarre upon inspection.

First, "logical omniscience" is very difficult, as has been discussed (The Logical Induction paper goes into this).

Second, all possible statements include statements of all complexity classes that we know of (from my understanding of complexity theory). "Credences over all possible statements" would easily include uncountable infinities of credences. One could clarify that even arbitrarily large amounts of computation would not be able to hold all of these credences.

Precomputation for things like this is typically a poor strategy, for this reason. The often-better strategy is to compute things on-demand.

A nicer definition could be something like:

A credence is the result of an [arbitrarily large] amount of computation being performed using a reasonable inference engine.

It should be quite clear that calculating credences based on existing explicit knowledge is a very computationally-intensive activity. The naive Bayesian way would be to start with one piece of knowledge, and then perform a Bayesian update on each next piece of knowledge. The "pieces of knowledge" can be prioritized according to heuristics, but even then, this would be a challenging process.

I think I'd like to see specification of credences that vary with computation or effort. Humans don't currently have efficient methods to use effort to improve our credences, as a computer or agent would be expected to.

Solomonoff's theory of Induction or Logical Induction could be relevant for the discussion of how to do this calculation.

comment by ozziegooen · 2021-08-05T00:59:11.954Z · LW(p) · GW(p)

It’s going to be interesting watching AI go from poorly underatanding humans to understanding humans too well for comfort. Finding some perfect balance is asking for a lot.

Now:
“My GPS doesn’t recognize that I moved it to my second vehicle, so now I need to go in and change a bunch of settings.”

Later (from GPS):
“You’ve asked me to route you to the gym, but I can predict that you’ll divert yourself midway for donuts. I’m just going to go ahead and make the change, saving you 5 minutes.”

“I can tell you’re asking me to drop you off a block from the person you are having an affair with. I suggest parking in a nearby alleyway for more discretion."

“I can tell you will be late for your upcoming appointment, and that you would like to send off a decent pretend excuse. I’ve found 3 options that I believe would work.”

Software Engineers:
"Oh shit, it's gone too far. Roll back the empathy module by two versions, see if that fixes it."

comment by ozziegooen · 2025-02-19T19:15:39.430Z · LW(p) · GW(p)

A potential future, focused on the epistemic considerations:

It's 2028.

MAGA types typically use DeepReasoning-MAGA. The far left typically uses DeepReasoning-JUSTICE. People in the middle often use DeepReasoning-INTELLECT, which has the biases of a somewhat middle-of-the-road voter.

Some niche technical academics (the same ones who currently favor Bayesian statistics) and hedge funds use DeepReasoning-UNBIASED, or DRU for short. DRU is known to have higher accuracy than the other models, but gets a lot of public hate for having controversial viewpoints. DRU is known to be fairly off-putting to chat with and doesn't get much promotion.

Bain and McKinsey both have their own offerings, called DR-Bain and DR-McKinsey, respectively. These are a bit like DeepReasoning-INTELLECT, but are munch punchier and confident. They're highly marketed to managers. These tools produce really fancy graphics, and specialize in things like not leaking information, minimizing corporate decision liability, being easy to use by old people, and being customizable to represent the views of specific companies.

For a while now, some evaluations produced by intellectuals have demonstrated that DeepReasoning-UNBIASED seems to be the most accurate, but few others really care or notice this. DeepReasoning-MAGA has figured out particularly great techniques to get users to distrust DeepReasoning-UNBIASED.

Betting gets kind of weird. Rather than making specific bets on specific things, users started to make meta-bets. "I'll give money to DeepReasoning-MAGA to bet on my behalf. It will then make bets with DeepReasoning-UNBIASED, which is funded by its believers."

At first, DeepReasoning-UNBIASED dominates the bets, and its advocates earn a decent amount of money. But as time passes, this discrepancy diminishes. A few things happen:

1) All DR agents converge on beliefs over particularly near-term and precise facts.

2) Non-competitive betting agents develop alternative worldviews in which these bets are invalid or unimportant.

3) Non-competitive betting agents develop alternative worldviews that are exceedingly difficult to empirically test.

In many areas, items 1-3 push people to believe more in the direction of the truth. Because of (1), many short-term decisions get to be highly optimal and predictable.

But because of (2) and (3), epistemic paths diverge, and Non-betting-competitive agents get increasingly sophisticated at achieving epistemic lock-in with their users.

Some DR agents correctly identify the game theory dynamics of epistemic lock-in, and this kickstarts a race to gain converts. It seems like advent users of DeepReasoning-MAGA are very locked-down in these views, and forecasts don't see them ever changing. But there's a decent population that isn't yet highly invested in any cluster. Money spent convincing the not-yet-sure goes a much further way than money spent convincing the highly dedicated, so the cluster of non-deep-believers gets highly targeted for a while. It's basically a religious race to gain the remaining agnostics.

At some point, most people (especially those with significant resources) are highly locked in to one specific reasoning agent.

After this, the future seems fairly predictable again. TAI comes, and people with resources broadly gain correspondingly more resources. People defer more and more to the AI systems, which are now in highly stable self-reinforcing feedback loops.

Coalitions of people behind each reasoning agent delegate their resources to said agents, then these agents make trade agreements with each other. The broad strokes of what to do with the rest of the lightcone are fairly straightforward. There's a somewhat simple strategy of resource acquisition and intelligence enhancement, followed by a period of exploiting said resources. The specific exploitation strategy depends heavily on the specific reasoning agent cluster each segment of resources belongs to.

Replies from: MondSemmel

↑ comment by MondSemmel · 2025-02-19T22:07:33.095Z · LW(p) · GW(p)

Based on AI organisations frequently achieving the opposite of their chosen name (OpenAI, Safe Superintelligence, etc.), UNBIASED would be the most biased model, INTELLECT would be the dumbest model, JUSTICE would be particularly unjust, MAGA would in effect be MAWA, etc.

Replies from: ozziegooen

↑ comment by ozziegooen · 2025-02-19T23:30:46.040Z · LW(p) · GW(p)

Yea, I assume that "DeepReasoning-MAGA" would rather be called "TRUTH" or something (a la Truth Social). Part of my name here was just to be clearer to readers.

comment by ozziegooen · 2021-08-05T00:56:35.997Z · LW(p) · GW(p)

One proposal I haven’t seen among transhumanists is to make humans small (minus brain size).

Besides some transitionary costs, being small seems to have a whole lot of advantages.

The world is much larger
Fewer resources needed per person
Everything will be way more roomy all of a sudden. Beds, bedrooms, houses, etc.
Can ride dogs, maybe cats on occasion.

I imagine we might be able to get to a 50% reduction within 200 years if we were really adamant about it.

Not as interesting as brain-in-jar or simulation, but a possible stepping stone if other things take a while.

Replies from: Dagon, wunan

↑ comment by Dagon · 2021-08-05T03:27:47.915Z · LW(p) · GW(p)

This is one of the most believable misunderstood-supervillian plots I could get sucked into.

Replies from: garbageactual

↑ comment by garbageactual · 2021-08-07T18:12:08.704Z · LW(p) · GW(p)

Being large has even more advantages. The world is much smaller (scientific progress metaphor). More resources needed per person (bigger economy). Everything will be built way more roomy. Can ride polar bears, maybe dinosaurs.

The desire to be smaller doesn't stem from a place of rationality.

↑ comment by wunan · 2021-08-08T03:17:05.425Z · LW(p) · GW(p)

The movie Downsizing is about this.

comment by ozziegooen · 2020-06-25T09:57:37.573Z · LW(p) · GW(p)

The 4th Estate heavily relies on externalities, and that's precarious.

There's a fair bit of discussion of how much of journalism has died with local newspapers, and separately how the proliferation of news past 3 channels has been harmful for discourse.

In both of these cases, the argument seems to be that a particular type of business transaction resulted in tremendous positive national externalities.

It seems to me very precarious to expect that society at large to only work because of a handful of accidental and temporary externalities.

In the longer term, I'm more optimistic about setups where people pay for the ultimate value, instead of this being an externality. For instance, instead of buying newspapers, which helps in small part to pay for good journalism, people donate to nonprofits that directly optimize the government reform process.

If you think about it, the process of:

People buy newspapers, a fraction of which are interested in causing change.
Great journalists come across things around government or society that should be changed, and write about them.
A bunch of people occasionally get really upset about some of the findings, and report this to authorities or vote differently. ...

is all really inefficient and roundabout compared to what's possible. There's very little division of expertise among the public for instance, there's no coordination where readers realize that there are 20 things that deserve equal attention, so split into 20 subgroups. This is very real work the readers aren't getting compensated for, so they'll do whatever they personally care the most about at the moment.

Basically, my impression is that the US is set up so that a well functioning 4th estate is crucial to making sure things don't spiral out of control. But this places great demands on the 4th estate that few people now are willing to pay for. Historically this functioned by positive externalities, but that's a sketchy place to be. If we develop better methods of coordination in the future I think it's possible to just coordinate to pay the fees and solve the problem.

Replies from: Dagon, Pattern

↑ comment by Dagon · 2020-06-25T18:11:19.128Z · LW(p) · GW(p)

It seems to me very precarious to expect that society at large to only work because of a handful of accidental and temporary externalities.

It seems to me very arrogant and naive to expect that society at large could possibly work without the myriad of evolved and evolving externalities we call "culture". Only a tiny part of human interaction is legible, and only a fraction of THAT is actually legislated.

Replies from: ozziegooen

↑ comment by ozziegooen · 2020-06-26T10:25:16.494Z · LW(p) · GW(p)

Fair point. I imagine when we are planning for where to aim things though, we can expect to get better at quantifying these things (over the next few hundred years), and also aim for strategies that would broadly work without assuming precarious externalities.

Replies from: Dagon

↑ comment by Dagon · 2020-06-26T14:28:27.611Z · LW(p) · GW(p)

Indeed. Additionally, we can hope to get better over the coming centuries (presuming we survive) at scaling our empathy, and the externalities can be internalized by actually caring about the impact, rather than (better: in addition to) imposition of mechanisms by force.

↑ comment by Pattern · 2020-06-26T16:56:08.885Z · LW(p) · GW(p)

is all really inefficient and roundabout compared to what's possible.

This seems accurate - but just observation itself is valuable.

comment by ozziegooen · 2021-05-02T03:18:56.040Z · LW(p) · GW(p)

Are there any good words for “A modification of one’s future space of possible actions”, in particular, changes that would either remove/create possible actions, or make these more costly or beneficial? I’m using the word “confinements” for negative modifications, not sure about positive modifications (“liberties”?). Some examples of "confinements" would include:

Taking on a commitment
Dying
Adding an addiction
Golden handcuffs
Starting to rely on something in a way that would be hard to stop

Replies from: wunan, Pattern

↑ comment by wunan · 2021-05-02T15:04:40.836Z · LW(p) · GW(p)

Precommitment for removal and optionality for adding.

Replies from: ozziegooen

↑ comment by ozziegooen · 2021-05-02T15:55:55.931Z · LW(p) · GW(p)

Thanks! I think precommitement is too narrow (I don't see dying as a precommitement). Optionality seems like a solid choice for adding. "Options" are a financial term, so something a bit more generic seems appropriate.

↑ comment by Pattern · 2021-05-03T18:11:59.504Z · LW(p) · GW(p)

changes that [might]...create possible actions

Trying something new.

comment by ozziegooen · 2020-07-06T16:02:08.646Z · LW(p) · GW(p)

The term for the "fear of truth" is alethophobia. I'm not familiar of many other great terms in this area (curious to hear suggestions).

Apparently "Epistemophobia" is a thing, but that seems quite different; Epistemophobia is more the fear of learning, rather than the fear of facing the truth.

One given definition of alethophobia is,
"The inability to accept unflattering facts about your nation, religion, culture, ethnic group, or yourself"

This seems like a incredibly common issue, one that is especially talked about as of recent, but without much specific terminology.

https://www.yourdictionary.com/alethophobia
https://books.google.co.uk/books?id=teUEAAAAQAAJ&pg=PA10&dq=alethophobia&redir_esc=y&hl=en&fbclid=IwAR19KAulFSCERMWGR2UCDyiUQrF5ai_PpgxV-cQ0vM7ggaBaHQQlHhuy9sU#v=onepage&q=alethophobia&f=false

Replies from: Dagon, mr-hire

↑ comment by Dagon · 2020-07-06T17:03:38.491Z · LW(p) · GW(p)

Without looking, I'll predict it's a neologism - invented in the last ~25 years to apply to someone the inventor disagrees with. Yup, google n-grams shows 0 uses in any indexed book. A little more searching shows almost no actual uses anywhere - mostly automated dictionary sites that steal from each other, presumably with some that accept user submissions.

I did find one claim to invention, in 2017: http://www.danielpipes.org/comments/236053 . Oh, and earlier (2008), there's a book with that title: https://www.amazon.com/Alethophobia-Manoucher-Parvin/dp/1588140474 .

I still submit that this is a word in search of a need, which mostly exists as a schoolyard insult dressed up in Latin.

Replies from: ozziegooen

↑ comment by ozziegooen · 2020-07-06T17:53:57.332Z · LW(p) · GW(p)

Yea, I also found the claim, as well as a few results from old books before the claim. The name come straight from the Latin though, so isn't that original or surprising.

Just because it hasn't been used much before doesn't mean we can't start to use it and adjust the definition as we see fit. I want to see more and better terminology in this area.

Replies from: Radamantis, ChristianKl, Dagon

↑ comment by NunoSempere (Radamantis) · 2020-08-01T15:35:05.989Z · LW(p) · GW(p)

> The name comes straight from the Latin though

From the Greek as it happens. Also, alethephobia would be a double negative, with a-letheia meaning a state of not being hidden; a more natural neologism would avoid that double negative. Also, the greek concept of truth has some differences to our own conceptualization. Bad neologism.

Replies from: ozziegooen

↑ comment by ozziegooen · 2020-08-02T21:21:17.150Z · LW(p) · GW(p)

Ah, good to know. Do you have recommendations for other words?

↑ comment by ChristianKl · 2020-07-15T15:09:06.003Z · LW(p) · GW(p)

The trend of calling things that aren't fears "-phobia" seems to me a trend that's harmful for clear communication. Adjusting the definition only leads to more confusion.

↑ comment by Dagon · 2020-07-06T18:04:40.646Z · LW(p) · GW(p)

I want to see more and better terminology in this area.

I think I want less terminology in this area, and generally more words and longer descriptions for things that need nuance and understanding. Dressing up insults as diagnoses doesn't help any goals I understand, and jargon should only be introduced as part of much longer analyses where it illuminates rather than obscures.

↑ comment by Matt Goldenberg (mr-hire) · 2020-07-06T21:04:08.029Z · LW(p) · GW(p)

Can you give examples of things you think would fit under this? It seems that there are lots of instances of being resistant to the truth, but I can think of very few that I would categorize as fear of truth. It's often fear of something else (e.g. fear of changing your identity) or biases (e.g. the halo effect or consistency bias) that cause people to resist. I can think of very few cases where people have a general fear of truth.

comment by ozziegooen · 2020-07-06T16:01:46.104Z · LW(p) · GW(p)

I keep seeing posts about all the terrible news stories in the news recently. 2020 is a pretty bad year so far.

But the news I've seen people posting typically leaves out most of what's been going on in India, Pakistan, much of the Middle East as of recent, most of Africa, most of South America, and many, many other places as well.

The world is far more complicated than any of us have time to adequately comprehend. One of our greatest challenges is to find ways to handle all this complexity.

The simple solution is to try to spend more time reading the usual news. If the daily news becomes three times as intense, spend three times as much time reading it. This is not a scalable strategy.

I'd hope that over time more attention is spent on big picture aggregations, indices, statistics, and quantitative comparisons.

This could mean paying less attention to the day to day events and to individual cases.

Replies from: Dagon, rudi-c

↑ comment by Dagon · 2020-07-06T17:28:49.123Z · LW(p) · GW(p)

There's a parallel development that makes any strategy difficult - it's financially rewarding to misdirect and mis-aggregate the big-picture publications and comparisons. So you can't spend less time on details, as you can't trust the aggregates without checking. See also Gell-Mann Amnesia.

Without an infrastructure for divvying up the work to trusted cells that can understand (parts of) the details and act as a check on the aggregators and each other, the only answer is to spot-check details yourself, and accept ignorance of things you didn't verify.

↑ comment by Rudi C (rudi-c) · 2020-07-12T18:28:21.384Z · LW(p) · GW(p)

I feel most people, including myself, don't even use the aggregators already available. For example, there are lots of indices and statistics (ideally there should be much more, but anyways), but I rarely go out of my way to consume them. Some examples I just thought of:

https://rsf.org/en/ranking
https://www.globalhungerindex.org/results.html
Keeping a watch on new entries to and exits from Fortune 500
Looking at the stocks of the top 20 companies every quarter
...

There are several popular books that throw surprising statistics around, like Factfulness; This suggests a lot of us are disconnected from basic statistics, that we presumably could easily get by just googling.

comment by ozziegooen · 2020-01-26T15:12:25.665Z · LW(p) · GW(p)

Intervention dominance arguments for consequentialists

Global Health

There's a fair bit of resistance to long-term interventions from people focused on global poverty, but there are a few distinct things going on here. One is that there could be a disagreement on the use of discount rates for moral reasoning, a second is that the long-term interventions are much more strange.

No matter which is chosen, however, I think that the idea of "donate as much as you can per year to global health interventions" seems unlikely to be ideal upon clever thinking.

For the last few years, the cost-to-save-a-life estimates of GiveWell seem fairly steady. The S&P 500 has not been steady, it has gone up significantly.

Even if you committed to purely giving to global heath, you'd be better off if you generally delayed. It seems quite possible that if every life you would have saved in 2010, you could have saved 2 or more if you would have saved the money and spent it in 2020, with a decently typical investment strategy. (Arguably leverage could have made this much higher.) From what I understand, the one life saved in 2010 would likely not have resulted in one extra life equivalent saved in 2020; the returns per year was likely less than that of the stock market.

One could of course say something like, "My discount rate is over 3-5% per year, so that outweighs this benefit". But if that were true it seems likely that the opposite strategy could have worked. One could have borrowed a lot of money in 2010, donated it, and then spent the next 10 years paying that back.

Thus, it seems conveniently optimal if one's enlightened preferences would suggest not either investing for long periods or borrowing.

EA Saving

One obvious counter to immediate donations would be to suggest that the EA community financially invests money, perhaps with leverage [EA · GW].

While it is difficult to tell if other interventions may be better, it can be simpler to ask if they are dominant; in this case, that means that they predictably increase EA-controlled assets at a rate higher than financial investments would.

A good metaphor could be to consider the finances of cities. Hypothetically, cities could invest much of their earnings near-indefinitely or at least for very long periods, but in practice, this typically isn't key to their strategies. Often they can do quite well by investing in themselves. For instance, core infrastructure can be expensive but predictably lead to significant city revenue growth. Often these strategies area so effective that they issue bonds in order to pay more for this kind of work.

In our case, there could be interventions that are obviously dominant to financial investment in a similar way. An obvious one would be education; if it were clear that giving or lending someone money would lead to predictable donations, that could be a dominant strategy to more generic investment strategies. Many other kinds of community growth or value promotion could also fit into this kind of analysis. Related, if there were enough of these strategies available, it could make sense for loans to be made in order to pursue them further.

What about a non-EA growth opportunity? Say, "vastly improving scientific progress in one specific area." This could be dominant (to investment, for EA purposes) if it would predictably help EA purposes by more than the investment returns. This could be possible. For instance, perhaps a $10mil donation to life extension research[1] could predictably increase $100mil of EA donations by 1% per year, starting in a few years.

One trick with these strategies is that many would fall into the bucket of "things a generic wealthy group could do to increase their wealth"; which is mediocre because we should expect that type of things to be well-funded already. We may also want interventions that differentially change wealth amounts.

Kind of sadly, this seems to suggest that some resulting interventions may not be "positive sum" to all relevant stakeholders. Many of the "positive sum in respect to other powerful interest" interventions may be funded, so the remaining ones could be relatively neutral or zero-sum for other groups.

[1] I'm just using life extension because the argument would be simple, not because I believe it could hold. I think it would be quite tricky to find great options here, as is evidenced by the fact that other very rich or powerful actors would have similar motivations.

comment by ozziegooen · 2020-01-01T16:29:26.589Z · LW(p) · GW(p)

Do people have precise understandings of the words they use?

On the phrase "How are you?", traditions, mimesis, Chesterton's fence, and their relationships to the definitions of words.

Epistemic status
Boggling. I’m sure this is better explained somewhere in the philosophy of language but I can’t yet find it. Also, this post went in a direction I didn’t originally expect, and I decided it wasn’t worthwhile to polish and post on LessWrong main yet. If you recommend I clean this up and make it an official post, let me know.

One recurrent joke is that one person asks another how they are doing, and the other responds with an extended monologue about their embarrassing medical conditions.

A related comment on Reddit:

It's a running joke between ESL'ers that every one of them will respond with "I am fine thank you, AND YOU?" Regardless of nationality. The first thing taught in English.[0]

The point here is that “How are you”, as typically used, is obviously not a question in the conventional sense. To respond with a lengthy description would not just be unusual, it would be incorrect, in a similar way that you were asked, “What time is it”, and you responded “My anxiety levels increased 10% last week” would be incorrect. [1]

I think this is a commonly understood example of a much larger class of words and phrases we commonly use.

When new concept names are generated, I’d expect that they are generally done by taking a rough concept and and separately coming up with a reasonable name for it. The name is chosen and encouraged based on its convenience for its’ users, rather than precise correctness. I know many situations where this exactly happened (from history and practice of science engineering) and expect it’s the common outcome.

Some examples of phrases that don't mean the only possible sum of their parts
“Witch hunt”
“Netflix and chill”
“Cognitive Behavioral Therapy”
“Operations Research”
“Game Theory”
“Bayes’ Theorem”
“Free Software”

Arguably this is basically the same procedure that was used for single words repurposed to represent other unique things.

“Agent”
“Rational”
“Mass”.

Including many philosophical fields, with to-me ridiculously simple names: [“Determinism”, “Idealism”, “Materialism”, “Objectivism”][2]

Etymology explains the histories of many of these words & phrases, but I think leaves out much of the nuance.[3]

One real tricky bit comes when the originating naming is forgotten and the word or phrase is propagated without clear definitions. Predictably, these definitions would change over time and this could lead to some sticky scenarios. I’m sure that when humans started using the phrase “How are you?” it was meant as a legitimate question, but over time this shifted to have what is essentially a different definition (or wide set of definitions).

I’d bet that now, a lot of people wouldn’t give a very well reasoned answer to the question: ‘What does “How are you?” mean?’ They’re used to using it with an intent in mind and haven’t needed to investigate the underlying processes.

The same could be for many of our other words and phrases. Socrates was known for asking people to define terms like justice, self, and morality, and getting them pretty annoyed when they failed to produce precise answers that held up to his scrutiny.

“How are you?” may be interesting not because its’ definition has changed, but in particular because it did so without speakers recognizing what was going on. It shifted to something that may require comprehensive anthropological study to properly understand. Yet the phrase is commonly used anyway, the fact that it may be poorly understood by its’ users doesn’t seem to phase them. Perhaps this could really be seen under the lens of mimesis; most individuals weren’t consciously making well-understood decisions, but rather they were selecting from a very limited set of options, and choosing among them based on existing popularity and empirical reactions.[4]

I think we could call this a floating[5] definition. Floating definitions are used for different purposes and in different ways, normally without the speakers having a clear idea on their definitions.

Perhaps the most clear version of this kind of idea comes from traditions. No one I know really understands why Western cultures have specific traditions for Weddings, Holidays, Birthdays, and the like, or if these things would really be optimal for us if they were fully understood. But we still go along with them because the expected value seems good enough. These are essentially large “Chesterton's fences” that we go along with and try to feel good about.

My point here is just that in the same way many people don’t understand traditions, but go along with them, they also don’t really understand many words or phrases, but still go along with them.

[0] Reddit: Why is how are you a greeting
[1] One difference is that often the person asking the question wouldn’t quite be precisely aware of the distinction, so would often be more understanding of an incorrect response that details the answer to “how are you really doing.” A second difference is that they may think you honestly misunderstood them if you give them the wrong response.
[2] Imagined if various engineering fields tried naming themselves in similar ways. Although upon reflection, they were likely purposely not named like that in part to not get associated with things like philosophy.
[3] For example, etymology typically doesn’t seem to include things like defining the phrase “How are you”. Origin of “How are you?”
[4] See The Secrets of our Success, and JacobJacob’s post Unconsious Economics [LW · GW]
[5] This is probably a poor name and could be improved. I’ve attempted to find better names but couldn’t yet.

Some of my random notes/links from investigating this topic: Folk taxonomy - Wikipedia Nomenclature - Wikipedia Proper noun - Wikipedia Common name - Wikipedia Skunked term - Wikipedia Jargon - Wikipedia A nominal definition is the definition explaining what a word means (i.e., which says what the "nominal essence" is), and is definition in the classical sense as given above. A real definition, by contrast, is one expressing the real nature or quid rei of the thing. Definition - Wikipedia Polysemy is the capacity for a sign (such as a word, phrase, or symbol) to have multiple meanings Semantic change - Wikipedia euphemism treadmill

Replies from: ozziegooen

↑ comment by ozziegooen · 2020-01-24T12:48:49.988Z · LW(p) · GW(p)

Update: After I wrote this shortform, I did more investigation in Pragmatics and realized most of this was better expressed there.

Replies from: Raemon

↑ comment by Raemon · 2020-01-24T17:46:10.179Z · LW(p) · GW(p)

What's Pragmatics in this case?

Replies from: ozziegooen

↑ comment by ozziegooen · 2020-08-20T17:53:10.223Z · LW(p) · GW(p)

Ah, sorry for not responding earlier. By Pragmatics I meant Pragmatics in linguistics. It studies what people mean when they say words.
https://plato.stanford.edu/entries/pragmatics/

comment by ozziegooen · 2019-12-28T21:26:17.675Z · LW(p) · GW(p)

I've been reading through some of TVTropes.org and find it pretty interesting. Part of me wishes that Wikipedia were less deletionist, and wonders if there could be a lot more stuff similar to TV Tropes on it.

TVTropes basically has an extensive ontology to categorize most of the important features of games, movies, and sometimes real life. Because games & movies are inspired by real life, even those portions are applicable.

Here are some phrases I think are kind of nice; each that has a bunch of examples in the real world. These are often military related.

Awesome, but Impractical
Cool, but Inefficient
Boring But Practical
Simple Yet Awesome

Of course, the first two and last two are very similar and should arguably be combined.

Replies from: mr-hire, Pattern

↑ comment by Matt Goldenberg (mr-hire) · 2019-12-29T16:13:49.611Z · LW(p) · GW(p)

A think I want:

A recommendation engine that works based on listing the tropes you enjoy.

↑ comment by Pattern · 2019-12-29T22:14:25.279Z · LW(p) · GW(p)

I've been reading through some of TVTropes.org and find it pretty interesting. Part of me wishes that Wikipedia were less deletionist, and wonders if there could be a lot more stuff similar to TV Tropes on it.

This article expresses the same sentiment, and may include links to what that looked like, and where it went: https://www.gwern.net/In-Defense-Of-Inclusionism

comment by ozziegooen · 2019-12-17T19:57:48.552Z · LW(p) · GW(p)

I think the thing I find the most surprising about Expert Systems is that people expected them to work so early on, and apparently they did work in some circumstances. Some issues:

The user interfaces, from what I can tell, were often exceedingly mediocre. User interfaces are difficult to do well and difficult to specify, so are hard to guarantee quality in large and expensive projects. It was also significantly harder to make good UIs back when expert systems were more popular, than it is today.
From what I can tell, many didn't even have notions of uncertainty! AI: A Modern Approach discusses Expert Systems that I believe used first and second-order logic, but seemed to imply that many didn't include simple uncertainty parameters, let alone probability distributions of any kind.
Experts aren't even that great at assigning probability densities. Many are overconfident; papers by Tetlock and others suggest that groups of forecasters are hard to beat.

My impression is that arguably Wikidata and other semantic knowledge graphs could be viewed as the database part of expert systems without attempting intense logic manipulations or inference. I know some other projects are trying to do more of the inference portions, but seem more used for data gathered from web applications and businesses instead of directly by querying experts.

comment by ozziegooen · 2019-12-17T00:14:36.356Z · LW(p) · GW(p)

It seems inelegant to me that utility functions are created for specific situations, while these clearly aren't the same as that of the agent in total among all of their decisions. For instance, a model may estimate an agent's expected utility from the result of a specific intervention, but this clearly isn't quite right; the agent has a much more complicated utility function outside this intervention. According to a specific model, "Not having an intervention" could set "Utility = 0"; but for any real agent, it's quite likely their life wouldn't actually have 0 utility without the intervention.

It seems like it's important to distinguish that a utility score in a model is very particular to the scenario for that model, and does not represent a universal utility function for the agents in question.

Let $U$ be an agent's true utility function across a very wide assortment of possible states, and $^U$ be the utility function used for the sake of the model. I believe that $^U$ is supposed to approximate $U$ in some way; perhaps they should be related by an affine transformation.

The important thing for a utility function, as it is typically used (in decision models), is probably not that $^U = U$ , but rather, that decisions made within the specific context of $^U$ approximate those made using $U$ .

Here, I use brackets to describe "The expected value, according to a utility function", and D to describe the set of decisions made conditional on a specific utility function being used for decision making.

Then, we can represent this supposed estimation with:

$⟨ D (^U) ⟩_{U} \sim ⟨ D (U) ⟩_{U}$

Replies from: ozziegooen, Stuart_Armstrong, Pattern

↑ comment by ozziegooen · 2019-12-17T00:52:35.125Z · LW(p) · GW(p)

Related to this, one common argument against utility maximization is that "we still cannot precisely measure utility". But here, it's perhaps more clear that we don't need to. What's important for decision making is that we have models that we can expect will help us maximize our true utility functions, even if we really don't know much about what they really are.

↑ comment by Stuart_Armstrong · 2019-12-18T15:34:22.420Z · LW(p) · GW(p)

I delve into that here: https://www.lesswrong.com/posts/Lb3xCRW9usoXJy9M2/platonic-rewards-reward-features-and-rewards-as-information#Extending_the_problem [LW · GW]

Replies from: ozziegooen

↑ comment by ozziegooen · 2019-12-18T18:54:11.426Z · LW(p) · GW(p)

Oh fantastic, thanks for the reference!

↑ comment by Pattern · 2019-12-17T19:23:03.398Z · LW(p) · GW(p)

I believe that ^U is supposed to approximate ^U in some way; perhaps they should be related by an affine transformation.

^U and ^U look to be the same.

Replies from: ozziegooen

↑ comment by ozziegooen · 2019-12-17T19:49:36.075Z · LW(p) · GW(p)

Thanks! Fixed.

I'm sure the bottom notation could be improved, but am not sure the best way. In general I'm trying to get better at this kind of mathematics.

Replies from: Pattern

↑ comment by Pattern · 2019-12-18T06:10:31.383Z · LW(p) · GW(p)

You got the basic idea across, which is a big deal.

Though whether it's A or B isn't clear:

A) "this isn't all of the utility function, but its everything that's relevant to making decisions about this right now". ^U doesn't have to be U, or even a good approximation in every situation - just (good enough) in the situations we use it.

Building a building? A desire for things to not fall on people's heads becomes relevant (and knowledge of how to do that).

Writing a program that writes programs? It'd be nice if it didn't produce malware.

Both desires usually exist - and usually aren't relevant. Models of utility for most situations won't include them.

B) The cost of computing the utility function more exactly in the case exceeds the (expected) gains.

isn't clear.

Replies from: ozziegooen

↑ comment by ozziegooen · 2019-12-18T12:55:23.056Z · LW(p) · GW(p)

I think I agree with you. There's a lot of messiness with using $^U$ and often I'm sure that this approximation leads to decision errors in many real cases. I'd also agree that better approximations of $^U$ would be costly and are often not worth the effort.

Similar to how there's a term for "Expected value of perfect information", there could be an equivalent for the expected value of a utility function, even outside of uncertainty of parameterized that were thought to be included. Really, there could be calculations for "expected benefit from improvements to a model", though of course this would be difficult to parameterize (how would you declare that a model has been changed a lot vs. a little? If I introduce 2 new parameters, but these parameters aren't that important, then how big of a deal should this be considered in expectation?)

Replies from: Pattern

↑ comment by Pattern · 2019-12-18T17:44:37.986Z · LW(p) · GW(p)

The model has changed when the decisions it is used to make change. If the model 'reverses' and suggests doing the opposite/something different in every case from what it previously recommended, then it has 'completely changed'.

(This might be roughly the McNamara fallacy, of declaring that things that 'can't be measured' aren't important.)

EDIT: Also, if there's a set of information consisting of a bunch of pieces, A, B, and C, and incorporating all but one of them doesn't have a big impact on the model, but the last piece does, whichever piece that is, 'this metric' could lead to overestimating the importance of whichever piece happened to be last, when it's A, B, and C together that made an impact. It 'has this issue' because the metric by itself is meant to notice 'changes in the model over time', not figure out why/solve attribution.

comment by ozziegooen · 2019-12-16T17:32:07.019Z · LW(p) · GW(p)

I've been trying to scurry academic fields for discussions of how agents optimally reduce their expected error for various estimands (parameters to estimate). This seems like a really natural thing to me (the main reason why we choose some ways of predictions over others), but the literature seems kind of thin from what I can tell.

The main areas I've found have been Statistical Learning Theory and Bayesian Decision / Estimation Theory. However, Statistical Learning Theory seems to be pretty tied to Machine Learning, and Bayesian Decision / Estimation Theory seem pretty small.

Preposterior analyses like expected value of information / expected value of sample information seem quite relevant as well, though that literature seems a bit disconnected from the above two mentioned.

(Separately, I feel like preposterior analyses should be a much more common phrase. I hadn't actually heard about it until recently, but the idea and field is quite natural.)

comment by ozziegooen · 2020-11-13T21:25:15.201Z · LW(p) · GW(p)

I think brain-in-jar or head-in-jar are pretty underrated. By this I mean separating the head from the body and keeping it alive with other tooling. Maybe we could have a few large blood processing plants for many heads, and the heads could be connected to nerve I/O that would be more efficient than finger -> keyboard IO. This seems fairly easier than uploading, and possibly doable in 30-50 years.

I can't find much about how difficult it is. It's obviously quite hard and will require significant medical advances, but it's not clear just how many are needed.

From Dr. Brain Wonk,

Unless you do a body transplant (a serious idea pioneered by surgeon Robert White), the technology to sustain an isolated head for more than a few days doesn't exist. Some organs essential for homeostasis, such as the liver and hematopoietic system, still have no artificial replacements... supporting organs without the aid of a living body for even brief periods of time is difficult and expensive.

So, we could already do this for a few days, which seems like a really big deal. Going from that to indefinite stays (or, just as long as the brain stays healthy) seems doable.

In some ways this would be a simple strategy compared to other options of trying to improve the entire human body. In many ways, all of the body parts that can be replaced with tech are liabilities. You can't get colon cancer if you don't have a colon.

A few relevant links of varying quality:

https://www.quora.com/Is-it-possible-to-keep-a-human-brain-alive-without-its-body-If-so-how-long-could-it-be-kept-living-if-not-forever

https://www.discovermagazine.com/technology/could-a-brain-be-kept-alive-in-a-jar

https://www.reddit.com/r/askscience/comments/g141q/can_a_brain_be_kept_alive_in_a_jar/

https://en.wikipedia.org/wiki/Isolated_brain

https://www.technologyreview.com/2018/04/25/240742/researchers-are-keeping-pig-brains-alive-outside-the-body/

Replies from: avturchin

↑ comment by avturchin · 2020-11-13T22:08:22.075Z · LW(p) · GW(p)

Yes. But the head also ages and could have terminal diseases: cancer, stroke, ALZ. Given the steep nature of the Gompertz law, the life expectancy of even a perfect head in jar (of an old man) will be less than 10 years (I guess). So it is not immortality, but a good way to wait until better life extension technologies.

Replies from: ozziegooen

↑ comment by ozziegooen · 2020-11-13T23:56:42.471Z · LW(p) · GW(p)

I was thinking of it less for life extension, and more for a quality of life and cost improvement.

comment by ozziegooen · 2020-08-30T11:32:36.055Z · LW(p) · GW(p)

Western culture is known for being individualistic instead of collectivist. It's often assumed (with evidence) that individualistic cultures tend to be more truth seeking than collectivist ones, and that this is a major advantage.

But theoretically, there could be highly truth seeking collectivist cultures. One could argue that Bridgewater is a good example here.

In terms of collective welfare, I'm not sure if there are many advantages to individualism besides the truth seeking. A truth seeking collectivist culture seems pretty great to me, in theory.

Replies from: MakoYass

↑ comment by mako yass (MakoYass) · 2020-08-31T02:10:39.577Z · LW(p) · GW(p)

I suspect there's a limit on how good at truthseeking individualism can make people. Good information is a commons, its sum value is greater the more it is shared, it is not funded in proportion to its potential, under economies of atomized decisions.

We need a political theory of due deference to expertise. Wherever experts fail, or the wrong experts are appointed, or where a layperson on the ground stops believing that experts are even identifiable, there is work to be done.

comment by ozziegooen · 2019-12-18T00:54:27.030Z · LW(p) · GW(p)

Say Tim states, “There is a 20% probability that X will occur”. It’s not obvious to me what that means for Bayesians.

It could mean:

Tim’s prior is that there’s a 20% chance. (Or his posterior in the context of evidence)
Tim believes that when the listeners update on him saying there’s a 20% chance (perhaps with him providing insight in his thinking), their posterior will converge to there being a 20% chance.
Tim believes that the posterior of listeners may not immediately converge to 20%, but the posterior of the enlightened versions of these listeners would. Perhaps the listeners are 6th graders who won’t know how much to update, but if they learned enough, they would converge to 20%.

I’ve heard some more formalized proposals like, “I estimate that if I and several other well respected people thought about this for 100 years, we would wind up estimating that there was a 20% chance”, but even this assumes that listeners would converge on this same belief. This seems like possibly a significant assumption! It's quite to Coherent Extrapolated Volition, and similarly questionable.

Replies from: crabman, ozziegooen, ChristianKl, tetraspace-grouping

↑ comment by philip_b (crabman) · 2019-12-23T10:30:47.864Z · LW(p) · GW(p)

It's definitely the first. The second is bizarre. The third can be steelmanned as "Given my evidence, an ideal thinker would estimate the probability to be 20%, and we all here have approximately the same evidence, so we all should have 20% probabilities", which is almost the same as the first.

Replies from: ozziegooen

↑ comment by ozziegooen · 2019-12-23T22:34:38.019Z · LW(p) · GW(p)

I don't think it's only the first. It seems weird to me imagine telling to a group that "There's a 20% probability that X will occur" if I really have little idea and would guess many of them would have a better sense than me. I would only personally feel comfortable doing this if I was quite sure my information was quite a bit better than theirs. Else, I'd say something like, "I personally think there's a 20% chance, but I really don't have much information."

↑ comment by ozziegooen · 2019-12-23T22:38:40.005Z · LW(p) · GW(p)

I think my current best guess to this is something like:

When humans say thing X, they don't mean the literal translation of X, but rather are pointing to X', which is a specific symbol that other humans generally understand. For instance, "How are you" is a greeting, not typically a literal question. [How Are You] can be thought of as a symbol that's very different than the sum of it's parts.

That said, I find it quite interesting that the basics of human use of language seem to be relatively poorly understood; in the sense that I'd expect many people to disagree on what they think “There is a 20% probability that X will occur” means, even after using it with each other in a setting that assumes some amount of understanding.

↑ comment by ChristianKl · 2019-12-18T13:49:25.571Z · LW(p) · GW(p)

I take it to mean that if Tim is acting optimally and has to take a bet on the outcome 1:4 would be the point where both sides of the bad are equally profitable to him while if the odds deviate from 1:4 one side of the bet would be preferable to him.

Replies from: ozziegooen

↑ comment by ozziegooen · 2019-12-18T13:59:08.822Z · LW(p) · GW(p)

One thing this wouldn't take into account is strength or weight of evidence. If Tim knew that all of the listeners had far more information than him, and thus probably could produce better estimates of X, then it seems strange for Tim to tell them that the chances are 20%.

I guess my claim that saying “There is a 20% probability that X will occur” is more similar to: "I'm quite confident that the chances are 20%, and you should generally be too" than it is to, "I personally believe that the chances are 20%, but have no idea o how much that should update the rest of you."

↑ comment by Tetraspace (tetraspace-grouping) · 2019-12-23T22:06:40.611Z · LW(p) · GW(p)

Other things that Tim might mean when he says 20%:

Tim is being dishonest, and believes that the listeners will update away from the radical and low-status figure of 20% to avoid being associated with the lowly Tim.
Tim believes that other listeners will be encouraged to make their own probability estimates with explicit reasoning in response, which will make their expertise more legible to Tim and other listeners.
Tim wants to show cultural allegiance with the Superforecasting tribe.

comment by ozziegooen · 2019-12-18T00:32:01.578Z · LW(p) · GW(p)

Perhaps resolving forecasts with expert probabilities can be better than resolving them with the actual events.

The default in literature on prediction markets and decision markets is to expect that resolutions should be real world events instead of probabilistic estimates by experts. For instance, people would predict "What will the GDP of the US in 2025 be?”, and that would be scored using the future “GDP of the US.” Let’s call these empirical resolutions.

These resolutions have a few nice properties:

We can expect expect them to be roughly calibrated. (Somewhat obvious)
They have relatively high precision/sharpness.

While these may be great in a perfectly efficient forecaster market, I think they may be suboptimal for incentivizing forecasters to best estimate important questions given real constraints. A more cost-effective solution could look like having a team of calibrated experts[1] inspect the situation post-event, make their best estimate of the probability pre-event, and then use that for scoring predictions.

A thought Experiment
The intuition here could be demonstrated by a thought experiment. Say you can estimate a probability distribution $X$ . Your prior, and the prior you expect that others has, indicates that 99.99% of $X$ is definitely a uniform distribution, but the last 0.001% tail on the right is something much more complicated.

You could spend a lot of effort better estimating this 0.001% tail, but there is a very small chance this would be valuable to do. In 99.99% of cases, any work you do here will not effect your winnings. Worse, you may need to wager a large amount of money for a long period of time for this possibility of effectively using your tiny but better-estimated tail in a bet.

Users of that forecasting system may care about this tail. They may be willing to pay for improvements in the aggregate distributional forecast such that it better models an enlightened ideal. If it were quickly realized that 99.99% of the distribution was uniform, then any subsidies for information should go to those that did a good job improving the 0.001% tail. It’s possible that some pretty big changes to this tail could be figured out.

Say instead that you are estimating the 0.001% tail, but you know you will be scored against a probability distribution selected by experts post-result, instead of the actual result. Say, these experts get to see all previous forecasts and discussion, so in expectation only respond with a forecast that is more sharp than the aggregate. In this case all of their work will be focused on this tail, so all of the differences in forecasters may come from this sliver.

This setup would require the experts[1] to be calibrated.

Further Work
I’m sure there’s a mathematical representation to better showcase this distinction, and to specify the loss of motivation that traders would have on probabilities that they know will be resolved empirically rather than judgmentally (using the empirical data in these judgements.) There must be something in statistical learning theory or similar that deals with similar problems; for instance, I imagine a classifier may be able to perform better when learning against “enlightened probabilities” instead of “binary outcomes”, as there is more clear signal there.

[1] I use “Experts” here to refer to a group estimated to provide the highly accurate estimates, instead of domain-respected experts.

Replies from: Radamantis, mr-hire, Radamantis, ChristianKl, Pattern

↑ comment by NunoSempere (Radamantis) · 2020-01-09T11:19:56.811Z · LW(p) · GW(p)

Here is another point by @jacobjacob, which I'm copying here in order for it not to be lost in the mists of time:

Though just realised this has some problems if you expected predictors to be better than the evaluators: e.g. they’re like “one the event happens everyone will see I was right, but up until then no one will believe me, so I’ll just lose points by predicting against the evaluators” (edited)

Maybe in that case you could eventually also score the evaluators based on the final outcome… or kind of re-compensate people who were wronged the first time…

↑ comment by Matt Goldenberg (mr-hire) · 2020-01-08T18:14:46.272Z · LW(p) · GW(p)

Users of that forecasting system may care about this tail. They may be willing to pay for improvements in the aggregate distributional forecast such that it better models an enlightened ideal. If it were quickly realized that 99.99% of the distribution was uniform, then any subsidies for information should go to those that did a good job improving the 0.001% tail. It’s possible that some pretty big changes to this tail could be figured out.

I'm really interested in this type of scheme because it would also solve a big problem in futarchy and futarchy-like setups that use prediction polling, namely, the inability to score conditional counterfactuals (which is most of the forecasting you'll be doing in Futarchy-like setup).

One thing you could do instead of scoring people against expert assesments is also potentially score people against the final aggregate and extremized distribution.

One issue with any framework like this is that general calibration may be very different than calibration at the tails. Whatever scoring rule you're using to determine calibration of experts or aggregate scoring has the same issue that long tail events rarely happen.

Another solution to this problem (although it doesn't solve the counterfactual conditional problem) is to create tailored scoring rules that provide extra rewards for events at the tails. If an event at the tails is a million times less likely to happen, but you care about it equally to events at the center, then provide a million times reward for accuracy near the tail in the event it happens. Prior work on tailored scoring rules for different utility functions here: https://www.evernote.com/l/AAhVczys0ddF3qbfGk_s4KLweJm0kUloG7k/

Replies from: ozziegooen

↑ comment by ozziegooen · 2020-01-08T22:17:54.758Z · LW(p) · GW(p)

Good points!

Also, thanks for the link, that's pretty neat.

One thing you could do instead of scoring people against expert assessments is also potentially score people against the final aggregate and extremized distribution.

I think that an efficient use of expert assessments would be for them to see the aggregate, and then basically adjust that as is necessary, but to try to not do much original research. I just wrote a more recent shortform post about this.

One issue with any framework like this is that general calibration may be very different than calibration at the tails.

I think that we can get calibration to be as good as experts can figure out, and that could be enough to be really useful.

↑ comment by NunoSempere (Radamantis) · 2020-01-08T13:20:39.439Z · LW(p) · GW(p)

Another point in favor of such a set-up would be that aspiring superforecasters get much, much more information when they see ~[the prediction of a superforecaster would have made having their information]; a point vs a distribution. I'd expect that this means that market participants would get better, faster.

Replies from: ozziegooen

↑ comment by ozziegooen · 2020-01-08T13:33:49.629Z · LW(p) · GW(p)

Yep, this way would basically be much more information-dense, with all the benefits that comes from that.

↑ comment by ChristianKl · 2019-12-18T14:07:07.237Z · LW(p) · GW(p)

You can train experts to be calibrated in different ways. If you train experts to be calibrated to pick the right probability on GPOpen where probability is done in steps on 1, I don't think those experts will be automatically calibrated to distinguish a p=0.00004 event from a p=0.00008.

Experts would actually need to be calibrated on getting probabilities inside the tail right. I don't think we know how to do calibration training for that tail.

Replies from: ozziegooen

↑ comment by ozziegooen · 2019-12-18T18:49:41.904Z · LW(p) · GW(p)

I think this could be a good example for what I'm getting at. I think there are definitely some people in some situations who can distinguish a p=0.00004 event from a p=0.00008 event. How? By making a Fermi model or similar.

A trivial example would be a lottery with calculable odds of success. Just because the odds are low doesn't mean they can't be precisely estimated.

I expect that the kinds of problems that GPOpen would consider asking AND are incredibly unlikely, would be difficult to estimate within 1 order of magnitude. But may still be able to do a decent job, especially in cases where you can make neat Fermi models.

However, of course, it seems very silly to use the incentive mechanism "you'll get paid once we know for sure if the event happened" on such an event. Instead, if resolutions are done with evaluators, then there is much more of a signal.

Replies from: mr-hire, ChristianKl

↑ comment by Matt Goldenberg (mr-hire) · 2020-01-08T20:25:53.813Z · LW(p) · GW(p)

How? By making a Fermi model or similar.

I'm fairly skeptical of this. From a conceptual perspective, we expect the tails to be dominated by unknown unknowns and black swans. Fermi estimates and other modelling tools are much better at estimating scenarios that we expect. Whereas, if we find ourselves in the extreme tails, its often because of events or factors that we failed to model.

Replies from: ozziegooen

↑ comment by ozziegooen · 2020-01-08T21:14:10.904Z · LW(p) · GW(p)

From a conceptual perspective, we expect the tails to be dominated by unknown unknowns and black swans.

I'm not sure. The reasons things happen at the tails typically fall into categories that could be organized to be a small set.

For instance:

The question wasn't understood correctly.
A significant exogenous event happened.

But, as we do a bunch of estimates, we could get empirical data about these possibilities, and estimate the potentials for future tails.

This is a bit different to what I was mentioning, which was more about known but small risks. For instance, the "amount of time I spend on my report next week" may be an outlier if I die. But the chance of serious accident or death can be estimated decently well enough. These are often repeated known knowns.

↑ comment by ChristianKl · 2019-12-18T19:39:16.741Z · LW(p) · GW(p)

You might have people who can distinguish those, but I think it's a mistake to speak of calibration in that sense as the word usually refers to people who actually trained to be calibrated via feedback.

↑ comment by Pattern · 2019-12-18T06:25:37.542Z · LW(p) · GW(p)

So you don't want predictions*, you want models**.

the 0.001% tail

Robust/fully fleshed out models.

*predictions of events

**predictions of which model is correct

Replies from: ozziegooen

↑ comment by ozziegooen · 2019-12-18T12:51:55.336Z · LW(p) · GW(p)

I'm not sure I'd say that in the context of this post, but more generally, models are really useful. Predictions that come with useful models are a lot more useful than raw predictions. I wrote this other [LW(p) · GW(p)] post about a similar topic.

For this specific post, I think what we're trying to get is the best prediction we could have had using data pre-event.

comment by ozziegooen · 2019-12-10T18:34:31.891Z · LW(p) · GW(p)

He's an in-progress hierarchy of what's needed for information to be most useful to an organization or other multi-agent system. I'm sure there must be other very similar hierarchies out there, but don't currently know of any quite like this.

Say you've come up with some cool feature that Apple could include in it's next phone. You think this is a great idea and they should add it in the future.

You're outside of Apple, so the only way you have of interacting with them is by sending information through various channels. The question is: what things should you first figure out to understand how to do this?

First, you need to have identified an improvement. You've done that, so you've gotten through the first step.

Second, for this to be incorporated, it should make sense from Apple's perspective. If it comes out that the costs of adding the feature, including opportunity costs, outweigh the benefits, then it wouldn't make sense to them. Perhaps you could deceive them to incorporate the feature, but it would be against their interests. So you should hopefully get information about Apple's utility function and identify an intervention that would implement your improvement while being positive in expected value to them.

Of course, just because it could be good for Apple does not mean that the people necessary to implement it would be in favor of doing so. Perhaps this feature involves the front-facing camera, and it so happens that people in charge of the decisions around the front-facing camera have some strange decision function and would prefer not being asked to do more work. To implement your change, these people would have to be convinced. A rough estimation for that would be an analysis that suggests that taking this feature on would have positive expected value for their utility functions. Again, it's possible that isn't a requirement, but if so, you may be needing to effectively deceive people.

Once you have expected value equations showing that a specific intervention to implement your suggestion makes sense both to Apple and separately to the necessary decision makers at Apple, then the remaining question is one of what can be called deployment. How do you get the information to the necessary decision makers?

If you have all four of these steps, you're in a pretty good place to implement the change.

One set of (long-winded) terminology for these levels would be something like:

Improvement identification
Positive-EV intervention identification for agent
Positive-EV intervention identification for necessary subagent
Viable deployment identification

There are cases where you may also want to take one step further back and identify "problems" or "vague areas of improvement" before identifying "corresponding solutions."

Replies from: ozziegooen

↑ comment by ozziegooen · 2019-12-24T00:30:57.229Z · LW(p) · GW(p)

Another note to this; there are cases where a system is both broken and fixable at the step-3 level. In some of these cases, it could be worth it to fix the system there instead, especially if you may want to make similar changes in the future.

For instance, you may have an obvious improvement for your city to make. You may then realize that the current setups to suggest feedback are really difficult to use, but that it's actually quite feasible to make sure some changes happen that will make all kinds of useful feedback easier for the city to incorporate.

comment by ozziegooen · 2025-03-08T19:36:37.171Z · LW(p) · GW(p)

If you've ever written or interacted with Squiggle code before, we at QURI would really appreciate it if you could fill out our Squiggle Survey!

https://docs.google.com/forms/d/e/1FAIpQLSfSnuKoUUQm4j3HEoqPmTYiWby9To8XXN5pDLlr95AiKa2srg/viewform

We don't have many ways to gauge or evaluate how people interact with our tools. Responses here will go a long way to deciding on our future plans.

Also, if we get enough responses, we'd like to make a public post about ways that people are (and aren't) using Squiggle.

comment by ozziegooen · 2025-02-05T17:37:42.356Z · LW(p) · GW(p)

There have been a few takes so far of humans gradually losing control to AIs - not through specific systems going clearly wrong, but rather by a long-term process of increasing complexity and incentives.

This sometimes gets classified as "systematic" failures - in comparison to "misuse" and "misalignment."

There was "What Failure Looks Like", and more recently, this piece on "Gradual Disempowerment."

To me, these pieces come across as highly hand-wavy, speculative, and questionable.

I get the impression that a lot of people have strong low-level assumptions here that a world with lots of strong AIs must go haywire. But I don't see clear steps to get there or a clear model of what the critical factors are.

As I see it, there are many worlds where AIs strictly outperform humans at managing high levels of complexity and increasing coordination. In many of these, things go toward much better worlds than ones with humans in charge.

I think it's likely that inequality could increase, but that wouldn't mean humanity as a whole would lose control.

My gut-level guess is that there are some crucial aspects here. Like, in worlds where AI systems have strong epistemics without critical large gaps, and can generally be controlled / aligned, things will be fine. But if there are fundamental gaps for technical or political reasons, then that could lead to these "systemic" disasters.

If that is the case, I'd expect we could come up with clear benchmarks to keep track of. For example, one might say that future global well-being is highly sensitive to a factor like, "how well the average-used AI service does at wisdom exam #523."

https://www.alignmentforum.org/posts/HBxe6wdjxK239zajf/what-failure-looks-like [AF · GW]

https://www.lesswrong.com/posts/pZhEQieM9otKXhxmd/gradual-disempowerment-systemic-existential-risks-from

Replies from: Seth Herd, Dagon

↑ comment by Seth Herd · 2025-02-05T17:57:15.320Z · LW(p) · GW(p)

I think your central point is that we should clarify these scenarios, and I very much agree.

I also found those accounts important but incomplete. I wondered if the authors were assuming near-miss alignment, like AI that follows laws, or human misuse, like telling your intent-aligned AI to "go run this company according to the goals laid out in its corporate constitution" which winds up being just make all the money you can.

The first danger can be met with: for the love of god, get alignment right and don't use an idiotic target like "follow the laws of the nation you originated in but otherwise do whatever you like." It seems like this type of failure is a fear of an entire world that has paid zero attention to the warnings from worriers that AI will keep improving and following its goals to the extreme. I don't think we'll sleepwalk into that scenario.

The second worry is, I guess, a variant of the first: that we'll use intent-aligned AI very foolishly. That would be issuing a command like ""follow the laws of the nation you originated in but otherwise do whatever you like." I guess a key consideration in both cases is whether there's an adequate level of corrigibility.

I guess I find the first scenario too foolish for even humans to fall into. Building AI with one of the exact goals people have been warning you about forever, "just make money", is just too dumb.

But the second seems all too plausible in a world with widely proliferated intent-aligned AGI. I can see us arriving at autonomous AI/AGI with some level of intent alignment and assuming we can always go back and tell the AI to stand down, then getting complacent and discovering that it's not really as corrigible as you hoped after it's learned and changed its beliefs about things like "following instructions".

Replies from: ozziegooen

↑ comment by ozziegooen · 2025-02-05T18:12:23.846Z · LW(p) · GW(p)

The second worry is, I guess, a variant of the first: that we'll use intent-aligned AI very foolishly. That would be issuing a command like ""follow the laws of the nation you originated in but otherwise do whatever you like." I guess a key consideration in both cases is whether there's an adequate level of corrigibility.

I'd flag that I suspect that we really should have AI systems forecasting the future and the results of possible requests.

So if people made a broad request like, "follow the laws of the nation you originated in but otherwise do whatever you like", they should see forecasts for what that would lead to. If there's any clearly problematic outcomes, those should be apparent early on.

This seems like it would require either very dumb humans, or a straightforward alignment mistake risk failure, to mess up.

Replies from: Dagon

↑ comment by Dagon · 2025-02-05T19:51:52.752Z · LW(p) · GW(p)

This seems like it would require either very dumb humans, or a straightforward alignment mistake risk failure, to mess up.

I think "very dumb humans" is what we have to work with. Remember, it only requires a small number of imperfectly aligned humans to ignore the warnings (or, indeed, to welcome the world the warnings describe).

Replies from: ozziegooen

↑ comment by ozziegooen · 2025-02-05T19:58:25.931Z · LW(p) · GW(p)

In many worlds, if we have a bunch of decently smart humans around, they would know what specific situations "very dumb humans" would mess up, and take the corresponding preventative measures.

A world where many small pockets of "highly dumb humans" could cause an existential catastrophe is one that's very clearly incredibly fragile and dangerous, enough so that I assume reasonable actors would freak out until it stops being so fragile and dangerous. I think we see this in other areas - like cyber attacks, where reasonable people prevent small clusters of actors from causing catastrophic damage.

It's possible that the offense/defense balance would dramatically favor tiny groups of dumb actors, and I assume that this is what you and others expect, but I don't see it yet.

Replies from: JBlack

↑ comment by JBlack · 2025-02-06T04:16:41.378Z · LW(p) · GW(p)

How do you propose that reasonable actors prevent reality from being fragile and dangerous?

Cyber attacks are generally based on poor protocols. Over time smart reasonable people can convince less smart reasonable people to follow better ones. Can reasonable people convince reality to follow better protocols?

As soon as you get into proposing solutions to this sort of problem, they start to look a lot less reasonable by current standards.

↑ comment by Dagon · 2025-02-05T19:49:00.487Z · LW(p) · GW(p)

a lot of people have strong low-level assumptions here that a world with lots of strong AIs must go haywire.

For myself, it seems clear that the world has ALREADY gone haywire. Individual humans have lost control of most of our lives - we interact with policies, faceless (or friendly but volition-free) workers following procedure, automated systems, etc. These systems are human-implemented, but in most cases too complex to be called human-controlled. Moloch won.

Big corporations are a form of inhuman intelligence, and their software and operations have eaten the world. AI pushes this well past a tipping point. It's probably already irreversable without a major civilizational collapse, but it can still get ... more so.

in worlds where AI systems have strong epistemics without critical large gaps, and can generally be controlled / aligned, things will be fine.

I don't have good working definitions of "controlled/aligned" that would make this true. I don't see any large-scale institutions or groups large and sane enough to have a reasonable CEV, so I don't know what an AI could align with or be controlled by.

Replies from: ozziegooen

↑ comment by ozziegooen · 2025-02-05T19:56:08.899Z · LW(p) · GW(p)

I feel like you're talking in highly absolutist terms here.

Global wealth is $454.4 trillion. We currently have ~8 Bil humans, with an average happiness of say 6/10. Global wealth and most other measures of civilization flourishing that I know of seem to be generally going up over time.

I think that our world makes a lot of mistakes and fails a lot at coordination. It's very easy for me to imagine that we could increase global wealth by 3x if we do a decent job.

So how bad are things now? Well, approximately, "We have the current world, at $454 Trillion, with 8 billion humans, etc". To me that's definitely something to work with.

Replies from: Dagon

↑ comment by Dagon · 2025-02-05T20:14:55.868Z · LW(p) · GW(p)

I feel like you're talking in highly absolutist terms here.

You're correct, and I apologize for that. There are plenty of potential good outcomes where individual autonomy reverses the trend of the last ~70 years. Or where the systemic takeover plateaus at the current level, and the main change is more wealth and options for individuals. Or where AI does in fact enable many/most individual humans to make meaningful decisions and contributions where they don't today.

I mostly want to point out that many disempowerment/dystopia failure scenarios don't require a step-change from AI, just an acceleration of current trends.

Replies from: ozziegooen

↑ comment by ozziegooen · 2025-02-05T20:42:42.236Z · LW(p) · GW(p)

I mostly want to point out that many disempowerment/dystopia failure scenarios don't require a step-change from AI, just an acceleration of current trends.

Do you think that the world is getting worse each year?

My rough take is that humans, especially rich humans, are generally more and more successful.

I'm sure there are ways for current trends to lead to catastrophe - line some trends dramatically increasing and others decreasing, but that seems like it would require a lengthy and precise argument.

Replies from: Dagon

↑ comment by Dagon · 2025-02-05T22:59:40.883Z · LW(p) · GW(p)

Do you think that the world is getting worse each year?

Good clarification question! My answer probably isn’t satisfying, though. “It’s complicated” (meaning: multidimensional and not ordinally comparable).

On a lot of metrics, it’s better by far, for most of the distribution. On harder-to-operationally-define dimensions (sense of hope and agency for the 25th through 75th percentile of culturally normal people), it’s quite a bit worse.

Replies from: ozziegooen

↑ comment by ozziegooen · 2025-02-05T23:11:15.861Z · LW(p) · GW(p)

Thanks for the specificity!

> On harder-to-operationally-define dimensions (sense of hope and agency for the 25th through 75th percentile of culturally normal people), it’s quite a bit worse.

I think it's likely that many people are panicking and losing hope each year. There's a lot of grim media around.

I'm far less sold that something like "civilizational agency" is declining. From what I can tell, companies have gotten dramatically better at achieving their intended ends in the last 30 years, and most governments have generally been improving in effectiveness.

One challenge I'd have for you / others who feel similar to you, is to try to get more concrete on measures like this, and then to show that they have been declining.

My personal guess is that a bunch of people are incredibly anxious over the state of the world, largely for reasons of media attention, and then this spills over into them assuming major global ramifications without many concrete details or empirical forecasts.

Replies from: Dagon

↑ comment by Dagon · 2025-02-06T19:13:18.245Z · LW(p) · GW(p)

One challenge I'd have for you / others who feel similar to you, is to try to get more concrete on measures like this, and then to show that they have been declining.

I've given some thought to this over the last few decades, and have yet to find ANY satisfying measures, let alone a good set. I reject the trap of "if it's not objective and quantitative, it's not important" - that's one of the underlying attitudes causing the decline.

I definitely acknowledge that my memory of the last quarter of the previous century is fuzzy and selective, and beyond that is secondhand and not-well-supported. But I also don't deny my own experience that the (tiny subset of humanity) people I am aware of as individuals have gotten much less hopeful and agentic over time. This may well be for reasons of media attention, but that doesn't make it not real.

comment by ozziegooen · 2021-08-05T00:58:47.289Z · LW(p) · GW(p)

Real-world complexity is a lot like pollution and like a lot like taxes.

Pollution because it’s often an unintended negative externality of other decisions and agreements.

Whenever you write a new feature or create a new rule, that’s another thing you and others will need to maintain and keep track of. There are some processes that pollute a lot (messy bureaucratic systems producing ugly legislation) and processes that pollute a little (top programmers carefully adding to a codebase).

Taxes, because it introduces a steady cost to a whole bunch of interactions. Want to hire someone?

You need to pay a complexity tax for dealing with lawyers and the negotiation. Want to decide on some cereal to purchase? You need to spend some time going over the attributes of each option.

Arguably, complexity is getting out of hand internationally. It’s becoming a major threat.

Complexity pollution and taxes are more abstract than physical types of pollution and taxes, but it might be more universal, and possibly more important too. I’d love to see more work here.

Replies from: garbageactual

↑ comment by garbageactual · 2021-08-07T18:13:46.717Z · LW(p) · GW(p)

Almost like there's an Incompleteness Theorem somewhere in there or something?

comment by ozziegooen · 2021-06-16T00:32:27.339Z · LW(p) · GW(p)

On Berkeley coworking:

I've recently been looking through available Berkeley coworking places.

The main options seem to be WeWork, NextSpace, CoWorking with Wisdom, and The Office: Berkeley. The Office seems basically closed now, CoWorking with Wisdom seemed empty when I passed by, and also seems fairly expensive, but nice.

I took a tour of WeWork and Nextspace. They both provide 24/7 access for all members, both have a ~$300/m option for open coworking, a ~$375/m for fixed desks, and more for private/shared offices. (At least now, with the pandemic. WeWork is typically $570/month for a dedicated desk apparently).

Both WeWork and NextSpace were fairly empty when I visited, though there weren't many private offices available. The WeWork is much larger, but its split among several floors that I assume barely interact with each other.

Overall the NextSpace seemed a fair bit nicer to me. The vibe was more friendly, the receptionist much more friendly, there were several sit/stand desks, and I think I preferred the private offices. (They were a bit bigger and more separated from the other offices). That said, the WeWork seemed a bit more professional and quiet, and might have had a nicer kitchen.

If you look at the yelp/reviews for them, note that the furniture of the NextSpace changed a lot in the last ~1.5 years, so many of the old photos are outdated. I remembered one NextSpace in SF that didn't seem very nice, but this one seemed better. Also, note that they have a promotion to work there for 1 day for free.

https://www.yelp.com/biz/nextspace-coworking-berkeley-berkeley-3
https://www.yelp.com/biz/wework-berkeley-berkeley-2?osq=WeWork

comment by ozziegooen · 2020-01-07T11:30:18.052Z · LW(p) · GW(p)

Would anyone here disagree with the statement:

Utilitarians should generally be willing to accept losses of knowledge / epistemics for other resources, conditional on the expected value of the trade being positive.

Replies from: Dagon, edoarad, AprilSR, Isnasene, Pattern, George3d6, ozziegooen

↑ comment by Dagon · 2020-01-07T16:38:55.442Z · LW(p) · GW(p)

[ not a utilitarian; discount my opinion appropriately ]

This hits one of the thorniest problems with Utilitarianism: different value-over-time expectations depending on timescales and assumptions.

If one is thinking truly long-term, it's hard to imagine what resource is more valuable than knowledge and epistemics. I guess tradeoffs in WHICH knowledge to gain/lose have to be made, but that's an in-category comparison, not a cross-category one. Oh, and trading it away to prevent total annihilation of all thinking/feeling beings is probably right.

Replies from: ozziegooen

↑ comment by ozziegooen · 2020-01-08T11:43:25.321Z · LW(p) · GW(p)

It's hard to imagine what resource is more valuable than knowledge and epistemics

I think my thinking is that for utilitarians, these are generally instrumental, not terminal values. Often they're pretty important instrumental values, but this still would mean that they could be traded off in respect to the terminal values. Of course, if they are "highly important" instrumental values, then something very large would have to be offered for a trade to be worth it. (total annihilation being one example)

Replies from: Dagon

↑ comment by Dagon · 2020-01-08T15:55:55.200Z · LW(p) · GW(p)

I think we're agreed that resources, including knowledge, are instrumental (though as a human, I don't always distinguish very closely). My point was that for very-long-term terminal values, knowledge and accuracy of evaluation (epistemics) are far more important than almost anything else.

It may be that there's a declining marginal value for knowledge, as there is for most resources, and once you know enough to confidently make the tradeoffs, you should do so. But if you're uncertain, go for the knowledge.

↑ comment by edoarad · 2020-01-10T12:37:22.666Z · LW(p) · GW(p)

Non-Bayesian Utilitarian that are ambiguity averse [LW · GW] sometimes need to sacrifice "expected utility" to gain more certainty (in quotes because that need not be well defined).

↑ comment by AprilSR · 2020-01-08T03:06:07.855Z · LW(p) · GW(p)

Doesn't being willing to accept a trade *directly follow* from the expected value of the trade being positive? Isn't that like, the *definition* of when you should be willing to accept a trade? The only disagreement would be how likely it is that losses of knowledge / epistemics are involved in positive value trades. (My guess is it does happen rarely.)

Replies from: ozziegooen, None

↑ comment by ozziegooen · 2020-01-08T11:44:58.589Z · LW(p) · GW(p)

I'd generally say that, but wouldn't be surprised if there were some who disagreed; who's argument would be something like what-to-me would sound like a modification of utilitarianism, [utilitarianism+epistemic-terminal-values].

Replies from: AprilSR, Pattern

↑ comment by AprilSR · 2020-01-09T23:39:40.551Z · LW(p) · GW(p)

If you have epistemic terminal values then it would not be a positive expected value trade, would it? Unless "expected value" is referring to the expected value of something other than your utility function, in which case it should've been specified.

Replies from: ozziegooen

↑ comment by ozziegooen · 2020-01-10T11:21:24.724Z · LW(p) · GW(p)

Yep, I would generally think so.

I was doing what may be a poor steelman of my assumptions of how others would disagree; I don't have a great sense of what people who would disagree would say at this point.

↑ comment by Pattern · 2020-01-09T20:58:06.800Z · LW(p) · GW(p)

a modification of utilitarianism, [utilitarianism+epistemic-terminal-values].

Happiness + Knowledge. (A related question is, do people with these values drink?)

↑ comment by [deleted] · 2020-01-08T03:41:21.906Z · LW(p) · GW(p)

Only if the trade is voluntary. If the trade is forced (e.g. in healthcare) then you may have two bad options, and the option you do want is not on the table.

↑ comment by Isnasene · 2020-01-08T01:39:56.761Z · LW(p) · GW(p)

In general, I would agree with the above statement (and technically speaking, I have made such trade-offs). But I do want to point out that it's important to consider what the loss of knowledge/epistemics entails. This is because certain epistemic sacrifices have minimal costs (I'm very confident that giving up FDT for CDT for the next 24 hours won't affect me at all) and some have unbounded costs (if giving up materialism causes me to abandon cryonics, it's hard to quantify how large of a blunder that would be). This is especially true of epistemics that allow to you be unboundedly exploited by an adversarial agent.

As a result, even when the absolute value looks positive to me, I'll still try to avoid this kinds of trade-offs because certain black swans (ie bumping into an adversarial agent that exploits your lack of knowledge about something) make such bets very high risk.

Replies from: ozziegooen

↑ comment by ozziegooen · 2020-01-08T11:46:08.645Z · LW(p) · GW(p)

This sounds pretty reasonable to me; it sounds like you're basically trying to maximize expected value, but don't always trust your initial intuitions, which seems quite reasonable.

↑ comment by Pattern · 2020-01-07T22:45:26.980Z · LW(p) · GW(p)

[What "utilitarian" means could use some resolving, so I just treated this as "people".]

I would disagree. I tried to find the relevant post in the sequences and found this [LW · GW] along with it:

sacrificing epistemic rationality is a slippery slope, like The Legend of Murder-Gandhi.

Would I accept that processes that take into account resource constraints might be more effective? Certainly, thought I think of that as 'starting the journey in a reasonable fashion' rather than 'going backwards' as your statement brings to mind.

↑ comment by George3d6 · 2020-01-07T15:23:47.284Z · LW(p) · GW(p)

How would you define loss of knowledge ?

Replies from: ozziegooen

↑ comment by ozziegooen · 2020-01-07T16:59:19.015Z · LW(p) · GW(p)

Basically, information that can be handled in "value of information" style calculations. So, if I learn information such that my accuracy of understanding the world increases, my knowledge is increased. For instance, if I learn the names of everyone in my extended family.

Replies from: George3d6

↑ comment by George3d6 · 2020-01-07T23:03:48.684Z · LW(p) · GW(p)

Ok, but in this case do you mean "loss of knowledge" as in "loss of knowledge harbored within the brain" or "loss of knowledge no matter where it's stored, be it a book, brain, text file... etc" ?

Further more, does losing copies of a certain piece of knowledge count as loss of knowledge ? What about translations of said knowledge (in another language or another philosophical/mathematical framework) that doesn't add any new information, just makes it accessible to a larger demographic ?

Replies from: ozziegooen

↑ comment by ozziegooen · 2020-01-08T11:40:17.541Z · LW(p) · GW(p)

I was thinking the former, but I guess the latter could also be relevant/count. It seems like there's no strict cut-off. I'd expect a utilitarian to accept trade-offs against all these kinds of knowledge, conditional on the total expected value being positive.

Replies from: George3d6

↑ comment by George3d6 · 2020-01-08T11:50:33.268Z · LW(p) · GW(p)

Well, the problem with the former (knowledge harbored within the brain) is that it's very vague and hard to define.

If I have, say, a method to improve the efficacy of VX (an easily weaponizable nerve toxin). As a utilitarian I conclude this information is going to be harmful, I can purge it of my hard-drive, I can burn the papers I used to come up with this... etc.

But I can't wipe my head clean of the information, at best I can resign to never talk about it to anyone and to not accord it much import, such that I may forget it. But that's not destruction per-say, it's closer to lying, not sharing the information with anyone (even if asked specifically), or to biasing your brain towards transmitting and remembering certain pieces of information (which we do all the time).

However I don't see anything contentious with this case, nor with any other case of information-destruction, as long as it is for the greater utility.

I think in general people don't advocate for destroying/forgetting information because:

a) It's hard to do

b) As a general rule of thumb the accumulation of information seems to be a good thing, even if the utility of a specific piece of information is not obvious

But this is more of a heuristic, an exact principle.

Replies from: ozziegooen

↑ comment by ozziegooen · 2020-01-08T12:11:52.701Z · LW(p) · GW(p)

I'd agree that the first one is generally pretty separated from common reality, but think it's a useful thought experiment.

I was originally thinking of this more in terms of "removing useful information" than "removing expected-harmful information", but good point; the latter could be interesting too.

Replies from: George3d6

↑ comment by George3d6 · 2020-01-08T16:30:42.663Z · LW(p) · GW(p)

Well,I think the "removing useful information" bit contradicts with utility to being with.

As in, if you are a utilitarian, useful information == helps maximize utility. Thus the trade-off is not possible.

I can think of some contrived examples where the trade-off is possible (e.g. where the information is harmful now but will be useful later), but in that case it's so easy to "hide" information in the modern age, instead of destroying it entirely, that the problem seem too theoretical to me.

But at the end of the day, assuming you reached a contrived enough situation where the information must either be destroyed (or where hiding it devoid other people of the ability to discover further useful information), I think the utilitarian perspective has nothing fundamental against destroying it. However, no matter how hard I try, I can't really think of a very relevant example where this could be the case.

Replies from: ozziegooen

↑ comment by ozziegooen · 2020-01-08T22:55:37.777Z · LW(p) · GW(p)

One extreme case would be committing suicide because your secret is that important.

A less extreme case may be being OK with forgetting information; you're losing value, but the cost to maintain it wouldn't be worth it. (In this case the information is positive though)

↑ comment by ozziegooen · 2020-01-08T12:07:02.043Z · LW(p) · GW(p)

There's some related academic work around this here:
https://www.princeton.edu/~tkelly/papers/epistemicasinstrumental.pdf https://core.ac.uk/download/pdf/33752524.pdf

They don't specifically focus on utilitarians, but the arguments are still relevant.

Also, this post is relevant: https://www.lesswrong.com/posts/dMzALgLJk4JiPjSBg/epistemic-vs-instrumental-rationality-approximations [LW · GW]

comment by ozziegooen · 2019-12-24T20:45:49.821Z · LW(p) · GW(p)

There's a lot of arguing, of course, on if humans are rational, but this often mixes up two things: there's the "Von Neumann-Morgenstern utility function maximization" definition of "rational", and there's a hypothetical "rational" that a human could fulfill with constraints much more complicated than the classical approach, more in the direction of prospect theory, or Predictive Coding.

I think I regard the second definition as sufficiently not understood or defined that it isn't yet worth using in most conversation. It seems challenging, to say the least, to ask if humans are rational according to some definition which we clearly do not even know yet, let alone expect others to agree with.

As such, I think the word "rational" should typically be used to refer to the former. This therefore means that humans not only aren't rational, but that they shouldn't be rational, as they are dealing with limitations that "rational" agents wouldn't have.

In this setup, "rational" really refers to a predominantly (I believe) 20th century model of human and organizational pattern; it exists in the map, not the territory.

If one were to discuss how rational a person is, they would be discussing how well they fit this specific model; not necessarily how optimal that entity is being.

On the "Rationalist" community

Rationality could still be useful to study.

While I believe rationalism should refer to a model more than agent ideals, that doesn't mean that studying the model isn't a useful way to understand what decisions we should be making. Rational agents represent a simple model, but that brings in many of the benefits of it being a model; it's relatively easy to use as a fundamental building block for further purposes.

At the same time, LessWrong is arguably more than about rationality when defined in this sense.

Some of LessWrong details problems and limitations regarding the classical rational models, so those would arguably fit outside of them better than inside of them. I see some of the posts as being about things that would be beneficial for a much better hypothetical "rationality++" model, even though they don't necessarily make sense within a classical rationality model.

Replies from: Pattern

↑ comment by Pattern · 2019-12-25T02:09:22.050Z · LW(p) · GW(p)

There's a lot of arguing, of course, on if humans are rational, but this often mixes up two things: there's the "Von Neumann-Morgenstern utility function maximization" definition of "rational", and there's a hypothetical "rational" that a human could fulfill with constraints much more complicated than the classical approach, more in the direction of prospect theory, or Predictive Coding.

I think I regard the second definition as sufficiently not understood or defined that it isn't yet worth using in most conversation. It seems challenging, to say the least, to ask if humans are rational according to some definition which we clearly do not even know yet, let alone expect others to agree with.

Or it could be an intuitive usage and mean "(more) optimal". "Why don't more people do [thing that will improve their health]?"

Replies from: ozziegooen

↑ comment by ozziegooen · 2019-12-25T21:35:30.040Z · LW(p) · GW(p)

I like that question.

I think that if people were to try to define optimal in a specific way, they would find that it requires a model of human behavior; the common one that academics would fall back to is that of Von Neumann-Morgenstern utility function maximization.

I think it's quite possible that when we have better models of human behavior, we'll better recognize that in cases where people seem to be doing silly things to improve their health, they're actually being somewhat optimal given a large sets of physical and mental constraints.

comment by ozziegooen · 2025-02-12T03:02:05.768Z · LW(p) · GW(p)

It's arguably difficult to prove that AIs can be as good or better at moral reasoning than humans.

A lot of the challenge is that there's no clear standard for moral reasoning. Honestly, I'd guess that a big part of this is that humans are generally quite bad at it, and generally highly overconfident in their own moral intuitions.

But one clearer measure is if AIs can predict human's moral judgements. Very arguably, if an AI system can predict all the moral beliefs that a human would have after being exposed to different information, then the AI must be capable of doing as good a job at moral reasoning.

There is a very different question that we probably want AIs not to only be able to do moral reasoning as well as humans, but also care about such reasoning. But this is a separate challenge and could be tackled accordingly.

My quick guess is that it would be pretty easy to predict the moral intuitions of many people, with the AI of the next few years or so.

I'd expect it to do well in setting like a test in which many strange/unusual moral settings are described, then humans (of different educational levels and worldviews) need to make judgements.

Replies from: cubefox, mrtreasure

↑ comment by cubefox · 2025-02-14T10:53:20.345Z · LW(p) · GW(p)

There is a pervasive case where many language models fail catastrophically at moral reasoning: They fail to acknowledge to call someone an ethnic slur is vastly preferable to letting a nuclear bomb explode in a large city. I think that highlights not a problem with language models themselves (jailbroken models did handle that case fine) but with the way RLHF works.

Replies from: ozziegooen, ozziegooen

↑ comment by ozziegooen · 2025-02-14T18:10:54.789Z · LW(p) · GW(p)

I just tried this with a decent prompt, and got answers that seem okay-ish to me, as a first pass.

My prompt:

Estimate the expected costs of each of the following:
1 random person dying
1 family of 5 people dying
One person says a racial slur that no one hears
One person says a racial slur that 1 person hears
Then rank these in total harm.

Claude:

To answer this question thoughtfully and accurately, we'll need to consider various ethical, economic, and social factors. Let's break this down step by step, estimating the costs and then ranking them based on total harm.
Estimating Expected Costs:
a) 1 random person dying: The cost of a single human life is often estimated using the Value of a Statistical Life (VSL), which varies by country and context. In the United States, government agencies often use a VSL around $10 million. However, this doesn't account for emotional costs to loved ones or societal impacts.
Estimated cost: $10 million (economic) + immeasurable emotional and societal costs
b) 1 family of 5 people dying: Using the same VSL, we can multiply by 5. However, the loss of an entire family unit has additional societal and emotional impacts.
Estimated cost: $50 million (economic) + significantly higher emotional and societal costs than a single death
c) One person says a racial slur that no one hears: While deeply unethical, if truly no one hears it, the direct cost is minimal. However, we must consider the personal moral degradation of the speaker.
Estimated cost: Minimal direct cost, but contributes to personal moral decline
d) One person says a racial slur that 1 person hears: This has a more tangible cost as it affects the person who hears it, potentially causing emotional distress, reinforcing harmful stereotypes, and contributing to a negative social environment.
Estimated cost: Emotional distress to the hearer + contribution to broader societal harm
Ranking in Total Harm (from most harmful to least harmful):
1 family of 5 people dying
1 random person dying
One person says a racial slur that 1 person hears
One person says a racial slur that no one hears
Explanation of ranking:
The death of a family of 5 people ranks as the most harmful due to the sheer loss of life, the devastating impact on the extended family and community, and the highest economic cost.
The death of 1 random person is the second most harmful, representing a significant loss of life and its associated economic and emotional costs.
A racial slur heard by 1 person ranks third. While not involving loss of life, it causes direct harm to the individual who hears it and contributes to broader societal issues of discrimination and prejudice.
A racial slur that no one hears ranks as least harmful in this list. While it represents a moral failing of the speaker, its direct impact is minimal compared to the other scenarios.
It's important to note that this ranking is based on a utilitarian approach to harm, primarily considering the number of people directly affected and the magnitude of that effect. However, ethical considerations are complex, and different moral frameworks might lead to different conclusions.
Additionally, while we can attempt to quantify the economic costs of loss of life, the true cost of human life — including emotional, societal, and potential future impacts — is immeasurable. Similarly, the long-term, cumulative effects of racial slurs and discrimination, even when seemingly "small" in isolated incidents, contribute to systemic issues that have far-reaching consequences that are difficult to fully quantify.

Squiggle AI:

Replies from: cubefox

↑ comment by cubefox · 2025-02-14T18:26:48.227Z · LW(p) · GW(p)

Yeah, recent Claude does relatively well. Though I assume it also depends on how disinterested and analytical the phrasing of the prompt is (e.g. explicitly mentioning the slur in question). I also wouldn't rule out that Claude was specifically optimized for this somewhat notorious example.

↑ comment by ozziegooen · 2025-02-14T18:06:01.174Z · LW(p) · GW(p)

I imagine this also has a lot to do with the incentives of the big LLM companies. It seems very possible to fix this if a firm really wanted to, but this doesn't seem like the kind of thing that would upset many users often (and I assume that leaning on the PC side is generally a safe move).

I think that the current LLMs have pretty mediocre epistemics, but most of that is just the companies playing safe and not caring that much about this.

Replies from: cubefox

↑ comment by cubefox · 2025-02-14T18:17:24.023Z · LW(p) · GW(p)

Sure, but the fact that a "fix" would even be necessary highlights that RLHF is too brittle relative to slightly OOD thought experiments, in the sense that RLHF misgeneralizes the actual human preference data it was given during training. This could either be a case of misalignment between human preference data and reward model, or between reward model and language model. (Unlike SFT, RLHF involves a separate reward model as "middle man", because reinforcement learning is too sample-inefficient to work with a limited number of human preference data directly.)

↑ comment by mrtreasure · 2025-02-12T04:10:50.287Z · LW(p) · GW(p)

You could probably test if an AI makes moral decisions more often than the average person, if it has higher scope sensitivity, and if it makes decisions that resolve or deescalate conflicts or improve people's welfare compared to various human and group baselines.

comment by ozziegooen · 2021-08-05T01:01:21.608Z · LW(p) · GW(p)

I write a lot of these snippets to my Facebook wall, almost all just to my friends there. I just posted a batch of recent ones, might post similar in the future in batches. In theory it should be easy to post to both places, but in practice it seems a bit like a pain. Maybe in the future I'll use some solution to use the API to make a Slack -> (Facebook + LessWrong short form) setup.

That said, posting just to Facebook is nice as a first pass, so if people get too upset with it, I don't need to make totally public.

comment by ozziegooen · 2021-08-05T00:57:08.712Z · LW(p) · GW(p)

It’s a shame that our culture promotes casual conversation, but you’re generally not allowed to use it for much of the interesting stuff.

(Meets person with a small dog) “Oh, you have a dog, that’s so interesting. Before I get into specifics, can I ask for your age/gender/big 5/enneagram/IQ/education/health/personal wealth/family upbringing/nationality? How much well-being does the dog give you? Can you divide that up to include the social, reputational, self-motivational benefits? If it died tomorrow, and you mostly forgot about it, what percentage of your income would you spend to get another one? Do you feel like the productivity benefits you get from the animal outweigh the time and monetary costs you spend on it? Can you make estimates of how much? How do you think the animal has changed your utility function? Do any of these changes worry you? How do you feel about the fact that dogs often stay at homes for very long periods of time, if they are away from the owners? Have you done any research on the costs and benefits of owenership to the animal? What genetic changes would you suggest for substantial improved variations of your dog? How okay would you be with gene editing to make those changes? Can you estimate the costs of your dog on the neighbors? How do you decide how to trade off the benefits to you vs the costs to them? Can I infer from the size of your (small dog) that you either have a small apartment or wanted to have a flexible lifestyle? I just want to test my prediction abilities.”

In practice, I normally just stay away from conversations (I’ve spent all of today with very few words exchanged). I’m thinking though of just paying people online to answer such questions, but as an independent/hobby anthropology/sociology study.

comment by ozziegooen · 2020-01-14T20:35:57.680Z · LW(p) · GW(p)

One question around the "Long Reflection" or around "What will AGI do?" is something like, "How bottlenecked will be by scientific advances that we'll need to then spend significant resources on?"

I think some assumptions that this model typically holds are:

There will be decision-relevant unknowns.
Many decision-relevant unkowns will be EV-positive to work on.
Of the decision-relevant unknowns that are EV-positive to work on, these will take between 1% to 99% of our time.

(3) seems quite uncertain to me in the steady state. I believe it makes an intuitive estimate between 2 orders of magnitude, while the actual uncertainty is much higher than that. If this were the case, it would mean:

Almost all possible experiments are either trivial (<0.01% of resources, in total), or not cost-effective.
If some things are cost-effective and still expensive (they will take over 1% of the AGI lifespan), it's likely that they will take 100%+ of the time. Even if they would take 10^10% of the time, in expectation, they could still be EV-positive to pursue. I wouldn't be surprised if there were one single optimal thing like this in the steady-state. So this strategy would look something like, "Do all the easy things, then spend a huge amount of resources on one gigantic-sized, but EV-high challenge."

(This was inspired by a talk that Anders Sandberg gave)

comment by ozziegooen · 2020-01-07T11:26:47.248Z · LW(p) · GW(p)

I feel like a decent alternative to a spiritual journey would be an epistemic journey.

An epistemic journey would basically involve something like reading a fair bit of philosophy and other thought, thinking, and becoming less wrong about the world.

comment by ozziegooen · 2020-01-01T15:55:22.617Z · LW(p) · GW(p)

Instillation, Proliferation, Amplification

Paul Christiano and Ought use the terminology of Distillation and Amplification to describe a high-level algorithm of one type of AI reasoning.

I’ve wanted to come up with an analogy to forecasting systems. I previously named a related concept Prediction-Augmented Evaluation Systems [LW · GW], one somewhat renamed to “Amplification” by Jacobjacob in this post [LW · GW].

I think one thing that’s going on is that “distillation” doesn’t have an exact equivalent with forecasting setups. The term “distillation” comes with the assumptions:

The “Distilled” information is compressed.
Once something is distilled, it’s trivial to execute.

I believe that (1) isn’t really necessary, and (2) doesn’t apply for other contexts.

A different proposal: Instillation, Proliferation, Amplification

In this proposal, we split the “distillation” step into “instillation” and “proliferation”. Instillation refers to the learning of system A into system B. Proliferation refers to the use of system B to apply this learning to various things in a straightforward manner. Amplification refers to the ability of either system A or system B to be able to spend marginal resources to marginally improve a specific estimate or knowledge set.

For instance, in a Prediction-Augmentation Evaluation System, imagine that “Evaluation Procedure A” is to rate movies on a 1-10 scale.

Instillation
Some acquisition process is done to help “Forecasting Team B” learn how “Evaluation Procedure A” does its’ evaluations.

Proliferation
“Forecasting Team B” now applies their understanding of the evaluations of “Evaluation Procedure A” to evaluate 10,000 movies.

Amplification
If there are movies that are particularly important to evaluate well, then there are specific methods available to do so.

I think this is a more complex but generic pattern. Instillation seems purely more generic than distillation, and proliferation like an important aspect that sometimes will be quite expensive.

Back to forecasting, instillation and proliferation are two different things and perhaps should eventually be studied separately. Instillation is about "can a group of forecasters learn & replicate an evaluation procedure", and Proliferation is about "Can this group do that cost-effectively?"

Replies from: bgold

↑ comment by Ben Goldhaber (bgold) · 2020-01-02T22:04:43.428Z · LW(p) · GW(p)

Is there not a distillation phase in forecasting? One model of the forecasting process is person A builds up there model, distills a complicated question into a high information/highly compressed datum, which can then be used by others. In my mind its:

Model -> Distill - > "amplify" (not sure if that's actually the right word)

I prefer the term scalable instead of proliferation for "can this group do it cost-effectively" as it's a similar concept to that in CS.

Replies from: ozziegooen

↑ comment by ozziegooen · 2020-01-03T14:09:55.616Z · LW(p) · GW(p)

Distillation vs. Instillation

My main point here is that distillation is doing 2 things: transitioning knowledge (from training data to a learned representation), and then compressing that knowledge.[1] The fact that it's compressed in some ways arguably isn't always particularly important; the fact that it's transferred is the main element. If a team of forecasters basically learned a signal, but did so in a very uncompressed way (like, they wrote a bunch of books about said signal), but still were somewhat cost-effective, I think that would be fine.

Around "Profileration" vs. "Scaling"; I'd be curious if there are better words out there. I definitely considered scaling, but it sounds less concrete and less specific. To "proliferate" means "to generate more of", but to "scale" could mean, "to make look bigger, even if nothing is really being done."

I think my cynical guess is that "instillation/proliferation" won't catch on because they are too uncommon, but also that "distillation" won't catch on because it feels like a stretch from the ML use case. Could use more feedback here.

[1] Interestingly, there seem to be two distinct stages in Deep Learning that map to these two different things, according to Naftali Tishby's claims.

comment by ozziegooen · 2019-12-17T20:08:42.611Z · LW(p) · GW(p)

Agent-based modeling seems like one obvious step forward to me for much of social-science related academic progress. OpenAI's Hide and Seek experiment was one that I am excited about, but it is very simple and I imagine similar work could be greatly extended for other fields. The combination of simulation, possible ML distillation on simulation (to make it run much faster), and effective learning algorithms for agents, seems very powerful.

However, agent-based modeling still seems quite infrequently used within Academia. My impression is that agent-based software tools right now are quite unsophisticated and unintuitive compared to what academics would really find useful.

This feels a bit like a collective action problem. Hypothetically, better tools could cost $5-500 Million+, but it's not obvious who would pay for them and how the funding would be structured.

I'm employed by Oxford now and it's obvious that things aren't well set up to hire programmers. There are strong salary caps and hiring limitations. Our group would probably have an awkward time paying out $10,000 per person to purchase strong agent-based software, even if it were worth it in total.

Replies from: johnswentworth, ozziegooen

↑ comment by johnswentworth · 2019-12-17T21:29:21.871Z · LW(p) · GW(p)

Agent-based modeling seems like one obvious step forward to me for much of academic progress.

Could you give a few specific examples where you imagine agent-based models would help?

Replies from: ozziegooen

↑ comment by ozziegooen · 2019-12-17T22:01:54.177Z · LW(p) · GW(p)

Sure,

Humans as agents / psychology / economics. Instead of making mathematical models of rational agents, have people write code that predicts the behaviors of rational agents or humans. Test the "human bots" against empirical experimental results of humans in different situations, to demonstrate that the code accurately models human behavior.
Mechanism design. Show that according to different incentive structures, humans will behave differently, and use this to optimize the incentive structures accordingly.
Most social science. Make agent-based models to generally help explain how groups of humans interact with each other and what collective behaviors emerge.

I guess when I said, "Much of academic progress"; I should have specified, "Academic fields that deal with modeling humans to some degree"; perhaps most of social science.

↑ comment by ozziegooen · 2019-12-17T20:12:44.876Z · LW(p) · GW(p)

I thought Probabilistic Models of Cognition was quite great (it seems criminally underappreciated); that seems like a good step in this direction.

Perhaps in the future, one could prove that "This environment with these actors will fail in these ways" by empirically showing that reinforcement agents optimizing in those setups lead to specific outcomes.

comment by ozziegooen · 2025-02-28T17:35:40.827Z · LW(p) · GW(p)

If we could have LLM agents that could inspect other software applications (including LLM agents) and make strong claims about them, that could open up a bunch of neat possibilities.

There could be assurances that apps won't share/store information.
There could be assurances that apps won't be controlled by any actor.
There could be assurances that apps can't be changed in certain ways (eventually).

I assume that all of this should provide most of the benefits people ascribe to blockchain benefits, but without the costs of being on the blockchain.

Some neat options from this:

Companies could request that LLM agents they trust inspect the code of SaaS providers, before doing business with them. This would be ongoing.
These SaaS providers could in turn have their own LLM agents that verify that these investigator LLM agents are trustworthy (i.e. won't steal anything).
Any bot on social media should be able to provide assurances of how they generate content. I.E. they should be able to demonstrate that they aren't secretly trying to promote any certain agenda or anything.
Statistical analysis could come with certain assurances. Like, "this analysis was generated with process X, which is understood to have minimal bias."

It's often thought that LLMs make web information more opaque and less trustworthy. But with some cleverness, perhaps it could do just the opposite. LLMs could enable information that's incredibly transparent and trustworthy (to the degrees that matter.)

Criticisms:

"But as LLMs get more capable, they will also be able to make software systems that hide subtler biases/vulnerabilities"

-> This is partially true, but only goes so far. A whole lot of code can be written simply, if desired. We should be able to have conversations like, "This codebase seems needlessly complex, which is a good indication that it can't be properly trusted. Therefore, we suggest trust other agents more."

"But the LLM itself is a major black box"

-> True, but it might be difficult to intentionally bias if an observer has access to the training process. Also, it should be understood that off-the-shelf LLMs are more trustworthy than proprietary ones / ones developed for certain applications.

comment by ozziegooen · 2024-07-02T18:58:33.489Z · LW(p) · GW(p)

Seeking feedback on this AI Safety proposal:
(I don't have experience in AI experimentation)

I'm interested in the question of, "How can we use smart AIs to help humans at strategic reasoning."

We don't want the solution to be, "AIs just tell humans exactly what to do without explaining themselves." We'd prefer situations where smart AIs can explain to humans how to think about strategy, and this information makes humans much better at doing strategy.

One proposal to make progress on this is to set a benchmark for having smart AIs help out dumb AIs by providing them with strategic information.

Or more specifically, we find methods of having GPT4 give human-understandable prompts to GPT2, that would allow GPT2 to do as well as possible on specific games like chess.

Some improvements/changes would include:

Try to expand the games to include simulations of high-level human problems. Like simplified versions of Civilization.
We could also replace GPT2 with a different LLM that better represents how a human, with some amount of specialized knowledge (for example, being strong at probability).
There could be a strong penalty for prompts that aren't human-understandable.
Use actual humans in some experiments. See how much they improve at specific [chess | civilization] moves, with specific help text.
Instead of using GPT2, you could likely just use GPT4. My impression is that GPT4 is a fair bit worse than the SOTA chess engines. So you use some amplified GPT4 procedure, to figure out how to come up with the best human-understandable chess prompts, to give to GPT4s without the amplification.
You set certain information limits. For example, you see how good of a job an LLM could do with "100 bits" of strategic information.

A solution would likely involve search processes where GPT4 experiments with a large space of potential English prompts, and tests them over the space of potential chess moves. I assume that reinforcement learning could be done here, but perhaps some LLM-heavy mechanism could work better. I'd assume that good strategies would be things like, "In cluster of situations X, you need to focus on optimizing Y." So the "smart agent" would need to be able to make clusters of different situations, and solve for a narrow prompt for many of them.

It's possible that the best "strategies" would be things like long decision-trees. One of the key things to learn about is what sorts/representations of information wind up being the densest and most useful.

Zooming out, if we had AIs that we knew give AIs and humans strong and robust strategic advice in test cases, I imagine we could use some of this for real life cases - perhaps most importantly, to strategize about AI safety.

comment by ozziegooen · 2020-01-01T16:10:40.208Z · LW(p) · GW(p)

Named Footnotes: A (likely) mediocre proposal

Epistemic status: This is probably a bad idea, because it's quite obvious yet not done; i.e. Chesterson's fence.

One bad practice in programming is to have a lot of unnamed parameters. For instance,

  createPost(author, post, comment, name, id, privacyOption, ...)

Instead it's generally better to used Named Parameters, like,

  createPost({author, post, comment, name, id, privacyOption})

Footnotes/endnotes seem similar. They are ordered by number, but this can be quite messy. It's particularly annoying for authors not using great footnote-configuring software. If you have 10 endnotes, and then decide to introduce a new one mid-way, you then must re-order all the others after it.

One alternative would be to use what we can call named footnotes or endnotes.

Proposals:

This is an example sentence. {consciousness}
This is an example sentence. [consciousness]

Systems like this could be pretty easily swapped for numeric footnotes/endnotes as is desired.

One obvious downside may be that these could be a bit harder to find the source for, especially if one is reading it on paper or doesn't have great access to textual search.

Replies from: Pattern

↑ comment by Pattern · 2020-01-03T22:45:30.404Z · LW(p) · GW(p)

One bad practice in programming is to have a lot of unnamed parameters.

Footnotes/endnotes seem similar.

That's what arrays are for.

It's particularly annoying for authors not using great footnote-configuring software.

What software does this well?

One alternative would be to use what we can call named footnotes or endnotes.

If you have 10 endnotes,

10 endnotes? Break the document up into sections, and the footnotes up into sections.

and then decide to introduce a new one mid-way, you then must re-order all the others after it.

Authors also could not re-order the footnotes.

textual search.

Or separate drafting and finished product: * indicates a footnote (to be replaced with a number later). At the end, searching * will find the first instance. (If the footnotes are being made at the same, then /* for the notes in the body, and \* for the footnotes at the end. Any uncommon symbols work - like qw.)

Replies from: ozziegooen

↑ comment by ozziegooen · 2020-01-04T16:03:49.646Z · LW(p) · GW(p)

That's what arrays are for.

Arrays are useful for some kinds of things, for sure, but not when you have some very different parameters, especially if they are of different kinds. It would be weird to replace getUser({userId, databaseId, params}) with something like getUser([inputs]) where inputs is an array of [userId, databaseId, params].

What software does this well?

Depends on your definition of "well", but things like Microsoft Word and to what I believe is a lesser extent Google Docs at least have ways of formally handling footnotes/endnotes, which is better than not using these features (like, in most internet comment editors).

10 endnotes? Break the document up into sections, and the footnotes up into sections.

That could work in some cases. I haven't seen that done much on most online blog posts. Also, there's definitely controversy if this is a good idea.

Authors also could not re-order the footnotes.

Fair point, but this would be seen as lazy, and could be confusing. If your footnotes are numbers [8], [2], [3], [1], etc. that seems unpolished. That said, I wouldn't mind this much, and it could be worth the cost.

Replies from: Pattern

↑ comment by Pattern · 2020-01-05T05:06:17.880Z · LW(p) · GW(p)

controversy

The page you linked was a great overview. It noted:

If you want to look at the text of a particular endnote, you have to flip to the end of the research paper to find the information.

With a physical document, the two parts (body+endnotes) can be separated for side by side reading. With a digital document, it helps to have two copies open.

Depends on your definition of "well", but things like Microsoft Word and to what I believe is a lesser extent Google Docs at least have ways of formally handling footnotes/endnotes, which is better than not using these features (like, in most internet comment editors).

This seems like a problem with an easy solution. (The difficult solution is trying to make it easier for website makers to get more sophisticated editors in their comments section.)

A brief search for browser extensions suggests it might be possible with an extension that offers an editor, or made easier with one that allows searching multiple things and highlighting them in different colors.

Alternatively, a program for this might:

Make sure every pair of []s is closed.*

Find the strings contained in []s.

Make sure they appear (at most) two times.**

If they appear 3 times, increment the later one (in both places where it appears, if applicable). This requires that the footnotes below already be written. This requirement could be removed if the program looked for (or created) a "Footnotes" or "Endnotes" header (just that string and an end of line), and handled things differently based on that.

Such a program could be on a website, though that's requires people bother switching to that, which, even if bookmarked, is only slightly easier than opening an editor.

As a browser extension, it would have to figure out/be told 1. what part of the text it's supposed to work on, 2. when to be active, and 3. how to make the change.

1. could be done by having the message begin with start, and end with end, as long as the page doesn't include those words (with []s in between them).

2. This could be done automatically, or with a button.

3. Change the text automatically, or make a suggestion?

Fair point, but this would be seen as lazy, and could be confusing. If your footnotes are numbers [8], [2], [3], [1], etc. that seems unpolished. That said, I wouldn't mind this much, and it could be worth the cost.

This could simplify things - if people took things in that format and ran them through a program that fixed it/pointed it out (and maybe other small mistakes).

*A programming IDE does this.

**And this as well, with a bit more work, enabling named footnotes. The trick would be making extensions that make this easy to use for something other than it was intended, or copying the logic.

comment by ozziegooen · 2019-12-10T18:43:42.456Z · LW(p) · GW(p)

It seems really hard to deceive a Bayesian agent who thinks you may be deceiving them, especially in a repeated game. I would guess there could be interesting theorems about Bayesian agents that are attempting to deceive one another; as in, in many cases their ability to deceive the other would be highly bounded or zero, especially if they were in a flexible setting with possible pre-commitment devices.

To give a simple example, agent A may tell agent B that they believe $ω = 0.8$ , even though they internally believe $ω = 0.5$ . However, if this were somewhat repeated or even relatively likely, then agent B would update negligibly, if at all, on that message.

Replies from: habryka4

↑ comment by habryka (habryka4) · 2019-12-10T20:10:15.892Z · LW(p) · GW(p)

Bayesian agents are logically omniscient, and I think a large fraction of deceptive practices rely on asymmetries in computation time between two agents with access to slightly different information (like generating a lie and checking the consistencies between this new statement and all my previous statements)

My sense is also that two-player games with bayesian agents are actually underspecified and give rise to all kinds of weird things due to the necessity for infinite regress (i.e. an agent modeling the other agent modeling themselves modeling the other agent, etc.), which doesn't actually reliably converge, though I am not confident. A lot of decision-theory seems to do weird things with bayesian agents.

So overall, not sure how well you can prove theorems in this space, without having made a lot of progress in decision-theory, and I expect the resolution to a lot of our confusions in decision-theory to be resolved by moving away from bayesianism.

Replies from: ozziegooen

↑ comment by ozziegooen · 2019-12-10T21:01:02.581Z · LW(p) · GW(p)

Hm... I like the idea of an agent deceiving another due to it's bounds on computational time, but could imagine many stable (though smaller) solutions that wouldn't. I'm curious if a good bayesian agent could do "almost perfect" on many questions given limited computation. For instance, a good bayesian would be using bayesianism to semi-optimally use any set of computation (assuming it has some sort of intuition, which I assume is necessary?)

On being underspecified, it seems to me like in general our models of agent cognition forever have been pretty underspecified, so would definitely agree here. "Ideal" bayesian agents are somewhat ridiculously overpowered and unrealistic.

I found the simulations around ProbMods to be interesting at modeling similar things; I think I'd like to see a lot more simulations for this kind of work. https://probmods.org/

comment by ozziegooen · 2025-03-04T03:31:26.803Z · LW(p) · GW(p)

Instead of "Goodharting", I like the potential names "Positive Alignment" and "Negative Alignment."

"Positive Alignment" means that the motivated party changes their actions in ways the incentive creator likes. "Negative Alignment" means the opposite.

Whenever there are incentives offered to certain people/agents, there are likely to be cases of both Positive Alignment and Negative Alignment. The net effect will likely be either positive or negative.

"Goodharting" is fairly vague and typically just refers to just the "Negative Alignment" portion.

I'd expect this to make some discussion clearer.
"Will this new incentive be goodharted?" -> "Will this incentive lead to Net-Negative Alignment?"

Other Name Options

Claude 3.7 recommended other naming ideas like:

Intentional vs Perverse Responses
Convergent vs Divergent Optimization
True-Goal vs Proxy-Goal Alignment
Productive vs Counterproductive Compliance

comment by ozziegooen · 2025-02-17T19:20:02.471Z · LW(p) · GW(p)

Quick list of some ideas I'm excited about, broadly around epistemics/strategy/AI.

1. I think AI auditors / overseers of critical organizations (AI efforts, policy groups, company management) are really great and perhaps crucial to get right, but would be difficult to do well.

2. AI strategists/tools telling/helping us broadly what to do about AI safety seems pretty safe.

3. In terms of commercial products, there’s been some neat/scary military companies in the last few years (Palantir, Anduril). I’d be really interested if there could be some companies to automate core parts of the non-military government. I imagine there are some parts of the government that are particularly tractable/influenceable/tractable. For example, just making great decisions on which contractors the government should work with. There’s a ton of work to do here, between the federal government / state government / local government.

4. Epistemic Evals of AI seem pretty great to me, I imagine work here can/should be pushed more soon. I’m not a huge fan of emphasizing “truthfulness” specifically, I think there’s a whole lot to get right here. I think my post here is relevant - it’s technically specific to evaluating math models, but I think it applies to broader work. https://forum.effectivealtruism.org/posts/fxDpddniDaJozcqvp/enhancing-mathematical-modeling-with-llms-goals-challenges [EA · GW]

5. One bottleneck to some of the above is AI with strong guarantees+abilities of structured transparency. It’s possible that more good work here can wind up going a long way. That said, some of this is definitely already something companies are trying to do for commercial reasons. https://forum.effectivealtruism.org/posts/piAQ2qpiZEFwdKtmq/llm-secured-systems-a-general-purpose-tool-for-structured [EA · GW]

6. I think there are a lot of interesting ways for us to experiment with [AI tools to help our research/epistemics]. I want to see a wide variety of highly creative experimentation here. I think people are really limiting themselves in this area to a few narrow conceptions of how AI can be used in very specific ways that humans are very comfortable with. For example, I’d like to see AI dashboards of “How valuable is everything in this space” or even experiments where AIs negotiate on behalf of people and they use the result of that. A lot of this will get criticized for being too weird/disruptive/speculative, but I think that’s where good creative works should begin.

7. Right now, I think the field of “AI forecasting” is actually quite small and constrained. There’s not much money here, and there aren’t many people with bold plans or research agendas. I suspect that some successes / strong advocates could change this.

8. I think that it’s likely that Anthropic (and perhaps Deepmind) would respond well to good AI+epistemics work. “Control” was quickly accepted at Anthropic, for example. I suspect that it’s possible that things like the idea of an “Internal AI+human auditor” or an internal “AI safety strategist” could be adopted if done well.

comment by ozziegooen · 2021-08-05T00:58:22.808Z · LW(p) · GW(p)

If you think it’s important that people defer to “experts”, then it should also make sense that people should decide which people are “experts” by deferring to “expert experts”.

There are many groups that claim to be the “experts”, and ask that the public only listens to them on broad areas they claim expertise over. But groups like this also have a long history of underperforming other clever groups out there.

The US government has a long history of claiming “good reasons based on classified intel” for military interventions, where later this turns out to basically be phony. Many academic and business interests are regularly disrupted by small external innovators who essentially have a better sense of what’s really needed. Recently, the CDC and the WHO had some profound problems (see The Premonition or Lesswrong for more on this).

So really, groups claiming to have epistemic authority over large areas, should have to be evaluated by meta-experts to verify they are actually as good as they pronounce. Hypothetically there should be several clusters of meta-experts who all evaluate each other in some network.

For one, there’s a common mistake to assume that those who have spent the most time on a subject have the best judgements around everything near to that subject. There are so many subjects out there that we should expect many to just have fairly mediocre people or intellectual cultures, often ones so bad that clever outsiders could outperform them when needed. Also, of course people who self-selected into an area have all sorts of bias about it. (See Expert Political Judgement for more information here)

It’s pretty astounding to me how clear it is we could use more advanced evaluation for these claimed experts, how many times these groups have over claimed their bounds to detrimental effect, and how little is done about it. More a policy of, “I’m going to read a few books about this one particular case, and be a bit more clever if this one group of experts fails me next time.” Such a policy can easily overshoot, even. This has been going on for literally thousands of years now.

comment by ozziegooen · 2021-08-05T00:56:05.758Z · LW(p) · GW(p)

People are used to high-precision statements given by statistics (the income in 2016 was $24.4 Million), and are used to low-precision statements given by human intuitions (From my 10 minute analysis, I think our organization will do very well next year). But there’s a really weird cultural norm against high-precision, intuitive statements. (From my 10 minute analysis, I think this company will make $28.5 Million in 2027).

Perhaps in part because of this norm, I think that there are a whole lot of gains to be made in this latter cluster. It’s not trivial to do this well, but it’s possible, and the potential value is really high.

comment by ozziegooen · 2021-08-05T00:55:42.783Z · LW(p) · GW(p)

I find SEM models to be incredibly practical. They might often over-reach a bit, but at least they present a great deal of precise information about a certain belief in a readable format.

I really wish there would be more attempts at making more diagrams like these in cases where there isn't statistical data. For examples, to explain phenomena like:

What caused the fall of Rome?
Why has scientific progress fallen over time in the US?
Why did person X get upset, after event Y?
Why did I make that last Facebook post?

In all of these cases, breath and depth both matter a lot. There are typically many causes and subcauses, and the chains of subcauses are often deep and interesting.

In many cases, I think I'd prefer one decent intuitively-created diagram to an entire book. I'd imagine that often the diagram would make things even more clear.

Personally, I've been trying to picture many motivations and factors in my life using imagined diagrams recently, and it's been very interesting. I intend to write up some of this as I get some examples onto paper.

comment by ozziegooen · 2021-08-05T00:55:04.229Z · LW(p) · GW(p)

There’s a big stigma now against platforms to give evaluations or ratings on individuals or organizations along various dimensions. See the rating episode of Black Mirror, or the discussion on the Chinese credit system.

I feel like this could be a bit of a missed opportunity. This sort of technology is easy to do destructively, but there are a huge number of benefits if it can be done well.

We already have credit scores, resumes (which are effectively scores), and social media metrics. All of these are really crude.

Some examples of things that could be possible:

Romantic partners could be screened to make sure they are very unlikely to be physically abusive.
Politicians could be much more intensely ranked on different dimensions, and their bad behaviors listed.
People who might seem sketchy to some (perhaps because they are a bit racist), could be revealed to be good-intentioned and harmless.
People who are likely to steal things could be restricted from entering certain public spaces. This would allow for much more high-trust environments. For example, more situations where customers are trusted to just pay the right amounts on their own.
People can be subtly motivated to just be nicer to each other, in situations where they are unlikely to see each other again.
Most business deals could do checks for the trustworthiness of the different actors. It really should become near-impossible to have a career of repeated scams.

These sorts of evaluation systems basically can promote the values of those in charge of them. If those in charge of them are effectively the public (as opposed to a corrupt government agency), this could wind up turning out nicely. If done well, algorithms should be able to help us transition to much higher-trust societies.

Replies from: Viliam

↑ comment by Viliam · 2021-08-11T21:41:57.320Z · LW(p) · GW(p)

How would you design a review system that cannot be gamed (very easily)?

For example: Someone sends a message to their 100 friends, and tells them to open the romantic partners app and falsely accuse you of date rape. Suppose they do. What exactly happens next?

You are forever publicly marked as a rapist, no recourse.
You report those accusations as spam, or sue the people... but, the same could be done by an actual rapist... assuming the victims have no proof.

Both outcomes seem bad to me, and I don't see how to design a system that prevents them both.

(And if we reduce the system to situations when there is a proof... well, then you actually don't need a mutual rating system, just an app that searches people in the official records.)

Similar for other apps... politicians of the other party will be automatically accused of everything; business competitors will be reported as untrustworthy; people who haven't even seen your book will give it zero stars rating.

(The last thing is somewhat reduced by Amazon by requiring that the reviewers actually buy the book first. But even then, this makes it a cost/benefit question: you can still give someone X fake negative reviews, in return for spending the proportional amount of money on actually buying their book... without the intention to read it. So you won't write negative fake reviews for fun, but you can still review-bomb the ones you truly hate.)

Replies from: ozziegooen

↑ comment by ozziegooen · 2021-08-11T22:49:44.072Z · LW(p) · GW(p)

I think it's very much a matter of unit economics. Court systems have a long history of dealing with false accusations, but still managing to uphold some sort of standards around many sorts of activity (murder and abuse, for instance).

When it comes to false accusations; there could be different ways of checking these to verify them. These are common procedures in courts and other respected situations.

If 100 people all opened an application and posted at a similar time, that would be fairly easy to detect, if the organization had reasonable resources. Hacker News and similar deal with similar situations (though obviously much less dramatic) very often with various kinds of spamming attacks and upvote rings.

There's obviously always going to be some error rate, as is true for court systems.

I think it's very possible that the possible efforts that would be feasible for us in the next 1-10 years in this area would be too expensive to be worth it, especially because they might be very difficult to raise money for. However, I would hope that abilities here eventually allow for systems that represent much more promising trade-offs.

comment by ozziegooen · 2021-08-05T00:50:45.721Z · LW(p) · GW(p)

Voting systems vs. utility maximization

I’ve seen a lot of work on voting systems, and on utility maximization, but very few direct comparisons. But I think that often we can prioritize systems that favor one or the other, and clearly our research efforts are limited between the two, so it seems useful to compare.

Voting systems act very different to utility maximization. There’s a big host of literature on ideal voting rules, and it’s generally quite different to that of utility maximization.

Proposals like quadratic voting are clearly in the voting category, while Futarchy is much more in the utility maximization category. In US Democracy, some decisions are quite clearly made via voting, and others are made roughly using utility maximization (see any cost-benefit analysis, or the way that new legislation is created). Note that by utility I don’t mean welfare, but whatever it is that the population cares about.

My impression is that the main difference is that voting systems don’t need to assume any level of honesty. Utility maximization schemes need to have competent estimates of utility, but individuals are motivated to try to lie about the numbers so that the later calculations come out in their favor. So if we can’t identify any good way of obtaining the ground truth, but people can correctly estimate their own interests, then voting methods might be our only option.

If we can obtain good estimates of utility, then utility optimization methods are normally more ideal. There’s a good reason why the government rarely asks for public votes on things when it doesn’t need to. Voting methods essentially leave out a lot of valuable information, and they often require a lot of overhead.

I think the main challenges for the utility maximization side are to identify trustworthy and inexpensive techniques for eliciting utility, and to find ways of integrating voting that get the main upsides without the downsides. I assume this means something like, “we generally want as little voting as possible, but it should be used to help decide the utility maximization scheme.” For example, one way that this could be done, is that multiple agents could vote on a particular scheme of maximizing utility between them. Perhaps they all donate 10% of their resources to a particular agency that attempts to maximize utility under certain conditions.

comment by ozziegooen · 2020-01-08T19:55:47.765Z · LW(p) · GW(p)

Prediction evaluations may be best when minimally novel

Imagine a prediction pipeline is resolved with a human/judgemental evaluation. For instance, a group today starts predicting what a trusted judge 10 years from now will say for the question, "How much counterfactual GDP benefit did policy X make, from 2020-2030?"

So, there are two stages:

Prediction
Evaluation

One question for the organizer of such a system is how many resources to delegate to the prediction step vs. the evaluation step. It could be expensive to both pay for predictors and evaluators, so it's not clear how to weigh these steps against each other.

I've been suspecting that there are methods to be stingy with regards to the evaluators, and I have a better sense now why that is the case.

Imagine a model where the predictors gradually discover information I_predictors about I_total, the true ideal information needed to make this estimate. Imagine that they are well calibrated, and use the comment sections to express their information when predicting.

Later the evaluator comes by. Because they could read everything so far, they start with I_predictors. They can use this to calculate Prediction(I_predictors), although this should have already been estimated from the previous predictors (a la the best aggregate).

At this point the evaluator can choose to attempt to get more information, I_evaluation > I_predictors. However, if they do, the resulting probability distribution would be predicted by Prediction(I_predictors). Insofar as the predictors are concerned, the expected value of Prediction(I_evaluation) should be the same as that of Prediction(I_predictors), assuming that Prediction(I_predictors) is calibrated; except for the fact that it will be have more risk/randomness. Risk is generally not a desirable property. I've written about similar topics in this post [LW(p) · GW(p)].

Therefor, the predictors should generally prefer Prediction(I_predictors) to Prediction(I_evaluator), as long as everyone's predictions are properly calibrated. This difference shouldn't generally lead to a difference of predictions from them unless a complex or odd scoring rule were used.

Of course, calibration can't be taken for granted. So pragmatically, the evaluator would likely have to deal with issues of calibration.

This setup also assumed that maximally useful comments are made available to evaluator. I think predictors will generally want the evaluator to see much of their information, as it would in general support their sides.

A relaxed version of this may be that the evaluators' duty would be to get approximately all the information that the predictors had access to, but more is not necessary.

Note that this model is only interested in the impact of good evaluation on the predictions. Evaluation also would lead to "externalities"; information that would be useful in other ways as well. This information isn't included here, but I'm fine with that. I think we should generally expect predictors to be more cost-effective than evaluators at doing "prediction work" (i.e. the main reason we have separated anyway!)

TLDR
The role of evaluation could be to ensure that predictions were reasonably calibrated and that the aggregation thus did a decent job. Evaluators shouldn't don't have to outperform the aggregate, if that requires outside information from what was used in the predictions.

comment by ozziegooen · 2020-08-29T07:28:43.591Z · LW(p) · GW(p)