Blessed information, garbage information, cursed information

post by tailcalled · 2024-04-18T16:56:17.370Z · LW · GW · 8 comments

Contents

8 comments

This post is also available on my substack. Thanks to Justis Mills for editing and feedback.

Imagine that you're a devops engineer who has been tasked with solving an incident where a customer reports having bad performance. You can look through the logs of their server, but this raises the problem that there's millions of lines of log, and likely only a few of them are relevant to the issue. Thus, the logs are basically "garbage information".


Rather than looking at a giant pool of unfiltered information, what you really need is highly distilled information that's specifically optimized for solving this performance issue. For instance you could ask the user for more information about precisely what they were doing, or use filters to get the logs for exactly the parts of the application they were dealing with, or look through the places where the server spent a very large amount of time. The more a piece of information has been made to help you, the more "blessed" it is, with the extreme end of blessedness being information that keeps surprising you in its usefulness.

It might be tempting to think you could use multivariate statistics like factor analysis to distill garbage information by identifying axes which give you unusually much information about the system. In my experience, that doesn't work well, and if you think about it for a bit, it becomes clear why: if the garbage information has a 50 000 : 1 ratio of garbage : blessed, then finding an axis which explains 10 variables worth of information still leaves you with a 5 000 : 1 ratio of garbage : blessed. The distillation you get with such techniques is simply not strong enough.[1][2]

A 50 000 : 1 ratio might sound insurmountable by any technique, but because strong evidence is common [LW · GW], it's actually pretty feasible; e.g. knowing which minute in a week an incident occurred already gets you about this strong of a filter.


While blessed information is actively helpful, and garbage information is essentially useless, there's also the third case, of information that leads you down the wrong road. If an incident is labelled as "everything is slow", then that may very well get it more highly prioritized through customer service, but if most things aren't slow but the engineer investigates as if it were, that ends up burning more engineer time than if it was labelled accurately. Actively misleading information could be called "cursed information".

Information doesn't have to be literally false in order for it to mislead. Often, people use information to infer the presence of adjacent latent variables outside of the literal meaning of that information. For instance, "the website is slow to load" might be taken to mean "the server is slow", which could be misleading if the real answer is "because I'm on a very slow network connection".

Cursed information doesn't just have the first-order harm caused by people believing it. It also has a second-order harm, as people develop filters so they don't end up believing cursed information. One such filter is verifying all the information you are given, which is costly. Another such filter is just ignoring most of what you are told, which loses one of the most effective means of learning information.


Blessed information can be expensive to produce, and cursed information can be hard to destroy and disincentivize. So one cannot expect all information to be blessed, nor expect no information to be cursed. But if you are dealing with information, especially if you are spreading information, it may still be good to ask yourself: is this blessed, garbage or cursed? If the first, great! If the last, maybe reconsider what you are doing.

The distinction between blessed, garbage and cursed information is value-laden, because it depends on what you are trying to do. However, I find that there is relatively little ambiguity in practice in-the-moment, as one is trying to solve some specific task.

The distinction between whether something is blessed or cursed becomes unambiguous because there is a relatively small set of people involved who have any influence on the task, and these people tend to have relatively clearly defined roles. Even when we have conflicting interests, we are part of a shared project, and the organization(s) that own this project have an interest in aligning our interests with each other.

This is obvious in the corporate setting that the engineer works in. Each of the people involved has a relatively small set of tasks that are efficient to work on, and each task has a relatively small set of solutions that are cheap to achieve. Because these sets are small, there's also commonly a small set of variables that contain essentially all the information relevant for solving those tasks [LW · GW], and due to noise, almost all other variables are irrelevant, i.e. garbage [LW · GW]. Of course, the logs exist for a reason; we expect some of them to be non-garbage with respect to some future tasks.

But it is also true (or can be made true) in many other scenarios. For instance, in personal relationships, the relationship partners are the main people who get impacted and have influence, so there arises a notion of whether information is blessed and cursed with respect to said relationship. If there is a conflict, then either person can take initiative to resolve said conflict.

  1. ^

    With one important caveat: in such methods, it is common to induce scale invariance, for instance by dividing by the standard deviation before doing PCA, or using probability-based methods to fit the factor model. If you don't introduce scale invariance, then the long-tailedness of the data will basically force the biggest things to dominate in the results. But for getting blessed information, that is Actually Good: it is equivalent to looking through the places where the server spent a lot of time. This kind of stops being multivariate, though, as then there's essentially only one variable that ends up driving the results.

  2. ^

    Once you do have a ton of blessed information, it can be helpful to apply multivariate methods to it to find components of it that are even more blessed. It just doesn't work on pure garbage. And if one does apply it in this way, one has to remember that the residuals are blessed too.

8 comments

Comments sorted by top scores.

comment by Dagon · 2024-04-18T20:16:24.096Z · LW(p) · GW(p)

These are probably useful categories in many cases, but I really don't like the labels.  Garbage is mildly annoying, as it implies that there's no useful signal, not just difficult-to-identify signal.  It's also putting the attribute on the wrong thing - it's not garbage data, it's data that's useful for other purposes than the one at hand.  "verbose" or "unfiltered" data, or just "irrelevant" data might be better.  

Blessed and cursed are much worse as descriptors.  In most cases there's nobody doing the blessing or cursing, and it focuses the mind on the perception/sanctity of the data, not the use of it.  "How do I bless this data" is a question that shows a misunderstanding of what is needed.  I'd call this "useful" or "relevant" data, and "misleading" or "wrongly-applied" data.

To repeat, though, the categories are useful - actively thinking about what you know, and what you could know, about data in a dataset, and how you could extract value for understanding the system, is a VERY important skill and habit.

Replies from: tailcalled
comment by tailcalled · 2024-04-19T09:18:53.431Z · LW(p) · GW(p)

It's also putting the attribute on the wrong thing - it's not garbage data, it's data that's useful for other purposes than the one at hand.

Mostly it's not useful for anything. Like the logs contains lots of different types of information, and all the different types of information are almost always useless for all purposes, but each type of information has a small number of purpose for which a very small fraction of that information is useful.

Blessed and cursed are much worse as descriptors. In most cases there's nobody doing the blessing or cursing, and it focuses the mind on the perception/sanctity of the data, not the use of it.

This is somewhat intentional. One thing one can do with information is give it to others who would not have seen it. Here one sometimes needs to be careful to preserve and highlight the blessed information and eliminate the cursed information.

comment by Gunnar_Zarncke · 2024-04-18T18:37:43.415Z · LW(p) · GW(p)

Examples of blessed information that I have seen in the context of logging:

  • Stacktraces logged by a library that elide all the superfluous parts of the stacktraces. 
  • A log message that says exactly what the problem is, why it is caused (e.g., which parameters lead to it), and where to find more information about it (ticket number, documentation page).
  • The presence of a Correlation IDs (also called Transaction ID, Request ID, Session ID, Trace ID).
    • What is a correlation ID? It is an ID that is created at the start of a request/session and available in all logs related to that request/session. See here or here, implementations here or here. There are even hierarchical correlation IDs 
    • Esp. useful: A correlation ID that is accessible from the client.
    • Even more useful: If there is a single place to search all the logs of a system for the ID.
  •  Aggregation of logs, such that only the first, ten's, 100s... of a log message is escalated. 
comment by gwern · 2024-04-19T17:06:04.854Z · LW(p) · GW(p)

It might be tempting to think you could use multivariate statistics like factor analysis to distill garbage information by identifying axes which give you unusually much information about the system. In my experience, that doesn't work well, and if you think about it for a bit, it becomes clear why: if the garbage information has a 50 000 : 1 ratio of garbage : blessed, then finding an axis which explains 10 variables worth of information still leaves you with a 5 000 : 1 ratio of garbage : blessed. The distillation you get with such techniques is simply not strong enough.[1][2]

That doesn't seem like it. In many contexts, a 10x saving is awesome and definitely a 'blessed' improvement if you can kill 90% of the noise in anything you have to work with. But you don't want to do that with logs. You can't distill information in advance of a bug (or anomaly, or attack) because a bug by definition is going to be breaking all of the past behavior & invariants governing normal behavior that any distillation was based on. If it didn't, it would usually be fixed already. ("We don't need to record variable X in the log, which would be wasteful accurst clutter, because X cannot change." NARRATOR: "X changed.") The logs are for the exceptions - which are precisely what any non-end-to-end lossy compression (factor analysis or otherwise) will correctly throw out information about to compress as residuals to ignore in favor of the 'signal'. Which is why the best debugging systems like time-travel debugging or the shiny new Antithesis work hard to de facto save everything.

Replies from: tailcalled
comment by tailcalled · 2024-04-19T18:25:26.211Z · LW(p) · GW(p)

I'd say "in many contexts" in practice refers to when you are already working with relatively blessed information. It's just that while most domains are overwhelmingly filled with garbage information (e.g. if you put up a camera at a random position on the earth, what it records will be ~useless), the fact that they are so filled with garbage means that we don't naturally think of them as being "real domains".

Basically, I don't mean that blessed information is some obscure thing that you wouldn't expect to encounter, I mean that people try to work with as much blessed information as possible. Logs were sort of a special case of being unusually-garbage.

You can't distill information in advance of a bug (or anomaly, or attack) because a bug by definition is going to be breaking all of the past behavior & invariants governing normal behavior that any distillation was based on.

Depends. If the system is very buggy, there's gonna be lots of bugs to distill from. Which bring us to the second part...

The logs are for the exceptions - which are precisely what any non-end-to-end lossy compression (factor analysis or otherwise) will correctly throw out information about to compress as residuals to ignore in favor of the 'signal'.

Even if lossy compression threw out the exceptions we were interested in as being noise, that would actually still be useful as a form of outlier detection. One could just zoom in on the biggest residuals and check what was going on there.

Issue is, the logs end up containing ~all the exceptions, including exceptional user behavior and exceptional user setups and exceptionally error-throwing non-buggy code, but the logs are only useful for bugs/attacks/etc. because the former behaviors are fine and should be supported.

comment by cheer Poasting (cheer-poasting) · 2024-04-22T17:48:50.491Z · LW(p) · GW(p)

I'm inclined to agree with other commenters: While the concepts presented in the article are very useful, the name "garbage" information is itself cursed  information, because if we tried to talk to someone about garbage information, they already have a very strong preconception that will be called up, which doesn't align with what you're trying to communicate.

You're using "garbage" to mean that the noise to signal ratio is fundamentally unusable. However when others think of "garbage" information, they think of something like a malfunctioning sensor, where all of the data collected is useless and should be thrown away. Instead, you mean that there is good data there, but it gets lost under a pile of irrelevant information. 

I would say the distinction is important to avoid incorrect intuitions. 

Replies from: tailcalled, tailcalled
comment by tailcalled · 2024-04-22T18:18:10.275Z · LW(p) · GW(p)

Maybe "barren information"?

comment by tailcalled · 2024-04-22T18:07:57.643Z · LW(p) · GW(p)

On the one hand, I do see your point that in some cases it's important not to make people think I'm referring to malfunctioning sensors. On the other hand, malfunctioning sensors would be an instance of the kind of thing I'm talking about, in the sense that information from a malfunctioning sensor is ~useless for real-world tasks (unless you don't realize it's malfunctioning, in which case it might be cursed).

I'll think about alternative terms that clarify this.