Exercises in Comprehensive Information Gathering

johnswentworth

Exercises in Comprehensive Information Gathering

post by johnswentworth · 2020-02-15T17:27:19.753Z · LW · GW · 18 comments

18 comments

Looking back, several of the most durably-valuable exercises I’ve done over the years have a general theme of comprehensive information gathering.

The most recent example involves capital investments. Economists talk about “capital goods” as physical stuff - machines, buildings, etc. But in practice, savings and investments are passed through banks and ETFs, bundled and securitized, involve debts and shares of companies which own debts and shares of other companies, and so forth… where does all that capital end up? To get an intuitive sense, I pulled up fundamental data on about 7000 US publicly-traded companies in quantopian, sorted them by amount of non-financial assets, and found that the top 100 accounted for about 50% of the non-financial assets of the whole set. Then, I looked at annual reports for each of those 100 companies, to see what capital assets they had. I googled around for pictures and maps of where those assets were located, and read up on anything I hadn’t heard of before. What’s a “central office”, where are they, what do they look like, and why does AT&T have $90B worth of them? What are the major US oil basins, where are the wells, and what all goes into drilling them? What are the technical differences between traditional phone, cable, satellite, and cell networks, and how do those technical differences impact the capital requirements of each? Who runs power plants and the power grid in various parts of the country? What are the major US railroads, and where are they? Why did GE own so many airplanes? These are the kinds of questions which come up when you want to know what “capital goods” actually consist of, in the real world.

Another interesting exercise: I read through five years of Nature archives, reading all the titles and any abstracts which sounded novel/interesting. I didn’t google everything I hadn’t heard of; instead, I’d wait until the same acronym popped up a few times before looking it up. This took maybe a week of evenings after work. By the end, I could at least place the large majority of articles in context. Now, when I see a title full of jargon in a field I haven’t studied, like “Novel tau filament fold in corticobasal degeneration”, I usually at least understand enough to guess at what it’s relevant to (in this case: neurodegenerative disease involving protein aggregates, probably Alzheimers?). I can generally follow conversations in a bunch of different fields - not necessarily between specialists in the same sub-sub-field, but at least the level of a typical conference talk, and when I meet new people I can ask not-too-embarrassing questions about what they’re researching.

Going back further, if you’re in college, I strongly recommend reading your entire course catalogue, googling anything you’ve never heard of at all, and marking anything that sounds potentially interesting. This seems really obvious; it only takes a few hours, and something something a pile of value sitting on a silver platter right in front of you. (Note: I went to a small STEM school; if you’re at a big school with a bajillion courses or a school with poor STEM coverage or not at college at all, consider reading an MIT/Caltech course catalogue instead, to get a feel for what all is out there.) You never know what surprising and interesting topics might be hiding in there - microfluidics, underactuated robotics, recursive macroeconomics, systems biology, synthetic biology, origami algorithms, computational photography, evo-devo, procedural graphics, and on and on.

These sort of exercises provide value in a few ways:

They reveal unknown unknowns - things you didn’t even realize were missing from your picture of the world.
You can’t make a map of a city by sitting in your room with the shades drawn; exercises like these force you to look at large slices of the world.
Knowledge within fields tends to have decreasing marginal returns - your first physics or CS class will teach you much more than your eighth. These exercises give a broad, brief glance at many areas where you probably haven’t reached decreasing marginal returns yet.
You can get a very rough big-picture sense of how much effort other people are investing in various areas - e.g. where most capital investments go or where most research effort goes - which is useful for understanding the world in general.
While these exercises don’t avoid biased selection of information altogether, they’re probably different biases from what you run into naturally, and they’re systematic enough that we can guess at what biases are likely to be present.
They’re a lot of fun, if you have a curious streak.

Most importantly: I’ve found each of these exercises to have lasting, long-term value in exchange for a one-time investment of effort.

Other exercises which are on my to-do list, but which I haven’t done yet:

Read the entire CIA world factbook; you can get a paper copy for $11 on Amazon.
Go through all of the (known) functions of genes in a minimal organism.

I’m curious to hear other suggestions for exercises along these lines.

18 comments

Comments sorted by top scores.

comment by lionhearted (Sebastian Marshall) (lionhearted) · 2020-02-16T14:22:57.319Z · LW(p) · GW(p)

Phenomenal post.

I've done similarly. It's actually remarkable how little time it takes to overview the history of breakthroughs in a sub-field, or all the political and military leaders of an obscure country during a particular era, or the history of laws and regulations of a a particular field.

Question to muse over —

Given how inexpensive and useful it is to do this, why do so few people it?

Replies from: johnswentworth, Viliam, jmh

↑ comment by johnswentworth · 2020-02-16T18:04:34.071Z · LW(p) · GW(p)

Given how inexpensive and useful it is to do this, why do so few people it?

I actually considered putting a paragraph on this in the OP. I think we're currently in a transitional state - prior to the internet, it would have been far more expensive to conduct this sort of exercise. People haven't had much time to figure out how to get lots of value out of the internet, and this is one example which I expect will become more popular over time.

Replies from: lionhearted

↑ comment by lionhearted (Sebastian Marshall) (lionhearted) · 2020-02-19T12:54:17.883Z · LW(p) · GW(p)

Makes sense. This is probably worth a top level post? —

People haven't had much time to figure out how to get lots of value out of the internet, and this is one example which I expect will become more popular over time.

Sounds obvious when put like that, but I think — as you implied — a lot of people haven't thought about it yet.

↑ comment by Viliam · 2020-02-17T01:07:21.827Z · LW(p) · GW(p)

Given how inexpensive and useful it is to do this, why do so few people it?

Because there are so many possible topics, that even if each of them takes relatively little time, together they would take a lot?

For example, in your example, you mentioned " an obscure country" and "a particular era", and also a focus on politics and military (as opposed to science, or art, or sport). Okay, maybe you can do it in a week, or in an afternoon. But why that country, and why that era? How much it would cost to get a comparable knowledge of all countries and, uhm, let's say the entire 20th century?

Replies from: lionhearted

↑ comment by lionhearted (Sebastian Marshall) (lionhearted) · 2020-02-19T12:53:09.217Z · LW(p) · GW(p)

Ahh, great question.

I think eventually patterns start to emerge — so eventually, you start reading about federalization of Chinese Law and you're "ah, this is like German Unification with a few key differences."

While you do find rare outliers — the Ottoman legal system continues to fascinate me ( https://en.wikipedia.org/wiki/Millet_(Ottoman_Empire) ) — you eventually find that there's only a few major ways that legal systems have been formulated at larger modern country scales than earlier local scales.

Science, art, and sport are also ones I've delved into incidentally. And there's also some patterns there.

↑ comment by jmh · 2020-02-16T16:04:26.718Z · LW(p) · GW(p)

What population did you have in your mind when you said "so few"?

Depending on what your answer is there, I think a large amount might be explained by - many are not really "intellectuals" or perhaps most are more specialist than "infovores".

comment by romeostevensit · 2020-02-16T00:03:11.857Z · LW(p) · GW(p)

Went through all the archives of publications of high IQ societies

Went though a significant fraction of the publications of the forecasting community and interviewed 4 superforecasters

Skimmed and spot checked knowledge in multiple knowledge representation textbooks which gave a much better overview of historical practical AI efforts

Went through most of the supporting research for micronutrient intake guidelines (mealsquares)

Reviewed the (fairly sparse) megascale project literature

Reading the bitcoin whitepaper turned out to be more comprehensive than 99+% of people (it's 9 pages)

Recursively applying search expertise to itself was fun and worthwhile

Replies from: parker-conley

↑ comment by Parker Conley (parker-conley) · 2023-09-23T21:35:20.750Z · LW(p) · GW(p)

Went through all the archives of publications of high IQ societies

Was this project worthwhile? Would you recommend it to others? If yes, what were the key takeaways? Considering doing this myself.

Replies from: romeostevensit

↑ comment by romeostevensit · 2023-09-24T16:58:06.120Z · LW(p) · GW(p)

I didn't find much of interest. Adverse selection on members of these orgs is my guess.

comment by Raemon · 2020-02-19T21:20:32.231Z · LW(p) · GW(p)

Obligatory link to Fact Posts: How and Why

comment by Raemon · 2020-02-20T22:32:11.418Z · LW(p) · GW(p)

Curated.

I like the idea of posts that suggest concrete exercises, and I think the sort of project John is pointing at here is something I hope LessWrong folk to do more often.

I also think it lends itself well as a self-reinforcing concept on LessWrong in particular (i.e. lots of rationality exercise you might just do quietly by yourself, but the sort of review John suggests here seems like it'd often lead to good new blogposts that'd be useful for others to learn from, as well as reminding people about the possibility of doing this exercise for themselves. Although obviously if you just end up doing it for yourself that's quite valuable as well).

comment by MondSemmel · 2022-01-15T20:49:49.616Z · LW(p) · GW(p)

This post exemplifies the rationalist virtues of curiosity and scholarship. This year's review is not meant to judge whether posts should be published in a book, but I do wonder how a LW project to create a workbook or rationality curriculum (including problem sets) would look like. I imagine posts like this one would feature prominently in either case.

So I do think such posts deserve recognition, though in what form I am less sure.

On an entirely unrelated note, it makes me sad that the Internet is afflicted with link rot and impermanence, and that LW isn't immune to it. The author used the service Quantopian in February 2020, and by November it had shut down. Another link didn't work ("where are the wells"), but that was due to some domain weirdness. And the CIA had entirely changed its link to the World Factbook, without setting up link forwarding from the old URL.

comment by adamShimi · 2021-06-12T12:19:13.495Z · LW(p) · GW(p)

I must have read this post when you first published it, but only now does it strike me as answering perfectly one of my need for deconfusion: building a reasonable map of vast territories of knowledge, to have more tools in mind when deconfusing. Especially with maths, I've been having the problem of always changing my focus, and never finishing textbooks.

But this is simply a Comprehensive Information Gathering exercice! The right way to go about it is to go through the wikipedia page on areas of mathematics; look at each sub area in turn; and get a grip on the history, the objects studied, and the fundamental theorems.

Honestly, this plan is the first one I imagined for this issue that sounds both fun and likely to work as I intended. Thanks so much!

comment by kerspoon · 2020-02-22T16:19:44.806Z · LW(p) · GW(p)

I'm also a big fan of this, I have got huge mileage out of creating a single page timeline of 1600 - 1800. I've got a few books lined up to create 1800-2000 and 1400-1800 but they are unfortunately low on my priority list at the moment. I would highly recommend it - what was happening in the world when the first academics journals were published. And 16-1800 is such a fascinating time, the scientific and industrial revolution, the age of enlightenment, the colonial empires and world trade.

The other one I have found a lot of value in is reading through cochrane/cambell reviews (high quality meta studies with readable summaries). There is a summary list of some useful ones here (I can't remember who I got it from though, but thanks whoever you are!) https://docs.google.com/spreadsheets/d/19D8JUgf95t-f-oUAHqh8Nn2G90KO3gUiua9yAjBSSqI/edit?usp=sharing

comment by leggi · 2020-02-20T12:21:19.942Z · LW(p) · GW(p)

Something that might interest you is embryology. Following the division and migration of cells to start forming a body is a fascinatingly complex insight into life although it's a long time since I studied embryology (a nightmare subject to learn [and then mostly forget] in 2-D) so I don't know what resources are out there these days.

If anyone was up for the exercise of creating a globe-base graphic showing the when and where of human history I would love to see it. From the known locations of predecessors to modern humans, the rise and fall of societies, civilisations, conflicts, border changes etc etc. A world-wide view of our collective history - quite a big project for comprehensive information gathering!

comment by corruptedCatapillar · 2021-12-15T23:00:19.995Z · LW(p) · GW(p)

I think I've done similar explorations as you've mentioned and have been curious to develop a framework of how to go about this more generally; specifically I get lost in your first example's preservation of structure (I think this is shallow vs. deep dives, could be wrong about the terminology.)

To my understanding, the overall objective is "to get a feel for what is out there." After running through it a couple times, I think I see a general pattern of...

Getting in front of you "what is out there"
- 100 companies who had 50% of the non-financial assets (NFA)
- pulling up the nature articles
- locating the college catalogue
"to get a feel," looking at the things that you aren't familiar with, interested in
- asking questions about the nature of the NFA that don't initially make sense
- reading all the titles and any abstracts which sounded novel/interesting

The only difference with the 1st example (capital assets) is that this iterated a bit; top 100 companies had 50% of the NFA --> annual reports of companies --> showing their capital assets (this is the completion of "getting it in front of you", now you can start "getting a feel.")

Is this correct or am I way off? This CIG seems like it'd be helpful for what I imagine your sequence "Gears which turn the world" would've used as far as research methods go. Either way, this seems super exciting, thanks for the post!

Replies from: johnswentworth

↑ comment by johnswentworth · 2021-12-15T23:42:43.347Z · LW(p) · GW(p)

This sounds right.

Replies from: corruptedCatapillar

↑ comment by corruptedCatapillar · 2021-12-16T00:26:32.759Z · LW(p) · GW(p)

Appreciate the reply, thank you!

Exercises in Comprehensive Information Gathering

Contents

18 comments