This is The Best Textbooks on Every Subject [LW · GW], but for visualizations. I greatly adore good visualizations, chiefly because there are so many visualizations that are so terrible. I have seen many such tools mentioned here, but always in passing.
The actual motivator is re-reading the posts Exercises in Comprehensive Information Gathering [LW · GW] and Fact Posts: How and Why [LW · GW]. While there is no substitute for the wrench-time they recommend, I think these kinds of tools make the process more efficient and lend themselves to insights which are difficult to acquire through reading alone; in my experience scale and distance are both easier to grasp in a visual medium, for example.
Also there is a non-trivial sense in which they are beautiful in their own right. If we are able to compare many examples, people in the community might even be able to help advance the art.
One nomination per comment; please include an explanation of why you nominated it. Contra the best textbooks list we won't require comparison with other visualizations because there are so few authoritative ones.
You should create a github.com repo ala the Awesome lists (e.g., https://github.com/hellerve/programming-talks). Lesswrong does not lend itself well to these collaborative community resources, as evidenced by the death of The Best Textbooks on Every Subject [LW · GW].
This strikes me as likely getting it backwards. The Best Textbooks on Every Subject thread was active for vastly longer than almost any Github repo of this style that I've ever seen. It petered out after a few years, but that's a really long time, and mostly because Luke was no longer actively maintaining it.
I expect doing this on Github to basically cause an immediate failure of this. This isn't because Github repos like this can't work, but it's because Github repo's like this don't end up being added to via Github, but via separate channels, and are usually curated by one really engaged person who invests a lot of time into it. They are not a good template for crowdsourcing submissions.
You can easily edit any documents on GitHub itself; It’ll automatically create a pull request for you. Github also has good features for having multiple maintainers, handing off ownership, and even forking in the worst case. You can even find active forks of any repo. (I have an extension that displays this information on the main repo page, called Lovely Forks.)
Github allows better organizatio of the information (e.g., in the textbook example, we could have had a separate directory for each subject, and a separate file in those for each contributor’s opinions), as well.
Now, I have no doubt that starting any such community effort is hard, and maybe using Lesswrong has a lower barrier to entry, but in the long run, a Github repo is more viable. The best case scenario is for the author to collate the comments on Lesswrong into the git repo, and provide links to tutorials on using Github’s web interface for new contributors.
Some meta thoughts, since I've thought a lot about what made the best-textbooks thread work:
I think the inclusion criteria for the best textbook thread were really important. Having a relatively objective inclusion criteria made people more comfortable posting, and made using it as a trusted source a good bit better.
I think having the top-level post update with new additions was really valuable, and caused the post to be a good long-term reference. A github repo might work, but I expect the trivial inconvenience of clicking through the link to make it a lot less useful (you want people to come back to the place where they can submit new visualizations if you want the thing to keep getting new submissions).
Limiting it to one submission per comment was really useful because it allowed for fine-grained upvoting and downvoting of various suggestions, and made navigating the thread a lot easier. I would also recommend doing that.
The inclusion criteria were easy to set for textbooks; textbooks are over a hundred years old as a format, and there are always multiple options for any not-brand-new subject. By contrast most of the really impressive visualizations of huge datasets are younger than 10 years, and it is mostly a question of is there a visualization or not, rather than being able to select among several. That being said, I did consider whether to strive for an authoritative source, but I reason I can do the legwork up front to separate out hobbyist efforts. However, if everyone is more likely to use the list if it only contains authoritative sources (or endorsed by authoritative sources), then I would reconsider and set that as a criteria.
I agree with this directly, particularly the trivial inconvenience point. But I am comfortable doing the legwork on both for now, and we can see which proves the more popular/useful.
This is an excellent point; I will modify the post to include it.
Here's a visualisation that goes along with Euclid's elements
This was one of many from an article on "The Empirical MetaMathematics of Euclid and Beyond". It is a long essay on the overarching structure of Euclid's elements and verifies some claims made about Euclid's Elements e.g. the proofs were ordered in nearly the most parsimonious way possible. It also finds the most difficult theorems in each book, the greatest possible reductions in proof length, and hints that the network of theorem dependancy has a local 2-d structure. Highly recommend the article.
This isn’t a textbook, but Dataclysm by Christian Rudder was a major inspiration to me when I was new to data analysis. The book is like a long data analysis project around dating on OKCupid (Rudder founded the site), and has a lot of good graphs made just for the book. Unlike some of the popular examples made famous by e.g. Tufte, the graphs in Dataclysm are of the type an analyst in 2020 might typically make in their day-to-day work. Lots of scatter plots and bar plots, but created thoughtfully enough to really be something. Rarely in this book did I think “ah, beautiful” - much more often, I thought “ah, yup, I see the relationship he’s saying exists.”
The version of Dataclysm that I read had red font in addition to black font, and I found it was quite striking as a way to do emphasis. When I started writing my personal journal, I adopted red and black felt-tipped pens because I enjoyed the font colours from the book.
Definetely not a subject, but I'd say that the visualisation of Wolfram's theory of everything is excellent. Of course there are problems with his theory of everything, like the fact that he hasn't actually proved his claims that it generates GR field equations or replicates QM. Or shown that his theory evades the critical objection Scott Aaronson raised. but as a visualisation:
It is aesthetically pleasing
Compactly contains the basic ideas of his T.o.E.
Ties the basic concepts together to see how they could generate a theory of physics
Question: should we include here, or make a separate location for, data visualization tools? By tools I mean the methods by which to make visualizations and/or maybe guidance for how.
This is clearly related, but it seems like a separate problem. It might be much more useful to the people in the community who maintain their own blog or have related professional obligations, which is a more distinct subset.