Posts
Comments
There's a minor error in the formula giving the cross entropy: you need a minus sign on the RHS so that it reads E[- log P[X|M_2] | M_2]
The preceding text is "Of course, we could be wrong about the distribution - we could use a code optimized for a model M2 which is different from the “true” model M1. In this case, the average number of bits used will be"
Certainly, you have pictures! Pictures are great!
I had no clue what SUVAT is and I know a relatively large amount of physics (advanced undergrad to grad level knowledge, currently an undergrad in college but with knowledge well above the curriculum). I feel a bit disgusted at the idea of someone memorizing those equations.
The first few letters are often used as parameters (e.g. p(x) = ax^2 + bx + c).
f is sometimes used for force density, e.g, in fluid mechanics (annoyingly, the wiki page on the Cauchy momentum equation uses f for the acceleration density caused by an external force).
Electrical engineers use j for the imaginary unit, because they will use i for current. I abhor this - why don't they just capitalize and use I for current?
Fancy L's are often used for the Lagrangian in analytical mechanics. The universe's path is the one that is a stationary point (derivative equals 0, so minimum/maximum/saddle point) of the integral of the Lagrangian (denoted S), and analytical mechanics only gets more beautiful from there (it's part of what got me into physics). Only mentioning this because you mentioned the Laplace transform.
m,l are often used for whole numbers when n is taken. So is k.
n with a hat is often used for the normal vector to some surface. Likewise A hat if you include the magnitude of the area.
P,Q,R,S,T are often used for points (like, in geometry). O is used for the origin/the center, though sometimes I see O just being another point.
u is often used for a velocity when v is taken.
Please don't use s for speed.
s is often an arc length for a path.
R is often another logical proposition.
z is often used for the z-score in statistics, that is, sigmas away from the mean assuming a normal distribution. Likewise t for the t-test, which uses fancy stuff to better estimate the standard deviation from the sample (and only noticeably differs from z-tests for small samples).
k is often used as a multiplicative coefficient, e.g. Hookes law (F=kx).
Mathematics often uses X for a space of some kind. There's also the convention that an upper case letter is a set from which the lower case letter comes from (e.g. take an example x in the space X of possible examples). When a bunch of sets are considered, one often uses some fancy version of the letter (Suppose we have an example-set A in the family of example-sets curlyA such that every example a in A is funny.)
I could make a long list of physical quantities with letter names. I could double the list by allowing greek letters (including ones used in mathematics But that wouldn't really be about the connotations of variable names, it would just be a list of things with their variable names.
For a fun puzzle, look into the Monty Hall problem. The usual explanation is bad. Use Bayes Law to figure out a good one. For the answer, along with some extra problems (e.g. The Monty Fall problem, where Monty slips on a banana peel and accidentally flips one of the levers, and The Monty Crawl problem, where our poor host now has to crawl, and thus will prefer to open the lowest number door as long as it doesn't contain the car), see https://probability.ca/jeff/writing/montyfall.pdf
I think you could've done better with integration by parts.
In physics, integration by parts is usually applied for a definite integral in which you can neglect the uv term. Thus, integration by parts reads: "The integral of udv = integral of -vdu, that is, you can trade what you differentiate in a product, as long as the functions in question have a small integral over the boundary".
Common examples are when you integrate over some big volume, as most physical quantities are very small far away from the stuff.
I also think the intuition behind Bayes rule as usually interpreted here on LW, that is, it provides the updating rule posterior odds = prior odds*likelihood ratio and thereby also provides a formalization of how good evidence is. As for the derivation from P(A|B) defined as equal to P(A and B)/P(B), I think this is best described by saying that P(A|B) is the probability of A once you know B, so you take the mass associated to the worlds where A is true once B is true and compare to your total mass, which is the mass associated to the worlds where B is true. The former is really just "mass of A and B", so you are done.
Now, P(A and B) = P(B)P(A|B), which I think of as "First, take probability B is true, then given that we are in this set of worlds, take the probability that A is true". Essentially translating from locating sets to probabilities.
From here, Bayes theorem is the simple fact that A and B = B and A. So P(B)P(A|B) = P(A and B) = P(A)P(B|A). If you draw a square with 4 rectangles where the first row is P(A), where the second row is P(-A), where the first column is P(B), and where the second is P(-B), and each rectangle represents a possibility like P(A and -B), then this equation just splits the rectangle P(A and B) into (rectangle compared to row) * row = (rectangle compared to column) * column. Divide by P(B) (that is, the row) to get Bayes law.
For the sine rule, I think it also helps to show that the fraction a/sin(a) is the diameter of the circumcircle. Wikipedia has good pictures.
For an extra math fact that totally doesn't need to be in the post, it is interesting that for spherical triangles, the law of sines just needs to be modified so that you take the sine of the lengths as well. In fact you can do similar in hyperbolic space (by using sinh), and there's a taylor series form involving the curvature for a version of sine that makes the law of sines still true in any constant curvature space. (you can find this on the same wiki page).
Great explanation! I was linked here by someone after wondering why linear regression was asymmetric. While a quick google and a chatGPT could tell me that they are minimizing different things, the advantage of your post is the:
- Pictures
- Explanation of why minimizing different things will get you slopes differing in this specific way (that is, far outliers are punished heavily)
- A connection to PCA that is nice and simply explained.
Thanks!
For a treatment besides Tamiflu: https://en.wikipedia.org/wiki/2009_swine_flu_pandemic cites the who and CDC stating that H1N1 developed resistance to Tamiflu but not Relenza
In December 2012, the World Health Organization (WHO) reported 314 samples of the 2009 pandemic H1N1 flu tested worldwide have shown resistance to oseltamivir (Tamiflu).[172] It is not totally unexpected as 99.6% of the seasonal H1N1 flu strains tested have developed resistance to oseltamivir.[173] No circulating flu has yet shown any resistance to zanamivir (Relenza), the other available anti-viral.[174]
The treatment plan at the time included Tamiflu/Relenza/experimental third thing (FDA approved for flu treatment in adults since 2014)
If oseltamivir (Tamiflu) is unavailable or cannot be used, zanamivir (Relenza) is recommended as a substitute.[50][168] Peramivir is an experimental antiviral drug approved for hospitalised patients in cases where the other available methods of treatment are ineffective or unavailable.[169]
I think 2009 H1N1 is a good example of how things could go, as it happened in the modern day.
I often find illustrative explanations like these either obvious or useless. But this was amazing! Those venn diagrams really are an extremely simple and intuitive and beautiful way to see Shapley values!
I think it makes sense to include the podcasts that aren't currently updating - for example, Rationally Speaking's old episodes. Affix needs a new link or an archived version, as the episodes are not listed at the current link, and I'm too lazy to track down the episodes.
I basically agree. The following is speculation/playing with an idea, not something I think is likely true.
Imagine it's the future. It becomes clear that a lab could easily create mirror bacteria if they wanted to, or even deliberately create mirror pathogens. It may even get to the point where countries explicitly threaten to do this.
At that point, it might be a good idea to develop mirror life for the purposes of developing countermeasures.
I'm not that familiar with how modern vaccines and drugs are made. Can a vaccine be made without involving a living cell? What about an antibiotic?
There's The Bayesian Conspiracy's discord server. No need to listen to the podcast or to related podcasts to participate in discussion.
They don't need to solve the whole Halting Problem, for the same reason you don't need to contradict Rice's theorem if you had some proof (which I take as an axiom for the sake of the hypothetical) that the predictor was in fact perfect and that it is utility maximizing. Also, we can just try saying that there is a high probability that they will do this. Furthermore, you can imagine a restricted subset of Turing machines for which the Halting problem is computable. But also the only computers that exist in reality are really finite state machines.
Well, the perplexing situation doesn't actually happen if the predictors are good enough, because they'll predict you both won't update and won't take the bet. Thus you'll never have been approached in the first place.
There's 148.94 million km^2 of Earth land area, not ~500 million as you claim (which is about the entire surface area of the earth).
This is important because in the kinetic destruction section, you found that your lower bound on human habitation area is 5x larger than the total possible kinetic destruction area. However, your area number is 3.3x too big, since only 30% of the area is land area. Thus your lower bound is only 1.5x larger than the possible destruction area, which makes the bound weaker - it's pretty plausible that nukes might get 1.5x more destructive or that the destruction would take out enough of humanity to be irrecoverable, and anyways is important if you care about the scale of nonextinction risks.
(I caught this error by knowing that the circumference of the Earth is about 40k km, from which you can quickly estimate the surface area).
I assume your proposal requires trades be public, so that someone exploiting a proof to get free money ends up revealing the proof to others.
Until computerized theorem proving vastly improves, this system will only prove statements after the first proof is accepted.
This is a very good collection and distillation of rational college advice. However, there is very little advice from you, about your year, advice that's the title made me expect.
I mention this because sometimes in rationalist contexts, I've felt a pressure to not talk about models that are missing Gears. I don't like that. I think that Gears-ness is a really super important thing to track, and I think there's something epistemically dangerous about failing to notice a lack of Gears. Clearly noting, at least in your own mind, where there are and aren't Gears seems really good to me. But I think there are other capacities that are also important when we're trying to get epistemology right
A good way to notice the lack of gears is to explicitly label the non-gearsy steps.
- My high school calculus teacher would draw a big cloud with the word "POOF!" while saying "Woogie Woogie Boogie!" when there was an unproven but vital statement (since high school calculus doesn't rigorously prove many of the calculus notions). Ever since, whenever I explain math to someone I always make very clear what statements I don't feel like going through the proofs of or will prove later ("Magic"), or who's proofs I don't know ("Dark Magic"), as opposed to those that I'll happily explain.
- Similarly, emergent phenomena should be called "Magic" (Though, this only works after internalizing that mysterious answers aren't answers. It's just "Gears work in mysterious ways", but in an absurd enough way to make it clear that the problem is with your understanding).
They aren't. brook is saying that picking locks might damage them, and damaging locks not in use at worst means you have to throw away a padlock, whereas damaging locks in use might mean you can't open your front door.
Woah, just on a watch-like device! How far along is this technology?
If this has been a thing for 30 years, why is the hardware best-in-class? Also, is there a presentation that is more impressive/innovative but perhaps less theatrical?
I think it is best to try to edit it anyway. I think if you have already seen the post, it does not take that long to see that there isn't a line added that is trolly. Also, you should do it for the sake of mathematical accuracy.
Hey! There are at least 3 channels where TBC-and-related-podcast-content is discussed!
(though, if you are only talking about the TBC podcast and not other podcasts hosted by the same people and that are plugged in the same places, then yes, there is only one channel).
I cannot speak for Scott, but I can speculate. I am quite sure a rock doesn't have qualia, because it doesn't have any processing center, gives no sign of having any utility to maximize, and has no reaction to stimuli. It most probably doesn't have a mind.