samshap's Shortform 2021-03-12T07:53:08.778Z


Comment by samshap on How should dance venues best protect the drinks of attendees? · 2021-09-22T14:56:11.432Z · LW · GW

Depends on the environment. My assumption is that the venue is sufficiently crowded that the tamperer would never be alone with the drink, and the main protection is their risk of being spotted.

A tamper proof solution would likely be far more costly to implement.

Comment by samshap on How should dance venues best protect the drinks of attendees? · 2021-09-20T20:07:10.789Z · LW · GW

Lids and straws. Presumably this would make slipping a drug in way more obvious.

Comment by samshap on Bayeswatch 7: Wildfire · 2021-09-08T13:06:27.729Z · LW · GW

"Miriam placed poker her hand against" should be "Miriam placed her hand" or "poked her hand"

Comment by samshap on What made the UK COVID-19 case count drop? · 2021-08-03T18:06:41.006Z · LW · GW

I think I agree. I hadn't realized the UK vaccination rates were so high. In that case I'll lean towards the pockets of unvaccinated reaching herd immunity + shorter incubation period hypothesis.

Comment by samshap on What made the UK COVID-19 case count drop? · 2021-08-02T16:45:20.588Z · LW · GW

I agree that this seems to explain it, but it raises a new question: how did the antibody rate get so high? Is it possible that part of Delta's contagiousness is that it has a lot more carriers who don't get sick?

Comment by samshap on Delta variant: we should probably be re-masking · 2021-07-26T18:02:41.390Z · LW · GW

Good point! I'll edit my fermi analysis to reflect that.

Comment by samshap on Delta variant: we should probably be re-masking · 2021-07-25T02:57:10.075Z · LW · GW

Even in a scenario where all unvaccinated people were infected with covid, I would expect none of the Georgetown undergraduates to die from covid or get covid longer than 12 weeks.

Here's my fermi analysis:

  • in your 20s, covid CFR is .0001, compared to .01 for population as a whole.
  • covid longer than 12 weeks is .03 for covid population as a whole.
  • assume really long covid scales similarly to death and hospitalization
  • mRNA reduces these both by .9.

That gives us .03 x .01 x .1, for a case really long covid rate of .00003. .00003 x 6532 = .2 really long covid .00001 x 6532 = .07 deaths

And given that you are primarily interacting with other unvaccinated, young individuals, you are less likely to be infected than the average vaccinated person. So the real number is probably less than .1 person getting covid beyond 12 weeks.

Let me know if you see errors in my reasoning.

Comment by samshap on Delta variant: we should probably be re-masking · 2021-07-25T01:57:16.623Z · LW · GW

He recommends that for communities, which presumably include significant numbers of unvaccinated folks. Which, if targeted to N95 or better masks, and actually enforced, could have substantial effect!

But having members of the least infectious subpopulation voluntarily mask is pretty much useless.

As to your second point, there is strong evidence that is not the case: Vaccinated individuals who get infected have substantially lower viral loads, and thus are substantially less contagious.

Comment by samshap on Delta variant: we should probably be re-masking · 2021-07-24T05:12:41.481Z · LW · GW

You reach the opposite conclusion from Tomas Pueyo (who seems to be your primary reference):

"If you’re vaccinated, you’re mostly safe, especially with mRNA vaccines. Keep your guard up for now, avoid events that might become super-spreaders, but you don’t need to worry much more than that."

Checking your math, I think your biggest error is equating long covid (at least one symptom still present after 28 days) with lifelong CFS. The vast majority seem to clear up in the next 8 weeks:

I believe the 64% reduction in symptomatic infections is an outlier (compare with the UK data, e.g.), and if you've had an mRNA vaccine the number is much higher.

Finally, not accounting for age in your long covid statistics is a mistake. Young people are making up a large percentage of the infected because they are disproportionally unvaccinated. Those young and vaccinated are quite well protected from severe infection. And while some long covid comes from mild cases, it's highly correlated with severe cases.

Comment by samshap on Agency and the unreliable autonomous car · 2021-07-09T13:54:21.546Z · LW · GW

Second, the way that "IF .. THEN" is defined in propositional or first order seems not to capture quite what we mean by those words in ordinary language. I think this is part of what you are pointing out.


I feel like the confusion between propositional logic and ordinary language is the only reason Lob's theorem is even being discussed in the first place. The car's programmers used IF X THEN Y to represent the statement "If X, then Y happens", which means something quite different. Other than the incidental similarity of these statements in the English language, why is this more relevant than any other programming error?

Comment by samshap on Am I anti-social if I get vaccinated now? · 2021-06-12T20:33:17.810Z · LW · GW

Fair. Since it's been better answered elsewhere, I withdrew the comment.

Comment by samshap on Am I anti-social if I get vaccinated now? · 2021-06-11T16:48:03.382Z · LW · GW

No. Getting vaccinated is prosocial. Do it ASAP.

In addition to what Willa said, even if the doses you don't take were magically redistributed to a poor country, it might not prevent any more infections than you getting a dose. Many poor countries have been able to control the infection well. And just because Switzerland has things under control now, doesn't mean that will be the case forever (see e.g. the Delta variant).

Comment by samshap on Covid 5/6: Vaccine Patent Suspension · 2021-05-06T23:56:05.581Z · LW · GW

You're reading the allahpundit/nytimes chart incorrectly (which is really the times' fault - it's presentation is terribly misleading, deceptive even).

It may look like it's saying what percentage of unvaccinated people went to a restaurant, but it's actually showing what percentage of people who went to a restaurant are unvaccinated. You can tell because the percentages across each column add to 100%.

Reading it that way means it's somewhat good news. Relative to the unvaccinated, fully vaccinated people are more likely to engage in indoor meetings than outdoor meetings.

Comment by samshap on samshap's Shortform · 2021-04-19T20:10:44.917Z · LW · GW

Is this a failure of inner or outer alignment?

Comment by samshap on Predictive Coding has been Unified with Backpropagation · 2021-04-05T14:53:03.160Z · LW · GW

Incorrect. Perceptrons are a low fidelity (but still incredibly useful!) rate-encoded model of individual neurons.

Comment by samshap on Predictive Coding has been Unified with Backpropagation · 2021-04-05T14:40:07.675Z · LW · GW

Kind of. Neuromorphics don't buy you too much benefit for generic feedforward networks, but they dramatically reduce the expenses of convergence. Since the 100x in this paper derives from iterating until the network converges, a neuromorphics implementation (say on Loihi) would directly eliminate that cost.

Comment by samshap on Predictive Coding has been Unified with Backpropagation · 2021-04-05T14:33:52.767Z · LW · GW

TLDR for this paper: There is a separate set of 'error' neurons that communicate backwards. Their values converge on the appropriate back propagation terms.

A large error at the top levels corresponds to 'surprise', while a large error at the lower levels corresponds more to the 'override'.

Comment by samshap on Predictive Coding has been Unified with Backpropagation · 2021-04-05T14:07:54.639Z · LW · GW

I think that's premature. This is just one (digital, synchronous) implementation of one model of BNN that can be shown to converge on the same result as backprop. In a neuromorphic implementation of this circuit, the convergence would occur on the same time scale as the forward propagation.

Comment by samshap on Predictive Coding has been Unified with Backpropagation · 2021-04-03T13:01:28.170Z · LW · GW

Right side of equation 2. Also the v update step in algorithm 1 should have a negative sign (the text version earlier on the same page has it right).

Comment by samshap on Predictive Coding has been Unified with Backpropagation · 2021-04-03T03:28:51.439Z · LW · GW

Thanks for sharing!

Two comments:

  • There seem to be a couple of sign errors in the manuscript. (Probably worth reaching out to the authors directly)
  • Their predictive coding algorithm holds the vhat values fixed during convergence, which actually implies a somewhat different network topology than the more traditional one shown in your figure.
Comment by samshap on samshap's Shortform · 2021-03-15T05:44:08.676Z · LW · GW

Do you have some source for saying the log scoring rule should only be used when no anthropics are involved? Without that, what does it even mean to have a well-calibrated belief?

(BTW, there are other nice features of using the log-scoring rule, such as rewarding models that minimize their cross-entropy with the territory).

Comment by samshap on samshap's Shortform · 2021-03-12T14:47:38.606Z · LW · GW

My argument is that the log scoring rule is not just a "given way of measuring outcomes". A belief that maximizes E(log(p)) is the definition of a proper Bayesian belief. There's no appeal to consequence other than "SB's beliefs are well calibrated".

Comment by samshap on samshap's Shortform · 2021-03-12T07:53:08.995Z · LW · GW

Redissolving sleeping beauty (and maybe solving it entirely)

[epistemic status - I'm new to thinking about anthropics, but I don't see any obvious flaws]

If a tree falls on sleeping beauty famously claims to have dissolved the Sleeping Beauty problem - that SB's correct answer just depended on what the reward structure for her answers, and that her actual credance didn't matter.

Several lesswrongers seem unsatisfied with that answer - understandably, given a longstanding commitment to epistemics and Bayesianism!

I would argue that ata did some key work in answering the problem from a purely epistemic perspective.

Recall the question SB is to be asked upon waking:

Each interview consists of one question, “What is your credence now for the proposition that our coin landed heads?”

 And one of the bets ata formulated:

Each interview consists of one question, “What is your credence now for the proposition that our coin landed heads?”, and the answer given will be scored according to a logarithmic scoring rule, with the aggregate result corresponding to the number of utilons (converted to dollars, let’s say) she will be penalized after the experiment.

These questions are actually equivalent! A properly calibrated belief is one that is optimal w.r.t to the logarithmic scoring rule.

ata goes on to show that the answer to that question is 1/3. This result, I think, is actually contingent on the meaning of 'aggregate'. If 'aggregate' just means 'sum over all predictions ever', then ata's math checks out, the thirders are right, and the problem is solved.

However, given the premise of SB - in case of tails, she forgets everything that happened on Monday - you could argue for 'aggregate' meaning 'sum over all predictions she remembers making', in which case the correct answer is one half.  Or if we include the log score for predictions that she was told she made, (say because the interviewers wrote it down and told her afterwards), then the answer becomes 1/3 again!

So the SB paradox boils down to what you, as an epistemic rationalist, consider the correct way to aggregate the entropy of predictions!

The 'sum over all predictions' seems best to me (and thus I suppose I lean to the 1/3 answer), but I don't have a definitive reason as to why.

Comment by samshap on Defending the non-central fallacy · 2021-03-12T05:31:08.441Z · LW · GW

Suppose that you want to move to Hawaii because it's so beautiful, but you know (because you saw something on the internet) that upon arrival, someone will rob you. If knowing this information, you still move to Hawaii, does this mean that you are consenting to being robbed? Even if when you actually get to Hawaii, you make sure to explain to every potential robber that you really really don't want to be robbed?

Your argument here is both circular, and committing the noncentral fallacy!

To recap:

In a debate with rohimshah over whether taxation can be consensual (and therefore theft),your argument reads:

  • Taxation is analogous to robbery
  • Robbery (even robbery that predictably occurs when I consume a good or service) is not consensual
  • Therefore, taxation (even taxation that predictably occurs when I consume a good or service) is not consensual
  • Therefore taxation is theft

I won't ding your OP for assuming that taxation is nonconsensual, since you were merely responding to Scott's arguments that had already conceded that point.

However, to argue that all taxes are always nonconsensual is clearly absurd.

Many taxes (especially local ones) are nearly identical to fees that private actors charge under similar terms (e.g. property taxes are equivalent to HOA fees and rents). Not to mention plenty of times when people explicitly consent to taxation!

If you want to strengthen your argument, limit it to: 'nonconsensual taxation is theft'.

Comment by samshap on We should not raise awareness · 2021-03-02T15:40:06.983Z · LW · GW

Level 10 is just a mix of 2 and 3.

Comment by samshap on We should not raise awareness · 2021-03-02T15:36:19.153Z · LW · GW

Most of these extra simulacra levels are redundant or orthogonal to the originals. I don't think they carve reality well.

  • L5 overlaps heavily with L1. Interestingness is a quality of most L1 statements that are worth communicating!
  • L7 is L2. "There's a lion across the river" = I want you to buy X. It's direct manipulation of reality.
  • L8 is L3 or L4 (in your example), although propaganda can also be at L2.
  • L10 overlaps heavily with L3. "We should raise awareness of X" = I'm part of the group that believes "X".

The only one that is salvageable is L6 (which is similar to L9), which I might call a True Level 5:

"There's a lion across the river." = Listen to me! I say things worth hearing! (from the speaker's perspective)

There's actually a lot of communication that falls within this bucket, characterized by the content of the statement having no instrumental value for the speaker. The speaker just wants your attention.

Comment by samshap on Book Summary: Consciousness and the Brain · 2021-01-15T04:33:46.891Z · LW · GW

Kaj_sotala's book summary provided me with something I hadn't seen before - a non-mysterious answer to the question of consciousness. And I say this as someone who took graduate level courses in neuroscience (albeit a few years before the book was published). Briefly, the book defines consciousness as the ability to access and communicate sensory signals, and shows that this correlates highly with those signals being shared over a cortical Global Neuronal Workspace (GNW). It further correlates with access to working memory. The review also gives a great account of the epistemic status of the major claims in the book. It reviews the evidence from several experiments discussed in the book itself. The review also goes beyond this, discussing the epistemic status of those experiments (e.g. in light of the replication crisis in psychology).

So kudos to both the book author and the review author. A decent follow-up would be to link these findings to the larger lesswrong agenda (although I note this review is part of a larger sequence that includes additional nominations).

Comment by samshap on Radical Probabilism · 2020-09-06T06:35:51.319Z · LW · GW
Hmmmm. Unfortunately I'm not sure what to say to this one except that in logical induction, there's not generally a pre-existing z we can update on like that.

So that's my real crux, and any examples with telephone calls and earthquakes etc are merely illustrative for me. (Like I said, I don't know how to actually motivate any of this stuff except with actual logical uncertainty, and I'm surprised that any philosophers would have become convinced just from other sorts of examples.)

I agree that the logical induction case is different, since it's hard to conceive of likelihoods to begin with. Basically, logical induction doesn't even include what I would call virtual evidence. But many of the examples you gave do have such a z. I think I agree with your crux, and my main critique here is just in the examples of overly dogmatic Bayesian who refuses to acknowledge the difference between a and z. I won't belabor the point further.

I've thought of another motivating example, BTW. In wartime, your enemy deliberately sends you some verifiably true information about their force dispositions. How should you update on that? You can't use a Bayesian update, since you don't actually have a likelihood model available. We can't even attempt to learn a model from the information, since we can't be sure its representative.

I don't get this at all! What do you mean?

By model M, I mean an algorithm that generates likelihood functions, so M(H,Z) = P(Z|H).

So any time we talk about a likelihood P(Z|H), it should really read P(Z|H,M). We'll posit that P(H,M) = P(H)P(M) (i.e. that the model says nothing about our priors), but this isn't strictly necessary.

E(P(Z|H,M)) will be higher for a well calibrated model than a poorly calibrated model, which means that we expect P(H,M|Z) to also be higher. When we then marginalize over the models to get a final posterior on the hypothesis P(H|Z), it will be dominated by the well-calibrated models: P(H|Z) = SUM_i P(H|M_i,Z)P(M_i|Z).

BTW, I had a chance to read part of the ILA paper. It barely broke my brain at all! I wonder if the trick of enumerating traders and incorporating them over time could be repurposed to a more Bayesianish context, by instead enumerating models M. Like the trading firm in ILA, a meta-Bayesian algorithm could keep introducing new models M_k over time, with some intuition that the calibration of the best model in the set would improve over time, perhaps giving it all those nice anti-dutch book properties. Basically this is a computable Solomonoff induction, that slowly approaches completeness in the limit. (I'm pretty sure this is not an original idea. I wouldn't be surprised if something like this contributed to the ILA itself).

Of course, its pretty unclear how this would work in the logical induction case. This might all be better explained in its own post.

Comment by samshap on Radical Probabilism · 2020-08-25T03:51:32.599Z · LW · GW
You're right, you could have an event in the event space which is just "the virtua-evidence update [such-and-such]". I'm actually going to pull out this trick in a future follow-up post.
I note that that's not how Pearl or Jeffrey understand these updates. And it's a peculiar thing to do -- something happens to make you update a particular amount, but you're just representing the event by the amount you update. Virtual evidence as-usually-understood at least coins a new symbol to represent the hard-to-articulate thing you're updating on.

That's not quite what I had in mind, but I can see how my 'continuously valued' comment might have thrown you off. A more concrete example might help: consider Example 2 in this paper. It posits three events:

b - my house was burgled

a - my alarm went off

z - my neighbor calls to tell me the alarm went off

Pearl's method is to take what would be uncertain information about a (via my model of my neighbor and the fact she called me) and transform it into virtual evidence (which includes the likelihood ratio). What I'm saying is that you can just treat z as being an event itself, and do a Bayesian update from the likelihood P(z|b)=P(z|a)P(a|b)+P(z|~a)P(~a|b), etc. This will give you the exact same posterior as Pearl. Really, the only difference in these formulations is that Pearl only needs to know the ratio P(z|a):P(z|~a), whereas traditional Bayesian update requires actual values. Of course, any set of values consistent with the ratio will produce the right answer.

The slightly more complex case (and why I mentioned continuous values) is in section 5 where the message includes probability data, such as a likelihood ratio. Note that the continuous value is not the amount you update (at least not generally), because its not generated from your own models, but rather by the messenger. Consider event z99, where my neighbor calls to say she's 99% sure the alarm went off. This doesn't mean I have to treat P(z99|b):P(z99|~b) as 99:1 - I might model my neighbor as being poorly calibrated (or as not being independent of other information I already have), and use some other ratio.

In what sense? What technical claim about Bayesian updates are you trying to refer to?

Definitely the second one, as optimal update policy. Responding to your specific objections:

This is only true if the only information we have coming in is a sequence of propositions which we are updating 100% on.

As you'll hopefully agree with at this point, we can always manufacture the 100% condition by turning it into virtual evidence.

This optimality property only makes sense if we believe something like grain-of-truth.

I believe I previously conceded this point - the true hypothesis (or at least a 'good enough' one) must have a nonzero probability, which we can't guarantee.

But properties such as calibration and convergence also have intuitive appeal

Re: calibration - I still believe that this can be included if you are jointly estimating your model and your hypothesis.

Re: convergence - how real of a problem is this? In your example you had two hypotheses that were precisely equally wrong. Does convergence still fail if the true probability is 0.500001 ?

(By the way, I really appreciate your in-depth engagement with my position.)

Likewise! This has certainly been educational, especially in light of this:

Sadly, the actual machinery of logical induction was beyond the scope of this post, but there are answers. I just don't yet know a good way to present it all as a nice, practical, intuitively appealing package.

The solution is too large to fit in the margins, eh? j/k, I know there's a real paper. Should I go break my brain trying to read it, or wait for your explanation?

Comment by samshap on Radical Probabilism · 2020-08-25T01:50:30.038Z · LW · GW

Phew! Thanks for de-gaslighting me.

Comment by samshap on Radical Probabilism · 2020-08-22T05:58:26.619Z · LW · GW

I definitely missed a few things on the first read through - thanks for repeating the ratio argument in your response.

I'm still confused about this statement:

Virtual evidence requires probability functions to take arguments which aren't part of the event space.

Why can't virtual evidence messages be part of the event space? Is it because they are continuously valued?

As to why one would want to have Bayesian updates be normative: one answer is that they maximize our predictive power, given sufficient compute. Given the name of this website, that seems a sufficient reason.

A second answer you hint at here:

The second seems more practical for the working Bayesian.

As a working Bayesian myself, having a practical update rule is quite useful! As far as I can tell, I don't see a good alternative in what you have provided.

Then we have to ask why not (steelmanned) classical Bayesianism? I think you've two arguments, one of which I buy, the other I don't.

The practical problem with this, in contrast to a more radical-probabilism approach, is that the probability distribution then has to explicitly model all of that stuff.

This is the weak argument. Computing P(A*|X) "the likelihood I recall seeing A given X" is not a fundamentally different thing than modeling P(A|X) "the likelihood signal A happened given X". You have to model an extra channel effect or two, but that's just a difference of degree.

Immediately after, though, you have the better argument:

As Scott and I discussed in Embedded World-Models, classical Bayesian models require the world to be in the hypothesis space (AKA realizability AKA grain of truth) in order to have good learning guarantees; so, in a sense, they require that the world is smaller than the probability distribution. Radical probabilism does not rest on this assumption for good learning properties.

if I were to paraphrase - Classical Bayesianism can fail entirely when the world state does not fit into one of its nonzero probability hypotheses, which must be of necessity limited in any realizable implementation.

I find this pretty convincing. In my experience this is a problem that crops up quite frequently, and requires meta-Bayesian methods you mentioned like calibration (to notice you are confused) and generation of novel hypotheses.

(Although Bayesianism is not completely dead here. If you reformulate your estimation problem to be over the hypothesis space and model space jointly, then Bayesian updates can get you the sort of probability shifts discussed in Pascal's Muggle. Of course, you still run into the 'limited compute' problem, but in many cases it might be easier than attempting to cover the entire hypothesis space. Probably worth a whole other post by itself.)

Comment by samshap on Radical Probabilism · 2020-08-21T04:18:31.796Z · LW · GW

Why is a dogmatic Bayesian not allowed to update on virtual evidence? It seems like you (and Jeffries?) have overly constrained the types of observations that a classical Bayesian is allowed to use, to essentially sensory stimuli. It seems like you are attacking a strawman, given that by your definition, Pearl isn't a classical Bayesian.

I also want to push back on this particular bit:

Richard Jeffrey (RJ): Tell me one peice of information you're absolutely certain of in such a situation.
DP: I'm certain I had that experience, of looking at the cloth.
RJ: Surely you aren't 100% sure you were looking at cloth. It's merely very probable.
DP: Fine then. The experience of looking at ... what I was looking at.

I'm pretty sure we can do better. How about:

DP: Fine then. I'm certain I remember believing that I had seen that cloth.

For an artificial dogmatic probabilist, the equivalent might be:

ADP: Fine then. I'm certain of evidence A* : that my probability inference algorithm received a message with information about an observation A.

Essentially, we update on A* instead of A. When we compute the likelihood P(A*|X), we can attempt to account for all the problems with our senses, neurons, memory, etc. that result in P(A*|~A) > 0.

RJ still has a counterpoint here:

RJ: Again I doubt it. You're engaging in inner-outer hocus pocus.* There is no clean dividing line before which a signal is external, and after which that signal has been "observed". The optic nerve is a noisy channel, warping the signal. And the output of the optic nerve itself gets processed at V1, so the rest of your visual processing doesn't get direct access to it, but rather a processed version of the information. And all this processing is noisy. Nowhere is anything certain. Everything is a guess. If, anywhere in the brain, there were a sharp 100% observation, then the nerves carrying that signal to other parts of the brain would rapidly turn it into a 99% observation, or a 90% observation...

But I don't find this compelling. At some point there is a boundary to the machinery that's performing the Bayesian update itself. If the message is being degraded after this point, then that means we're no longer talking about a Bayesian updater.

Comment by samshap on [deleted post] 2020-07-14T01:28:45.785Z

Thanks for presenting your thesis. However, one of your figures doesn't support your argument on closer inspection. The figure that you point to as being the 'unfiltered' data is measuring cross-correlation between the Hanford and Livingston datasets, so we should expect it to look completely different than the datasets themselves.

I also want to push back on a particular point - there's nothing wrong in principle with using a black-hole shaped filter to find black holes. You just have to adjust the prior based on the complexity of your filter.

Comment by samshap on The Hammer and the Dance · 2020-03-21T05:51:24.044Z · LW · GW

I've been lurking lesswrong for years, and this is the article that actually got me to create an account. I am promoting this to everyone I can that has a scrap of political influence - my bosses (I work at a major university), my local newspaper, my rabbis, my local politicians. Every state in the country should be enacting the same measures as New York and Texas.

I would urge the lesswrong community to

a: constructively critique the article as Chris recommends (use argument to make it stronger)

b: shut up and do the impossible - if your state governor hasn't already shut down restaurants, public gatherings, and restricted all non-essential travel, get them to do it ASAP. If we figured out how to get a handler to unbox a superhuman intelligence, and how to defeat Voldemort, we at least owe this an attempt.