Posts

Comments

Comment by Daniel V on The lying p value · 2024-11-12T22:28:36.573Z · LW · GW

I'm here to say, this is not some property specific to p-values, just about the credibility of the communicator.

If  make a bunch of errors all the time, especially those that change their conclusions, indeed you can't trust them. Turns out (BW11) that  are more credible than , the errors they make tend not to change the conclusions of the test (i.e., the chance of drawing a wrong conclusion from their data ("gross error" in BW11) was much lower than the headline rate), and (admittedly I'm going out on a limb here) it is very possible the errors that change the conclusion of a particular test do not change the overall conclusion about the general theory (e.g., if theory says X, Y, and Z should happen, and you find support for X and Y and marginal-support-now-not-significant-support-anymore for Z, the theory is still pretty intact unless you really care about using p-values in a binary fashion. If theory says X, Y, and Z should happen, and you find support for X and Y and now-not-significant-support-anymore for Z, that's more of an issue. But given how many tests are in a paper, it's also possible theory says X, Y, and Z should happen, and you find support for X and Y and Z, but turns out your conclusion about W reverses, which may or may not really have something to say about your theory).

I don't think it is wise to throw the baby out with the bathwater.

Comment by Daniel V on Monthly Roundup #23: October 2024 · 2024-10-18T00:43:01.463Z · LW · GW

Supply side: It approaches the minimum average total, not marginal, cost. Maybe if people accounted for it finer (e.g., charging self "wages" and "rent"), cooking at home would be in the ballpark (assuming equal quality of inputs and outputs across venues..), but that just illustrates how real costs can explain a lot of the differential without having to jump to regulation and barriers to entry (yes, those are nonzero too!).

Demand side: Complaints in the OP about the uninformativeness of ratings also highlight how far we are from perfect competition (also, e.g., heterogeneous products), so you can expect nonzero markups. We aren't in equilibrium and in the long run we're all dead, etc.

I'm a big proponent of starting with the textbook economic analysis, but I was surprised by the surprise. Let's even assume perfect accounting and competition:

Draw a restaurant supply curve in the middle of the graph. In the upper right corner, draw a restaurant demand curve (high demand given all the benefits I listed). Equilibrium price is P_r*. Now draw a home supply curve to the far left, indicating an inefficient supply relative to restaurants (for the same quantity, restaurants do it "cheaper"). In the bottom left corner, draw a home demand curve (again the point is I demand eating out more than eating at home). Equilibrium price for those is P_h*. It's very easy to draw where P_h* < P_r*.

Comment by Daniel V on Monthly Roundup #23: October 2024 · 2024-10-16T14:43:51.733Z · LW · GW

Cooking at Home Being Cheaper is Weird

 

I like the argument that the scaling should make the average marginal cost per plate lower in restaurants than at home, but I find cooking at home being cheaper not weird at all. First, there are also real fixed costs to account for, not just regulatory costs.

More importantly, the average price per plate is not just a function of costs, it's a function of the value that people receive. Cooking at home does give some nice benefits, but eating out gives some huge ones: essentially leisure, time savings (a lot of things get prepped before service), no dishes, and possibly lower search costs ("what's for dinner tonight?").

Comment by Daniel V on Prices are Bounties · 2024-10-13T13:50:40.450Z · LW · GW

A classic that seemingly will have to be reargued til the end of time. Other allocation methods are not clearly more egalitarian and are less efficient (depends on the correlation matrix of WTP, need, time budget, etc., plus one's own judgment of fairness, but money prices come out looking great a lot of the time). In some cases, even prices don't perform great (addressed in some comments on this post), but they're better than the alternatives.

For more reading: https://www.lesswrong.com/posts/gNodQGNoPDjztasbh/lies-damn-lies-and-fabricated-options?commentId=nG2X7x3n55cb3p7yB

Comment by Daniel V on Robin Hanson AI X-Risk Debate — Highlights and Analysis · 2024-07-13T12:34:02.673Z · LW · GW

To get Robin worried about AI doom, I'd need to convince him that there's a different metric he needs to be tracking

That, or explain the factors/why the Robin should update his timeline for AI/computer automation taking "most" of the jobs.

AI Doom Scenario

Robin's take here strikes me both as an uncooperative thought-experiment participant and as a decently considered position. It's like he hasn't actually skimmed the top doom scenarios discussed in this space (and that's coming from me...someone who has probably thought less about this space than Robin) (also see his equating corporations with superintelligence - he's not keyed into the doomer use of the term and not paying attention to the range of values it could take).

On the other hand, I find there is some affinity with my skepticism of AI doom, with my vibe being it's in the notion that authorization lines will be important.

On the other other hand, once the authorization bailey is under siege by the superhuman intelligence aspect of the scenario, Robin retreats to the motte that there will be billions of AIs and (I guess unlike humans?) they can't coordinate. Sure, corporations haven't taken over the government and there isn't one world government, but in many cases, tens of millions of people coordinate to form a polity, so why would we assume all AI agents will counteract each other?

It was definitely a fun section and I appreciate Robin making these points, but I'm finding myself about as unassuaged by Robin's thoughts here as I am by my own.

Robin: We have this abstract conception of what it might eventually become, but we can't use that abstract conception to do very much now about the problems that might arise. We'll need to wait until they are realized more.

When talking about doom, I think a pretty natural comparison is nuclear weapon development. And I believe that analogy highlights how much more right Robin is here than doomers might give him credit for. Obviously a lot of abstract thinking and scenario consideration went into developing the atomic bomb, but also a lot of safeguards were developed as they built prototypes and encountered snags. If Robin is so correct that no prototype or abstraction will allow us address safety concerns, so we need to be dealing with the real thing to understand it, then I think a biosafety analogy still helps his point. If you're dealing with GPT-10 before public release, train it, give it no authorization lines, and train people (plural) studying it to not follow its directions. In line with Robin's competition views, use GPT-9 agents to help out on assessments if need be. But again, Robin's perspective here falls flat and is of little assurance if it just devolves into "let it into the wild, then deal with it."

A great debate and post, thanks!

Comment by Daniel V on Monthly Roundup #19: June 2024 · 2024-06-25T17:52:54.973Z · LW · GW

Paper from the Federal Reserve Bank of Dallas estimates 150%-300% returns to government nondefense R&D over the postwar period on business sector productivity growth. They say this implies underfunding of nondefense R&D, but that is not right. One should assume decreasing marginal returns, so this is entirely compatible with the level of spending being too high. I also would not assume conditions are unchanged and spending remains similarly effective.

 

At low returns, you might question whether it's good enough to invest more compared to other options (e.g., at 5%, maybe simply not incurring the added deficit to be financed at 5% is arguably preferable; at 7%, maybe your value function is such that simply not incurring the added deficit to be financed at 5% is arguably preferable), but at such high returns, unless you think the private sector is achieving a ballpark level of marginal returns, invest, baby, invest! The marginal returns would have to be insanely diminishing for it not to make sense to invest more, which implies we're investing at just about the optimal level (if the marginal return of the next $1 were 0%, we shouldn't invest more, but we shouldn't invest less either because our current marginal return is 150%). Holding skepticism about the estimated return itself would be a different story.

Comment by Daniel V on Childhood and Education Roundup #5 · 2024-04-17T13:50:39.103Z · LW · GW

That is an additional 15% of kids not sleeping seven hours


I was not aware of the concomitant huge drop in sleep (though it's obvious in retrospect). Maybe it's more important to limit screen time at night, when you're alone in your room not sleeping. Being constantly lethargic as a result may also contribute to (and be a) depressive symptoms. It will be very important to figure out the mechanism(s) by which smartphone use hurts kids.

Comment by Daniel V on The Poker Theory of Poker Night · 2024-04-08T12:37:19.193Z · LW · GW

I agree, I was thinking more generally this isn't a "poker" theory specifically, just one about rules and buy-in. But it's about poker night, so I'll let it slide. The main game rules, though, remain extraneous. Loved the post still!

Comment by Daniel V on Economics Roundup #1 · 2024-03-26T14:36:40.850Z · LW · GW

Mira: You should be able to buy anything with a limit order.

“I don’t feel like paying $250 for an anime figurine, but I left an order up for $50”

If they saw 10,000 orders at a lower price rung ...

As usual the answer is transaction costs

Agree and also perceptions. The idea here is to facilitate price discovery and price discrimination. If only we knew people's WTP and could serve them lower prices acceptable to us when volume isn't moving at the current price! We can adjust prices ad hoc, but maybe a little upfront market research would be better and an exchange might be smoother (subject to TCs). The flipside of this has the problem that consumers hate it [Reuters]. Also, hedging (see: futures markets) does happen in B2B, but with more sophisticated owners and larger businesses. The supply chain is constantly to optimize inventory management (again, not mom-and-pops you see on save-my-business shows).

Comment by Daniel V on Monthly Roundup #9: August 2023 · 2023-08-07T15:11:42.060Z · LW · GW

Why is turbulence worse on planes? The headlines blame it on ‘climate change.’ The actual answer is the FAA told airlines to prioritize saving fuel over passenger comfort, despite passengers having a strong revealed preference for spending the extra cost of fuel to have a more pleasant flight. This then became ‘because climate change.’ This kind of thing damages public trust in all such claims, making solving climate change (and everything else) that much harder.

There are benefits to optimized profile descents (fuel, time, reduced air traffic controller instructions, reduced noise over populated areas), which they did studies on to confirm since in high traffic airspace the stepwise approach can be easier for ATC. This change could conceivably increase turbulence on approach but would not explain the increase that "the narrative" is attributing to increased wind shear at higher altitudes. 

Comment by Daniel V on You don't get to have cool flaws · 2023-07-28T18:24:02.087Z · LW · GW

I agree with Neil here: if you identify with your flaws, that is bad. By definition. If you are highly analytical and you identify with it, great, regardless of if other people see it as a flaw. Like you said and Neil's reply in the footnote, if it's a goal, then it is not a flaw. But if you say it is a personal flaw, then either you shouldn't be adopting it into your identity (you don't even have to try to fix it as noble as that would be, but you don't get to say "I'm the bad-at-math-person, it's so funny and quirky, and I just led my small business and partners into financial ruin with an arithmetic mistake," life is not a sit-com) or maybe you don't really see it as a flaw after all. Either way, something is wrong, either in your priorities or the reliability of your self-reports. And, yeah, this topic involves value judgments. If nothing has valence, then the notion of a flaw would not exist. 

Comment by Daniel V on Contra Contra the Social Model of Disability · 2023-07-21T18:03:06.885Z · LW · GW

I quite appreciate the post's laying things out, but it's not convincing regarding Scott's post (it's not bad either, just not convincing!) because it doesn't offer much more than "no, you're wrong." The crux of the argument presented here is taking the word disability, which to most speakers means X and implies Y, and breaking it into an impairment, which means X, and a disability, which is Y. Scott says this is wrong and explains why he thinks so. DirectedEvolution says Scott is wrong "because the definitions say..." but that's exactly what Scott is complaining about.

For example, if you're short-sighted, normally we'd say "you have a disability (or impairment or handicap, etc., they're interchangeable) of your vision so that means you will struggle with reading road signs." Instead, the social model entails saying "you have an impairment of your vision so that means, because of society, you will be disabled when it comes to reading road signs."

We can debate which view is more useful (and for what purposes). Scott thinks the social model is useful to promote accommodations since it separates the physical condition from the consequences (whether it produces negative consequences depends on society). He thinks the Szaz-Caplan model is useful to deny accommodations since it separates the mental condition (i.e., preferences, in that model) from the consequences (whether it produces negative consequences depends on will). More importantly, he thinks the social model is "slightly wrong about some empirical facts" (what empirical facts? DirectedEvolution is correct that Scott's argumentation is a bit soft...he benefits greatly from arguing the layperson side) in that in some cases it feels absurd to pin blame on society for the consequences of some impairments (e.g., Mt. Everest). And on that your layperson (and I) would agree with him. DirectedEvolution offers no counterpoint on that (which is the primary argument), but the post DOES provide a key benefit:

Adopting separate definitions for impairment and disability IS NOT strictly equivalent to adopting the social model. One could restate short-sightedness: "you have an impairment of your vision so that means you will be disabled when it comes to reading road signs." This drops the blame game and allows for impairments to disable people outside of societies. In fact, Scott accidentally endorsed it [added by me]: "the blind person’s inability to drive [disability] remains due to their blindness [impairment], not society." So perhaps the crux of Scott's argument is not about using two definitions but about whether disability ought to be defined as stemming from society! And in fact that's evident in Scott's post. However, Scott's post DID also, at times, imply that one definition would suffice.

This post made me update toward two definitions potentially being useful, but it did not make me update away from endorsing Scott's main point, that disability ought not be defined as stemming from society.

As an aside: the two definitions are still debatable though. Suppose someone has an impairment that has not nor ever will generate a disability. How is this not the same as "there exists variability"? If someone has perfect vision and I am short-sighted but we live in a dome with a 5 foot diameter such that I can see just fine, and no one tells me my lived experience could be better, how could you even call that an impairment? Is it an impairment if I realize that my vision could be better? Is that other person impaired if they realize their vision could be improved above "normal"? "Impairment" could just refer to being low on the spectrum of natural human variability in some capability, but how low is low enough? "So low that it starts to interfere..." is bringing disability into the mix. What capabilities count? Certainly not "reading road signs" as that would be in the realm of disability, but what level of specificity is appropriate? Short-sightedness is not an impairment of seeing near objects, it's an impairment of seeing far objects, so that is to say, not vision generally. But once you get specific enough, it's back to sounding like a disability - "your far object vision is impaired so you are disabled at seeing far objects."

Comment by Daniel V on Which personality traits are real? Stress-testing the lexical hypothesis · 2023-06-22T18:51:22.872Z · LW · GW

It's very interesting to see the intuitive approach here and there is a lot to like about how you identified something you didn't like in some personality tests (though there are some concrete ones out there), probed content domains for item generation, and settled upon correlations to assess hanging-togetherness.

But you need to incorporate your knowledge from reading about scale development and factor analysis. Obviously you've read in that space. You know you want to test item-total correlations (trait impact), multi-dimensionality (factor model loss), and criterion validity (correlation with lexical notion). Are you trying to ease us in with a primer (with different vocabulary!) or reinvent the wheel?

Let's start with the easy-goingness scale:

  • (+) In the evening I tend to relax and watch some videos/TV
  • (+) I don’t feel the need to arrange any elaborate events to go to in my free time
  • (+) I think it is best to take it easy about exams and interviews, rather than worrying a bunch about doing it right
  • (+) I think you’ve got to have low expectations of others, as otherwise they will let you down
  • (-) I get angry about politics
  • (-) I have a stressful job
  • (-) I don’t feel like I should have breaks at work unless I’ve “earned” them by finishing something productive
  • (-) I spent a lot of effort on parenting

The breadth of it is either a strength or a weakness. It'd be nice to have a construct definition or at least some gesturing at what easy-goingness actually is to gauge the face-validity of these items. Concrete items necessarily will have some domain-dependence, resulting in deficiency (e.g., someone who likes to relax and read a book will score low on item 1) or contamination (e.g., having low expectations of others might also be trait pessimism), but item 8 is really specific. It hampers the ability of this scale to capture easy-goingness among non-parents. The breadth would be good if it captured variations on easy-goingness, but instead it'd be bad if it just captures different things that don't really relate to each other. That's especially problematic because then the inference from low inter-correlations might not be that the construct is bad, but that the items just don't tap into it. You can see where I'm going with this because...

This suggests to me that Easy-Goingness is not very “real”. While it might make sense to describe a person as doing something Easy-Going, for instance when they are watching TV, it is kind of arbitrary to talk about people as being more or less Easy-Going, because it depends a lot on context/what you mean.

...indeed, the items are mainly just capturing different things, not reflecting on easy-goingness in any way. From a scale-assessment standpoint, it's great to see the results confirm my unease about the items based on simply reading them.

The fact that this is weak means that even the most Easy-Going people cannot necessarily be expected to be particularly Easy-Going in all contexts.

This statement presumes your measure reflects a higher-order easy-goingness and that context-specific easy-goingnesses are also being adequately measured.

With conservatism, on the other hand, you can see there is some context-specificity (e.g., dress vs. general social views vs. issue-based ideology), but the measure is facially better. And it hangs together better. Alternately, you might explore those contours and say you've come up with a multi-dimensional conservatism scale, just like you have a multi-dimensional creativity scale. 

the “Correlation with lexical notion” was consistently close to 1, showing that the concrete and the abstract descriptors were getting at the same thing.

There's an implicit "when the concrete descriptors actually had face validity" hidden here; low correlation with the lexical notion could indicate a problem with the lexical scale or a problem with the concrete scale, or both. 

Overall, I am very impressed that you presented a scary chart to start, promised you'd explain it, and successfully did so. The general takeaway from it is that the lexical hypothesis could be pretty sound and a few of these might be multidimensional in nature (or could be that some items are good and some a bad). For the low trait impact scales, it's a question of whether the items are good and the construct isn't "real," or whether the items are just a bad measurement approach.

Comment by Daniel V on Papers, Please #1: Various Papers on Employment, Wages and Productivity · 2023-05-22T15:26:11.530Z · LW · GW

Who has an alternative hypothesis that explains this data? Anyone? Ooh ooh, pick me, pick me. Perhaps being depressed has something to do with your life being depressing, due to things like lack of human capital or job opportunities, life and career setbacks or alienation from one’s work. Income increases life satisfaction, as I assume does the prospect of future income.

It is amazing to see the ‘depression is purely a chemical imbalance unrelated to one’s physical circumstances’ attitude in this brazen a form. Mistaking correlation for causation here seems like a difficult mistake for a reasonable and reflecting person to make.

 

They measured depression at ages 27-35 in 1992 and outcomes at age 50. They control for "age, gender, race, for level of education by age 26, parental education, r marital status in 1992 survey, years of work experience accumulated by 1992 survey, the average percentage of weeks the person’s work history data is unaccounted for by 1992 survey, health status during childhood, a dummy for number of cigarettes consumed by 1992 survey, year indicators, local unemployment rate in 1992, 1998, 2004, and the year the person’s outcome variable is collected."

So it's not like they just correlated depression and wages from a cross-sectional survey and claimed causation. They did some work here.

Comment by Daniel V on Double-negation as framing · 2023-04-17T01:50:53.910Z · LW · GW

It was a good post! To the extent that whatever I said was value-added or convincing to you, it was only because your quality post prompted me to lay it out.

And like you said, perhaps there is more here. Does a negative (vs. positive) frame make it harder to notice (or easier to forget) that there is a null hypothesis? Preliminary evidence in favor is that people who "own" the null will cede it in a negative frame, whereas they tend to retain it in a positive frame. More thinking/research may be needed though to feel confident about that (I say that as a scientist starting with the null effect of no difference, not as someone proponing the hypothesis of no difference).

"It's not sufficient to be right in many contexts, you must also be rhetorically persuasive." Spittin' facts.

Comment by Daniel V on Double-negation as framing · 2023-04-16T13:50:28.496Z · LW · GW

Going off localdeity's comment, I think "arrogating the right to choose the null hypothesis" or as you said, "assuming the burden of proof" are more critical than whether the frame involves negations. If you want to win an argument, don't argue, make the other person do the arguing by asking lots of questions, even questions phrased as statements, and then just say whatever claim they make isn't convincing enough. Why should purple be better than green? An eminently reasonable question! But one whose answer will never have satisfactory support, unless you want it to. "I'm just asking questions."

It's good for you to point out that the true statement localdeity offered and your conclusion seem in contention. It is a weaker statement, so if you are being asked for your opinion, you may want to hedge with that negation. If you are actually trying to convince someone of something though (and this is why I think you rightly believe these are about subtly different things), that is not the way to do it. You could make the stronger claim, or alternately, you could phrase it as a question - "why shouldn't we do anti-X?" (but notice it would also work without the negation: "why should we do X?") and get them to do the arguing for you.

Comment by Daniel V on Double-negation as framing · 2023-04-16T13:22:56.415Z · LW · GW

You're not wrong, and I don't disagree!

Comment by Daniel V on The benevolence of the butcher · 2023-04-09T13:28:05.933Z · LW · GW

In the long run it seems pretty clear labor won't have any real economic value

 

I'd love to see a full post on this. It's one of those statements that rings true since it taps into the underlying trend (at least in the US) where the labor share of GDP has been declining. But *check notes* that was from 65% to 60% and had some upstreaks in there. So it's also one of those statements that, upon cognitive reflection, also has a lot of ways to end up false: in an economy with labor crowded out by capital, what does the poor class have to offer the capitalists that would provide the basis for a positive return on their investment (or are they...benevolent butchers in the scenario)? Also, this dystopia just comes about without any attempts to regulate the business environment in a way that makes the use of labor more attractive? Like I said, I'd love to see the case for this spelled out in a way that allows for a meaningful debate.

As you can tell from my internal debate above, I agree with the other points - humans have a long history of voluntarily crippling our technology or at least adapting to/with it.

Comment by Daniel V on Consider The Hand Axe · 2023-04-08T13:23:20.641Z · LW · GW

Thanks for writing this. I suppose the same could be said about any tool that you have suspicions might be inferior to another on the horizon in your lifetime. As quanticle said, some romance around self-crafting could support the psychological value of the labor. More importantly, I think there are in fact qualia pertinent to our quality evaluations that leave AI productions inferior in important ways than human work...currently. That gap will attenuate and we'll hone our models to be better at producing in a wider spectrum of areas, too.

However, I don't think it's a foregone conclusion that no gap will remain. When the world of bits can't quite recreate the world of atoms (efficiently), there will be a place for human labors (okay, even the boundaries for this are subject to change too but bear with me) - think of handwriting. What a pain! The tool has been replaced with word processing and printing for many written documents. But when I want to send a thank-you to a big client, printing just can't recreate my ink-on-paper signature. An autopen could, but again it's not at the level of efficiency where it is worth the widespread adoption that would snuff out human labor in that space.

By the way, I wonder if you took your inspiration and general plan for this essay, turned it into a prompt, and gave it to chatGPT, what it would produce (maybe there could be some honing of that by a prompt engineer, but whatever). To be fair, you could let chatGPT rewrite it a few times with edits like you would have done for yourself. I suspect it would not write as good of a post - that's a good enough reason to bother doing it yourself.

(Also because the prompt to write with the style of a specific person only works when you have enough online content in the training data. So if you want a unique style, you need to write a lot before you can outsource. LOL)

Comment by Daniel V on Don't take bad options away from people · 2023-03-27T00:13:20.146Z · LW · GW

Upvote for paragraph one, agree for paragraph two.

It's a very narrow (but admittedly compelling) perspective to realize that in particularly bad situations, regulations can compound the badness. But there is plenty of room to debate regulations when it comes to typical cases, and it's probably a better basis on which to evaluate them.

Comment by Daniel V on Why consumerism is good actually · 2023-03-25T13:39:33.441Z · LW · GW

I agree with your comment, but I think the definitional problem is core to the debate rather than something that can simply be discarded. Consumerism is not consumption, but it used to mean consumer protection and empowerment (obviously there is a spectrum there about what constitutes adequate information and the appropriate regulations/interventions to ensure that)...in support of their consumption, which was assumed to be valuable for them. Consumerism has taken on a second, more prominent meaning that itself is a spectrum: sometimes demanding the pricing/regulation of externality-generating production (not all that different in nature from economics, but unique in the externalities that are identified, oftentimes private costs that consumers simply don't attend to), sometimes all the way to value judgments about certain kinds of consumption.

It's such a loaded term I find it best instead to talk about what I actually mean rather than use the term consumerism. Do I want to talk about negative aspects of consumption? Do I want to talk about the consumer information movement? Which one am I about to get into when I say "I'd like to talk about consumerism"?

I also want to add to your bolded comment on substitution, which seems like a really good rule of thumb. But a lot of things cannot be substituted easily because they are timing- or situation-dependent. If I have 15 minutes to kill, it's not obvious that just sitting there with my thoughts is particularly desirable (for some people, sure!), so I'll seek to consume something (not non-consumption) - if the park is 2.5 minutes away, I can consume a 10 minute walk at the park, which might dominate my crappy phone game. If the park is 7.5 minutes away, I can consume a walk to the park, but given that menu of options, maybe my phone game is fine. It also provides optionality for when I'm looking for a low-transportation mode of entertainment in a waiting room. But it can shift from working in these initial use cases to being a prioritized activity in itself - maybe when I have 30 minutes, I'll "default" to that instead of actually evaluating my options. In that case, regret would be a sign that something has gone wrong in my decision-making. It just reinforces the need to use that rule of thumb - be conscious about what you're consuming and the options that are before you!

Comment by Daniel V on On the Crisis at Silicon Valley Bank · 2023-03-16T21:05:12.411Z · LW · GW

I strongly agree and wanted to share a similar sentiment.

It is not as simple as "the market says the asset or liability is worth X, so you should too." Businesses are usually going-concerns and it is not really that useful for the company to report itself as merely how things would go down if they were to liquidate today (though obviously considering that possibility is useful, especially if your business could be "runny," and recording the fair value of HTM securities in a note to the financial statements allows readers, like Raging Capital Ventures, to contemplate that). Those liquidation values continue to require subjectivity (e.g., depends on the spreads for the assets and what if the blowup situation we're talking about would spark fear and government intervention that would actually support the assets' values?! [which is exactly what happened with SVB's assets actually]), and of course are not even perfectly reflected by MTM values, so their utility is not as straightforward as it may seem at first blush. 

In fact, the FASB (1993) explicitly stated in explaining its rule-making...

that extremely remote "disaster scenarios" (such as a run on a bank or an insurance company) would not be anticipated by an enterprise in deciding whether it had the positive intent and ability to hold a debt security to maturity.

The managers (evidenced by pursuing more capital) and the market (in reaction to that) obviously started to consider that possibility as much less remote, which became a self-fulfilling prophecy. But "disaster valuation" might not be a great default way to account when your business is generally conducted under non-disaster conditions.

Comment by Daniel V on What are some ideas that LessWrong has reinvented? · 2023-03-15T21:30:01.015Z · LW · GW

That's wonderful for him. I wish he had translated that knowledge into the post then! The reader shouldn't have to come away from a post titled "the point of trade" with simply a list of reasons why trade might be nice when those reasons can actually be brought together in a unifying explanation, one that is already well-explained in Econ 101, no less.

Here he talks about his understanding of the textbook explanation, and you can judge for yourself whether it conveys comparative advantage or not:
"[sometimes people get different amounts of value from things, so they can get more value by trading them] is the horrible explanation that you sometimes see in economics textbooks because nobody knows how to explain anything ... All right, suppose that all of us liked exactly the same objects exactly the same amount.  This obliterates the poorly-written-textbook's reason for "trade"."

He also explains his thesis: 
"I claim that the reason we have more stuff has something to do with trade. I claim that in an alternate society where everybody likes every object the same amount, they still do lots and lots of trade for this same reason, to increase how much stuff they have."

Of course, that is comparative advantage adjacent, so we'll talk about it right? Wrong, the point of trade is to leverage an assortment of the sources of comparative advantage (but we won't even attempt to link these together in their unifying concept):
"So now let us suppose identical fruit tastes, perfect task-switching, Star Trek transporters, identically cloned genetics, and people can share expertise via Matrix-style downloads which are free.  Have we now gotten rid of the point of trade?"

The organizing/umbrella concept (comparative advantage) is still absent at the end of this. Maybe concrete examples like these, delineating specific sources by which comparative advantage can arise, are a useful didactic tool. But I don't think the point was to illuminate a key concept (indeed, it was never named or really all that gestured at), the point apparently was to generate an exhaustive list of things that enable trade to increase production:
"Note:  While contemplating this afterwards, I realized that we hadn't quite gotten rid of all the points of trade, and there should have been two more rounds of dialogue; there are two more magical powers a society needs, in order to produce a high-tech quantity of stuff with zero trade.  The missing sections are left as an exercise for the reader."

I wish by the end of it that he had fully reinvented comparative advantage. Great that he knew about it all along though...

Comment by Daniel V on What are some ideas that LessWrong has reinvented? · 2023-03-15T00:03:41.661Z · LW · GW

Confounders - This post took some vivid examples and turned them into solid recommendations, even referring to the concept that already exists outside the post. But it mints new laws where none are needed, not really addressing other things that contribute to the internal validity of experiments or the inferences from full programs of research that might counteract the call to measure every single thing you possibly can; in my estimation, it led to a minor weakness in the post. It's not an egregious reinvention because it has the intellectual humility to interact with previous scholarship, one cannot expect any individual post to cover all the pieces of what can be a broad domain, and the point seemed to be more of presenting preferred operating procedures rather than (re)introducing a concept.

Comment by Daniel V on What are some ideas that LessWrong has reinvented? · 2023-03-14T23:43:48.962Z · LW · GW

Goodharting - on the other side of things, LessWrong also has posts like this that are designed to review rather than reinvent ideas. There is value in explaining old ideas in new ways or finding previously-unconsidered applications for old ideas.

Comment by Daniel V on What are some ideas that LessWrong has reinvented? · 2023-03-14T23:32:59.122Z · LW · GW

Comparative advantage - and even worse, EY didn't even fully reinvent it. He just lined up a bundle of things that fall under the umbrella and called it a job well done. This particular instance also checks the boxes for arrogance and lack of rigor. That post was a fun read, but the embedded disdain for economics textbooks was particularly galling since economics textbooks handle the concept just fine.

Comment by Daniel V on "Liquidity" vs "solvency" in bank runs (and some notes on Silicon Valley Bank) · 2023-03-13T18:28:55.311Z · LW · GW

As you said, it doesn't really change the point, but I'm here to say it's not an alternative bond structure, just that the bond happens to be trading at a discount already at the initial conditions. It will trade at a steeper discount as interest rates rise. It would be even less intuitive, but you could also do this analysis with bonds that are trading at a premium (trading at a smaller premium, or even hitting par or switching to a discount, as interest rates rise).

Comment by Daniel V on "Liquidity" vs "solvency" in bank runs (and some notes on Silicon Valley Bank) · 2023-03-12T14:10:22.876Z · LW · GW

Matt Levine at Bloomberg also has good comments on this - basically it was a boring bank run/collapse. With it being primarily a duration issue (rather than an impaired assets issue) and a large amount of deposits, I also suspect we'll see an acquisition.

Check the date on this too.

Comment by Daniel V on Noting an error in Inadequate Equilibria · 2023-02-08T19:36:38.446Z · LW · GW

Further illustrating Eliezer's misplaced confidence, Sumner's view is about NGDP targeting, so the success of the BOJ's policy should be based on delivering NGDP growth, not real economic variables like RGDP growth or employment rate as Eliezer implies. They were in fact successful at this (RGDP growth + Inflation = NGDP growth; with RGDP growth continuing on trend and Inflation bucking the downtrend, that's a new NGDP trajectory, baby!). Here, with 100=March 2013 as Kuroda ascended, you can see the shift in CPI trend even before the VAT impact in April 2014. Sumner was bullish on the new BOJ policy by September 2013.

So, Eliezer, you think you have identified which econbloggers, like Scott Sumner, know better than the Bank of Japan, do you? Eliezer did identify Sumner successfully, but he got lucky. His belief in Sumner was based on a misread of Sumner's position, one that led him to wrongly believe real economic variables would supply evidence for the veracity of the theory. Further compounding the issue, while employment rate might have been readable as supportive, as Matthew Barnett points out, RGDP was not. He is overconfident and should be more humble about his approach.

Ironically, Eliezer's mistake actually more strongly makes his key point. The demand for humility Eliezer was writing about stemmed from the belief that even a very good reasoner oughtn't be able to outperform "the experts." And yet, here we have a mistaken reasoner outperforming "the experts" (at least, outperforming the hawkish experts, before they were replaced by the dovish experts who implemented the new monetary policy at the BOJ). Perhaps the case for humility is not so strong after all: "it is perfectly plausible for an econblogger to write up a good analysis of what the Bank of Japan is doing wrong, and for a sophisticated reader to reasonably agree that the analysis seems decisive, without a deep agonizing episode of Dunning-Kruger-inspired self-doubt playing any important role in the analysis." I suppose one might need to decide how interchangeable "humility" and "agonizing self-doubt" are...

Eliezer is driving an intellectual racecar when many are driving intellectual horse-and-buggies. Still needs to be vacuumed out from time to time though.

Comment by Daniel V on Exercise is Good, Actually · 2023-02-03T17:39:51.566Z · LW · GW

Chapin is describing a range of gains - "Until I’d gained some muscle, I didn’t know that getting out of bed shouldn’t actually feel like much, physically, or that walking up a bunch of stairs shouldn’t tire you out, or that carrying groceries around shouldn’t be onerous. I felt cursed." If you can remove a general feeling of being cursed and get a license to live in the material world, wow! If you can solve chronic pain with strength training, great! If you can climb stairs without getting tired, a lot of people already can, but good! If you can carry groceries, like most people can, okay!

Since most normal people's gains will fall in the last two types (for real, what percent of the people feel they "need some special justification for existing" because they're physically not-that-bad-kinda-on-the-weak-side?), you have a point that for a lot of people who already can do things without bother, this won't move the needle. Yet, for many who can do these things, doing them without bother may be nice (and prospectively under-appreciated) - feeling less exhaustion in your life generally and having more energy to do things you really want to do are quite good benefits.

But even if the benefits are more trivial than Chapin characterizes, I think your characterizing the costs as "feel[ing] miserable" is a bit much (though obviously everything is subjective here). Again, for some, sure, it's misery. For most, it's challenging and uplifting and potentially even energizing (especially after the first couple workouts).

So, we have Chapin claiming , and I suggest it's probably more like  or at worst , either of which should be more motivating than your . But I agree the benefits seem trumped up by Chapin.

Comment by Daniel V on Alexander and Yudkowsky on AGI goals · 2023-01-28T01:27:17.748Z · LW · GW

You (correctly, I believe) distinguish between controlling the reward function and controlling the rewards. This is very important as reflected in your noting the disanalogy to AGI. So I'm a little puzzled by your association of the second bullet point (controlling the reward function, which parents have quite low but non-zero control over) with behaviorism (controlling the rewards, which parents have a lot of control over).

Comment by Daniel V on COVID contagiousness after negative tests? · 2023-01-20T21:50:20.972Z · LW · GW

From 2021, modeling estimated 6% if you follow the 2 negative tests after day 6 or 10 day isolation rule; estimated 4% if you follow the 2 negative tests after day 6 or 14 day isolation rule.

From 2022, modeling estimated 2-3% if you have 2 negative tests testing daily.

Comment by Daniel V on What is the best way to approach Expected Value calculations when payoffs are highly skewed? · 2022-12-28T15:13:32.503Z · LW · GW

(I'm going to nix the cost of the ticket as it's just a constant)

Depends. Do you want to sum the probability weighted payoffs? EV is fine for that. The probability weighting deals with the striking "really, really low" odds (unless you want to further reweight the probabilities themselves by running them through a subjective probability function), and the payoffs are just the payoffs (unless you want to further reweight the payoffs themselves by running them through a subjective utility function). Either or both of these changes may be appropriate to deal with your own subjective views of objective reality, but that's what they are - personal transformations. However, enough people subscribe to such transformations that EU (expected utility, or see cumulative prospect theory) makes sense more widely than just for you. We indeed perceive probabilities differently from their objective meanings and we indeed value payoffs differently from their mere dollar value.

Now, if you just want a number that best represents the payoff structure, we have candidate central tendencies - mean is a good one (that's just EV). But since the payoff distribution is highly skewed, maybe you'd prefer the median. Or the mode. It's a classic problem, but it's finding what represents the objective distribution rather than what summarizes your possible subjective returns.

Comment by Daniel V on Lead in Chocolate? · 2022-12-23T14:49:46.517Z · LW · GW

Thanks for the analysis, and I mostly agree with your interpretation (having done no further research into this myself), but I'm confused how dividing by 1000 is the problem here. The levels are "basically fine" because 9*-they are well below the FDA/EPA limits, but the CA levels are only about 1 order of magnitude lower, not 3. If they had divided by 100, would we be interrogating their divisor choice? (The current implication is that that arbitrary approach would have been fine since it would correspond to FDA/EPA levels). It makes me think that maybe the impetus to question the arbitrary approach mainly stems from the conclusions not fitting our palate.

Comment by Daniel V on Where's the economic incentive for wokism coming from? · 2022-12-10T23:19:36.284Z · LW · GW

Karma downvote for lack of introspection into failures of rationality in a rationality forum.
Agreement upvote for "don't do this" because "that is telling of your own biases" without naming any is just not engaging. It was sadly a throwaway starting line to an otherwise excellent comment.

Comment by Daniel V on The Categorical Imperative Obscures · 2022-12-07T17:14:58.549Z · LW · GW

1. My comment was contra "the case that the categorical imperative tries to pull a fast one." Evaluating this (the point of the OP) very much requires an understanding of intent.

2. Well sure, if you don't care about the point of the OP, you can care about other things. Whether it is a useful framework for you to judge and cajole others in their actions is a very important question! You might still care about intent though. Is a handsaw the right tool for hammering a nail? I would recommend you look at a handsaw's intended use case and rule it out. If you really want to see how it would do, okay then, I can't really stop you. You can conclude a handsaw it bad at hammering nails, but don't go to the hardware store and complain. Likewise, is the categorical imperative a good way to aggregate preferences? I say it's not for that, so you don't really have to try it. If you really want to see how it would do and find out that it's bad for the job though, great. But don't say you were duped!

Comment by Daniel V on The Categorical Imperative Obscures · 2022-12-07T17:05:59.109Z · LW · GW

Ah, makes more sense now. I'm generally not a fan of that approach though, and here's why.

My comment was that your conclusion about the categorical imperative vis-a-vis its aims was off because the characterization of its aims wasn't quite right. But you're saying it's okay, we're still learning something here because you meant to do the not-quite-right characterization because most people will do that, and this is the conclusion they will reach from that. But you never tell me up front that's what you're doing, nor do you caution that the characterization is not-quite-right (you said your purpose was to "make the case that the categorical imperative tries to pull a fast one" not that "laypeople will end up making the case..."). So I'm left thinking you genuinely believe the case you laid out, and my efforts go into addressing the flaws in that. Little did I know you were trying to describe a manner of not-quite-right thinking people will do, a description which we should be interested in not because of its truth value but because of its inaccuracy (which you never pointed out).

I've seen this before in another domain: someone wanted to argue that doing X in modeling would be a bad call. So they did Y, then did X, got bad results and said voila, X is bad. But they did Y too (which in this particular case was a priori known to be not the right thing to do and did most of the damage - if they did Y and not X, they'd get good results, and if they had just not done Y, they'd get good results without X and adequate results with X)! When I pointed out that X really wasn't that bad by itself, they said, well, Y is pretty standard practice so we'd expect people to probably improperly do that in this particular case anyway. You gotta tell me that beforehand! Otherwise it looks like a flaw in your commenting on the true state of the world as opposed to a feature of your analysis of the typical approach. But it also changes the message: the problem isn't with X but Y, the problem isn't with the categorical imperative but analyzing it superficially. Of course, that message is also the main point of my comment.

Comment by Daniel V on The Categorical Imperative Obscures · 2022-12-06T19:08:56.025Z · LW · GW

The categorical imperative aims to solve the problem of what norms to pick, but goes on to try to claim universality.
...
The trouble is that the categorical imperative tries to smuggle in every moral agents' values without actually doing the hard work of aggregating and deciding between them.

That would be a problem, if that is what it were doing. The categorical imperative aims to define the obligation an individual has in conducting themselves consistent with the autonomy of the will. Each individual may have a distinct moral code consistent with that obligation, and that is indeed a problem for ethics, but the categorical imperative does not attempt to help people pick specific norms to apply across multiple agents.

Kant lays out a ton of definitions and then leans on them heavily; it's classic old philosophy. Understanding what Kant said means you need to read Kant, not just his conclusions. That's a big strike against the clarity of his writing (it is a real slog to get through), but whether he achieves his intent should be judged vs. his self-professed intent, not against a misunderstanding of his intent.

Comment by Daniel V on How should I judge the impact of giving $5k to a family of three kids and two mentally ill parents? · 2022-12-05T20:07:22.241Z · LW · GW

With a goal to "alleviate immediate and preventable suffering," QALY seems to be a pretty terrible metric. You need to measure immediacy, preventability, and suffering, or at least the suffering due to just the immediate and preventable causes. I would suggest suffering needs construct definition before you consider an operationalization. 

It would be smart to measure pre- and post-intervention. The good news is that if suffering is a subjective psychological state, you could do post-only and measure perceived change. If you're worried about self-report, you could do observer-report (case worker), but since they'll be doling out the funds that could be biased as well (and presumably they'll be basing these decisions in part on self-reports of what is a "fire").

The type of counterfactual analysis tailcalled suggests is likely what the family or case worker would be mentally approximating when responding to how big of a difference paying off the utility bill or a pizza night made to their suffering. Plus, n=1, so a qualitative assessment may be all you can really do - if they rate something a 7/7 on reducing their suffering and everything else came in at 1-2, sure, that hits you between the eyes, though you'd probably get that anyway from them saying "the most important thing, by a mile, was X, and this is why..."

Comment by Daniel V on Sadly, FTX · 2022-11-20T14:23:42.576Z · LW · GW

If you take the business part of the business and separate it from the Ponzi part of the business, it's not a Ponzi scheme. But apparently it was a package deal - at least until Alameda were to weather the crypto crash? Like, we just need to siphon FTX customer funds over to Alameda until they recoup $8-10 billion over time in profits, which have not been forthcoming lately. But oops, we "forgot" that when we transferred that money over, it affected our financials - there it is, fake books. And that's not even getting into the laughable valuations of their other "shitcoin" assets, or even of FTT itself, just the flow of funds!

This also speaks to further impairment of FTX's value by management - if you separate the business part of the business from the management part of the business...and you can to some extent, but the damage has been done. Who is going to trust FTX as an exchange going forward even with a new structure and management team?

Finally, it's not like Madoff vaporized the money and SBF/FTX/Alameda didn't. If anything, it's the opposite; Madoff was a far better steward, making the assets recoverable. SBF/FTX/Alameda simply gambled them away. Put differently, the non-Ponzi part of the business was a bigger share of Madoff's fund than of SBF's bundle. Obviously the valuation in either case was fake, based on a multiple of both real assets and Ponzi accounting over time, but Madoff skimmed customer funds while SBF/FTX straight-up embezzled them to prop up the husk of Alameda. In its death throes, FTX was in straight-up Ponzi mode, making withdrawing customers whole to maintain the facade at the expense of those last in line.

I think the comparison is quite apt, and the points of contrast are more interesting than absolving. I was initially hesitant to say there was some fraud/Ponzi beyond just accidentally falling into borderline insolvency, but by this point, especially with how enormous the hole, it looks much more intentional, and Matt Levine has even dropped the P-word because of it.

Comment by Daniel V on The Futility of Status and Signalling · 2022-11-14T00:48:43.104Z · LW · GW

Disagree that "[a]t best, these theories just do not bring much new to the table" but agree that the over-emphasis on these theories is "extremely unhelpful." That is, they provide good insights and explanatory power, and even substantial explanatory power on the margin sometimes, but having non-zero  is not the same as being the explanation with the highest . Assuming the latter is often not accurate. Ironically, I find myself mostly agreeing with a post that Razied disagrees with while also mostly agreeing with Razied - the human value function is not solely, or even primarily, status seeking.

Likewise, I also disagree with "[on social media], it is well known people are playing their meaningful status games and doing false signalling on a high simulacrum level' because Ape in the coat is doing exactly what they are complaining about. Certainly this happens, and in some cases dominates any other reasons for people's behavior on social media, but again, most of why people engage on social media is likely not to be about obtaining status, unless status is to be so broadly defined as to include things like "personal satisfaction."

Getting meta, here I find myself engaging in a social medium that even has a very clear social approval mechanism. Yes, I want lots of yummy upvotes (is it because they accord me status somehow, or because the social approval provides peer review for the validity of my hopefully-rational take?), but mainly I'm posting because I think my opinion is right, worthwhile to share, and hopefully conveyed in a way that can, if not persuade, update.

Comment by Daniel V on How Risky Is Trick-or-Treating? · 2022-10-27T14:55:26.793Z · LW · GW

Nicely done!

The question is what should be the denominator of the risk?
For societal aggregate worries: Day? Sure.
For personal risk: Day? Nope. Person-day? Yes!
 

Comment by Daniel V on Luck based medicine: my resentful story of becoming a medical miracle · 2022-10-19T19:44:32.004Z · LW · GW

He guessed I had an allergic reaction and threw 5 different antihistamines

Not so much dumb luck after all! Allergic reactions often cause inflammation, and it's the inflammation that is uncomfortable. Sure, it's not very controlled (could have suggested one, then another, then another, until you got to Boswellia, preferably in order of prior belief for each more-specific hypothesis), and other things could cause inflammation, but it's not completely luck either. (Though this did not detract from my enjoyment of the post!)

 

Some quick Googling says,

Current research showed that 3-O-Acetyl-11-keto-beta-boswellic acid (AKBA) is the one boswellic acid with strong pharmacological activity; for example, AKBA has a powerful inhibitory effect on 5-lipoxygenase (5-LOX)

The tissue, animal model, and animal and human genetic studies cited above implicate ALOX5 in a wide range of diseases...chronic inflammatory conditions such as rheumatoid arthritis, atherosclerosis, inflammatory bowel disease, autoimmune diseases

Helpfully, this study compared AKBA vs. NDGA, so maybe that could be a useful way to test the mechanism of action (they also differ in their specifics within that too, so that's yet another question mark). Obviously not medical advice and just a curious wondering.

Comment by Daniel V on You Are Not Measuring What You Think You Are Measuring · 2022-09-22T16:44:07.348Z · LW · GW

To the contrary, johnswentworth's point is not that the experiments have low external validity but that they have low internal validity. It's that there are confounds.

Ironically, one of my quibbles with the post is that the verbiage implies measurement error is the problem. Not measuring what you think you're measuring is about content validity, but the post is actually about how omitted variables (i.e., confounders) are a problem for inferences. "You are not Complaining About What You Think You Are Complaining About."

Comment by Daniel V on Twitter Polls: Evidence is Evidence · 2022-09-20T17:38:58.953Z · LW · GW

Adam’s whole position here, to me, is rather silly, even if we limit ourselves to use cases where the Twitter poll is being used only to try and extrapolate towards national sentiment. 

 

I agree except with the last part (it's not silly when thinking about extrapolating to national sentiment). The key is to what extent is it evidence of [insert thing], and of course if you're interested in learning more, what are the factors that affect the extent to which it is evidence of [insert thing]? In other words, what are you trying to generalize to, and what interesting things are limiting your ability to generalize to [insert other thing]?

Often we are comfy with generalizing from sample to appropriately-defined population (sample of Zvi Twitter noticers to Zvi Twitter noticers), but when we don't define the scope of our generalization properly, we get uncomfy again (sample of Zvi Twitter followers to US general population). Often we are interested in the limits of generalizability (e.g., this treatment works for men but not women, isn't that interesting and useful!), unless the those boundaries are trivial (e.g., vasectomies work for men but not women, gosh!) or we already don't see them as boundaries (e.g., "what if you had changed the wording from 'YOU in particular' to 'YOU specifically'?).

Interestingness is in the eye of the beholder. Concede to Adam for the moment that the boundaries are not interesting because they are well-known limits to generalizability (selection, wording). Then, is it "bad evidence?" Depends on what you're trying to generalize to (what it is purported to be evidence of)! Adam waves between Twitter polls being "meaningless" and "does not generalize at all" as in worthless for anything at all, which is obviously mostly false (it should at least generalize to Zvi Twitter noticers, though even then it could suffer from self-selection bias like many other polls), vs. "not representative of general views," which is not silly and is far more debatable (it's likely "weak" evidence in that Twitter polls can yield biased estimates on some questions [this is the most charitable interpretation of the position]; it's possibly "bad" evidence if the bias is so severe that the qualitative conclusions will differ egregiously [this is the most accurate interpretation of the position seeing as he literally wanted to differentiate it from weak evidence] - e.g., if I polled lesbian women on how sexually attractive the opposite sex was to infer how sexually attractive the opposite sex is to people generally). So overall, the position is rather silly (low generalizability is not NO generalizability, and selection and wording ARE interesting factors relevant for understanding people), except on the very specific last part, where it's not silly (possibly bad evidence) but it is also still probably not correct (probably not bad evidence).

Comment by Daniel V on Covid 9/15/22: Permanent Normal · 2022-09-15T18:17:42.771Z · LW · GW

Re: Physical World Modeling

I'm not surprised by those South Africa vaccine efficacy numbers, since they are broadly in line with releases we've been getting over the last 9 months. We already knew VE vs. infection for the monovalent vaccine was very low and that VE vs. severe disease would be higher but still lower than vs. Delta. We already knew they provide about 3-4 months of "good" protection. We already knew VE vs. BA.4/5 was lower than vs. BA.1/2.

But yeah, it's pretty easy to square, seeing as Dr. Rivers tweeted the paragraph whose last sentence is "need for vaccines to incorporate variants of concern." Table 1 is stuff we know about old vaccines, and future public health decisions will not be using those. An annual booster of bivalent vaccine timed like the flu shot is a different story! It's the one being told (e.g., Zvi's points 2 and 5), and it's the one not shown in Table 1. Seeing as we are getting a bivalent booster in about 10 months, in part thanks to uncertainty in how the FDA was going to treat approvals, getting it down to the 6 month lead time of the flu vaccine seems in the realm of possibility for future years (potentially contra Zvi's point 3). We will "need for vaccines to incorporate variants of concern" for annual shots to make sense, and that's exactly what we'll be doing.

Comment by Daniel V on Argument against 20% GDP growth from AI within 10 years [Linkpost] · 2022-09-13T14:02:18.208Z · LW · GW

Right, when you go to argue the merits, you ask "well, if there were to be a phase change, what would the phase change look like?" And the original estimate was derived from not much effort in calibrating the numbers, and the reply was that even if we saw an utterly shocking phase change, we'd get nowhere close to 20%. You can do varying degrees of in-depth analyses to get to that point (good on you), or you can do like I did and rely on a semi-informed prior.

Here's US growth from 1947. Imagine all the things that happened since then that could have induced mild phase changes to the growth trajectory. ASSUMING that there will be a substantial phase change (again, see Cameron Fen's thread), 20% is still ludicrous.
https://fred.stlouisfed.org/graph/?g=TE7B

Comment by Daniel V on Argument against 20% GDP growth from AI within 10 years [Linkpost] · 2022-09-12T15:43:06.226Z · LW · GW

Not a comment on the argumentation or anything, I know we want to be rationalists and worry about the arguments (so thank you for posting about the disagreement that actually offers some analysis), but just registering my initial reaction to the 20%/year in 10 years claim:

Anyone with a cursory understanding of the history of economic growth (I don't even mean professors who have spent their careers studying growth economics) will know that number is facially ridiculous. My first thought was, great, now I know this person has no idea what they are talking about and can be safely ignored. As a communication device, that prediction failed miserably for me because it did not make me want to assess the underpinnings or energize me to research further about economic growth, which is the least such a tweet should have done if not prompt an update. Though good luck with the latter because, again, if you have an iota of knowledge about the area, your prior distribution for the realm of US economic growth possibilities is correctly more narrow than "let me throw out a shocking round number to myself, see if I can live with it, and then see if others take it seriously." Am I being too harsh? Well, no, he literally "didn't put much effort into calibrating the numbers."

Now back to the regularly scheduled programming of arguing about the merits.

Comment by Daniel V on Stop Discouraging Microwave Formula Preparation · 2022-09-03T13:50:38.969Z · LW · GW

That's a charitable interpretation. But the steps for microwaving and not microwaving formula are the same, just in a different order. If you forget to check the temperature of overheated formula, you may burn your baby, regardless of the heating method.

It'd be like recommending against making instant oatmeal in the microwave...but you just need to heat the water separately!

Comment by Daniel V on What Makes A Good Measurement Device? · 2022-08-25T13:04:09.486Z · LW · GW

I'm with Davidmanheim here, it seems this idea could benefit from reading in measurement theory, or at least recognizing a discrepancy that undermines the analogy. I'll get into that a bit, but to start, the post was definitely positive food for thought.

If you're measuring actual temperature, you have some measure options there too, but fundamentally it's a quality of the material under study. If you're measuring "the" perceived temperature, it's an interaction between "the average person" and the material, and sticking fingers in is probably a good measure. Yes, temperature and perceived temperature will correlate, but if the thing you're measuring exists only in someone's head, you're going to have to go to their head for the measurement (also see psychophysics).

"Train[ing] a net to replicate human reports" is not obviously less useful than "actual" scales. Human reports may in fact be the most construct-valid measure. (Though I do agree that leaving these reports in the form of natural language rather than attempted quantifications would indeed be ambiguous, and if we lack face-valid quantitative measures, we will have to develop them from somewhere, probably with those open-ended responses as a foundation.)

Although human reports may be noisy, so are all measures. The thermometer has an implicit +/- margin of error. It seems very precise to us, but human judgments of attributes can also be reliable (in that lots of people agree) and precise (in that the error bars are narrow). For example, if I asked a lot of people to rate the perceived precision of various measures on a scale of 1=extremely noisy to 100=extremely precise, I expect there to be a decent amount of consistency in the rank ordering of those ratings, for thermometers to score highly, and for at least some of the average perceived precisions to flash pleasantly narrow error bars.

But because even the lowest-variance perceptions vary a lot between people (vs. the variability in temperature readings from a thermometer), I do suspect you're not going to get readings that are "approximately-deterministically" useful indicators for lots of perceptual domains, such as alignment. But you'll get indicators that "far-from-deterministically-but-reliably" predict variance in criterion variables. In the end, we're pessimistic and optimistic about the same things; I just don't think it's because human reports are inherently the wrong tool, it's because the attribute of interest is a psychological construct rather than a conveniently-precisely-measurable-physical property. Again, the post was good food for thought - just as measurement of temperature has improved and gotten more precise (touch it -> use mercury -> use radiation), maybe the methods we use for psychological measurement will develop and improve, with hope for alignment.