Posts
Comments
Thanks, you had mentioned the short- vs. long-run before, but after this discussion it is more foregrounded and the "racing" explanation makes sense. :) Though I appreciated the references to marginal value and marginal cost.
You’re assuming that the economy will produce new jobs faster than the factories will produce new chips and robots to fill those jobs.
Well, the assumptions are primarily that the supply and demand for AI labor will vary across markets and secondarily that labor can flow across markets. This is an important layer separate from just seeing who (S or D) wins the race. If there is only one homogenous market, then the price trajectory for AI labor (produced through the racing dynamics) tells you all you'll need to know about the price trajectory for its human substitute. So the question is just which is faster.
But if there are heterogenous markets, "which is faster" is informative only for that market and the price of human labor as a substitute in that market. The price trajectory for AI labor in other markets might be subject to different "which is faster" racing dynamics. Then, because of composition effects, the trajectory for the average price of AI labor that is performed may diverge from the trajectory for the average price of human labor that is performed.
This is true even if you assume the economy has no vacancies and will not produce new jobs (i.e., labor cannot flow across markets). For example, average hourly earnings spiked during COVID because the work that was being performed was high-cost/value labor, an increase seemingly entirely due to composition [BLS]. Although I am alleging that predicting the price trajectory remains difficult even if you take a stance on the racing dynamics because you need to know what the alternative human jobs are, in that world where jobs are simply destroyed, the total value accruing to human laborers certainly goes down. This is why I think the labor flows could be considered a secondary assumption for the left-side depending on how much you think that side would be arguing - they are not dispositive of what the price changes will be (the focus of the post was on price), but they definitely will affect whether human labor commands the same total value.
I like that this post lays out the dilemma in principles A (marginal value dominates) and B (marginal cost dominates). One quibble is that the effects are on the supply and demand curves, not on the quantities supplied and demanded, i.e., it's not about the slopes of the curves but the location of the new equilibrium as the curves shift left or right. It's not about which part "equilibrates" faster (with what?) but about the relative strength of the shifts.
If AGI shifts the demand for AI labor to the right, under constant supply, we'd expect a price increase and more AI labor created and consumed. If AGI shifts the supply for AI labor to the right, under constant demand, we'd expect a price decrease and more AI labor created and consumed. Both of these things would happen, so there is a wide range of possible price changes (even no change in price) consistent with more AI labor created and consumed, but what happens to the price depends on which shift is "stronger."
Still, with the quantity of AGI labor created and consumed increasing, you might wonder about how the experience curve impacts it - that's just more right-shift in the supply curve, so maybe we don't have to wonder after all. What about the effect on substitutes like human labor? Well, if the economy has a set number of jobs, you'd expect a lot of human labor displaced, but if the economy can find other useful work for those people, they will do those other jobs, which might be lower-paying (no more coding tasks for you - enjoy 7/11), reducing the average price of human labor, or might be higher-paying (no more coding tasks for you - enjoy this support role for AGI that because of its importance requires, increasing the average price of human labor.
Can those niches exist? Yes, the supply and demand curves are curves of heterogeneous values and production functions. And markets are imperfect. Won't those niches eventually disappear? Well, rinse and repeat. See ATMs and bank tellers, also see building luxury housing supply and the effects on rents throughout the housing supply.
I don't think it's only talking past each other - it's a genuine ton of uncertainty.
I'm here to say, this is not some property specific to p-values, just about the credibility of the communicator.
If make a bunch of errors all the time, especially those that change their conclusions, indeed you can't trust them. Turns out (BW11) that are more credible than , the errors they make tend not to change the conclusions of the test (i.e., the chance of drawing a wrong conclusion from their data ("gross error" in BW11) was much lower than the headline rate), and (admittedly I'm going out on a limb here) it is very possible the errors that change the conclusion of a particular test do not change the overall conclusion about the general theory (e.g., if theory says X, Y, and Z should happen, and you find support for X and Y and marginal-support-now-not-significant-support-anymore for Z, the theory is still pretty intact unless you really care about using p-values in a binary fashion. If theory says X, Y, and Z should happen, and you find support for X and Y and now-not-significant-support-anymore for Z, that's more of an issue. But given how many tests are in a paper, it's also possible theory says X, Y, and Z should happen, and you find support for X and Y and Z, but turns out your conclusion about W reverses, which may or may not really have something to say about your theory).
I don't think it is wise to throw the baby out with the bathwater.
Supply side: It approaches the minimum average total, not marginal, cost. Maybe if people accounted for it finer (e.g., charging self "wages" and "rent"), cooking at home would be in the ballpark (assuming equal quality of inputs and outputs across venues..), but that just illustrates how real costs can explain a lot of the differential without having to jump to regulation and barriers to entry (yes, those are nonzero too!).
Demand side: Complaints in the OP about the uninformativeness of ratings also highlight how far we are from perfect competition (also, e.g., heterogeneous products), so you can expect nonzero markups. We aren't in equilibrium and in the long run we're all dead, etc.
I'm a big proponent of starting with the textbook economic analysis, but I was surprised by the surprise. Let's even assume perfect accounting and competition:
Draw a restaurant supply curve in the middle of the graph. In the upper right corner, draw a restaurant demand curve (high demand given all the benefits I listed). Equilibrium price is P_r*. Now draw a home supply curve to the far left, indicating an inefficient supply relative to restaurants (for the same quantity, restaurants do it "cheaper"). In the bottom left corner, draw a home demand curve (again the point is I demand eating out more than eating at home). Equilibrium price for those is P_h*. It's very easy to draw where P_h* < P_r*.
Cooking at Home Being Cheaper is Weird
I like the argument that the scaling should make the average marginal cost per plate lower in restaurants than at home, but I find cooking at home being cheaper not weird at all. First, there are also real fixed costs to account for, not just regulatory costs.
More importantly, the average price per plate is not just a function of costs, it's a function of the value that people receive. Cooking at home does give some nice benefits, but eating out gives some huge ones: essentially leisure, time savings (a lot of things get prepped before service), no dishes, and possibly lower search costs ("what's for dinner tonight?").
A classic that seemingly will have to be reargued til the end of time. Other allocation methods are not clearly more egalitarian and are less efficient (depends on the correlation matrix of WTP, need, time budget, etc., plus one's own judgment of fairness, but money prices come out looking great a lot of the time). In some cases, even prices don't perform great (addressed in some comments on this post), but they're better than the alternatives.
For more reading: https://www.lesswrong.com/posts/gNodQGNoPDjztasbh/lies-damn-lies-and-fabricated-options?commentId=nG2X7x3n55cb3p7yB
To get Robin worried about AI doom, I'd need to convince him that there's a different metric he needs to be tracking
That, or explain the factors/why the Robin should update his timeline for AI/computer automation taking "most" of the jobs.
AI Doom Scenario
Robin's take here strikes me both as an uncooperative thought-experiment participant and as a decently considered position. It's like he hasn't actually skimmed the top doom scenarios discussed in this space (and that's coming from me...someone who has probably thought less about this space than Robin) (also see his equating corporations with superintelligence - he's not keyed into the doomer use of the term and not paying attention to the range of values it could take).
On the other hand, I find there is some affinity with my skepticism of AI doom, with my vibe being it's in the notion that authorization lines will be important.
On the other other hand, once the authorization bailey is under siege by the superhuman intelligence aspect of the scenario, Robin retreats to the motte that there will be billions of AIs and (I guess unlike humans?) they can't coordinate. Sure, corporations haven't taken over the government and there isn't one world government, but in many cases, tens of millions of people coordinate to form a polity, so why would we assume all AI agents will counteract each other?
It was definitely a fun section and I appreciate Robin making these points, but I'm finding myself about as unassuaged by Robin's thoughts here as I am by my own.
Robin: We have this abstract conception of what it might eventually become, but we can't use that abstract conception to do very much now about the problems that might arise. We'll need to wait until they are realized more.
When talking about doom, I think a pretty natural comparison is nuclear weapon development. And I believe that analogy highlights how much more right Robin is here than doomers might give him credit for. Obviously a lot of abstract thinking and scenario consideration went into developing the atomic bomb, but also a lot of safeguards were developed as they built prototypes and encountered snags. If Robin is so correct that no prototype or abstraction will allow us address safety concerns, so we need to be dealing with the real thing to understand it, then I think a biosafety analogy still helps his point. If you're dealing with GPT-10 before public release, train it, give it no authorization lines, and train people (plural) studying it to not follow its directions. In line with Robin's competition views, use GPT-9 agents to help out on assessments if need be. But again, Robin's perspective here falls flat and is of little assurance if it just devolves into "let it into the wild, then deal with it."
A great debate and post, thanks!
Paper from the Federal Reserve Bank of Dallas estimates 150%-300% returns to government nondefense R&D over the postwar period on business sector productivity growth. They say this implies underfunding of nondefense R&D, but that is not right. One should assume decreasing marginal returns, so this is entirely compatible with the level of spending being too high. I also would not assume conditions are unchanged and spending remains similarly effective.
At low returns, you might question whether it's good enough to invest more compared to other options (e.g., at 5%, maybe simply not incurring the added deficit to be financed at 5% is arguably preferable; at 7%, maybe your value function is such that simply not incurring the added deficit to be financed at 5% is arguably preferable), but at such high returns, unless you think the private sector is achieving a ballpark level of marginal returns, invest, baby, invest! The marginal returns would have to be insanely diminishing for it not to make sense to invest more, which implies we're investing at just about the optimal level (if the marginal return of the next $1 were 0%, we shouldn't invest more, but we shouldn't invest less either because our current marginal return is 150%). Holding skepticism about the estimated return itself would be a different story.
That is an additional 15% of kids not sleeping seven hours
I was not aware of the concomitant huge drop in sleep (though it's obvious in retrospect). Maybe it's more important to limit screen time at night, when you're alone in your room not sleeping. Being constantly lethargic as a result may also contribute to (and be a) depressive symptoms. It will be very important to figure out the mechanism(s) by which smartphone use hurts kids.
I agree, I was thinking more generally this isn't a "poker" theory specifically, just one about rules and buy-in. But it's about poker night, so I'll let it slide. The main game rules, though, remain extraneous. Loved the post still!
Mira: You should be able to buy anything with a limit order.
“I don’t feel like paying $250 for an anime figurine, but I left an order up for $50”
If they saw 10,000 orders at a lower price rung ...
As usual the answer is transaction costs
Agree and also perceptions. The idea here is to facilitate price discovery and price discrimination. If only we knew people's WTP and could serve them lower prices acceptable to us when volume isn't moving at the current price! We can adjust prices ad hoc, but maybe a little upfront market research would be better and an exchange might be smoother (subject to TCs). The flipside of this has the problem that consumers hate it [Reuters]. Also, hedging (see: futures markets) does happen in B2B, but with more sophisticated owners and larger businesses. The supply chain is constantly to optimize inventory management (again, not mom-and-pops you see on save-my-business shows).
Why is turbulence worse on planes? The headlines blame it on ‘climate change.’ The actual answer is the FAA told airlines to prioritize saving fuel over passenger comfort, despite passengers having a strong revealed preference for spending the extra cost of fuel to have a more pleasant flight. This then became ‘because climate change.’ This kind of thing damages public trust in all such claims, making solving climate change (and everything else) that much harder.
There are benefits to optimized profile descents (fuel, time, reduced air traffic controller instructions, reduced noise over populated areas), which they did studies on to confirm since in high traffic airspace the stepwise approach can be easier for ATC. This change could conceivably increase turbulence on approach but would not explain the increase that "the narrative" is attributing to increased wind shear at higher altitudes.
I agree with Neil here: if you identify with your flaws, that is bad. By definition. If you are highly analytical and you identify with it, great, regardless of if other people see it as a flaw. Like you said and Neil's reply in the footnote, if it's a goal, then it is not a flaw. But if you say it is a personal flaw, then either you shouldn't be adopting it into your identity (you don't even have to try to fix it as noble as that would be, but you don't get to say "I'm the bad-at-math-person, it's so funny and quirky, and I just led my small business and partners into financial ruin with an arithmetic mistake," life is not a sit-com) or maybe you don't really see it as a flaw after all. Either way, something is wrong, either in your priorities or the reliability of your self-reports. And, yeah, this topic involves value judgments. If nothing has valence, then the notion of a flaw would not exist.
I quite appreciate the post's laying things out, but it's not convincing regarding Scott's post (it's not bad either, just not convincing!) because it doesn't offer much more than "no, you're wrong." The crux of the argument presented here is taking the word disability, which to most speakers means X and implies Y, and breaking it into an impairment, which means X, and a disability, which is Y. Scott says this is wrong and explains why he thinks so. DirectedEvolution says Scott is wrong "because the definitions say..." but that's exactly what Scott is complaining about.
For example, if you're short-sighted, normally we'd say "you have a disability (or impairment or handicap, etc., they're interchangeable) of your vision so that means you will struggle with reading road signs." Instead, the social model entails saying "you have an impairment of your vision so that means, because of society, you will be disabled when it comes to reading road signs."
We can debate which view is more useful (and for what purposes). Scott thinks the social model is useful to promote accommodations since it separates the physical condition from the consequences (whether it produces negative consequences depends on society). He thinks the Szaz-Caplan model is useful to deny accommodations since it separates the mental condition (i.e., preferences, in that model) from the consequences (whether it produces negative consequences depends on will). More importantly, he thinks the social model is "slightly wrong about some empirical facts" (what empirical facts? DirectedEvolution is correct that Scott's argumentation is a bit soft...he benefits greatly from arguing the layperson side) in that in some cases it feels absurd to pin blame on society for the consequences of some impairments (e.g., Mt. Everest). And on that your layperson (and I) would agree with him. DirectedEvolution offers no counterpoint on that (which is the primary argument), but the post DOES provide a key benefit:
Adopting separate definitions for impairment and disability IS NOT strictly equivalent to adopting the social model. One could restate short-sightedness: "you have an impairment of your vision so that means you will be disabled when it comes to reading road signs." This drops the blame game and allows for impairments to disable people outside of societies. In fact, Scott accidentally endorsed it [added by me]: "the blind person’s inability to drive [disability] remains due to their blindness [impairment], not society." So perhaps the crux of Scott's argument is not about using two definitions but about whether disability ought to be defined as stemming from society! And in fact that's evident in Scott's post. However, Scott's post DID also, at times, imply that one definition would suffice.
This post made me update toward two definitions potentially being useful, but it did not make me update away from endorsing Scott's main point, that disability ought not be defined as stemming from society.
As an aside: the two definitions are still debatable though. Suppose someone has an impairment that has not nor ever will generate a disability. How is this not the same as "there exists variability"? If someone has perfect vision and I am short-sighted but we live in a dome with a 5 foot diameter such that I can see just fine, and no one tells me my lived experience could be better, how could you even call that an impairment? Is it an impairment if I realize that my vision could be better? Is that other person impaired if they realize their vision could be improved above "normal"? "Impairment" could just refer to being low on the spectrum of natural human variability in some capability, but how low is low enough? "So low that it starts to interfere..." is bringing disability into the mix. What capabilities count? Certainly not "reading road signs" as that would be in the realm of disability, but what level of specificity is appropriate? Short-sightedness is not an impairment of seeing near objects, it's an impairment of seeing far objects, so that is to say, not vision generally. But once you get specific enough, it's back to sounding like a disability - "your far object vision is impaired so you are disabled at seeing far objects."
It's very interesting to see the intuitive approach here and there is a lot to like about how you identified something you didn't like in some personality tests (though there are some concrete ones out there), probed content domains for item generation, and settled upon correlations to assess hanging-togetherness.
But you need to incorporate your knowledge from reading about scale development and factor analysis. Obviously you've read in that space. You know you want to test item-total correlations (trait impact), multi-dimensionality (factor model loss), and criterion validity (correlation with lexical notion). Are you trying to ease us in with a primer (with different vocabulary!) or reinvent the wheel?
Let's start with the easy-goingness scale:
- (+) In the evening I tend to relax and watch some videos/TV
- (+) I don’t feel the need to arrange any elaborate events to go to in my free time
- (+) I think it is best to take it easy about exams and interviews, rather than worrying a bunch about doing it right
- (+) I think you’ve got to have low expectations of others, as otherwise they will let you down
- (-) I get angry about politics
- (-) I have a stressful job
- (-) I don’t feel like I should have breaks at work unless I’ve “earned” them by finishing something productive
- (-) I spent a lot of effort on parenting
The breadth of it is either a strength or a weakness. It'd be nice to have a construct definition or at least some gesturing at what easy-goingness actually is to gauge the face-validity of these items. Concrete items necessarily will have some domain-dependence, resulting in deficiency (e.g., someone who likes to relax and read a book will score low on item 1) or contamination (e.g., having low expectations of others might also be trait pessimism), but item 8 is really specific. It hampers the ability of this scale to capture easy-goingness among non-parents. The breadth would be good if it captured variations on easy-goingness, but instead it'd be bad if it just captures different things that don't really relate to each other. That's especially problematic because then the inference from low inter-correlations might not be that the construct is bad, but that the items just don't tap into it. You can see where I'm going with this because...
This suggests to me that Easy-Goingness is not very “real”. While it might make sense to describe a person as doing something Easy-Going, for instance when they are watching TV, it is kind of arbitrary to talk about people as being more or less Easy-Going, because it depends a lot on context/what you mean.
...indeed, the items are mainly just capturing different things, not reflecting on easy-goingness in any way. From a scale-assessment standpoint, it's great to see the results confirm my unease about the items based on simply reading them.
The fact that this is weak means that even the most Easy-Going people cannot necessarily be expected to be particularly Easy-Going in all contexts.
This statement presumes your measure reflects a higher-order easy-goingness and that context-specific easy-goingnesses are also being adequately measured.
With conservatism, on the other hand, you can see there is some context-specificity (e.g., dress vs. general social views vs. issue-based ideology), but the measure is facially better. And it hangs together better. Alternately, you might explore those contours and say you've come up with a multi-dimensional conservatism scale, just like you have a multi-dimensional creativity scale.
the “Correlation with lexical notion” was consistently close to 1, showing that the concrete and the abstract descriptors were getting at the same thing.
There's an implicit "when the concrete descriptors actually had face validity" hidden here; low correlation with the lexical notion could indicate a problem with the lexical scale or a problem with the concrete scale, or both.
Overall, I am very impressed that you presented a scary chart to start, promised you'd explain it, and successfully did so. The general takeaway from it is that the lexical hypothesis could be pretty sound and a few of these might be multidimensional in nature (or could be that some items are good and some a bad). For the low trait impact scales, it's a question of whether the items are good and the construct isn't "real," or whether the items are just a bad measurement approach.
Who has an alternative hypothesis that explains this data? Anyone? Ooh ooh, pick me, pick me. Perhaps being depressed has something to do with your life being depressing, due to things like lack of human capital or job opportunities, life and career setbacks or alienation from one’s work. Income increases life satisfaction, as I assume does the prospect of future income.
It is amazing to see the ‘depression is purely a chemical imbalance unrelated to one’s physical circumstances’ attitude in this brazen a form. Mistaking correlation for causation here seems like a difficult mistake for a reasonable and reflecting person to make.
They measured depression at ages 27-35 in 1992 and outcomes at age 50. They control for "age, gender, race, for level of education by age 26, parental education, r marital status in 1992 survey, years of work experience accumulated by 1992 survey, the average percentage of weeks the person’s work history data is unaccounted for by 1992 survey, health status during childhood, a dummy for number of cigarettes consumed by 1992 survey, year indicators, local unemployment rate in 1992, 1998, 2004, and the year the person’s outcome variable is collected."
So it's not like they just correlated depression and wages from a cross-sectional survey and claimed causation. They did some work here.
It was a good post! To the extent that whatever I said was value-added or convincing to you, it was only because your quality post prompted me to lay it out.
And like you said, perhaps there is more here. Does a negative (vs. positive) frame make it harder to notice (or easier to forget) that there is a null hypothesis? Preliminary evidence in favor is that people who "own" the null will cede it in a negative frame, whereas they tend to retain it in a positive frame. More thinking/research may be needed though to feel confident about that (I say that as a scientist starting with the null effect of no difference, not as someone proponing the hypothesis of no difference).
"It's not sufficient to be right in many contexts, you must also be rhetorically persuasive." Spittin' facts.
Going off localdeity's comment, I think "arrogating the right to choose the null hypothesis" or as you said, "assuming the burden of proof" are more critical than whether the frame involves negations. If you want to win an argument, don't argue, make the other person do the arguing by asking lots of questions, even questions phrased as statements, and then just say whatever claim they make isn't convincing enough. Why should purple be better than green? An eminently reasonable question! But one whose answer will never have satisfactory support, unless you want it to. "I'm just asking questions."
It's good for you to point out that the true statement localdeity offered and your conclusion seem in contention. It is a weaker statement, so if you are being asked for your opinion, you may want to hedge with that negation. If you are actually trying to convince someone of something though (and this is why I think you rightly believe these are about subtly different things), that is not the way to do it. You could make the stronger claim, or alternately, you could phrase it as a question - "why shouldn't we do anti-X?" (but notice it would also work without the negation: "why should we do X?") and get them to do the arguing for you.
You're not wrong, and I don't disagree!
In the long run it seems pretty clear labor won't have any real economic value
I'd love to see a full post on this. It's one of those statements that rings true since it taps into the underlying trend (at least in the US) where the labor share of GDP has been declining. But *check notes* that was from 65% to 60% and had some upstreaks in there. So it's also one of those statements that, upon cognitive reflection, also has a lot of ways to end up false: in an economy with labor crowded out by capital, what does the poor class have to offer the capitalists that would provide the basis for a positive return on their investment (or are they...benevolent butchers in the scenario)? Also, this dystopia just comes about without any attempts to regulate the business environment in a way that makes the use of labor more attractive? Like I said, I'd love to see the case for this spelled out in a way that allows for a meaningful debate.
As you can tell from my internal debate above, I agree with the other points - humans have a long history of voluntarily crippling our technology or at least adapting to/with it.
Thanks for writing this. I suppose the same could be said about any tool that you have suspicions might be inferior to another on the horizon in your lifetime. As quanticle said, some romance around self-crafting could support the psychological value of the labor. More importantly, I think there are in fact qualia pertinent to our quality evaluations that leave AI productions inferior in important ways than human work...currently. That gap will attenuate and we'll hone our models to be better at producing in a wider spectrum of areas, too.
However, I don't think it's a foregone conclusion that no gap will remain. When the world of bits can't quite recreate the world of atoms (efficiently), there will be a place for human labors (okay, even the boundaries for this are subject to change too but bear with me) - think of handwriting. What a pain! The tool has been replaced with word processing and printing for many written documents. But when I want to send a thank-you to a big client, printing just can't recreate my ink-on-paper signature. An autopen could, but again it's not at the level of efficiency where it is worth the widespread adoption that would snuff out human labor in that space.
By the way, I wonder if you took your inspiration and general plan for this essay, turned it into a prompt, and gave it to chatGPT, what it would produce (maybe there could be some honing of that by a prompt engineer, but whatever). To be fair, you could let chatGPT rewrite it a few times with edits like you would have done for yourself. I suspect it would not write as good of a post - that's a good enough reason to bother doing it yourself.
(Also because the prompt to write with the style of a specific person only works when you have enough online content in the training data. So if you want a unique style, you need to write a lot before you can outsource. LOL)
Upvote for paragraph one, agree for paragraph two.
It's a very narrow (but admittedly compelling) perspective to realize that in particularly bad situations, regulations can compound the badness. But there is plenty of room to debate regulations when it comes to typical cases, and it's probably a better basis on which to evaluate them.
I agree with your comment, but I think the definitional problem is core to the debate rather than something that can simply be discarded. Consumerism is not consumption, but it used to mean consumer protection and empowerment (obviously there is a spectrum there about what constitutes adequate information and the appropriate regulations/interventions to ensure that)...in support of their consumption, which was assumed to be valuable for them. Consumerism has taken on a second, more prominent meaning that itself is a spectrum: sometimes demanding the pricing/regulation of externality-generating production (not all that different in nature from economics, but unique in the externalities that are identified, oftentimes private costs that consumers simply don't attend to), sometimes all the way to value judgments about certain kinds of consumption.
It's such a loaded term I find it best instead to talk about what I actually mean rather than use the term consumerism. Do I want to talk about negative aspects of consumption? Do I want to talk about the consumer information movement? Which one am I about to get into when I say "I'd like to talk about consumerism"?
I also want to add to your bolded comment on substitution, which seems like a really good rule of thumb. But a lot of things cannot be substituted easily because they are timing- or situation-dependent. If I have 15 minutes to kill, it's not obvious that just sitting there with my thoughts is particularly desirable (for some people, sure!), so I'll seek to consume something (not non-consumption) - if the park is 2.5 minutes away, I can consume a 10 minute walk at the park, which might dominate my crappy phone game. If the park is 7.5 minutes away, I can consume a walk to the park, but given that menu of options, maybe my phone game is fine. It also provides optionality for when I'm looking for a low-transportation mode of entertainment in a waiting room. But it can shift from working in these initial use cases to being a prioritized activity in itself - maybe when I have 30 minutes, I'll "default" to that instead of actually evaluating my options. In that case, regret would be a sign that something has gone wrong in my decision-making. It just reinforces the need to use that rule of thumb - be conscious about what you're consuming and the options that are before you!
I strongly agree and wanted to share a similar sentiment.
It is not as simple as "the market says the asset or liability is worth X, so you should too." Businesses are usually going-concerns and it is not really that useful for the company to report itself as merely how things would go down if they were to liquidate today (though obviously considering that possibility is useful, especially if your business could be "runny," and recording the fair value of HTM securities in a note to the financial statements allows readers, like Raging Capital Ventures, to contemplate that). Those liquidation values continue to require subjectivity (e.g., depends on the spreads for the assets and what if the blowup situation we're talking about would spark fear and government intervention that would actually support the assets' values?! [which is exactly what happened with SVB's assets actually]), and of course are not even perfectly reflected by MTM values, so their utility is not as straightforward as it may seem at first blush.
In fact, the FASB (1993) explicitly stated in explaining its rule-making...
that extremely remote "disaster scenarios" (such as a run on a bank or an insurance company) would not be anticipated by an enterprise in deciding whether it had the positive intent and ability to hold a debt security to maturity.
The managers (evidenced by pursuing more capital) and the market (in reaction to that) obviously started to consider that possibility as much less remote, which became a self-fulfilling prophecy. But "disaster valuation" might not be a great default way to account when your business is generally conducted under non-disaster conditions.
That's wonderful for him. I wish he had translated that knowledge into the post then! The reader shouldn't have to come away from a post titled "the point of trade" with simply a list of reasons why trade might be nice when those reasons can actually be brought together in a unifying explanation, one that is already well-explained in Econ 101, no less.
Here he talks about his understanding of the textbook explanation, and you can judge for yourself whether it conveys comparative advantage or not:
"[sometimes people get different amounts of value from things, so they can get more value by trading them] is the horrible explanation that you sometimes see in economics textbooks because nobody knows how to explain anything ... All right, suppose that all of us liked exactly the same objects exactly the same amount. This obliterates the poorly-written-textbook's reason for "trade"."
He also explains his thesis:
"I claim that the reason we have more stuff has something to do with trade. I claim that in an alternate society where everybody likes every object the same amount, they still do lots and lots of trade for this same reason, to increase how much stuff they have."
Of course, that is comparative advantage adjacent, so we'll talk about it right? Wrong, the point of trade is to leverage an assortment of the sources of comparative advantage (but we won't even attempt to link these together in their unifying concept):
"So now let us suppose identical fruit tastes, perfect task-switching, Star Trek transporters, identically cloned genetics, and people can share expertise via Matrix-style downloads which are free. Have we now gotten rid of the point of trade?"
The organizing/umbrella concept (comparative advantage) is still absent at the end of this. Maybe concrete examples like these, delineating specific sources by which comparative advantage can arise, are a useful didactic tool. But I don't think the point was to illuminate a key concept (indeed, it was never named or really all that gestured at), the point apparently was to generate an exhaustive list of things that enable trade to increase production:
"Note: While contemplating this afterwards, I realized that we hadn't quite gotten rid of all the points of trade, and there should have been two more rounds of dialogue; there are two more magical powers a society needs, in order to produce a high-tech quantity of stuff with zero trade. The missing sections are left as an exercise for the reader."
I wish by the end of it that he had fully reinvented comparative advantage. Great that he knew about it all along though...
Confounders - This post took some vivid examples and turned them into solid recommendations, even referring to the concept that already exists outside the post. But it mints new laws where none are needed, not really addressing other things that contribute to the internal validity of experiments or the inferences from full programs of research that might counteract the call to measure every single thing you possibly can; in my estimation, it led to a minor weakness in the post. It's not an egregious reinvention because it has the intellectual humility to interact with previous scholarship, one cannot expect any individual post to cover all the pieces of what can be a broad domain, and the point seemed to be more of presenting preferred operating procedures rather than (re)introducing a concept.
Goodharting - on the other side of things, LessWrong also has posts like this that are designed to review rather than reinvent ideas. There is value in explaining old ideas in new ways or finding previously-unconsidered applications for old ideas.
Comparative advantage - and even worse, EY didn't even fully reinvent it. He just lined up a bundle of things that fall under the umbrella and called it a job well done. This particular instance also checks the boxes for arrogance and lack of rigor. That post was a fun read, but the embedded disdain for economics textbooks was particularly galling since economics textbooks handle the concept just fine.
As you said, it doesn't really change the point, but I'm here to say it's not an alternative bond structure, just that the bond happens to be trading at a discount already at the initial conditions. It will trade at a steeper discount as interest rates rise. It would be even less intuitive, but you could also do this analysis with bonds that are trading at a premium (trading at a smaller premium, or even hitting par or switching to a discount, as interest rates rise).
Matt Levine at Bloomberg also has good comments on this - basically it was a boring bank run/collapse. With it being primarily a duration issue (rather than an impaired assets issue) and a large amount of deposits, I also suspect we'll see an acquisition.
Check the date on this too.
Further illustrating Eliezer's misplaced confidence, Sumner's view is about NGDP targeting, so the success of the BOJ's policy should be based on delivering NGDP growth, not real economic variables like RGDP growth or employment rate as Eliezer implies. They were in fact successful at this (RGDP growth + Inflation = NGDP growth; with RGDP growth continuing on trend and Inflation bucking the downtrend, that's a new NGDP trajectory, baby!). Here, with 100=March 2013 as Kuroda ascended, you can see the shift in CPI trend even before the VAT impact in April 2014. Sumner was bullish on the new BOJ policy by September 2013.
So, Eliezer, you think you have identified which econbloggers, like Scott Sumner, know better than the Bank of Japan, do you? Eliezer did identify Sumner successfully, but he got lucky. His belief in Sumner was based on a misread of Sumner's position, one that led him to wrongly believe real economic variables would supply evidence for the veracity of the theory. Further compounding the issue, while employment rate might have been readable as supportive, as Matthew Barnett points out, RGDP was not. He is overconfident and should be more humble about his approach.
Ironically, Eliezer's mistake actually more strongly makes his key point. The demand for humility Eliezer was writing about stemmed from the belief that even a very good reasoner oughtn't be able to outperform "the experts." And yet, here we have a mistaken reasoner outperforming "the experts" (at least, outperforming the hawkish experts, before they were replaced by the dovish experts who implemented the new monetary policy at the BOJ). Perhaps the case for humility is not so strong after all: "it is perfectly plausible for an econblogger to write up a good analysis of what the Bank of Japan is doing wrong, and for a sophisticated reader to reasonably agree that the analysis seems decisive, without a deep agonizing episode of Dunning-Kruger-inspired self-doubt playing any important role in the analysis." I suppose one might need to decide how interchangeable "humility" and "agonizing self-doubt" are...
Eliezer is driving an intellectual racecar when many are driving intellectual horse-and-buggies. Still needs to be vacuumed out from time to time though.
Chapin is describing a range of gains - "Until I’d gained some muscle, I didn’t know that getting out of bed shouldn’t actually feel like much, physically, or that walking up a bunch of stairs shouldn’t tire you out, or that carrying groceries around shouldn’t be onerous. I felt cursed." If you can remove a general feeling of being cursed and get a license to live in the material world, wow! If you can solve chronic pain with strength training, great! If you can climb stairs without getting tired, a lot of people already can, but good! If you can carry groceries, like most people can, okay!
Since most normal people's gains will fall in the last two types (for real, what percent of the people feel they "need some special justification for existing" because they're physically not-that-bad-kinda-on-the-weak-side?), you have a point that for a lot of people who already can do things without bother, this won't move the needle. Yet, for many who can do these things, doing them without bother may be nice (and prospectively under-appreciated) - feeling less exhaustion in your life generally and having more energy to do things you really want to do are quite good benefits.
But even if the benefits are more trivial than Chapin characterizes, I think your characterizing the costs as "feel[ing] miserable" is a bit much (though obviously everything is subjective here). Again, for some, sure, it's misery. For most, it's challenging and uplifting and potentially even energizing (especially after the first couple workouts).
So, we have Chapin claiming , and I suggest it's probably more like or at worst , either of which should be more motivating than your . But I agree the benefits seem trumped up by Chapin.
You (correctly, I believe) distinguish between controlling the reward function and controlling the rewards. This is very important as reflected in your noting the disanalogy to AGI. So I'm a little puzzled by your association of the second bullet point (controlling the reward function, which parents have quite low but non-zero control over) with behaviorism (controlling the rewards, which parents have a lot of control over).
From 2021, modeling estimated 6% if you follow the 2 negative tests after day 6 or 10 day isolation rule; estimated 4% if you follow the 2 negative tests after day 6 or 14 day isolation rule.
From 2022, modeling estimated 2-3% if you have 2 negative tests testing daily.
(I'm going to nix the cost of the ticket as it's just a constant)
Depends. Do you want to sum the probability weighted payoffs? EV is fine for that. The probability weighting deals with the striking "really, really low" odds (unless you want to further reweight the probabilities themselves by running them through a subjective probability function), and the payoffs are just the payoffs (unless you want to further reweight the payoffs themselves by running them through a subjective utility function). Either or both of these changes may be appropriate to deal with your own subjective views of objective reality, but that's what they are - personal transformations. However, enough people subscribe to such transformations that EU (expected utility, or see cumulative prospect theory) makes sense more widely than just for you. We indeed perceive probabilities differently from their objective meanings and we indeed value payoffs differently from their mere dollar value.
Now, if you just want a number that best represents the payoff structure, we have candidate central tendencies - mean is a good one (that's just EV). But since the payoff distribution is highly skewed, maybe you'd prefer the median. Or the mode. It's a classic problem, but it's finding what represents the objective distribution rather than what summarizes your possible subjective returns.
Thanks for the analysis, and I mostly agree with your interpretation (having done no further research into this myself), but I'm confused how dividing by 1000 is the problem here. The levels are "basically fine" because 9*-they are well below the FDA/EPA limits, but the CA levels are only about 1 order of magnitude lower, not 3. If they had divided by 100, would we be interrogating their divisor choice? (The current implication is that that arbitrary approach would have been fine since it would correspond to FDA/EPA levels). It makes me think that maybe the impetus to question the arbitrary approach mainly stems from the conclusions not fitting our palate.
Karma downvote for lack of introspection into failures of rationality in a rationality forum.
Agreement upvote for "don't do this" because "that is telling of your own biases" without naming any is just not engaging. It was sadly a throwaway starting line to an otherwise excellent comment.
1. My comment was contra "the case that the categorical imperative tries to pull a fast one." Evaluating this (the point of the OP) very much requires an understanding of intent.
2. Well sure, if you don't care about the point of the OP, you can care about other things. Whether it is a useful framework for you to judge and cajole others in their actions is a very important question! You might still care about intent though. Is a handsaw the right tool for hammering a nail? I would recommend you look at a handsaw's intended use case and rule it out. If you really want to see how it would do, okay then, I can't really stop you. You can conclude a handsaw it bad at hammering nails, but don't go to the hardware store and complain. Likewise, is the categorical imperative a good way to aggregate preferences? I say it's not for that, so you don't really have to try it. If you really want to see how it would do and find out that it's bad for the job though, great. But don't say you were duped!
Ah, makes more sense now. I'm generally not a fan of that approach though, and here's why.
My comment was that your conclusion about the categorical imperative vis-a-vis its aims was off because the characterization of its aims wasn't quite right. But you're saying it's okay, we're still learning something here because you meant to do the not-quite-right characterization because most people will do that, and this is the conclusion they will reach from that. But you never tell me up front that's what you're doing, nor do you caution that the characterization is not-quite-right (you said your purpose was to "make the case that the categorical imperative tries to pull a fast one" not that "laypeople will end up making the case..."). So I'm left thinking you genuinely believe the case you laid out, and my efforts go into addressing the flaws in that. Little did I know you were trying to describe a manner of not-quite-right thinking people will do, a description which we should be interested in not because of its truth value but because of its inaccuracy (which you never pointed out).
I've seen this before in another domain: someone wanted to argue that doing X in modeling would be a bad call. So they did Y, then did X, got bad results and said voila, X is bad. But they did Y too (which in this particular case was a priori known to be not the right thing to do and did most of the damage - if they did Y and not X, they'd get good results, and if they had just not done Y, they'd get good results without X and adequate results with X)! When I pointed out that X really wasn't that bad by itself, they said, well, Y is pretty standard practice so we'd expect people to probably improperly do that in this particular case anyway. You gotta tell me that beforehand! Otherwise it looks like a flaw in your commenting on the true state of the world as opposed to a feature of your analysis of the typical approach. But it also changes the message: the problem isn't with X but Y, the problem isn't with the categorical imperative but analyzing it superficially. Of course, that message is also the main point of my comment.
The categorical imperative aims to solve the problem of what norms to pick, but goes on to try to claim universality.
...
The trouble is that the categorical imperative tries to smuggle in every moral agents' values without actually doing the hard work of aggregating and deciding between them.
That would be a problem, if that is what it were doing. The categorical imperative aims to define the obligation an individual has in conducting themselves consistent with the autonomy of the will. Each individual may have a distinct moral code consistent with that obligation, and that is indeed a problem for ethics, but the categorical imperative does not attempt to help people pick specific norms to apply across multiple agents.
Kant lays out a ton of definitions and then leans on them heavily; it's classic old philosophy. Understanding what Kant said means you need to read Kant, not just his conclusions. That's a big strike against the clarity of his writing (it is a real slog to get through), but whether he achieves his intent should be judged vs. his self-professed intent, not against a misunderstanding of his intent.
With a goal to "alleviate immediate and preventable suffering," QALY seems to be a pretty terrible metric. You need to measure immediacy, preventability, and suffering, or at least the suffering due to just the immediate and preventable causes. I would suggest suffering needs construct definition before you consider an operationalization.
It would be smart to measure pre- and post-intervention. The good news is that if suffering is a subjective psychological state, you could do post-only and measure perceived change. If you're worried about self-report, you could do observer-report (case worker), but since they'll be doling out the funds that could be biased as well (and presumably they'll be basing these decisions in part on self-reports of what is a "fire").
The type of counterfactual analysis tailcalled suggests is likely what the family or case worker would be mentally approximating when responding to how big of a difference paying off the utility bill or a pizza night made to their suffering. Plus, n=1, so a qualitative assessment may be all you can really do - if they rate something a 7/7 on reducing their suffering and everything else came in at 1-2, sure, that hits you between the eyes, though you'd probably get that anyway from them saying "the most important thing, by a mile, was X, and this is why..."
If you take the business part of the business and separate it from the Ponzi part of the business, it's not a Ponzi scheme. But apparently it was a package deal - at least until Alameda were to weather the crypto crash? Like, we just need to siphon FTX customer funds over to Alameda until they recoup $8-10 billion over time in profits, which have not been forthcoming lately. But oops, we "forgot" that when we transferred that money over, it affected our financials - there it is, fake books. And that's not even getting into the laughable valuations of their other "shitcoin" assets, or even of FTT itself, just the flow of funds!
This also speaks to further impairment of FTX's value by management - if you separate the business part of the business from the management part of the business...and you can to some extent, but the damage has been done. Who is going to trust FTX as an exchange going forward even with a new structure and management team?
Finally, it's not like Madoff vaporized the money and SBF/FTX/Alameda didn't. If anything, it's the opposite; Madoff was a far better steward, making the assets recoverable. SBF/FTX/Alameda simply gambled them away. Put differently, the non-Ponzi part of the business was a bigger share of Madoff's fund than of SBF's bundle. Obviously the valuation in either case was fake, based on a multiple of both real assets and Ponzi accounting over time, but Madoff skimmed customer funds while SBF/FTX straight-up embezzled them to prop up the husk of Alameda. In its death throes, FTX was in straight-up Ponzi mode, making withdrawing customers whole to maintain the facade at the expense of those last in line.
I think the comparison is quite apt, and the points of contrast are more interesting than absolving. I was initially hesitant to say there was some fraud/Ponzi beyond just accidentally falling into borderline insolvency, but by this point, especially with how enormous the hole, it looks much more intentional, and Matt Levine has even dropped the P-word because of it.
Disagree that "[a]t best, these theories just do not bring much new to the table" but agree that the over-emphasis on these theories is "extremely unhelpful." That is, they provide good insights and explanatory power, and even substantial explanatory power on the margin sometimes, but having non-zero is not the same as being the explanation with the highest . Assuming the latter is often not accurate. Ironically, I find myself mostly agreeing with a post that Razied disagrees with while also mostly agreeing with Razied - the human value function is not solely, or even primarily, status seeking.
Likewise, I also disagree with "[on social media], it is well known people are playing their meaningful status games and doing false signalling on a high simulacrum level' because Ape in the coat is doing exactly what they are complaining about. Certainly this happens, and in some cases dominates any other reasons for people's behavior on social media, but again, most of why people engage on social media is likely not to be about obtaining status, unless status is to be so broadly defined as to include things like "personal satisfaction."
Getting meta, here I find myself engaging in a social medium that even has a very clear social approval mechanism. Yes, I want lots of yummy upvotes (is it because they accord me status somehow, or because the social approval provides peer review for the validity of my hopefully-rational take?), but mainly I'm posting because I think my opinion is right, worthwhile to share, and hopefully conveyed in a way that can, if not persuade, update.
Nicely done!
The question is what should be the denominator of the risk?
For societal aggregate worries: Day? Sure.
For personal risk: Day? Nope. Person-day? Yes!
He guessed I had an allergic reaction and threw 5 different antihistamines
Not so much dumb luck after all! Allergic reactions often cause inflammation, and it's the inflammation that is uncomfortable. Sure, it's not very controlled (could have suggested one, then another, then another, until you got to Boswellia, preferably in order of prior belief for each more-specific hypothesis), and other things could cause inflammation, but it's not completely luck either. (Though this did not detract from my enjoyment of the post!)
Some quick Googling says,
Helpfully, this study compared AKBA vs. NDGA, so maybe that could be a useful way to test the mechanism of action (they also differ in their specifics within that too, so that's yet another question mark). Obviously not medical advice and just a curious wondering.
To the contrary, johnswentworth's point is not that the experiments have low external validity but that they have low internal validity. It's that there are confounds.
Ironically, one of my quibbles with the post is that the verbiage implies measurement error is the problem. Not measuring what you think you're measuring is about content validity, but the post is actually about how omitted variables (i.e., confounders) are a problem for inferences. "You are not Complaining About What You Think You Are Complaining About."
Adam’s whole position here, to me, is rather silly, even if we limit ourselves to use cases where the Twitter poll is being used only to try and extrapolate towards national sentiment.
I agree except with the last part (it's not silly when thinking about extrapolating to national sentiment). The key is to what extent is it evidence of [insert thing], and of course if you're interested in learning more, what are the factors that affect the extent to which it is evidence of [insert thing]? In other words, what are you trying to generalize to, and what interesting things are limiting your ability to generalize to [insert other thing]?
Often we are comfy with generalizing from sample to appropriately-defined population (sample of Zvi Twitter noticers to Zvi Twitter noticers), but when we don't define the scope of our generalization properly, we get uncomfy again (sample of Zvi Twitter followers to US general population). Often we are interested in the limits of generalizability (e.g., this treatment works for men but not women, isn't that interesting and useful!), unless the those boundaries are trivial (e.g., vasectomies work for men but not women, gosh!) or we already don't see them as boundaries (e.g., "what if you had changed the wording from 'YOU in particular' to 'YOU specifically'?).
Interestingness is in the eye of the beholder. Concede to Adam for the moment that the boundaries are not interesting because they are well-known limits to generalizability (selection, wording). Then, is it "bad evidence?" Depends on what you're trying to generalize to (what it is purported to be evidence of)! Adam waves between Twitter polls being "meaningless" and "does not generalize at all" as in worthless for anything at all, which is obviously mostly false (it should at least generalize to Zvi Twitter noticers, though even then it could suffer from self-selection bias like many other polls), vs. "not representative of general views," which is not silly and is far more debatable (it's likely "weak" evidence in that Twitter polls can yield biased estimates on some questions [this is the most charitable interpretation of the position]; it's possibly "bad" evidence if the bias is so severe that the qualitative conclusions will differ egregiously [this is the most accurate interpretation of the position seeing as he literally wanted to differentiate it from weak evidence] - e.g., if I polled lesbian women on how sexually attractive the opposite sex was to infer how sexually attractive the opposite sex is to people generally). So overall, the position is rather silly (low generalizability is not NO generalizability, and selection and wording ARE interesting factors relevant for understanding people), except on the very specific last part, where it's not silly (possibly bad evidence) but it is also still probably not correct (probably not bad evidence).
Re: Physical World Modeling
I'm not surprised by those South Africa vaccine efficacy numbers, since they are broadly in line with releases we've been getting over the last 9 months. We already knew VE vs. infection for the monovalent vaccine was very low and that VE vs. severe disease would be higher but still lower than vs. Delta. We already knew they provide about 3-4 months of "good" protection. We already knew VE vs. BA.4/5 was lower than vs. BA.1/2.
But yeah, it's pretty easy to square, seeing as Dr. Rivers tweeted the paragraph whose last sentence is "need for vaccines to incorporate variants of concern." Table 1 is stuff we know about old vaccines, and future public health decisions will not be using those. An annual booster of bivalent vaccine timed like the flu shot is a different story! It's the one being told (e.g., Zvi's points 2 and 5), and it's the one not shown in Table 1. Seeing as we are getting a bivalent booster in about 10 months, in part thanks to uncertainty in how the FDA was going to treat approvals, getting it down to the 6 month lead time of the flu vaccine seems in the realm of possibility for future years (potentially contra Zvi's point 3). We will "need for vaccines to incorporate variants of concern" for annual shots to make sense, and that's exactly what we'll be doing.
Right, when you go to argue the merits, you ask "well, if there were to be a phase change, what would the phase change look like?" And the original estimate was derived from not much effort in calibrating the numbers, and the reply was that even if we saw an utterly shocking phase change, we'd get nowhere close to 20%. You can do varying degrees of in-depth analyses to get to that point (good on you), or you can do like I did and rely on a semi-informed prior.
Here's US growth from 1947. Imagine all the things that happened since then that could have induced mild phase changes to the growth trajectory. ASSUMING that there will be a substantial phase change (again, see Cameron Fen's thread), 20% is still ludicrous.
https://fred.stlouisfed.org/graph/?g=TE7B
Not a comment on the argumentation or anything, I know we want to be rationalists and worry about the arguments (so thank you for posting about the disagreement that actually offers some analysis), but just registering my initial reaction to the 20%/year in 10 years claim:
Anyone with a cursory understanding of the history of economic growth (I don't even mean professors who have spent their careers studying growth economics) will know that number is facially ridiculous. My first thought was, great, now I know this person has no idea what they are talking about and can be safely ignored. As a communication device, that prediction failed miserably for me because it did not make me want to assess the underpinnings or energize me to research further about economic growth, which is the least such a tweet should have done if not prompt an update. Though good luck with the latter because, again, if you have an iota of knowledge about the area, your prior distribution for the realm of US economic growth possibilities is correctly more narrow than "let me throw out a shocking round number to myself, see if I can live with it, and then see if others take it seriously." Am I being too harsh? Well, no, he literally "didn't put much effort into calibrating the numbers."
Now back to the regularly scheduled programming of arguing about the merits.