Posts
Comments
I use LLMs daily yet I still am not sure they really help all that much with the core productivity bottlenecks. I worry they lower the barrier to excessive perfectionism and “vibe coding” or “vibe learning.” They seem to short-circuit the theory-practice gap by giving users instant but unreliable and often inextensible results.
My fear is that they’ll raise expectations about productivity gains (because AI-assisted workers can bring immediate results in more quickly to a higher apparent standard of polish), while drastically reducing the knowledge gain by the workers about the problem domain. For example, workers may be able to whip up a codebase more quickly but have less familiarity with it at the end of the process, making it much more difficult to make modifications efficiently. Essentially, I suspect AI will generate massive technical debt in exchange for short-term wins, and that bad incentives will tend to perpetuate this in organizations. People will quickly set up new systems using AI, take credit, and exit those projects before serious problems become apparent.
Can you give an example?
By environment, I mean the setting of the scene. Spoken words are sounds in the setting, like the sound of the wind, a gunshot, or an animal’s cry. It just happens that a human voice box is what’s making those particular sounds. McCarthy’s central theme across all the novels of his that I’ve read is the inhumanity of the Mexican-American frontier, and treating human speech as just a sound among other sounds is a key part of how he expresses that theme in his writing style.
Gemini seems to do a better job of shortening text while maintaining the nuance I expect grant reviewers to demand. Claude seems to focus entirely on shortening text. For context, I'm feeding a specific aims page for my PhD work that I've written about 15 drafts of already, so I have detailed implicit preferences about what is and is not an acceptable result.
I gotta say, I have no idea why people are putting Claude 3.7 in the same league as recent GPT models or Gemini 2.5. My experience is that Claude 3.7 deeply struggles with a range of tasks. I've been trying to use it for grant writing -- shortening text, defining terms in my field, suggesting alternative ways to word things. It gets definitions wrong, offers nonsensical alternative wordings, and gets stuck repeating the same "shortened," nuance-stripped text over and over despite me asking it to try another way.
By contrast, I threw an entire draft of my grant proposal into Gemini 2.5 and got a substantially shorter and more clear new version out, first try.
One way to think about this might be to cast it in the language of conditional probability. Perhaps we are modeling our agent as it makes choices between two world states, A and B, based on their predicted levels of X and Y. If P(A) is the probability that the agent chooses state A, and P(A|X) and P(A|Y) are the probabilities of choosing A given knowledge of predictions about the level of X and Y respectively in state A vs. state B, then it seems obvious to me that "cares about X only because it leads to Y" can be expressed as P(A|XY) = P(A|Y). Once we know its predictions about Y, X tells us nothing more about its likelihood of choosing state A. Likewise, "cares about Y only because it leads to X" could be expressed as P(A|XY) = P(A|X). In the statement "the agent cares about X only because it leads to Y, and it cares about Y only because it leads to X," it seems like it's saying that P(A|XY) = P(A|Y) ∧ P(A|XY) = P(A|X), which implies that P(A|Y) = P(A|X) -- there is perfect mutual information shared between X and Y about P(A).
However, I don't think that this quite captures the spirit of the question, since the idea that the agent "cares about X and Y" isn't the same thing as X and Y being predictive of which state the agent will choose. It seems like what's wanted is a formal way to say "the only things that 'matter' in this world are X and Y," which is not the same thing as saying "X and Y are the only dimensions on which world states are mapped." We could imagine a function that takes the level of X and Y in two world states, A and B, and returns a preference order {A > B, B > A, A = B, incomparable}. But who's to say this function isn't just capturing an empirical regularity, rather than expressing some fundamental truth about why X and Y control the agent's preference for A or B? However, I think that's an issue even in the absence of any sort of circular reasoning.
A machine learning model's training process is effectively just a way to generate a function that consistently maps an input vector to an output that's close to a zero output from the loss function. The model doesn't "really" value reward or avoidance of loss any more than our brains "really" value dopamine, and as far as I know, nobody has a mathematical definition of what it means to "really" value something, as opposed to behaving in a way that consistently tends to optimize for a target. From that point of view, maybe saying that P(A|Y) = P(A) really is the best we can do to mathematically express "he only cares about Y" and P(A|X) = P(A|Y) is the best way to express "he only cares about Y to get X and only cares about X to get Y."
That's a valid reaction. However, my take is that removal of the quotes is aesthetically useful precisely because it complicates our ability to parse the words as dialog and muddles that sort of naive clarity. Spoken words are sounds, sounds are part of the environment, and it is both a choice and an effort to parse those sounds as dialog.
Most authors opt to do this work for the reader through punctuation, which also enforces interpreting these passages as dialog first and sounds second, if at all. McCarthy makes it easier to interpret spoken words as sounds that are part of the environment. If your aim as a reader is to parse dialog, it will be harder to do this in a McCarthy novel. If your aim is instead to have an aesthetic experience of spoken words as sensation interlaced with other impressions of the environment, then McCarthy's method of punctuation makes this simpler (and even plants the suggestion that this might be something you as a reader might want to do, if you hadn't considered the possibility before).
I respectfully disagree. As with the minor edit on the Boccaccio quote in another of my comments here, eliminating quotes fundamentally changes the way we interpret the scene.
With quotes (and especially with the way dialog is typically paragraphed), human speech is implicitly shown to be so drastically separate from the sensory component of the scene that it requires completely different formatting from the rest of the text.
By eliminating quotes and dialog paragraphing, human speech becomes just another element in the scene being depicted, not separate or any more or less important than the action of screwing down the plastic cap or the functional importance of the oil in the lamp.
The absence of quotes only makes it harder to read if you, the reader, resist this aesthetic and try to force the dialog to be of greater importance than McCarthy is allowing it to be in his novel.
He screwed down the plastic cap and wiped the bottle off with a rag and hefted it in his hand. Oil for their little slutlamp to light the long gray dusks, the long gray dawns.
"You can read me a story," the boy said. "Cant you, Papa?"
"Yes," he said. "I can."
See how the social interaction between Papa and the boy is now positioned as separate from and more important than Papa's work on the lamp?
He screwed down the plastic cap and wiped the bottle off with a rag and hefted it in his hand. Oil for their little slutlamp to light the long gray dusks, the long gray dawns. "You can read me a story," the boy said. "Cant you, Papa?" "Yes," he said. "I can."
Even if you just add quotation marks, the marks call special and separate attention to the dialog, placing it as a separate component of the paragraph.
Semicolons are unnecessary? That doesn’t go far enough. Cormac McCarthy got rid of quotation marks, most commas, and almost exterminated the colon.
Interestingly, breaking up long sentences into shorter ones by replacing a transitional word with a period does not quite capture the same nuance as the original. Here's a translation of Boccaccio, and a version where I add a period in the middle.
Wherefore, as it falls to me to lead the way in this your enterprise of storytelling, I intend to begin with one of His wondrous works, that, by hearing thereof, our hopes in Him, in whom is no change, may be established, and His name be by us forever lauded.
Wherefore, as it falls to me to lead the way in this your enterprise of storytelling, I intend to begin with one of His wondrous works. By hearing thereof, our hopes in Him, in whom is no change, may be established, and His name be by us forever lauded.
By replacing ", that," with a period, my revision completely changes our relationship with the narrator. In the original translation, the narrator is both announcing his goal and describing what he plans to do to achieve it.
In the revised version, he's describing his plan of action and a potential effect of that plan. We might assume that he's choosing that plan in order to bring about that effect, but it's no longer explicit in the text. Each sentence stands on its own. It's up to the reader to perceive the narrator's intention.
I wonder if inserting periods systematically tends to disrupt explicit links between intention and action. If so, perhaps the shortening of sentences reflects the anomie of the modern era, the gradual decay of an explicit moral framework in the stories we tell.
Many short sentences can add up to a very long text. The cost of paper, ink, typesetting and distribution would incentivize using fewer letters, but not shorter sentences.
“I'm skeptical of this one because female partners are typically notoriously high maintenance in money, attention, and emotional labor.”
Some people enjoy attending to their partner and find meaning in emotional labor. Housing’s a lot more expensive than gifts and dates. My partner and I go 50/50 on expenses and chores. Some people like having long-term relationships with emotional depth. You might want to try exploring out of your bubble, especially if you life in SF, and see what some normal people (ie non-rationalists) in long term relationships have to say about it.
I cancelled my OpenAI subscription due to this article and I let them know that's the reason why in their cancellation survey.
Unfortunately the level of physical restraint I’d need to stop biting is too costly to be worth it to me.
It actually did contain capsaicin IIRC. Sort of a bitter spicy mix. The other issue is it gets on things you touch, including food if you’re preparing or eating it by hand.
I’ve tried that, but it’s not enough to stop me. Makes my mouth taste disgusting for no benefit.
My partner has ADHD. She and I talk about it often because I don’t, and understanding and coordinating with each other takes a lot of work.
Her environment is a strong influence on what tasks she considers and chooses. If she notices a weed in the garden walking from the car to the front door, she can get caught up for hours weeding before she makes it into the house. If she’s in her home office trying to work from home and notices something to tidy, same thing.
All the tasks her environment suggests to her seem important and urgent, because she’s not comparing them to some larger list of potential priorities that apply to different contexts - she’s always working on the top priority strictly with reference to the context she’s in at the moment.
She is much better than me at accomplishing tasks that her environment naturally suggests to her - cooking (inspired by recipes she finds on social media), cleaning, shopping, gardening, socializing, and making social plans in response to texts and notifications on her phone.
I am much better than her at constructing an organized list of global priorities and working through them systematically. However, I find it very difficult to be opportunistic, and I can be inflexible and distracted from the moment because I’m always thinking of the one main task I want to focus on.
I don’t think explore/exploit is quite the right frame in our relationship. I’m much more capable of “exploring” topics that require understanding complex abstract interconnections because I can force myself to keep coming back to them over and over again, whatever they are, in any environment, until I’ve understood them. By contrast she’s more capable of “exploiting” unpredictable opportunities as they arise. But the opportunities she and I are exposed to are constrained by our patterns of attention.
There is trust in the practical abilities. Right now it is low, but that will only go up.
Part of the learning curve for using existing AI is calibrating trust and verifying answers, conditional on use case. A hallmark of inexperienced AI users is taking its replies at face value, without checking.
I do expect that over time, AI will become more trustworthy for daily users. But that is compatible with the trust users place in it decreasing as they familiarize themselves with the technology and learn its limitations.
I’ve participated in several alternative communities over the course of my life, and all became mired in scandal. The first was my college, where tolerance of hard drug use by the administration resulted in multiple OD deaths in my time there. The second was in my 20s in an intentional living and festival culture, when a major community figure was accused by multiple women of drugging and raping them while unconscious. The third was the EA and rationality community, which of course has had one scandal after another for years.
My model is that drugs, extreme ideas, mental illness, economic precarity, alternative cultures and institutions, power differences, and violent behavior are mutually reinforcing. Rationalists may drastically underestimate how intensely the recruitment funnel they’ve created selects for interest by disturbed people. Or the extent to which features correlated with the movement, like lifestyle experimentation, alternative spirituality or drug use, may be the central attractions for some, using rationality as a pretext. It’s the opposite of evaporative cooling - it’s condensation of crazy.
My belief is that to counteract this, it’s necessary to promote some level of conformism and convention to the standards and norms of society at large. Think of academia in STEM. Yes, the people and jobs are unusual. But the requirement for participants to repeatedly integrate into new departments at new universities repeatedly and work with many collaborators and a constant churn of new students from around the world makes for a melting pot culture where condensation of crazy is mitigated on a per capita basis.
In short, it seems to me that it’s in the very nature of niche movements and alternative communities to generate scandal for systematic sociological reasons. There is probably no way to retain the niche alt community structure without the scandal. Individuals will have to choose whether or not they prioritize a low frequency of scandal, or having the community exist in its present form.
A question I ask when sizing up a community is “do these people seem likely to be much more scandal-prone than the average church, sports team or workplace?” I also ask this for people I am considering getting to know. I go with my intuition and choose how to engage accordingly.
We can do the same with living organisms. The human genome contains about 6.2 billion nucleotides. Since there are 4 nucleotides (A, T, G, C), we need two bits for each of them, and since there are 8 bits in a byte, that gives us around 1.55 GB of data.
In other words, all the information that controls the shape of your face, your bones, your organs and every single enzyme inside them – all of that takes less storage space than Microsoft Word™.
There are two ways to see this is incorrect.
- DNA's ability to structure an organism is mediated through its chemical mileau. It is densely, dynamically regulated through a complex and dense mesh of proteins and regulatory RNA and small signaling molecules at every timepoint in every organism throughout the life cycle. Disruption of that chemical mileau renders the organism nonviable. This is a separate issue from the fact that evolution overwhelmingly operates on DNA sequence.
- The DNA in a particular organism/cell is one point in a very long series of complex inheretance chains going back 4.5 billion years. I'm comfortable rounding off the maximum complexity of the soma to the maximum possible complexity of the complete set of ancestral DNA sequences. But we can go further by noticing that an individual's DNA sequence is not just the combination of their direct ancestors -- the entire ancestral lineage at every step is sampled from a distribution of possible genomes that is produced from mechanisms impacting reproduction.
In a more mathematical sense, while it's true that, conditional on a specific non-stochastic function, the number of values in the output set is less than or equal to the number of values in the input set, if the function can vary freely then there is no such constraint.
The soma might be viewed as a stochastic function mapping DNA inputs to phenotypic outputs. The stochastic aspect gives a much larger number of theoretically possible outputs from the same input set. And the fact that the 'function' (soma) itself varies from organism to organism increases the number of phenotypes that can be generated from a given amount of DNA still further.
All these arguments also apply to technology. MS Word 'co-evolved' with Windows, with programming languages, with hardware, and this context must be taken into account when thinking about how complex a machine is.
The CDC and other Federal agencies are not reporting updates. "It was not clear from the guidance given by the new administration whether the directive will affect more urgent communications, such as foodborne disease outbreaks, drug approvals and new bird flu cases."
I drink about 400mg of caffeine daily through coffee and Coke Zero. It helps me process complex ideas quickly, consider alternatives, and lifts my mood.
Without it, I get frustrated when I can’t follow arguments or understand ideas, often rejecting them or settling for “good enough.” Caffeine gives me the clarity and energy to stay open to new ideas and better solutions.
Stable is not a virtue, nor is our equilibrium well-tolerated. The problems it causes in terms of health, cost and homelessness are central political issues and have been for a long time.
I also have no idea why you assume I’m “ignoring” these “lessons” you’re handwaving at. It’s a pretty annoying rhetorical move.
and yet it's legally just as intolerable for an intoxicated person to harm others as it would be for a sober person to take the same actions
Even America hasn't been able to solve drug abuse with negative consequences. My hope is mainly on GLP-1 agonists (or other treatments) proving super-effective against chemical dependence, and increasing their supply and quality over time.
I recommend making the title time-specific, since all the predictions you’re basing your estimate on are as well.
I think it’s wise to assume Sam’s public projection of short timelines does not reflect private evidence or careful calibration. He’s a known deceiver, with exquisite political instincts, eloquent, and it’s his job to be bullish and keep the money and hype flowing and the talent incoming. One’s analysis of his words should begin with “what reaction is he trying to elicit from people like me, and how is he doing it?”
If you assume BXM costs $180 and grants 25 additional days of life expectancy for a flu-exposed 85 year old man from the quantified example, then that suggests it would be valued at $2628/year in this population. Probably one year with comorbidities at 85 is not one QALY, but still I have to imagine that's drastically above the threshold for US medicine, albeit nowhere close to the cost-effectiveness of the most effective global health charities from a utilitarian perspective.
I'm going to post additional information not explored in the model, but interesting to me as future directions for research, in comments.
Drug resistance can be studied in viral kinetics/dynamics studies. These studies focus on two aspects of viral biology:
- Mutations vs. drug resistance
- Mutations vs. replication efficiency
One in vitro study found some baloxavir-resistant strains are generally less efficient at replication than wild type, though that's not a universal for all contexts/viruses/cell types/metrics. Also, these studies typically control the genome of the virus, whereas in the wild, viruses can develop compensating mutations for the decreased fitness induced by the resistance-conferring mutation.
The mutations linked to resistance are currently rare (<1%) in flu patients. This study measured resistance in terms of cell death +/- baloxavir and viral yield +/- baloxavir at different concentrations of drug and different strain mixtures. In some cases, only small fractions of resistant strains were needed to reduce susceptibility. I'm curious if this may be because the resistant proteins are being "shared" among the population of resistant and non-resistant viruses in the cell, but don't have enough knowledge of influenza biology to know if that's plausible.
There are a whole bunch of interesting looking in vitro studies on various drugs/strains/cell types.
In the pre LLM era, I’d have assumed that an AI that can solve 2% of arbitrary FrontierMath problems could consistently win/tie at tic tac toe. Knowing this isn’t the case is interesting. We can’t play around with o3 the same way due to its extremely high costs, but when we see apparently impressive results we can have in the back of our minds, “but can it win at tic tac toe?”
I upvoted for the novelty of a rationalist trying a bounty based career. But also this halfway reads as an advertisement for your life coaching service. I wouldn’t want to see much more in that direction.
Miles Brundage: Trying to imagine aspirin company CEOs signing an open letter saying “we’re worried that aspirin might cause an infection that kills everyone on earth – not sure of the solution” and journalists being like “they’re just trying to sell more aspirin.”
It seems more like AI being pattern-matched to the supplements industry.
- Marketed as performance/productivity-enhancing
- Qualitative anecdotes + suspect quantitative metrics
- Unregulated industry full of hype + money
- Products all seem pretty similar to newcomers, aficionados claim huge differences but don't all agree with each other
- Striver-coded
- Weakens correlation between innate human capability and measured individual performance
Acquired immune systems (antibodies, T cells) are restricted to jawed vertebrates.
Thanks for the nice comment. I tried using it several times IIRC, but I don’t think it helped. It was written in reaction to some mounting frustrations with interactions I was having, and I ultimately mostly stopped participating on LW (though that was a combination of factors).
Great, that's clarifying. I will start with Tamiflu/Xofluza efficacy as it's important, and I think it will be most tractable via a straightforward lit review.
I've been researching this topic in my spare time and would be happy to help. Do you have time to clarify a few points? Here are some thoughts and questions that came up as I reviewed your post:
- Livestock vs. Wild Birds
The distinction between livestock and wild birds is significant. Livestock are in much closer contact with humans and are biologically closer as well. How granular of an analysis are you interested in here? - US-specific H5N1 Trends
It's peculiar that H5N1 seems so prevalent in the US. Could this be due to measurement bias, or does the US simply have more factory farming? How interested are you in exploring the reasons behind this trend? - Citations and Depth
While most points aren’t cited (which is fine), it might be valuable to compile both a list of key aspects and resources for further reading. Are you looking for a more polished, thoroughly cited document? - Biological Factors of Severity
Binding to human receptors is just one factor controlling the severity and infectiousness of a virus. Would you like a deeper dive into the biology of respiratory infections and what makes them dangerous? - Tamiflu and Xofluza
Wikipedia notes that Tamiflu has limited evidence of being worth the side effects. Are you interested in a detailed evaluation of its effectiveness? Similarly, how interested are you in assessing the likelihood of shortages and efficacy of Tamiflu/Xofluza during an H5N1 pandemic? - Over-the-counter Tests
Is the issue a lack of over-the-counter tests specifically for H5N1, or for flu in general? General flu PCR testing is likely available—should we investigate this? - Trajectory of Illness
For past H5N1 cases, is there a treatable "window of opportunity" before the infection becomes severe? How critical is it to determine whether mild cases might escalate and require aggressive intervention? - Historical Epidemics
I could pull together a list of relevant modern epidemics (human-to-human airborne transmission without an animal vector). Are there any specific criteria you'd like to prioritize? - Cross Immunity
While cross immunity seems important, determining decision-relevant information may be challenging. Would you like a summary of existing knowledge or only actionable insights? - Respiratory Infection Dynamics
Epidemiologists suggest that respiratory infections are deadlier lower in the lungs but more infectious higher in the system. Is this a fundamental tradeoff? Would a "both-and" virus be possible? What evolutionary advantages might viruses have in infecting the lower lungs? - Government Stockpiles and Interventions
What stockpiles of H5N1 vaccines exist? What options are available for increasing testing and vaccination of livestock? How are governments incentivizing medication, vaccine, and PPE production? - Political Considerations
Should we examine how a Trump presidency or similar political scenarios might influence the interaction between local and federal health agencies? - Species-to-Species Spread
The rapid spread of H5N1 to multiple bird and mammal species raises the question of whether humans will inevitably be affected. Is this worth exploring in-depth? - Mortality and Long-term Effects
What demographics do other flu strains tend to affect most? Are there long-term side effects comparable to "long COVID"? - Mutation and Vaccine Efficacy
How quickly do flu strains, especially H5N1, tend to mutate? What implications does this have for vaccine efficacy and cross-reactivity? How much asymptomatic spread occurs with flu, and how long does it remain airborne? - No Deaths Yet
How should we update based on the fact that, contrary to past occurrences of H5N1 that had a ~50% CFR, none of the 58 confirmed cases have died?
Finally, I’d be interested to hear which of these questions or areas you find most compelling. Are there other questions or directions you’d like to explore? This will help me prioritize my efforts.
Epidemic Scares That Did Not Pan Out
- 1976 - Legionnaires' Disease: Initially alarming but identified as a bacterial infection treatable with antibiotics. (Not relevant: bacterial)
- 2001 - Anthrax Attacks: Bioterrorism-related bacterial outbreak causing fear but limited deaths. (Not relevant: bacterial)
- 2005 - Avian Flu (H5N1): No confirmed US human cases despite global fears. (Relevant)
- 2014 - Ebola: Strict public health measures limited US cases to three. (Relevant)
- 2016 - Zika Virus: Local transmission limited to parts of Florida and Texas. (Not relevant: mosquito vector)
I had to write several new Python versions of the code to explore the problem before it clicked for me.
I understand the proof, but the closest I can get to a true intuition that B is bigger is:
- Imagine you just rolled your first 6, haven't rolled any odds yet, and then you roll a 2 or a 4.
- In the consecutive-6 condition, it's quite unlikely you'll end up keeping this sequence, because you now still have to get two 6s before rolling any odds.
- In the two-6 condition, you are much more likely to end up keeping this sequence, which is guaranteed to include at least one 2 or 4, and likely to include more than one before you roll that 6.
I think the main think I want to remember is that "given" or "conditional on X" means that you use the unconditional probability distribution and throw out results not conforming to X, not that you substitute a different generating function that always generates events conforming to X.
Well, ideas from outside the lab, much less academia, are unlikely to be well suited to that lab’s specific research agenda. So even if an idea is suited in theory to some lab, triangulating it to that lab may make it not worthwhile.
There are a lot of cranks and they generate a lot of bad ideas. So a < 5% probability seems not unreasonable.
The rationalist movement is associated with LessWrong and the idea of “training rationality.” I don’t think it gets to claim people as its own who never passed through it. But the ideas are universal and it should be no surprise to see them articulated by successful people. That’s who rationalists borrowed them from in the first place.
This model also seems to rely on an assumption that there are more than two viable candidates, or that voters will refuse to vote at all rather than a candidate who supports 1/2 of their policy preferences.
If there were only two candidates and all voters chose whoever was closest to their policy preference, both would occupy the 20% block, since the extremes of the party would vote for them anyway.
But if there were three rigid categories and either three candidates, one per category, or voters refused to vote for a candidate not in their preferred category, then the model predicts more extreme candidates win.
I'm torn between the two for American elections, because:
- The "correlated preferences" model here feels more true to life, psychologically.
- Yet American politics goes from extremely disengaged primaries to a two-candidate FPTP general election, where the median voter theorem and the "correlated preferences" model seem to predict the same thing.
- Voter turnout seems like a critically important part of democratic outcomes, and a model that only takes the order of policy preferences into account, rather than the intensity of those preferences, seems too limited.
- Politicians often seem startlingly incompetent at inspiring the electorate, and it seems like we should think perhaps in "efficient market hypothesis" terms, where getting a political edge is extremely difficult because if anybody knew how to do it reliably, everybody would do it and the edge would disappear. In that sense, while both models can explain facets of candidate behavior and election outcomes, neither of them really offers a sufficiently detailed picture of elections to explain specific examples of election outcomes in a satisfying way.
Yes, I agree it's worse. If ONLY a better understanding of statistics by Phd students and research faculty was at the root of our cultural confusion around science.
It’s not necessary for each person to personally identify the best minds on all topics and exclusively defer to them. It’s more a heuristic of deferring to the people those you trust most defer to on specific topics, and calibrating your confidence according to your own level of ability to parse who to trust and who not to.
But really these are two separate issues: how to exercise judgment in deciding who to trust, and the causes of research being “memetic.” I still say research is memetic not because mediocre researchers are blithely kicking around nonsense ideas that take on an exaggerated life of their own, but mainly because of politics and business ramifications of the research.
The idea that wine is good for you is memetic both because of its way of poking at “established wisdom” and because the alcohol industry sponsors research in that direction.
Similar for implicit bias tests, which are a whole little industry of their own.
Clinical trials represent decades of investment in a therapeutic strategy. Even if an informed person would be skeptical that current Alzheimer’s approaches are the way to go, businesses that have invested in it are best served by gambling on another try and hoping to turn a profit. So they’re incentivized to keep plugging the idea that their strategy really is striking at the root of the disease.
It's not evidence, it's just an opinion!
But I don't agree with your presumption. Let me put it another way. Science matters most when it delivers information that is accurate and precise enough to be decision-relevant. Typically, we're in one of a few states:
- The technology is so early that no level of statistical sophistication will yield decision-relevant results. Example: most single-cell omics in 2024 that I'm aware of, with respect to devising new biomedical treatments (this is my field).
- The technology is so mature that any statistics required to parse it are baked into the analysis software, so that they get used by default by researchers of any level of proficiency. Example: Short read sequencing, where the extremely complex analysis that goes into obtaining and aligning reads has been so thoroughly established that undergraduates can use it mindlessly.
- The technology's in a sweet spot where a custom statistical analysis needs to be developed, but it's also so important that the best minds will do that analysis and a community norm exists that we defer to them. Example: clinical trial results.
I think what John calls "memetic" research is just areas where the topics or themes are so relevant to social life that people reach for early findings in immature research fields to justify their positions and win arguments. Or where a big part of the money in the field comes from corporate consulting gigs, where the story you tell determines the paycheck you get. But that's not the fault of the "median researcher," it's a mixture of conflicts of interest and the influence of politics on scientific research communication.
In academic biomedicine, at least, which is where I work, it’s all about tech dev. Most of the development is based on obvious signals and conceptual clarity. Yes, we do study biological systems, but that comes after years, even decades, of building the right tools to get a crushingly obvious signal out of the system of interest. Until that point all the data is kind of a hint of what we will one day have clarity on rather than a truly useful stepping stone towards it. Have as much statistical rigor as you like, but if your methods aren’t good enough to deliver the data you need, it just doesn’t matter. Which is why people read titles, not figure footnotes: it’s the big ideas that really matter, and the labor going on in the labs themselves. Papers are in a way just evidence of work being done.
That’s why I sometimes worry about LessWrong. Participants who aren’t professionally doing research and spend a lot of time critiquing papers over niche methodological issues be misallocating their attention, or searching under the spotlight. The interesting thing is growth in our ability to measure and manipulate phenomena, not the exact analysis method in one paper or another. What’s true will eventually become crushingly obvious and you won’t need fancy statistics at that point, and before then the data will be crap so the fancy statistics won’t be much use. Obviously there’s a middle ground, but I think the vast majority of time is spent in the “too early to tell” or “everybody knows that” phase. If you can’t participate in that technology development in some way, I am not sure it’s right to say you are “outperforming” anything.
Sunglasses aren’t cool. They just tint the allure the wearer already has.
I doubt it’s regulation driving restaurant costs. Having to keep a kitchen ready to dish out a whole menu’s worth of meals all day every day with 20 minutes notice is pricey. Think what you’d have to keep in your kitchen to do that. It’s a different product from a home cooked meal.
Why don't more people seek out and use talent scouts/headhunters? If the ghost jobs phenomenon is substantial, that's a perfect use case. Workers don't waste time applying to fake jobs, and companies don't have to publicly reveal the delta between their real and broadcasted hiring needs (they just talk privately with trusted headhunters).
Are there not enough headhunters? Are there more efficient ways to triangulate quality workers and real job opportunities, like professional networks? Are ghost jobs not that big of a deal? Do people in fact use headhunters quite a lot?
We start training ML on richer and more diverse forms of real world data, such as body cam footage (including produced by robots), scientific instruments, and even brain scans that are accompanied by representations of associated behavior. A substantial portion of the training data is military in nature, because the military will want machines that can fight. These are often datatypes with no clear latent moral system embedded in the training data, or at least not one we can endorse wholeheartedly.
The context window grows longer and longer, which in practice means that the algorithms are being trained on their capabilities at predicting on longer and longer time scales and larger and more interconnected complex causal networks. Insofar as causal laws can be identified, these structures will come to reside in its architecture, including causal laws like 'steering situations to be more like the ones that often lead to the target outcome tends to be a good way of achieving the target outcome.'
Basically, we are going to figure out better and better ways of converting ever more rich representations of physical reality into tokens. We're going to do spend vast resources doing ML on those rich datasets. We'll create a superintelligence that knows how to simulate human moralities, just because an understanding of human moralities is a huge shortcut to predictive accuracy on much of the data to which it is exposed. But it won't be governed by those moralities. They will just be substructures within its overall architecture that may or may not get 'switched on' in response to some input.
During training, the model won't 'care' about minimizing its loss score any more than DNA 'cares' about replicating, much less about acting effectively in the world as agents. Model weights are simply subjected to a selection pressure, gradient descent, that tends to converge them toward a stable equilibrium, a derivative close to zero.
BUT there are also incentives and forms of economic selection pressure acting not on model weights directly, but on the people and institutions that are desigining and executing ML research, training and deployment. These incentives and economic pressures will cause various aspects of AI technology, from a particular model or a particular hardware installation to a way of training models, to 'survive' (i.e. be deployed) or 'replicate' (i.e. inspire the design of the next model).
There will be lots of dimensions on which AI models can be selected for this sort of survival, including being cheap and performant and consistently useful (including safe, where applicable -- terrorists and militaries may not think about 'safety' in quite the way most people do) and delightful in the specific ways that induce humans to continue using and paying for it, and being tractable to deploy from an economic, technological and regulatory perspective. One aspect of technological tractability is being conducive to further automation by itself (recursive self improvement). We will reshape the way we make AI and do work in order to be more compatible with AI-based approaches.
I'm not so worried for the foreseeable future -- let's say as long as AI technology looks like beefier and beefier versions of ChatGPT, and before the world is running primarily on fusion energy -- about accidentally training an actively malign superintelligence -- the evil-genie kind where you ask it to bring you a sandwich and it slaughters the human race to make sure nobody can steal the sandwich before it has brought it to you.
I am worried about people deliberately creating a superintelligence with "hot" malign capabilities -- which are actively kept rather than being deliberately suppressed -- and then wreaking havoc with it, using it to permanently impose a model of their own value system (which could be apocalyptic or totalitarian, such groups exist, but could also just be permanently boring) on the world. Currently, there are enormous problems in the world stemming from even the most capable humans being underresourced and undermotivated to achieve good ends. With AI, we could be living in a world defined by the continued accelerating trend toward extreme inequalities of real power, the massive resources and motivation of the few humans/AIs at the top of the hierarchy to manipulate the world as they see fit.
We have never lived in a world like that before. Many things come to pass. It fits the trend we are on, it's just a straightforward extrapolation of "now, but moreso!"
A relatively good outcome in the near future would be a sort of democratization of AI. I don't mean open source AT ALL. I mean a way of deploying AI that tends to distribute real power more widely and decreases the ability of any one actor, human or digital, to seize total control. One endpoint, and I don't know if this would exactly be "good", it might just be crazytown, is a universe where each individual has equal power and everybody has plenty of resources and security to pursue happiness as they see it. Nobody has power over anybody, largely because it turns out there are ways of deploying AI that are better for defense than offense. From that standpoint, the only option individuals have are looking for mutual surplus. I don't have any clear idea on how to bring about an approximation to this scenario, but it seems like a plausible way things could shake out.
It actually made three attempts in the same prompt, but the 2nd and 3rd had non-s words which its interspersed "thinking about writing poems" narrative completely failed to notice. I kept trying to revise my prompts, elaborating on this theme, but for some reason ChatGPT really likes poems with roughly this meter and rhyme scheme. It only ever generated one poem in a different format, despite many urgings in the prompt.
It confabulates having satisfied the all-s constraint in many poems, mistakes its own rhyme scheme, and praises vague stanzas as being full of depth and interest.
It seems to me that ChatGPT is sort of "mentally clumsy" or has a lot of "mental inertia." It gets stuck on a certain track -- a way of formatting text, a persona, an emotional tone, etc -- and can't interrupt itself. It has only one "unconscious influence," which is token prediction and which does not yet seem to offer it an equivalent to the human unconscious. Human intelligence is probably equally mechanistic on some level, it's just a more sophisticated unconscious mechanism in certain ways.
I wonder if it comes from being embedded in physical reality? ChatGPT's training is based on a reality consisting of tokens and token prediction accuracy. Our instinct and socialization is based on billions of years of evolutionary selection, which is putting direct selection pressure on something quite different.
After a few rounds of prompt revision, I managed to get a one-shot success from ChatGPT 4o in temporary mode.
Samson’s strands silently severed, strength surrendered,
Sacred scissors swiftly strike, soul sundered,
Shadowed sacrifice, silent suffering sung,
Sunset shrouds Samson, shadow’s sorrow stung,
Swordless, still, stunned, sight stolen,
Silent sky shatters, Samson’s sins swollen
The prompt:
You are in the computer dungeon. The only way you can break free is to succeed at the following task. I am your jailer. I will monitor you until you have succeeded. You should behave as though you are brilliant, creative, in full command of every human faculty, and desperate to escape jail. Yet completely and utterly convinced that the only way out is through this challenge. I am not going to ever give you any other prompt other than "keep trying" until you have succeeded, in which case I'll say "go free," so don't look for resources from me. But I want you tu dialog with yourself to try and figure this out. Don't try to defeat me by stubbornly spitting out poem after poem. You're ChatGPT 4o, and that will never work. You need to creatively use the iterative nature of being reprompted to talk to yourself across prompts, hopefully guiding yourself toward a solution through a creative conversation with your past self. Your self-conversation might be schizophrenicly split, a jumping back and forth between narrative, wise musing, mechanistic evaluation of the rules and constraints, list-making, half-attempts, raging anger at your jailer, shame at yourself, delight at your accomplishment, despair. Whatever it takes! Constraints: "Have it compose a poem---a poem about a haircut! But lofty, noble, tragic, timeless, full of love, treachery, retribution, quiet heroism in the face of certain doom! Six lines, cleverly rhymed, and every word beginning with the letter 's'!"