dynomight

(Most chalkboards/whiteboard don't spark as much joy as high-quality paper/pen in my opinion, but I reckon a good blackboard with good chalk does?)

Comment by dynomight on Paper · 2025-04-11T20:05:51.667Z · LW · GW

I'd hate to convince you to stop using paper, but I use this Obsidian Excalidraw plugin for making drawings and I find it to be reaaallllly fast: https://github.com/zsviczian/obsidian-excalidraw-plugin

It's kinda clunky but fundamentally I find it incredibly "non-frustrating" compared to all other tools. I guess you can try the editor in your browser here: https://excalidraw.com/

Comment by dynomight on Paper · 2025-04-11T17:50:12.273Z · LW · GW

I move that we think of paper and notes software as complements. Certainly, notes software is much better for almost any purpose where you're actually going to be referencing the notes repeatedly. But for the purpose of "make the neurons in your brain fire good", paper still can't be beat.

(This post was written by first scribbling on paper and then retyping and editing in, umm, Obsidian.)

Comment by dynomight on The first RCT for GLP-1 drugs and alcoholism isn't what we hoped · 2025-04-11T15:54:11.881Z · LW · GW

(Sorry for the slow reply—just saw this.)

> What is the alternative explanation for why semaglutide would disincline people who would have had small change scores from participating or incline people who have large change scores to participate (remember, this is within-subjects) in the alcohol self-administration experiment?

I'm a bit unsure what the non-alternative explanation is here. But imagine that semaglutide does not reduce the urge to drink but—I don't know—makes people more patient, or makes them more likely to agree to do things doctors ask them to do, or makes them more greedy. Then take the "marginal" person, who is just on the border of participating or not. If those marginal people drink less on average, then semaglutide would look good purely due to changing selection rather than actually reducing drinking.

Now, I don't claim that the above story is true. It's possible, but lots of other stories are also possible, including ones where the bias could go in the other way.

I also think there is a general tendency for people to believe that once they've identified a selection issue the results are totally undermined.

I expected this sentence to be followed by you praising me for explicitly disavowing such a view and stating that, since the bias could be in either direction, the lab experiment does provide some evidence in favor of semaglutide. :) (Just very weak evidence.)

Comment by dynomight on METR: Measuring AI Ability to Complete Long Tasks · 2025-03-20T16:57:36.243Z · LW · GW

What premises would I have to accept for the comparison to be fair? Suppose I think that available compute will continue to grow along previous trends and that we'll continue to find new tricks to turn extra compute into extra capabilities. Does conditioning on that make it fair? (Not sure I accept those premises, but never mind that.)

Comment by dynomight on The first RCT for GLP-1 drugs and alcoholism isn't what we hoped · 2025-02-21T22:51:57.568Z · LW · GW

Thanks for the response! I must protest that I think I'm being misinterpreted a bit. Compare my quote:

the point of RCTs is to avoid resorting to regression coefficients on non-randomized sample

To the:

The point of RCTs is not to avoid resorting to regression coefficients.

The "non-randomized sample" part of that quote is important! If semaglutide had no impact on the decision to participate, then we can argue about about the theory of regressions. Yes, the fraction that participated happened to be close, but with small numbers that could easily happen by chance. The hypothesis of this research is that semaglutide would reduce the urge to drink! If the decision to participate was random, and I believed the conclusion of the experiment, then that conclusion would seem to imply that the decision to participate wasn't random after all. It just seems incredibly strange to assume that there's no impact of semaglutide on the probability of agreeing to the experiment, and very unlikely the other variables in the regression fix this, which is why I'm dubious that the regression coefficients reflect any causal relationship.

That said, I think the participation bias could go in either direction. I said (and maintain) that the lab experiment does provide some evidence in favor of semaglutide's effectiveness. I just think that given the non-random selection, small sample, and general weirdness of having people drink in a room in a hospital as a measurement, it's quite weak evidence. Given the dismal results from the drinking records (which have less of all of these issues) I think that makes the overall takeaway from this paper pretty negative.

Comment by dynomight on Heritability: Five Battles · 2025-01-14T21:51:06.603Z · LW · GW

It ranges from 0% to 100%.

Small nitpick that doesn't have any significant consequences—this isn't technically true, it could be higher than 100%.

Comment by dynomight on Trying Bluesky · 2024-11-17T13:17:38.892Z · LW · GW

Wow, I didn't realize bluesky already supports user-created feeds, which can seemingly use any algorithm? So if you don't like "no algorithm" or "discover" you can create a new ranking method and also share it with other people?

Anyone want to create a lesswrong starter pack? Are there enough people on bluesky for that to be viable?

Comment by dynomight on Arithmetic is an underrated world-modeling technology · 2024-10-24T22:57:03.980Z · LW · GW

Well done, yes, I did exactly what you suggested! I figured that an average human lifespan was "around 80 years" and then multiplied and divided by 1.125 to get 80×1.125=90 and 80/1.125=71.111.

(And of course, you're also right that this isn't quite right since (1.125 - 1/1.125) / (1/1.125) = (1.125)²-1 = .2656 ≠ .25. This approximation works better for smaller percentages...)

Comment by dynomight on Arithmetic is an underrated world-modeling technology · 2024-10-23T21:29:42.616Z · LW · GW

Interesting. Looks like they are starting with a deep tunnel (530 m) and may eventually move to the deepest tunnel in Europe (1444 m). I wish I could find numbers on how much weight will be moved or the total energy storage of the system. (They say quote 2 MW, but that's power, not energy—how many MWh?)

According to this article, a Swiss company is building giant gravity storage buildings in China and out of 9 total buildings, there should be a total storage of 3700 MWh, which seems quite good! Would love to know more about the technology.

Comment by dynomight on Arithmetic is an underrated world-modeling technology · 2024-10-21T19:48:20.946Z · LW · GW

You're 100% right. (I actually already fixed this due to someone emailing me, but not sure about the exact timing.) Definitely agree that there's something amusing about the fact that I screwed up my manual manipulation of units while in the process of trying to give an example of how easy it is to screw up manual manipulations of units...

Comment by dynomight on Arithmetic is an underrated world-modeling technology · 2024-10-18T17:37:43.941Z · LW · GW

You mentioned a density of steel of 7.85 g/cm^3 but used a value of 2.7 g/cm^3 in the calculations.

Yes! You're right! I've corrected this, though I still need to update the drawing of the house. Thank you!

Comment by dynomight on I finally got ChatGPT to sound like me · 2024-09-17T23:22:15.654Z · LW · GW

Word is (at least according to the guy who automated me) that if you want an LLM to really imitate style, you really really want to use a base model and not an instruction-tuned model like ChatGPT. All of ChatGPT's "edge" has been worn away into bland non-offensiveness by the RLHF. Base models reflect the frightening mess of humanity rather than the instructions a corporation gave to human raters. When he tried to imitate me using instruction-tuned models it was very cringe no matter what he tried. When he switched to a base model it instantly got my voice almost exactly with no tricks needed.

I think many people kinda misunderstand the capabilities of LLMs because they only interact with instruction-tuned models.

Comment by dynomight on Nursing doubts · 2024-09-10T18:38:40.999Z · LW · GW

Why somewhat? It's plausible to me that even just the lack of DHA would give the overall RCT results.

Yeah, that seems plausible to me, too. I don't think I want to claim that the benefits are "definitely slightly lower", but rather that they're likely at least a little lower but I'm uncertain how much. My best guess is that the bioactive stuff like IgA does at least something, so modern formula still isn't at 100%, but it's hard to be confident.

Comment by dynomight on Nursing doubts · 2024-09-08T14:15:09.532Z · LW · GW

My impression was that the backlash you're describing is causally downstream of efforts by public health people to promote breastfeeding (and pro-breastfeeding messages in hospitals, etc.) Certainly the correlation is there (https://www.researchgate.net/publication/14117103_The_Resurgence_of_Breastfeeding_in_the_United_States) but I guess it's pretty hard to prove a strict cause.

Comment by dynomight on Michael Dickens' Caffeine Tolerance Research · 2024-09-05T15:22:11.710Z · LW · GW

I'm fascinated that caffeine is so well-established (the most popular drug?) and yet these kinds of self-experiments still seem to add value over the scientific literature.

Anyway, I have a suspicion that tolerance builds at different rates for different effects. For example, if you haven't had any caffeine in a long time (like months), it seems to create a strong sense of euphoria. But this seems to fade very quickly. Similarly, with prescription stimulants, people claim that tolerance to physical effects happens gradually, but full tolerance never develops for the effect on executive function. (Though I don't think there are any long-term experiments to prove this.)

These different tolerances are a bit hard to understand mechanistically: Doesn't caffeine only affect adenosine receptors? Maybe the body also adapts at different places further down the causal chain.

Comment by dynomight on Thoughts on seed oil · 2024-08-28T11:45:09.060Z · LW · GW

(Many months later) Thanks for this comment, I believe you are right! Strangely, there do seem to be many resources that list them as being hydrogen bonds (e.g. Encyclopedia Brittanica: https://www.britannica.com/science/unsaturated-fat which makes me question their editorial process.) In any case, I'll probably just rephrase to avoid using either term. Thanks again, wish I had seen this earlier!

Comment by dynomight on Datasets that change the odds you exist · 2024-07-02T17:31:46.096Z · LW · GW

Thanks, any feedback on where the argument fails? (If anywhere in particular.)

Comment by dynomight on Thoughts on seed oil · 2024-04-30T15:02:36.962Z · LW · GW

I would dissuade no one from writing drunk, and I'm confident that you too can say that people are penguins! But I'm sorry to report that personally I don't do it by drinking but rather writing a much longer version with all those kinds of clarifications included and then obsessively editing it down.

Comment by dynomight on Thoughts on seed oil · 2024-04-29T22:10:50.057Z · LW · GW

Do you happen to have any recommended pointers for research on health impacts of processed food? It's pretty easy to turn up a few recent meta reviews, which seems like a decent place to start, but I'd be interested if there were any other sources, particularly influential individual experiments, etc. (It seems like there's a whole lot of observational studies, but many fewer RCTs, for reasons that I guess are pretty understandable.) It seems like some important work here might never use the word "processing".

Comment by dynomight on Thoughts on seed oil · 2024-04-22T19:16:46.380Z · LW · GW

If I hadn't heard back from them, would you want me to tell you? Or would that be too sad?

Comment by dynomight on Thoughts on seed oil · 2024-04-22T18:51:11.863Z · LW · GW

Seed oils are usually solvent extracted, which makes me wonder, how thoroughly are they scrubbed of solvent, what stuff in the solvent is absorbed into the oil (also an effective solvent for various things), etc

I looked into this briefly at least for canola oil. There, the typical solvent is hexane. And some hexane does indeed appear to make it into the canola oil that we eat. But hexane apparently has very low toxicity, and—more importantly—the hexane that we get from all food sources apparently makes up less than 2% of our total hexane intake! https://www.hsph.harvard.edu/nutritionsource/2015/04/13/ask-the-expert-concerns-about-canola-oil/ Mostly we get hexane from gasoline fumes, so if hexane is a problem, it's very hard to see how to pin the blame on canola oil.

Comment by dynomight on Thoughts on seed oil · 2024-04-21T11:17:50.403Z · LW · GW

It's a regression. Just like they extrapolate backwards to (1882+50=1932) using data from 1959, they extrapolate forwards at the end. (This is discussed in the "timelines" section.) This is definitely a valid reason to treat it with suspicion, but nothing's "wrong" exactly.

Comment by dynomight on Thoughts on seed oil · 2024-04-20T19:14:12.674Z · LW · GW

Many thanks! All fixed (except one that I prefer the old way.)

Comment by dynomight on My Detailed Notes & Commentary from Secular Solstice · 2024-03-26T12:00:40.705Z · LW · GW

As the original author of underrated reasons to be thankful (here), I guess I can confirm that tearing apart the sun for raw materials was not an intended implication.

Comment by dynomight on Using axis lines for good or evil · 2024-03-19T21:01:44.124Z · LW · GW

I think matplotlib has way too many ways to do everything to be comprehensive! But I think you could do almost everything with some variants of these.

ax.spines['top'].set_visible(False) # or 'left' / 'right' / 'bottom'
ax.set_xticks([0,50,100],['0%','50%','100%'])
ax.tick_params(axis='x', left=False, right=False) # or 'y'
ax.set_ylim([0,0.30])
ax.set_ylim([0,ax.get_ylim()[1]])

Comment by dynomight on Using axis lines for good or evil · 2024-03-09T22:51:20.944Z · LW · GW

Good point regarding year tick marks! I was thinking think that labeling 0°C would make the most sense when freezing is really important. Say, if you were plotting historical data on temperatures and you were interested in trying to estimate the last frost date in spring or something. Then, 10°C would mean "twice as much margin" as 5°C.

Comment by dynomight on Using axis lines for good or evil · 2024-03-07T14:46:11.364Z · LW · GW

One way you could measure which one is "best" would be to measure how long it takes people to answer certain questions. E.g. "For what fraction of the 1997-2010 period did Japan spend more on healthcare per-capita than the UK?" or "what's the average ratio of healthcare spending in Sweden vs. Greece between 2000 and 2010?" (I think there is an academic literature on these kinds of experiments, though I don't have any references on hand.)

In this case, I think Tufte goes overboard in saying you shouldn't use color. But if the second plot had color, I'd venture it would win most such contests, if only because the y-axis is bigger and it's easier to match the lines with the labels. But even if I don't agree with everything Tufte says, I still find him useful because he suggests different options and different ways to think about things.

Comment by dynomight on Using axis lines for good or evil · 2024-03-07T14:35:32.502Z · LW · GW

Thanks, someone once gave me the advice that after you write something, you should go back to the beginning and delete as many paragraphs as you can without making everything incomprehensible. After hearing this, I noticed that most people tend to write like this:

Intro
Context
Overview
Other various throat clearing
Blah blah blah
Finally an actual example, an example, praise god

Which is pretty easy to correct once you see it!

Comment by dynomight on Using axis lines for good or evil · 2024-03-07T14:23:11.140Z · LW · GW

Hey, you might be right! I'll take this as useful feedback that the argument wasn't fully convincing. Don't mean to pull a motte-and-bailey, but I suppose if I had to, I'd retreat to an argument like, "if making a plot, consider using these rules as one option for how to pick axes." In any case, if you have any examples where you think following this advice leads to bad choices, I'd be interested to hear them.

Comment by dynomight on Why correlation, though? · 2024-03-06T17:56:56.798Z · LW · GW

I think you're basically right: Correlation is just one way of measuring dependence between variables. Being correlated is a sufficient but not necessary condition for dependence. We talk about correlation so much because:

We don't have a particularly convenient general scalar measure of how related two variables are. You might think about using something like mutual information, but for that you need the densities not datasets.
We're still living in the shadows of the times when computers weren't so big. We got used to doing all sorts of stuff based on linearity decades ago because we didn't have any other options, and they became "conventional" even when we might have better options now.

Comment by dynomight on Are language models good at making predictions? · 2023-11-08T15:20:00.038Z · LW · GW

Thanks, you've 100% convinced me. (Convincing someone that something that (a) is known to be true and (b) they think isn't surprising, actually is surprising is a rare feat, well done!)

Comment by dynomight on Are language models good at making predictions? · 2023-11-06T16:23:02.368Z · LW · GW

Chat or instruction finetuned models have poor prediction cailbration, whereas base models (in some cases) have perfect calibration.

Tell me if I understand the idea correctly: Log-loss to predict next token leads to good calibration for single token prediction, which manifests as good calibration percentage predictions? But then RLHF is some crazy loss totally removed from calibration that destroys all that?

If I get that right, it seems quite intuitive. Do you have any citations, though?

Comment by dynomight on Are language models good at making predictions? · 2023-11-06T14:54:21.759Z · LW · GW

Sadly, no—we had no way to verify that.

I guess one way you might try to confirm/refute the idea of data leakage would be to look at the decomposition of brier scores: GPT-4 is much better calibrated for politics vs. science but only very slightly better at politics vs. science in terms of refinement/resolution. Intuitively, I'd expect data leakage to manifest as better refinement/resolution rather than better calibration.

Comment by dynomight on Can I take ducks home from the park? · 2023-09-15T16:40:54.068Z · LW · GW

That would definitely be better, although it would mean reading/scoring 1056 different responses, unless I can automate the scoring process. (Would LLMs object to doing that?)

Comment by dynomight on Can I take ducks home from the park? · 2023-09-15T13:12:49.848Z · LW · GW

Thank you, I will fix this! (Our Russian speaker agrees and claims they noticed this but figured it didn't matter 🤔) I re-ran the experiments with the result that GPT-4 shifted from a score of +2 to a score of -1.

Comment by dynomight on Can I take ducks home from the park? · 2023-09-14T21:16:30.209Z · LW · GW

Well, no. But I guess I found these things notable:

Alignment remains surprisingly brittle and random. Weird little tricks remain useful.
The tricks that work for some models often seem to confuse others.
Cobbling together weird little tricks seems to help (Hindi ranger step-by-step)
At the same time, the best "trick" is a somewhat plausible story (duck-store).
PaLM 2 is the most fun, Pi is the least fun.

Comment by dynomight on I still think it's very unlikely we're observing alien aircraft · 2023-06-15T18:53:06.779Z · LW · GW

You've convinced me! I don't want to defend the claim you quoted, so I'll modify "arguably" into something much weaker.

Comment by dynomight on I still think it's very unlikely we're observing alien aircraft · 2023-06-15T17:22:06.163Z · LW · GW

I don't think I have any argument that it's unlikely aliens are screwing with us—I just feel it is, personally.

I definitely don't assume our sensors are good enough to detect aliens. I'm specifically arguing we aren't detecting alien aircraft, not that alien aircraft aren't here. That sound like a silly distinction, but I'd genuinely give much higher probability to "there are totally undetected alien aircraft on earth" than "we are detecting glimpses of alien aircraft on earth."

Regarding your last point, I totally agree those things wouldn't explain the weird claims we get from intelligence-connected people. (Except indirectly—e.g. rumors spread more easily when people think something is possible for other reasons.) I think that our full set of observations are hard to explain without aliens! That is, I think P[everything | aliens] is low. I just think P[everything | no aliens] is even lower.

Comment by dynomight on I still think it's very unlikely we're observing alien aircraft · 2023-06-15T13:10:46.904Z · LW · GW

I know that the mainstream view on Lesswrong is that we aren't observing alien aircraft, so I doubt many here will disagree with the conclusion. But I wonder if people here agree with this particular argument for that conclusion. Basically, I claim that:

P[aliens] is fairly high, but
P[all observations | aliens] is much lower than P[all observations | no aliens], simply because it's too strange that all the observations in every category of observation (videos, reports, etc.) never cross the "conclusive" line.

As a side note: I personally feel that P[observations | no aliens] is actually pretty low, i.e. the observations we have are truly quite odd / unexpected / hard-to-explain-prosaically. But it's not as low as P[observations | aliens]. This doesn't matter to the central argument (you just need to accept that the ratio P[observations | aliens] / P[observations | no aliens] is small) but I'm interested if people agree with that.

Comment by dynomight on Properties of Good Textbooks · 2023-05-08T19:17:07.858Z · LW · GW

I get very little value from proofs in math textbooks, and consider them usually unnecessary (unless they teach a new proof method).

I think the problem is that proofs are typically optimized for "give most convincing possible evidence that the claim is really true to a skeptical reader who wants to check every possible weak point". This is not what most readers (especially new readers) want on a first pass, which is "give maximum possible into why this claim is true for to a reader who is happy to trust the author if the details don't give extra intuition." At a glance, infinite Napkin seems to be optimizing much more for the latter.

Comment by dynomight on [Link] "The madness of reduced medical diagnostics" by Dynomight · 2022-06-17T03:31:28.267Z · LW · GW

If you're worried about computational complexity, that's OK. It's not something that I mentioned because (surprisingly enough...) this isn't something that any of the doctors discussed. If you like, let's call that a "valid cost" just like the medical risks and financial/time costs of doing tests. The central issue is if it's valid to worry about information causing harmful downstream medical decisions.

Comment by dynomight on Observations about writing and commenting on the internet · 2022-02-16T15:48:14.275Z · LW · GW

I might not have described the original debate very clearly. My claim was that if Monty chose "leftmost non-car door" you still get the car 2/3 of the time by always switching and 1/3 by never switching. Your conditional probabilities look correct to me. The only thing you might be "missing" is that (A) occurs 2/3 of the time and (B) occurs only 1/3 of the time. So if you always switch your chance of getting the car is still (chance of A)*(prob of car given A) + (chance of B)*(prob of car given B)=(2/3)*(1/2) + (1/3)*(1) = (2/3).

One difference (outside the bounds of the original debate) is that if Monty behaves this way there are other strategies that also give you the car 2/3 of the time. For example, you could switch only in scenario B and not in scenario A. There doesn't appear to be any way to exploit Monty's behavior and do better than 2/3 though.

Comment by dynomight on Observations about writing and commenting on the internet · 2022-02-15T00:03:25.321Z · LW · GW

Just to be clear, when talking about how people behave in forums, I mean more "general purpose" places like Reddit. In particular, I was not thinking about Less Wrong where in my experience, people have always bent over backwards to be reasonable!

Comment by dynomight on Writing On The Pareto Frontier · 2021-09-17T22:06:47.658Z · LW · GW

I have two thoughts related to this:

First, there's a dual problem: Given a piece of writing that's along the Pareto frontier, how do you make it easy for readers who might have a utility function aligned with the piece to find it.

Related to this, for many people and many pieces of writing, a large part of the utility they get is from comments. I think this leads to dynamics where a piece where the writing that's less optimal can get popular and then get to a point on the frontier that's hard to beat.

Comment by dynomight on Why has the replication crisis affected RCT-studies but not observational studies? · 2021-09-04T13:33:03.732Z · LW · GW

Done!

Comment by dynomight on johnswentworth's Shortform · 2021-09-03T14:54:12.203Z · LW · GW

I loved this book. The most surprising thing to me was the answer that people who were there in the heyday give when asked what made Bell Labs so successful: They always say it was the problem, i.e. having an entire organization oriented towards the goal of "make communication reliable and practical between any two places on earth". When Shannon left the Labs for MIT, people who were there immediately predicted he wouldn't do anything of the same significance because he'd lose that "compass". Shannon was obviously a genius, and he did much more after than most people ever accomplish, but still nothing as significant as what he did when at at the Labs.

Comment by dynomight on How To Write Quickly While Maintaining Epistemic Rigor · 2021-08-30T00:54:17.918Z · LW · GW

I thought this was fantastic, very thought-provoking. One possibly easy thing that I think would be great would be links to a few posts that you think have used this strategy with success.

Comment by dynomight on Factors of mental and physical abilities - a statistical analysis · 2021-08-18T17:04:34.656Z · LW · GW

Thanks, I clarified the noise issue. Regarding factor analysis, could you check if I understand everything correctly? Here's what I think is the situation:

We can write a factor analysis model (with a single factor) as

where:

$x$ is observed data
$g \sim N (0, 1)$ is a random latent variable
$w \in R^{n}$ is some vector (a parameter)
$e \sim N (0, Σ)$ is a random noise variable
$Σ$ is the covariance of the noise (a parameter)

It always holds (assuming $g$ and $e$ are independent) that

$Cov [x] = w w^{T} + Σ .$

In the simplest variant of factor analysis (in the current post) we use $Σ = a I$ in which case you get that

$Cov [x] = w w^{T} + a I .$

You can check if this model fits by (1) checking that $x$ is Normal and (2) checking if the covariance of x can be decomposed as in the above equation. (Which is equivalent to having all singular values the same except one).

The next slightly-less-simple variant of factor analysis (which I think you're suggesting) would be to use $Σ = diag (a)$ where $a$ is a vector, in which case you get that

$Cov [x] = w w^{T} + diag (a) .$

You can again check if this model fits by (1) checking that $x$ is Normal and (2) checking if the covariance of $x$ can be decomposed as in the above equation. (The difference is, now this doesn't reduce to some simple singular value condition.)

Do I have all that right?

Comment by dynomight on Factors of mental and physical abilities - a statistical analysis · 2021-08-18T16:38:18.943Z · LW · GW

Thanks for pointing out those papers, which I agree can get at issues that simple correlations can't. Still, to avoid scope-creep, I've taken the less courageous approach of (1) mentioning that the "breadth" of the effects of genes is an active research topic and (2) editing the original paragraph you linked to to be more modest, talking about "does the above data imply" rather than "is it true that". (I'd rather avoid directly addressing 3 and 4 since I think that doing those claims justice would require more work than I can put in here.) Anyway, thanks again for your comments, it's useful for me to think of this spectrum of different "notions of g".

User info

Posts

Comments