Posts

"AI achieves silver-medal standard solving International Mathematical Olympiad problems" 2024-07-25T15:58:57.638Z
Humans, chimpanzees and other animals 2023-05-30T23:53:08.295Z
On "aiming for convergence on truth" 2023-04-11T18:19:18.086Z
Large language models learn to represent the world 2023-01-22T13:10:38.837Z
Suspiciously balanced evidence 2020-02-12T17:04:20.516Z
"Future of Go" summit with AlphaGo 2017-04-10T11:10:40.249Z
Buying happiness 2016-06-16T17:08:53.802Z
AlphaGo versus Lee Sedol 2016-03-09T12:22:53.237Z
[LINK] "The current state of machine intelligence" 2015-12-16T15:22:26.596Z
Scott Aaronson: Common knowledge and Aumann's agreement theorem 2015-08-17T08:41:45.179Z
Group Rationality Diary, March 22 to April 4 2015-03-23T12:17:27.193Z
Group Rationality Diary, March 1-21 2015-03-06T15:29:01.325Z
Open thread, September 15-21, 2014 2014-09-15T12:24:53.165Z
Proportional Giving 2014-03-02T21:09:07.597Z
A few remarks about mass-downvoting 2014-02-13T17:06:43.216Z
[Link] False memories of fabricated political events 2013-02-10T22:25:15.535Z
[LINK] Breaking the illusion of understanding 2012-10-26T23:09:25.790Z
The Problem of Thinking Too Much [LINK] 2012-04-27T14:31:26.552Z
General textbook comparison thread 2011-08-26T13:27:35.095Z
Harry Potter and the Methods of Rationality discussion thread, part 4 2010-10-07T21:12:58.038Z
The uniquely awful example of theism 2009-04-10T00:30:08.149Z
Voting etiquette 2009-04-05T14:28:31.031Z
Open Thread: April 2009 2009-04-03T13:57:49.099Z

Comments

Comment by gjm on Demis Hassabis and Geoffrey Hinton Awarded Nobel Prizes · 2024-10-13T02:53:53.033Z · LW · GW

Unless I misread, it said "mRNA" before.

Comment by gjm on Demis Hassabis and Geoffrey Hinton Awarded Nobel Prizes · 2024-10-09T22:46:10.163Z · LW · GW

Correction: the 2024 Nobel Prize in Medicine was for the discovery of microRNA, not mRNA which is also important but a different thing.

Comment by gjm on Shortform · 2024-10-09T02:08:01.662Z · LW · GW

I think it's more "Hinton's concerns are evidence that worrying about AI x-risk isn't silly" than "Hinton's concerns are evidence that worrying about AI x-risk is correct". The most common negative response to AI x-risk concerns is (I think) dismissal, and it seems relevant to that to be able to point to someone who (1) clearly has some deep technical knowledge, (2) doesn't seem to be otherwise insane, (3) has no obvious personal stake in making people worry about x-risk, and (4) is very smart, and who thinks AI x-risk is a serious problem.

It's hard to square "ha ha ha, look at those stupid nerds who think AI is magic and expect it to turn into a god" or "ha ha ha, look at those slimy techbros talking up their field to inflate the value of their investments" or "ha ha ha, look at those idiots who don't know that so-called AI systems are just stochastic parrots that obviously will never be able to think" with the fact that one of the people you're laughing at is Geoffrey Hinton.

(I suppose he probably has a pile of Google shares so maybe you could squeeze him into the "techbro talking up his investments" box, but that seems unconvincing to me.)

Comment by gjm on My Apartment Art Commission Process · 2024-08-27T11:25:57.576Z · LW · GW

Pedantic correction: you have some sizes where you've written e.g. 20' x 20' and I'm pretty sure you mean 20" x 20".

(Also, the final note saying pixel art is good for crisp upscaling and you should start with the lowest-resolution version seems very weird to me, though the way it's worded makes it unlikely that this is a mistake; another sentence or so elaborating on why this is a good idea would be interesting to me.)

Comment by gjm on "AI achieves silver-medal standard solving International Mathematical Olympiad problems" · 2024-07-26T19:40:11.566Z · LW · GW

So maybe e.g. the (not very auto-) autoformalization part produced a theorem-statement template with some sort of placeholder where the relevant constant value goes, and AlphaProof knew it needed to find a suitable value to put in the gap.

Comment by gjm on "AI achieves silver-medal standard solving International Mathematical Olympiad problems" · 2024-07-26T19:37:05.543Z · LW · GW

I'm pretty sure what's going on is:

  • The system automatically generates candidate theorems it might try to prove, expressing possible answers, and attempts to prove them.
  • In this case, the version of the theorem it ended up being able to prove was the one with 2 in that position. (Which is just as well, since -- I assume, not having actually tried to solve the problem for myself -- that is in fact the unique number for which such a theorem is true.)
  • So the thing you end up getting a proof of includes the answer, but not because the system was told the answer in advance.

It would be nice to have this more explicitly from the AlphaProof people, though.

[EDITED to add:] Actually, as per the tweet from W T Gowers quoted by "O O" elsewhere in this thread, we do have it explicitly, not from the AlphaProof people but from one of the mathematicians the AlphaProof people engaged to evaluate their solutions.

Comment by gjm on "AI achieves silver-medal standard solving International Mathematical Olympiad problems" · 2024-07-26T09:49:54.908Z · LW · GW

The AlphaZero algorithm doesn't obviously not involve an LLM. It has a "policy network" to propose moves, and I don't know what that looks like in the case of AlphaProof. If I had to guess blindly I would guess it's an LLM, but maybe they've got something else instead.

Comment by gjm on "AI achieves silver-medal standard solving International Mathematical Olympiad problems" · 2024-07-26T01:30:35.372Z · LW · GW

I don't think this [sc. that AlphaProof uses an LLM to generate candidate next steps] is true, actually.

Hmm, maybe you're right. I thought I'd seen something that said it did that, but perhaps I hallucinated it. (What they've written isn't specific enough to make it clear that it doesn't do that either, at least to me. They say "AlphaProof generates solution candidates", but nothing about how it generates them. I get the impression that it's something at least kinda LLM-like, but could be wrong.)

Comment by gjm on [deleted post] 2024-07-25T18:08:56.397Z

Looks like this was posted 4 minutes before my https://www.lesswrong.com/posts/TyCdgpCfX7sfiobsH/ai-achieves-silver-medal-standard-solving-international but I'm not deleting mine because I think some of the links, comments, etc. in my version are useful.

Comment by gjm on On the CrowdStrike Incident · 2024-07-25T15:22:11.591Z · LW · GW

Nothing you have said seems to make any sort of conspiracy theory around this more plausible than the alternative, namely that it's just chance. There are 336 half-hours per week; when two notable things happen in a week, about half a percent of the time one of them happens within half an hour of the other. The sort of conspiracies you're talking about seem to me more unlikely than that.

(Why a week? Arbitrary choice of timeframe. The point isn't a detailed probability calculation, it's that minor coincidences happen all the time.)

Comment by gjm on On the CrowdStrike Incident · 2024-07-24T22:22:02.088Z · LW · GW

Given how much harm such an incident could do CrowdStrike, and given how much harm it could do an individual at Crowdstrike who turned out to have caused it on purpose, your second explanation seems wildly improbable.

The third one seems pretty improbable too. I'm trying to imagine a concrete sequence of events that matches your description, and I really don't think I can. Especially as Trump's formal acceptance of the GOP nomination can hardly have been any sort of news to anyone.

(Maybe I've misunderstood your tone and your comment is simply a joke, in which case fair enough. But if you're making any sort of serious suggestion that the incident was anything to do with Mr Trump, I think you're even crazier than he is.)

Comment by gjm on Friendship is transactional, unconditional friendship is insurance · 2024-07-20T01:44:01.835Z · LW · GW

Right: as I said upthread, the discussion is largely about whether terms like "spending" are misleading or helpful when we're talking about time rather than money. And, as you point out (or at least it seems clearly implied by what you say), whether a given term is helpful to a given person will depend on what other things are associated with that term in that person's mind, so it's not like there's even a definite answer to "is it helpful or misleading?".

(But, not that it matters all that much, I think you might possibly not have noticed that Ruby and Raemon are different people?)

Comment by gjm on Friendship is transactional, unconditional friendship is insurance · 2024-07-19T23:51:29.429Z · LW · GW

In such a world we'd presumably already have vocabulary adapted to that situation :-). But yes, I would feel fine using the term "spending" (but then I also feel fine talking about "spending time") but wouldn't want to assume that all my intuitions from the present world  still apply.

(E.g., in the actual world, for anyone who is neither very rich or very poor spending always has saving as an alternative[1], and how much you save can have a big impact on your future well-being. In the hypothetical spend-it-or-lose-it world, that isn't the case, and that feels like a pretty important difference.)

[1] For the very poor, much of their spending is basically compulsory and saving instead isn't an option. For the very rich, the choice is still there but for normal-size purchases matters less because they're going to have ample resources whether they spend or save.

Comment by gjm on Friendship is transactional, unconditional friendship is insurance · 2024-07-19T17:23:17.702Z · LW · GW

I am not convinced by the analogy. If you have $30 in your bank account you spend it on a book, you are $30 poorer; you had the option of just not doing that, in which case you would still have the $30. If you have 60 minutes ahead of you in the day and you spend it with a friend, then indeed you're 60 minutes older = poorer at the end of that time; but you didn't have the option of not spending those 60 minutes; they were going to pass by one way or another whatever you did.

You might still have given up something valuable! If you'd have preferred to devote those 60 minutes to earning money, or sleeping, or discovering timeless mathematical truths, then talking to your friend instead has had an opportunity cost. But the valuable thing you've foregone is that other thing you'd prefer to have done, not the time itself. That was always going to pass you by; it always does.

There aren't exactly definite right and wrong answers here; everyone agrees that "spending" time is not exactly the same thing as spending money, ditto for "giving" time, and the question is merely whether it's similar enough that the choice of language isn't misleading us. And it seems to me that, because the time is going to go anyway, the least-misleading way to think of it is that the default, no-action, case to compare with -- analogous to simply not spending money and having it sit in the bank -- is whatever you'd have been doing with your time otherwise. If you help a friend out by doing some tedious task that makes their life better, then you are "giving" or "spending" time. If you sit and chat with a friend because you both enjoy talking with one another, then you're not "giving" or "spending" in a sense that much resembles "giving" or "spending" money: you aren't worse off afterwards than if you hadn't done that; if you hadn't, then you'd likely have occupied the same time doing something no better.

I'm in two minds as to whether I believe what I just wrote. The counter-argument goes like this: comparing the situation after spending an hour with your friend with the situation after spending an hour doing something else begs the question; it's like comparing the situation after spending the $30 on a book with that after spending the $30 on something else. But I don't think the counter-argument really works, precisely because you have the option of just leaving the $30 in the bank and never spending it on something else, and there is no corresponding option for money.

Comment by gjm on Friendship is transactional, unconditional friendship is insurance · 2024-07-19T01:23:02.045Z · LW · GW

You would be in the same situation if you'd done something else during that hour. You're only "paying", in the sense of giving up something valuable to you, in so far as you would have preferred to do something else.

That's sometimes true of time spent with friends -- maybe your friend is moving house and you help them unload a lot of boxes or something -- but by and large we human beings tend to enjoy spending time with our friends. (Even when unloading boxes, actually.)

Comment by gjm on Baking vs Patissing vs Cooking, the HPS explanation · 2024-07-18T12:01:08.683Z · LW · GW

Agreed. I was going for "explain how he came to say the false thing", not "explain why it's actually true rather than false".

Comment by gjm on Baking vs Patissing vs Cooking, the HPS explanation · 2024-07-18T00:45:07.676Z · LW · GW

I took it as a kinda-joking backformation from "patisserie".

Comment by gjm on Baking vs Patissing vs Cooking, the HPS explanation · 2024-07-18T00:44:40.629Z · LW · GW

On the other hand, it seems like Adam is looking at breadmaking that uses a sourdough starter, and that does have both yeasts and bacteria in it. (And breadmaking that uses it is correspondingly more error-prone and in need of adjustment on the fly than most baking that just uses commercial yeast, though some of what makes it more error-prone isn't directly a consequence of the more complicated leavener.)

Comment by gjm on Brief notes on the Wikipedia game · 2024-07-16T00:25:14.338Z · LW · GW

I tried them all. My notes are full of spoilers.

Industrial Revolution (_2):

I didn't catch the deliberately-introduced error; instead I thought the bit about a recession in the 1830s was likely false. I took "the Industrial Revolution" to include the "Second Industrial Revolution" (as, indeed, the article seems to) and it seems to me that electricity was important for the progress of that, so I'm not entirely convinced that what was introduced to the article was exactly a falsehood.

Price gouging:

It seemed utterly unbelievable to me that a gathering of economists would disapprove wholeheartedly of "price gouging" and, lo, I was correct. (I feel like this one was pretty easy.)

World economy:

A few things seemed dubious and nothing seemed obviously wrong. There's a statement that economists usually use PPP to translate currencies and not official exchange rates, but then a table that (if I understand it right) uses both of those. (But this is kinda reconcilable: usually PPP is used for comparisons, and it's much better, but no harm in including nominal figures as well.) There's a claim that it's usual to consider only "human economic activity" and I don't know what that means -- what's the alternative? E.g., if it doesn't include making things with automatic machines then this is obviously false; if it's contrasting with aliens or something then it's so obviously true as not to be worth saying; probably it means something else but I don't know what, and looking it up feels a bit cheaty. This feels odd but my epistemic state is more "I don't understand something here" than "I know what this means and it's wrong". (Oh, I guess maybe it means "human economic activity as opposed to other human activities not classified as economic, like parents taking care of their own children", in which case it's not content-free and is probably correct.) There's a claim that before the Industrial Revolution "global output was dominated by China and India" and that India was the largest economy circa 1CE which seems iffy -- bigger than the Roman Empire? -- but, I dunno, those both had large populations and China at least was pretty advanced in some ways. There's a claim that the African Union has an economy bigger than $2T/year; I'm not sure just who's in the African Union but most of Africa is pretty poor and I'd have thought that if e.g. Nigeria and/or South Africa is at ~$1T then there ought to be a bigger list of $2T individual countries. The first of these feels like a nitpick; for the second, it feels like maybe I just don't understand a technical term; the third and fourth  feel like memorization more than understanding. The thing that seems most likely wrong is the claim about China and India, though I wouldn't be astonished to find it's completely true. ... The actual fabrication turns out to be none of those things. I did wonder about errors introduced in the table that was falsified, but didn't consider the thing that was actually done. It's a fair cop.

Cell (biology):

Surely 1-5nm is waaaay too small for even a small prokaryotic cell. The 100nm given as biggest typical size for a eukaryotic cell feels way too small too. Are these really microns rather than nanometres, perhaps? Of course some eukaryotic cells get much larger than 100um -- consider a hen's egg, for instance. But 100um feels like a plausible "seldom much bigger than this" figure. (Yup, this is spot on, and indeed these were originally microns.)

Fundamental theorems of welfare economics:

This is not a field I'm aware of knowing anything about, but the two theorems sound familiar (though I hadn't associated them with the specific title here). The only thing that seems obviously fishy is that the article says the theorems don't ensure existence of the equilibria, but also says that the second theorem guarantees that "any Pareto optimum can be supported as a competitive equilibrium for some initial set of endowments", which sure sounds like ensuring the existence of equilibria to me. There are a couple of other things that look strange but I think they're more likely just badly-written-article strange: "The production economy is more general and entails additional assumptions" (surely to be more general is to require fewer assumptions, but maybe what's meant is that in the broader setting you need extra assumptions that automatically hold in the narrower setting or something), and "The assumptions are all based on the standard graduate microeconomics textbook" which is just a really weird sentence. There are lots of technicalities wherein errors could be hiding, but I'm no economist and wouldn't notice. So, my guess is that it's the "don't ensure existence" thing. -- Nope, all those are apparently correct, and actually there are some further technical conditions that I didn't know about. This seems very much in the "memorization" rather than "understanding" category to me; e.g., for all I know "perfect information" is implied by "rationality" as those terms are used in economics. Maybe given infinite intelligence I could have figured out that there needed to be some sort of condition about externalities, but: bah.

List of causes of death by rate:

This says that neurological disorders are the biggest single cause of death. Ahead of both cardiovascular and respiratory problems? Really? Can't possibly be right. And, indeed, it isn't. On the one hand, this is "memorization"; on the other, surely this is pretty well known.

Natural selection:

Surely that description of "microevolution" and "macroevolution" can't be right; the distinction is about the scale of the changes, not the nature of the environmental pressures that cause them. ... Huh, apparently that isn't the issue; the actual error is one I should have spotted if I'd been reading more carefully and hadn't seen something else I was sure was wrong. I have to say I still think those sentences about microevolution and macroevolution are wrong. Aren't they? (I tend to avoid those terms, which I mostly associate with creationists who argue that one is real and the other not, so maybe I'm misunderstanding them, but e.g. the Wikipedia pages for them don't seem to match what's claimed in this article.)

Working time:

Surely the decreases in working hours shown in that table are too dramatic. Unless, I guess, they're the result of e.g. increasing numbers of part-time workers, rather than changes to typical full-time hours or something? (If so, then right next to the table there should be a load of caveats about this.) But this claims that, for instance, in France in 1950 the average worker worked nearly 3000 hours per year, versus about half that now. And can it really be true that Greece has longer-working workers than any other EU country? It's not that any of the individual figures is completely unbelievable, but in the aggregate something seems really off. I'm sure there has been a decrease in working hours, but it surely can't be this big. (I may have been encouraged toward this conclusion by noticing that the directory containing this example had what looked like data-processing code, suggesting that something quantitative has been changed in a nontrivial way.) Anyway, cheaty or not, I was spot on with this.

So, I got exactly half of them right (and alternated between right and wrong!). For two of the wrong ones I'm prepared to argue that the thing I thought I'd spotted is wrong even though it wasn't deliberately introduced as a test; for another, I'm maybe prepared to argue that the deliberate error introduced isn't exactly an error.

Comment by gjm on Brief notes on the Wikipedia game · 2024-07-15T23:05:44.581Z · LW · GW

There appear to be two edited versions of the Industrial Revolution article. Which one is recommended? (My guess: the _2 one because it's more recent.)

Comment by gjm on Brief notes on the Wikipedia game · 2024-07-15T16:34:47.974Z · LW · GW

Let's suppose it's true, as Olli seems to find, that most not-inconsequential things in Wikipedia are more "brute facts" than things one could reasonably deduce from other things. Does this tell us anything interesting about the world?

For instance: maybe it suggests that reasoning is less important than we might think, that in practice most things we care about we have to remember rather than working out. It certainly seems plausible that that's true, though "reasonining is less important than we might think" feels like a slightly tendentious way of putting it. (I suggest: Reasoning is very important on those occasions when you actually need to do it, but those occasions are rarer than those of us who are good at reasoning might like to think.)

Comment by gjm on Reliable Sources: The Story of David Gerard · 2024-07-15T13:29:46.417Z · LW · GW

Unfortunately, not being a NYT subscriber I think I can't see the specific video you mention (the only one with Biden allegedly being led anywhere that I can see before the nag message has his wife doing the alleged leading, and the point of it seems to have been not that someone was leading him somewhere but that he walked off instead of greeting veterans at a D-Day event, and there isn't anything in the text I can see that calls anything a cheap fake).

(Obviously my lack of an NYT subscription isn't your problem, but unless there's another source for what they're claiming I can't actually tell whether I agree with your take on it or not.)

Again, I wasn't claiming that the people spinning conspiracy theories about the NYT wanting Biden out of the way to lower billionaires' taxes are right. (I think they're almost certainly wrong.) But when I see, within a week of one another, one claim that you can tell the NYT is politically biased because of how they went out of their way to defend Biden from claims about his age/frailty/... and another claim that you can tell the NYT is politically biased because of how they went out of their way to propagate and dramatize claims about Biden's age/frailty/..., my instinctive reaction is to be skeptical about both those claims.

... Ah, I've found what I think is the text of the NYT article. What it actually says about "cheap fakes" is this:

Some of the videos of Mr. Biden circulating during this year's campaign are clearly manipulated to make him look old and confused. Others cut out vital context to portray him in a negative light, a process sometimes known as a "cheap fake" because it requires little expense or technological skill to create.

(Which is not at all what I assumed "cheap fake" meant on reading your comment, for what it's worth.) But that text doesn't say anything about Obama and doesn't use the word "fundraiser" (and doesn't include any of the videos) so I still can't tell what video it is you're saying isn't misleading and therefore have no opinion on whether or not it actually is.

I had a look at a YouTube video from "Sky News AU" which was about the D-Day thing and it looked to me like a mixture of places where Biden's behaviour was genuinely troubling and short clips of the sort that I'm pretty sure you could find for anyone of his age whether or not there was anything much wrong with them, if you did a bit of cherry-picking. The latter seems like the sort of thing the NYT article called "cheap fakes" and whatever Biden's current state it seems pretty clear to me that there was some of that going on in the (pretty mainstream, I take it) video I looked at.

Again, I don't know exactly what video you're talking about or what the NYT said about it, since in what looks like a copy of the article's text there's nothing about Obama leading him anywhere at any fundraiser. But from what I've looked at so far, I'm not seeing how the NYT article is misinformation.

(As for the actual question of Biden's current state of physical and/or cognitive health, which is somewhat relevant to this, I'm not sure what to think. The very worst things look pretty alarming; on the other hand, I took a sort of random sampling of short segments from That Debate and I thought that in them Trump was significantly more incoherent than Biden. And I know that Biden is a lifelong stutterer which will make any sort of slowness look more alarming, and which is a fact that never seems to be mentioned in any of the articles about how decrepit he allegedly is. It's not relevant to all the things that people are making noise about -- e.g., if he calls someone by the wrong name, that's probably nothing to do with his stutter. On the other hand, calling people by the wrong name happens all the time and e.g. Trump does it a lot too.)

Comment by gjm on Reliable Sources: The Story of David Gerard · 2024-07-14T00:48:16.724Z · LW · GW

What in that article is misinformation?

Elsewhere on the internet, people are complaining vociferously that the NYT's more recent articles about Biden's age and alleged cognitive issues show that the NYT is secretly doing the bidding of billionaires who think a different candidate might tax them less. I mention this not because I think those people are right but as an illustration of the way that "such-and-such a media outlet is biased!" is a claim that often says more about the position of the person making the complaint than about the media outlet in question.

Comment by gjm on On Claude 3.5 Sonnet · 2024-06-27T16:13:53.293Z · LW · GW

I gave it a few paragraphs from something I posted on Mastodon yesterday, and it identified me. I'm at least a couple of notches less internet-famous than Zvi or gwern, though again there's a fair bit of my writing on the internet and my style is fairly distinctive. I'm quite impressed.

(I then tried an obvious thing and fed it a couple of Bitcoin-white-paper paragraphs, but of course it knew that they were "Satoshi Nakamoto" and wasn't able to get past that. Someone sufficiently determined to identify Satoshi and with absurd resources could do worse than to train a big LLM on "everything except writings explicitly attributed to Satoshi Nakamoto" and then see what it thinks.)

Comment by gjm on Claude 3.5 Sonnet · 2024-06-23T21:57:25.531Z · LW · GW

If it's true that models are "starting to become barely capable of noticing that they are falling for this pattern" then I agree it's a good sign (assuming that we want the models to become capable of "general intelligence", of course, which we might not). I hadn't noticed any such change, but if you tell me you've seen it I'll believe you and accordingly reduce my level of belief that there's a really fundamental hole here.

Comment by gjm on Claude 3.5 Sonnet · 2024-06-23T01:00:56.532Z · LW · GW

I'm suggesting that the fact that things the model can't do produce this sort of whack-a-mole behaviour and that the shape of that behaviour hasn't really changed as the models have grown better at individual tasks may indicate something fundamental that's missing from all models in this class, and that might not go away until some new fundamental insight comes along: more "steps of scaling" might not do the trick.

Of course it might not matter, if the models become able to do more and more difficult things until they can do everything humans can do, in which case we might not be able to tell whether the whack-a-mole failure mode is still there. My highly unreliable intuition says that the whack-a-mole failure mode is related to the planning and "general reasoning" lacunae you mention, and that those might turn out also to be things that models of this kind don't get good at just by being scaled further.

But I'm aware that people saying "these models will never be able to do X" tend to find themselves a little embarrassed when two weeks later someone finds a way to get the models to do X. :-) And, for the avoidance of doubt, I am not saying anything even slightly like "mere computers will never be truly able to think"; only that it seems like there may be a hole in what the class of models that have so far proved most capable can be taught to do, and that we may need new ideas rather than just more "steps of scaling" to fill those holes.

Comment by gjm on Claude 3.5 Sonnet · 2024-06-22T22:55:26.935Z · LW · GW

That seems reasonable.

My impression (which isn't based on extensive knowledge, so I'm happy to be corrected) is that the models have got better at lots of individual tasks but the shape of their behaviour when faced with a task that's a bit too hard for them hasn't changed much: they offer an answer some part of which is nonsense; you query this bit; they say "I'm sorry, I was wrong" and offer a new answer some different part of which is nonsense; you query this bit; they say "I'm sorry, I was wrong" and offer a new answer some different part of which is nonsense; rinse and repeat.

So far, that pattern doesn't seem to have changed much as the models have got better. You need to ask harder questions to make it happen, because they've got better at the various tasks, but once the questions get hard enough that they don't really understand, back comes the "I'm sorry, I was wrong" cycle pretty much the same as it ever was.

Comment by gjm on Claude 3.5 Sonnet · 2024-06-22T01:25:29.323Z · LW · GW

It's pretty good. I tried it on a few mathematical questions.

First of all, a version of the standard AIW problem from the recent "Alice in Wonderland" paper. It got this right (not very surprisingly as other leading models also do, at least much of the time). Then a version of the "AIW+" problem which is much more confusing. Its answer was wrong, but its method (which it explained) was pretty much OK and I am not sure it was any wronger than I would be on average trying to answer that question in real time.

Then some more conceptual mathematical puzzles. I took them from recent videos on Michael Penn's YouTube channel. (His videos are commonly about undergraduate or easyish-olympiad-style pure mathematics. They seem unlikely to be in Claude's training data, though of course other things containing the same problems might be.)

One pretty straightforward one: how many distinct factorials can you find that all end in the same number of zeros? It wrote down the correct formula for the number of zeros, then started enumerating particular numbers and got some things wrong, tried to do pattern-spotting, and gave a hilariously wrong answer; when gently nudged, it corrected itself kinda-adequately and gave an almost-correct answer (which it corrected properly when nudged again) but I didn't get much feeling of real understanding.

Another (an exercise from Knuth's TAOCP; he rates its difficulty HM22, meaning it needs higher mathematics and should take you 25 minutes or so; it's about the relationship between two functions whose Taylor series coefficients differ by a factor H(n), the n'th harmonic number) it solved straight off and quite neatly.

Another (find all functions with (f(x)-f(y))/(x-y) = f'((x+y)/2) for all distinct x,y) it initially "solved" with a solution with a completely invalid step. When I said I couldn't follow that step, it gave a fairly neat solution that works if you assume f is real-analytic (has a Taylor series expansion everywhere). This is also the first thing that occurred to me when I thought about the problem. When asked for a solution that doesn't make that assumption, it unfortunately gave another invalid solution, and when prodded about that it gave another invalid one. Further prompting, even giving it a pretty big hint in the direction of a nice neat solution (better than Penn's :-)), didn't manage to produce a genuinely correct solution.

I rate it "not terribly good undergraduate at a good university", I think, but -- as with all these models to date -- with tragically little "self-awareness", in the sense that it'll give a wrong answer, and you'll poke it, and it'll apologize effusively and give another wrong answer, and you can repeat this several times without making it change its approach or say "sorry, it seems I'm just not smart enough to solve this one" or anything.

On the one hand, the fact that we have AI systems that can do mathematics about as well as a not-very-good undergraduate (and quite a bit faster) is fantastically impressive. On the other hand, it really does feel as if something fairly fundamental is missing. If I were teaching an actual undergraduate whose answers were like Claude's, I'd worry that there was something wrong with their brain that somehow had left them kinda able to do mathematics. I wouldn't bet heavily that just continuing down the current path won't get us to "genuinely smart people really thinking hard with actual world models" levels of intelligence in the nearish future, but I think that's still the way I'd bet.

(Of course a system that's at the "not very good undergraduate" level in everything, which I'm guessing is roughly what this is, is substantially superhuman in some important respects. And I don't intend to imply that it doesn't matter whether Anthropic are lax about what they release just because the latest thing happens not to be smart enough to be particularly dangerous.)

Comment by gjm on Notes on Gracefulness · 2024-05-29T01:46:16.025Z · LW · GW

Even though it would have broken the consistent pattern of the titling of these pieces, I find myself slightly regretting that this one isn't called "Grace Notes".

Comment by gjm on How to be an amateur polyglot · 2024-05-09T01:34:44.776Z · LW · GW

A nitpick: you say

fun story, I passed the C2 exam and then I realized I didn’t remember the word faucet when I went to the UK to visit a friend

but here in the UK I don't think I have ever once heard a native speaker use the word "faucet" in preference to "tap". I guess the story is actually funnier if immediately after passing your C2 exam you (1) thought "faucet" was the usual UK term and (2) couldn't remember it anyway...

(I liked the post a lot and although I am no polyglot all the advice seems sound to me.)

Comment by gjm on Losing Faith In Contrarianism · 2024-04-26T09:26:41.875Z · LW · GW

Please don't write comments all in boldface. It feels like you're trying to get people to pay more attention to your comment than to others, and it actually makes your comment a little harder to read as well as making the whole thread uglier.

Comment by gjm on social lemon markets · 2024-04-25T10:42:34.351Z · LW · GW

It looks to me as if, of the four "root causes of social relationships becoming more of a lemon market" listed in the OP, only one is actually anything to do with lemon-market-ness as such.

The dynamic in a lemon market is that you have some initial fraction of lemons but it hardly matters what that is because the fraction of lemons quickly increases until there's nothing else, because buyers can't tell what they're getting. It's that last feature that makes the lemon market, not the initial fraction of lemons. And I think three of the four proposed "root causes" are about the initial fraction of lemons, not the difficulty of telling lemons from peaches.

  • urbanization: this one does seem to fit: it means that the people you're interacting with are much less likely to be ones you already know about, so you can't tell lemons from peaches.
  • drugs: this one is all about there being more lemons, because some people are addicts who just want to steal your stuff.
  • MLM schemes: again, this is "more lemons" rather than "less-discernible lemons".
  • screens: this is about raising the threshold below which any given potential interaction/relationship becomes a lemon (i.e., worse than the available alternative), so again it's "more lemons" not "less-discernible lemons".

Note that I'm not saying that "drugs", "MLM", and "screens" aren't causes of increased social isolation, only that if they are the way they're doing it isn't quite by making social interactions more of a lemon market. (I think "screens" plausibly is a cause of increased social isolation. I'm not sure I buy that "drugs" and "MLM" are large enough effects to make much difference, but I could be convinced.)

I like the "possible solutions" part of the article better than the section that tries to fit everything into the "lemon market" category, because it engages in more detail with the actual processes involved by actual considering possible scenarios in which acquaintances or friendships begin. When I think about such scenarios in the less-isolated past and compare with the more-isolated present, it doesn't feel to me like "drugs" and "MLM" are much of the difference, which is why I don't find those very plausible explanations.

Comment by gjm on A High Decoupling Failure · 2024-04-14T23:33:34.212Z · LW · GW

I think this is oversimplified:

High decouplers will notice that, holding preferences constant, offering people an additional choice cannot make them worse off. People will only take the choice if its better than any of their current options.

This is obviously true if somehow giving a person an additional choice is literally the only change being made, but you don't have to be a low-decoupler to notice that that's very very often not true. For a specific and very common example: often other people have some idea what choices you have (and, in particular, if we're talking about whether it should be legal to do something or not, it is generally fairly widely known what's legal).

Pretty much everyone's standard example of how having an extra choice that others know about can hurt you: threats and blackmail and the like. I might prefer not to have the ability to pay $1M to avoid being shot dead, or to prove I voted for a particular candidate to avoid losing my job.

This is pretty much parallel to a common argument for laws against euthanasia, assisted suicide, etc.: the easier it is for someone with terrible medical conditions to arrange to die, the more opportunities there are for others to put pressure on them to do so, or (this isn't quite parallel, but it seems clearly related) to make it appear that they've done so when actually they were just murdered.

Comment by gjm on Ackshually, many worlds is wrong · 2024-04-14T23:22:04.910Z · LW · GW

Then it seems unfortunate that you illustrated it with a single example, in which A was a single (uniformly distributed)  number between 0 and 1.

Comment by gjm on Ackshually, many worlds is wrong · 2024-04-12T01:26:19.886Z · LW · GW

I think this claim is both key to OP's argument and importantly wrong:

But a wavefunction is just a way to embed any quantum system into a deterministic system

(the idea being that a wavefunction is just like a probability distribution, and treating the wavefunction as real is like treating the probability distribution of some perhaps-truly-stochastic thing as real).

The wavefunction in quantum mechanics is not like the probability distribution of (say) where a dart lands when you throw it at a dartboard. (In some but not all imaginable Truly Stochastic worlds, perhaps it's like the probability distribution of the whole state of the universe, but OP's intuition-pumping example seems to be imagining a case where A is some small bit of the universe.)

The reason why it's not like that is that the laws describing the evolution of the system explicitly refer to what's in the wavefunction. We don't have any way to understand and describe what a quantum universe does other than in terms of the evolution of the wavefunction or something basically equivalent thereto.

Which, to my mind, makes it pretty weird to say that postulating that the wavefunction is what's real is "going further away from quantum mechanics". Maybe one day we'll discover some better way to think about quantum mechanics that makes that so, but for now I don't think we have a better notion of what being Truly Quantum means than to say "it's that thing that wavefunctions do".

I have the impression -- which may well be very unfair -- that at some early stage OP imbibed the idea that what "quantum" fundamentally means is something very like "random", so that a system that's deterministic is ipso facto less "quantum" than a system that's stochastic. But that seems wrong to me. We don't presently have any way to distinguish random from deterministic versions of quantum physics; randomness or something very like it shows up in our experience of quantum phenomena, but the fact that a many-worlds interpretation is workable at all means that that doesn't tell us much about whether randomness is essential to quantum-ness.

So I don't buy the claim that treating the wavefunction as real is a sort of deterministicating hack that moves us further away from a Truly Quantum understanding of the universe.

(And, incidentally, if we had a model of Truly Stochastic physics in which the evolution of the system is driven by what's inside those probability distributions -- why, then, I would rather like the idea of claiming that the probability distributions are what's real, rather than just their outcomes.)

Comment by gjm on Thinking harder doesn’t work · 2024-04-11T01:20:12.429Z · LW · GW

I don't know exactly what the LW norms are around plagiarism and plagiarism-ish things, but I think that introducing that basically-copied material with

I learned this by observing how beginners and more experienced people approach improv comedy.

is outright dishonest. OP is claiming to have observed this phenomenon and gleaned insight from it, when in fact he read about it in someone else's book and copied it into his post.

I have strong-downvoted the post for this reason alone (though, full disclosure, I also find the one-sentence-per-paragraph style really annoying and that may have influenced my decision[1]) and will not find it easy to trust anything else I see from this author.

[1] It feels to me as if the dishonest appropriation of someone else's insight and the annoying style may not be completely unrelated. One reason why I find this style annoying is that it gives me the strong impression of someone who is optimizing for sounding good. This sort of style -- punchy sentences, not too much complexity in how they relate to one another, the impression of degree of emphasis on every sentence -- feels like a public speaking style to me, and when I see someone writing this way I can't shake the feeling that someone is trying to manipulate me, to oversimplfy things to make them more likely to lodge in the brain, etc. And stealing other people's ideas and pretending they're your own is also a thing people do when they are optimizing for sounding good. (Obviously everything in this footnote is super-handwavy and unfair.)

In case anyone is in doubt about abstractapplic's accusation, I've checked. The relevant passage is near the end of section 3 of the chapter entitled "Spontaneity"; in my copy it's on page 88. I'm not sure "almost verbatim" is quite right, but the overall claim being made is the same, "fried mermaid" and "fish" are both there, and "will desperately try to think up something original" is taken verbatim from Johnstone.

Comment by gjm on D&D.Sci: The Mad Tyrant's Pet Turtles [Evaluation and Ruleset] · 2024-04-10T22:32:52.538Z · LW · GW

One can't put a price on glory.

Comment by gjm on D&D.Sci: The Mad Tyrant's Pet Turtles [Evaluation and Ruleset] · 2024-04-09T22:36:49.361Z · LW · GW

Wow, that was incredibly close.

I think simon and aphyer deserve extra credit for noticing the "implicit age variable" thing.

Comment by gjm on Math-to-English Cheat Sheet · 2024-04-08T20:51:44.490Z · LW · GW

There are a few things in the list that I would say differently, which I mention not because the versions in the post are _wrong_ but because if you're using a crib-sheet like this then you might get confused when other people say it differently:

  • I say "grad f", "div f", "curl f" for . I more often say "del" than "nabla" and for the Laplacian I would likely say either "del squared f" or "Laplacian of f".
  • I pronounce "cos" as "coss" not as "coz".
  • For derivatives I will say "dash" at least as often as "prime".

The selection of things in the list feels kinda strange (if it was mostly produced by GPT-4 then that may be why) -- if the goal is to teach you how to say various things then some of the entries aren't really pulling their weight (e.g., the one about the z-score, or the example of how to read out loud an explicit matrix transpose, when we've already been told how to say "transpose" and how to read out the numbers in a matrix). It feels as if whoever-or-whatever generated the list sometimes forgot whether they were making a list of bits of mathematical notation that you might not know how to say out loud or a list of things in early undergraduate mathematics that you might not know about.

It always makes me just a little bit sad when I see Heron's formula for the area of a triangle. Not because there's anything wrong with it or because it isn't a beautiful formula -- but because it's a special case of something even nicer. If you have a cyclic quadrilateral with sides  then (writing ) its area is . Heron's formula is just the special case where two vertices coincide so . The more general formula (due to Brahmagupta) is also more symmetrical and at least as easy to remember.

Comment by gjm on D&D.Sci: The Mad Tyrant's Pet Turtles · 2024-03-30T18:01:02.134Z · LW · GW

With rather little confidence, I estimate for turtles A-J respectively:

22.93, 18.91, 25.47, 21.54, 17.79, 7.24, 30.36, 20.40, 24.25, 20.69 lb

Justification, such as it is:

The first thing I notice on eyeballing some histograms is that we seem to have three different distributions here: one normal-ish with weights < 10lb, one maybe lognormal-ish with weights > 20lb, and a sharp spike at exactly 20.4lb. Looking at some turtles with weight 20.4lb, it becomes apparent that 6-shell-segment turtles are special; they all have no wrinkles, green colour, no fangs, normal nostrils, no misc abnormalities, and a weight of 20.4lb. So that takes care of Harold. Then the small/large distinction seems to go along with (gray, fangs) versus (not-gray, no fangs). Among the fanged gray turtles, I didn't find any obvious sign of relationships between weight and anything other than number of shell segments, but there there's a clear linear relationship. Variability of weight doesn't seem interestingly dependent on anything. Residuals of the model a + b*segs look plausibly normal. So that takes care of Flint. The other pets are all green or grayish-green so I'll ignore the greenish-gray ones. These look like different populations again, though not so drastically different. Within each population it looks as if there's a plausibly-linear dependence of weight on the various quantitative features; nostrils seem irrelevant; no obvious sign of interactions or nonlinearities. The coefficients of wrinkles and segments are very close to a 1:2 ratio and I was tempted to force that in the name of model simplicity, but I decided not to. The coefficient of misc abs is very close to 1 and I was tempted to force that too but again decided not to. Given the estimated mean, the residuals now look pretty normally distributed -- the skewness seems to be an artefact of the distribution of parameters -- with stddev plausibly looking like a + b*mean. The same goes for the grayish-green turtles, but with different coefficients everywhere (except that the misc abs coeff looks like 1 lb/abnormality again). Finally, if we have a normally distributed estimate of a turtle's weight then the expected monetary loss is minimized ifwe estimate mu + 1.221*sigma.

I assume

that there's a more principled generation process, which on past form will probably involve rolling variable numbers of dice with variable numbers of sides, but I didn't try to identify it.

I will be moderately unsurprised if

it turns out that there are subtle interactions that I completely missed that would enable us to predict some of the turtles' weights with much better accuracy. I haven't looked very hard for such things. In particular, although I found no sign that nostril size relates to anything else it wouldn't be very surprising if it turns out that it does. Though it might not! Not everything you can measure actually turns out to be relevant! Oh, and I also saw some hints of interactions among the green turtles between scar-count and the numbers of wrinkles and shell segments, though my brief attempts to follow that up didn't go anywhere useful.

Tools used: Python, Pandas, statsmodels, matplotlib+seaborn. I haven't so far seen evidence that this would benefit much from

 fancier models like random forests etc.

Comment by gjm on Middle Child Phenomenon · 2024-03-16T18:19:57.911Z · LW · GW

Yes , I know what the middle-child phenomenon is in the more literal context. I just don't have any idea why you're using the term here. I don't see any similarities between the oldest / middle / youngest child relationships in a family and whatever relationships there might be between programmers / lawyers / alignment researchers.

(I think maybe all you actually mean is "these people are more important than we're treating them as". Might be true, but that isn't a phenomenon, it's just a one-off judgement that a particular group of people are being neglected.)

I still don't understand why the distribution of talent/success/whatever among law students is relevant. If your point is that very few of them are going to be in a position to make a difference to AI policy then surely that actually argues against your main claim that law students should be getting more attention from people who care about AI.

Comment by gjm on Middle Child Phenomenon · 2024-03-16T11:40:16.935Z · LW · GW

Having read this post, I am still not sure what "the Middle Child Phenomenon" actually is, nor why it's called that.

The name suggests something rather general. But most of the post seems like maybe the definition is something like "the fact that there isn't a vigorous effort to get law students informed about artificial intelligence".

Except that there's also all the stuff about the distribution of talent and interests among law students, and another thing I don't understand is what that actually has to do with it. If (as I'm maybe 75% confident) the main point of the post is that it would be valuable to have law students learn something about AI because public policy tends to be strongly influenced by lawyers, then it seems like this point would be equally strong regardless of how your cohort of 1000 lawyers is distributed between dropouts, nobodies, all-rounders, CV-chasers, and "golden children". (I am deeply unconvinced by this classification, by the way, but I am not a lawyer myself and maybe it's more accurate than it sounds.)

Comment by gjm on Constructive Cauchy sequences vs. Dedekind cuts · 2024-03-15T03:52:46.324Z · LW · GW

It looks as if you're taking a constructive Dedekind cut to involve a "set of real numbers" in the sense of a function for distinguishing left-things from right-things.

Is that actually how constructivists would want to define them? E.g., Bishop's "Foundations of Constructive Analysis", if I am understanding its definitions of "set" and "subset" correctly (which I might not be), says in effect that a set of rational numbers is a recipe for constructing elements of that set, along with a way of telling whether two things constructed in this way are equal. I'm pretty sure you can have one of those but not be able to determine explicitly whether a given rational number is in the set, in which case your central argument doesn't go through.

Are Cauchy sequences and Dedekind cuts equivalent if one thinks of them as Bishop does? There's an exercise in his book that claims they are. I haven't thought about this much and am very much not an expert on this stuff, and for all I know Bishop may have made a boneheaded mistake at this point. I'm also troubled by the apparent vagueness of Bishop's account of sets and subsets and whatnot.

More concretely, that exercise in Bishop's book says: a Dedekind cut is a pair of nonempty sets of rationals S,T such that we always have s<t and given rationals x<y either x is in S or y is in T. Unless I'm confused about Bishop's account of sets, all of this is consistent with e.g. S containing the negative rationals and T the positive rationals, and not being able to say that 0 is in either of them. And unless I'm confused about your "arbitration oracles", you can't build an arbitration oracle out of that setup.

(But, again: not an expert on any of this, could be horribly wrong.)

Comment by gjm on [deleted post] 2024-03-02T00:27:15.699Z

I did, in fact, read the post and the NYT articles, and I am not convinced that your description of what they do and what it means is correct. So, if my response to your article doesn't consist mostly of the gushing praise your first paragraph indicates you'd prefer, that's one reason why.

But, regardless of that: If you write something wrong, and someone points out that it's wrong, I don't think it's reasonable to respond with "how dare you point that out rather than looking only at the other parts of what I wrote?".

Scott is not using some weird eccentric definition of "lie". E.g., the main definition in the OED is: "An act or instance of lying; a false statement made with intent to deceive; a criminal falsehood." (Does that first clause soften it? Not really; it's uninformative, because they define the verb "lie" in terms of the noun "lie".) First definition in Wiktionary is " To give false information intentionally with intent to deceive". But, in any case, even with a very broad definition of "lie" the first four levels in his taxonomy are simply, uncontroversially, obviously not kinds of lying. Again, the first one is "reasoning well, and getting things right".

If I say "There are seven classes of solid objects in the solar system: dust motes, pebbles, boulders, mountains, moons, small planets, and large planets" and you identify something as a small planet, you should not call it "a Level 6 Planet, according to gjm's classification of planets".

And, while I understand a preference for being charitable and not leaping to calling things dishonest that aren't necessarily so ... I don't think you get to demand such treatment in the comments on an article that does the exact reverse to someone else.

Comment by gjm on [deleted post] 2024-03-01T01:57:29.215Z

Your justification seems to me almost completely non-responsive to the point I was actually making, which is not about whether it's reasonable to call what the NYT did in these cases "lying" but about whether it's reasonable to call something at level 6 in Scott's taxonomy a "level 6 lie".

Scott classifies utterances into seven types in ascending order of dishonesty. The first four are uncontroversially not kinds of lying. Therefore, something on the sixth level of Scott's taxonomy cannot reasonably be called a "level 6 lie", because that phrase will lead any reader who hasn't checked carefully to think that Scott has a taxonomy of levels of lying, where a "level 6 lie" is something worse than a level 5 lie, which is worse than a level 4 lie, ... than a level 1 lie, with all these things actually being kinds of lies.

Whereas in fact, even if we ignore Scott's own opinion that only "the most egregious cases of 6" (and also all of 7) deserve to be called lies at all, at the absolute worst a level-6 utterance is more-dishonest-than only one lower level of lie.

Further, you called these things "Scott Alexander's criteria for media lies", which is plainly not an accurate description because, again, more than half the levels in his taxonomy are completely uncontroversially not lying at all (and Scott's own opinion is that only the top level and "the most egregious cases of" the one below should be called lying).

So even if you were 100% sincere and reasonable in regarding what the NYT did as ("routinely and brazenly") lying, I do not see any way to understand your alleged application of Scott's taxonomy as a sincere and reasonable use of it. I do not find it plausible that you are really unable to understand that most of its levels are plainly not types of lie. I do not find it plausible that you really thought that something that begins with "reasoning well, and getting things right" followed by "reasoning well, but getting things wrong because the world is complicated and you got unlucky" can rightly be described as "criteria for media lies".

I could, of course, be wrong. Maybe you really are stupid enough not to understand that "according to X's criteria for media lies, Y is a level 6 lie" implies that what X presented is a classification of lies into levels, in which Y comes at level 6. Or maybe the stupidity is mine and actually most people wouldn't interpret it that way. (I would bet heavily against that but, again, I could be wrong.) Maybe you didn't actually read Scott's list, somehow. But you don't generally seem stupid or unable to understand the meanings and implications of words, so I still find it much much more plausible that you knew perfectly well that Scott was presenting a taxonomy of mostly-not-lies, and chose to phrase things as you did because it made what you were accusing the NYT of sound worse. Which is, I repeat, on at least level 6 of Scott's taxonomy.

And, again, none of this is about whether the NYT really did what you say, nor about whether it's reasonable to describe what you said the NYT did was lying. It's entirely about your abuse of Scott's taxonomy, which (1) is not a list of "criteria for media lies" and (2) is not something that justifies calling an utterance at its Nth level a "level N lie".

Comment by gjm on Intuition for 1 + 2 + 3 + … = -1/12 · 2024-02-19T02:50:23.939Z · LW · GW

It is not true that "no pattern that suggests a value suggests any other", at least not unless you say more precisely what you are willing to count as a pattern.

Here's a template describing the pattern you've used to argue that 1+2+...=-1/12:

We define numbers  with the following two properties. First, , so that for each  we can think of  as a sequence that's looking more and more like (1,2,3,...) as  increases. Second,  where , so the sums of these sequences that look more and more like (1,2,3,...) approach -1/12.

(Maybe you mean something more specific by "pattern". You haven't actually said what you mean.)

Well, here are some  to consider. When  we'll let . When  we'll let . And when  we'll let . Here,  is some fixed number; we can choose it to be anything we like.

This array of numbers satisfies our first property: . Indeed, once  we have , and the limit of an eventually-constant sequence is the thing it's eventually constant at.

What about the second property? Well, as you'll readily see I've arranged that for each  we have . So the sequence of sums converges to .

In other words, this is a "pattern" that makes the sum equal to . For any value of  we choose.

I believe there are more stringent notions of "pattern" -- stronger requirements on how the  approach  for large  -- for which it is true that every "pattern" that yields a finite sum yields . But does this actually end up lower-tech than analytic continuation and the like? I'm not sure it does.

(One version of the relevant theory is described at https://terrytao.wordpress.com/2010/04/10/the-euler-maclaurin-formula-bernoulli-numbers-the-zeta-function-and-real-variable-analytic-continuation.)

Comment by gjm on [deleted post] 2024-02-18T03:06:15.264Z

Once again you are making a ton of confident statements and offering no actual evidence. "is a high priority", "they want", "they don't want", "what they're aiming for is", etc. So far as I can see you don't in fact know any of this, and I don't think you should state things as fact that you don't have solid evidence for.

Comment by gjm on [deleted post] 2024-02-18T00:55:45.995Z

Let us suppose that social media apps and sites are, as you imply, in the business of trying to build sophisticated models of their users' mental structures. (I am not convinced they are -- I think what they're after is much simpler -- but I could be wrong, they might be doing that in the future even if not now, and I'm happy to stipulate it for the moment.)

If so, I suggest that they're not doing that just in order to predict what the users will do while they're in the app / on the site. They want to be able to tell advertisers "_this_ user is likely to end up buying your product", or (in a more paranoid version of things) to be able to tell intelligence agencies "_this_ user is likely to engage in terrorism in the next six months".

So inducing "mediocrity" is of limited value if they can only make their users more mediocre while they are in the app / on the site. In fact, it may be actively counterproductive. If you want to observe someone while they're on TikTok and use those observations to predict what they will do when they're not on TikTok, then putting them into an atypical-for-them mental state that makes them less different from other people while on TikTok seems like the exact opposite of what you want to do.

I don't know of any good reason to think it at all likely that social media apps/sites have the ability to render people substantially more "mediocre" permanently, so as to make their actions when not in the app / on the site more predictable.

If the above is correct, then perhaps we should expect social media apps and sites to be actively trying not to induce mediocrity in their users.

Of course it might not be correct. I don't actually know what changes in users' mental states are most helpful to social media providers' attempts to model said users, in terms of maximizing profit or whatever other things they actually care about. Are you claiming that you do? Because this seems like a difficult and subtle question involving highly nontrivial questions of psychology, of what can actually be done by social media apps and sites, of the details of their goals, etc., and I see no reason for either of us to be confident that you know those things. And yet you are happy to declare with what seems like utter confidence that of course social media apps and sites will be trying to induce mediocrity in order to make users more predictable. How do you know?

Comment by gjm on [deleted post] 2024-02-17T23:16:15.379Z

"Regression to the mean" is clearly an important notion in this post, what with being in the title and all, but you never actually say what you mean by it. Clearly not the statistical phenomenon of that name, as such.

(My commenting only on this should not be taken to imply that I find the rest of the post reasonable; I think it's grossly over-alarmist and like many of Trevor's posts treats wild speculation about the capabilities and intentions of intelligence agencies etc. as if it were established fact. But I don't think it likely that arguing about that will be productive.)

Comment by gjm on Opinions survey (with rationalism score at the end) · 2024-02-17T02:54:49.512Z · LW · GW

What's going on is that tailcalled's factor model doesn't in fact do a good job of identifying rationalists by their sociopolitical opinions. Or something like that.

[EDITED to add:] Here's one particular variety of "something like that" that I think may be going on: an opinion may be highly characteristic of a group even if it is very uncommon within the group. For instance, suppose you're classifying folks in the US on a left/right axis. If someone agrees with "We should abolish the police and close all the prisons" then you know with great confidence which team they're on, but I'm pretty sure the great majority of leftish people in the US disagree with it. If someone agrees with "We should bring back slavery because black people aren't fit to run their own lives" then you know with great confidence which team they're on, but I'm pretty sure the great majority of rightish people in the US disagree with it.

Tailcalled's model isn't exactly doing this sort of thing to rationalists -- if someone says "stories about ghosts are zero evidence of ghosts" then they have just proved they aren't a rationalist, not done something extreme but highly characteristic of (LW-style) rationalists -- but it's arguably doing something of the sort to a broader fuzzier class of people that are maybe as near as the model can get to "rationalists". Roughly the people some would characterize as "Silicon Valley techbros".