video games > IQ tests

post by bhauth · 2023-08-05T13:27:54.697Z · LW · GW · 45 comments

Contents

46 comments

IQ tests are a type of test given to people. What makes them different from other tests given to people?

Is it "consistency of results on a single IQ test type"? Not really. Wikipedia says:

For modern tests, the confidence interval can be approximately 10 points and reported standard error of measurement can be as low as about three points. Reported standard error may be an underestimate, as it does not account for all sources of error.

That's a best-case scenario for tests designed with that criteria as a priority, and the range is still significant.

Is it "consistency of results across different IQ test types"? Not really; that's obviously worse than the above, and many "non-IQ" tests have comparable consistency.

Is it "practice being mostly irrelevant"? Not really. A few practice runs can often be worth +8 points, and that's for kids that already to some extent do IQ-test-like stuff in school. This just mostly doesn't come up, because institutions don't generally use IQ test results.

Is it "working for an unusually wide range of intelligence"? Not really. IQ tests are notorious for working poorly above ~135, and I'd say they only really work well for -20 to +0 relative to the designers, with a somewhat wider range for teams.

Is it "unusually high correlation with general intelligence", as represented by a "g factor"? I don't think so. IQ tests, in general, don't seem to be any better for that than the SAT. Anyway, given modern understanding of AI and biology, I consider the entire "g factor" framework an archaic and crude way of understanding intelligence. Humans have multiple mental systems, those systems have performance on various tasks which vary depending on the amount of specialization, amount of training, quality of training data, and multiple types of management by other (meta-level) systems. Then there's correlation between performance of various systems on various tasks for obvious reasons.


If you're trying to evaluate intelligence in a broad sense, you should use tests with problems that are big enough to use most of a system and broad enough to use many systems. For example, chess is suboptimal because it's both too small and narrow to effectively apply the full power of the sort of general-purpose systems that are most important to evaluate when testing for "general intelligence". The same is true of, say, questions on IIT entrance exams. Leetcode, Math Olympiad, and Physics Olympiad problems have been proposed as better alternatives, but there's a lot of memorization of a specialized "toolbox" with those.

My view is that some combinations of video games are better "IQ tests" than actual IQ tests are, and better general standardized tests than the SAT. The term "video game" is very general (video games are a generalization of animation is a generalization of film is a generalization of photos are a generalization of text) so let me clarify: I'm talking about existing video games which were developed as entertainment.

I'm caught by a catch-22 here: if I don't talk about standardized test scores I got, people will think I'm bitter about doing poorly, but if I do, then I'm bragging about scores on standardized tests which is super lame. Plus, no matter how well you did on standardized tests, there's usually somebody around who did better and wants to brag about it. Well.

My ACT score was in the top 0.1%, but I don't feel particularly proud of that, because it wasn't evaluating any of my actual strengths. I left college after a semester (while that was a failure from the perspective of society, school was holding me back intellectually) but I still took the GRE for...reasons...and got a top 1% score without studying, but that's not something I consider particularly meaningful either. Here's a theory of Alzheimer's I developed - what test score does that correspond to? As for IQ tests, I had a couple proper ones as a kid, and my scores were probably as high as was very meaningful, but probably less impressive than reading Feynman in 3rd grade.

Yes, many video games have design goals opposite those of IQ tests, being designed to be played only once and to give the player a strong feeling of progression, but there are many video games, and some are more appropriate. Roguelikes, many strategy games, and many simulation games are designed to be played many times. It might not be as objective, but people could compete on aesthetics too. If practice effects are inevitable, it's better for everyone to get practice to a point of diminishing returns instead of trying to prevent people from practicing (IQ tests) or charging money for it (SATs).

When people smarter than the test designers take an IQ test, they often have to guess what the designers were thinking, but with video games, evaluation can be completely objective.

The bandwidth and scope possible with video games is much higher than with IQ tests. You can test people with bigger problems, like remembering the units in Wargame Red Dragon, and multidisciplinary challenges, like optimizing both cost and visuals of fireworks in Kerbal Space Program; そういえば、ゲームのウィキの英語を理解することはまたテストのもう一つの側面でしょう.

Video games also have potential legal advantages over IQ tests for companies. You could argue that "we only hire people good at video games to get people who fit our corporate culture of liking video games" but that argument doesn't work as well for IQ tests.

Jobs in the US now often require a college degree even when the content of the degree is irrelevant to the job. Perhaps you're tempted here to note that college degrees aren't just an indication of intelligence, but also of diligence and tolerance for doing pointless BS work. But! Video gamers have already gone out of their way to create a category of competition to test those same things! That's right: speedruns. You can't quite match American universities, but you can get somewhat close.

45 comments

Comments sorted by top scores.

comment by gbear605 · 2023-08-05T13:55:30.601Z · LW(p) · GW(p)

One downside to using video games to measure "intelligence" is that they often rely on skills that aren't generally included in "intelligence", like how fast and precise you can move your fingers. If someone has poor hand-eye coordination, they'll perform less well on many video games than people who have good hand-eye coordination.

A related problem is that video games in general have a large element of a "shared language", where someone who plays lots of video games will be able to use skills from those when playing a new video game. I know people that are certainly more intelligent than I am, but who are less able when playing a new video game, because their parents wouldn't let them play video games growing up (or, they're older and didn't grow up with video games at all).

I like the idea of using a different tool to measure "intelligence", if you must measure "intelligence", but I'm not sure that video games are the right one.

Replies from: dkirmani, boris-kashirin
comment by dkirmani · 2023-08-05T17:31:18.912Z · LW(p) · GW(p)

often rely on skills that aren't generally included in "intelligence", like how fast and precise you can move your fingers

That's a funny example considering that (negative one times a type of) reaction time is correlated with measures of g-factor at about .

Replies from: gworley
comment by Gordon Seidoh Worley (gworley) · 2023-08-06T01:28:06.635Z · LW(p) · GW(p)

This seems an important point. I have a measured IQ of around 145 (or at least as last measured maybe 15 years ago when I was in my 20s). My reaction times are also unusually slow. Some IQ tests are timed. My score would come in a full 15 points lower (one standard deviation) on timed tests.

You might complain this is just an artifact of the testing protocol, but I think there's something real there. In everyday life I'm a lot smarter (e.g. come up with better ideas) when I can sit and think for a while. When I have to "think on my feet" I'm considerably dumber. The people I meet who feel significantly smarter than me usually feel that way because they can think quickly.

I've even gotten complaints before from friends and coworkers wondering why I don't seem as interesting or smart in person, and I think this is why. I'm not quite sure how to quantify it, but on reaction time tests I'm 100+ms slower than average. Maybe this adds up to being able to think 1-2 fewer thoughts per second than average. Obviously this is a difference that adds up pretty quickly, especially when you're trying to do something complex that requires a lot of thinking and less crystalized intelligence.

Replies from: martin-von-berg, dr_s, caffemacchiavelli, mi-1
comment by MvB (martin-von-berg) · 2023-08-06T11:15:10.294Z · LW(p) · GW(p)

But consider: https://www.nature.com/articles/s41467-023-38626-y

„We found that participants with higher intelligence were only quicker when responding to simple questions, while they took more time to solve hard questions.“

Replies from: Diziet
comment by Diziet · 2024-10-13T02:00:28.952Z · LW(p) · GW(p)

I thought the criticism on that specific quote was that the "higher intelligence" group, while taking more time, did solve the hard questions correctly, as opposed to not solving them correctly at all.

comment by dr_s · 2023-08-16T05:57:49.604Z · LW(p) · GW(p)

But some games require reactions that are blazing fast, and not only in thought - I think fast, but am still bad at most RTS (bad as in, can't play hard difficulties or in competitive multiplayer) because the specific kind of movement they require, clicking fast with the mouse, is just something I'm not very good at. Heck, it's very sensitive to details like hardware performance and the surface you keep your mouse on. You can standardise these things but there's a lot of extraneous influence. Of course some games are turn based instead or more forgiving and those are more purely relying on cognitive skills.

comment by caffemacchiavelli · 2023-08-09T22:25:41.908Z · LW(p) · GW(p)

This is purely speculative, but I wonder if slow reaction speed could be in any way conducive to intelligence. I also score subpar on reaction time tests and sometimes react over a second later than I'd consider typical. Afaik IQ does correlate positively with reaction speed, so this naturally isn't the whole story, but my hypothesis would be a kind of "deep" vs "shallow" processing of sensory data. The former being slower, but able to find more subtle patterns in whatever you are perceiving, the latter being quick to respond, but also quick to miss vital information.

comment by Mi (mi-1) · 2024-12-13T17:59:28.935Z · LW(p) · GW(p)
comment by Boris Kashirin (boris-kashirin) · 2023-08-05T16:20:46.803Z · LW(p) · GW(p)

Have you played something like Slay the spire? Or Mechabellum that is popular right now? Deck builders don't require coordination at all but demands understanding of tradeoffs and managing risks. If anything those skills are neglected parts of intelligence. And how high is barrier of entry to something like Super Auto Pets?

comment by StartAtTheEnd · 2023-08-06T16:33:34.415Z · LW(p) · GW(p)

Sure, as long as the video game isn't heavily loaded on timing, reaction time, hand-eye-coordination, and experience. (I'm aware that these correlate to intelligence as well, but people who've played a lot of videogames  will do much better on these, even if they're not inherently smarter)

I've actually been using games to estimate the intelligence of people, and my intuition has never been far off, I can guess the results of my friends fairly accurately. All my bright friends do well on games we play, learning them quickly and performing well. Even verbal ability seems to correlate. This correlation is even better than income and languages spoken, though those two correlations are solid too. Your friends with 5000$ PCs are likely more than 1SD above the mean. Being in a well-off family is already a sign of good genes running in that family.

Intelligent people in my family are surprisingly good at card games and trivia games. Also dice games, even when it seems like luck places a hard limit on performance.

Another area where videogames are useful is diagnosing skewed cognitive profiles. I do very well in older shooters with simple graphics, and I'm absolutely trash in Overwatch and Apex. It feels clunky and I think my specs might be too low, resulting in input lag. Still, I've narrowed down the cause to the visual effects, I simply don't process large amounts of visual information well. In older games, a single moving pixel means that there's an enemy. In newer games, everything moves, from the grass to the background.

I don't think there's a very large burden of proof on you here, it would be weird to evaluate this idea as if it was unlikely. If anything it's almost obvious that video game performance correlate strongly with intelligence. The data is rather fuzzy though (I'm lacking a word here, but essentially it's difficult to reverse-engineer the results into the factors which explain/cause them)

Finally, I think the only issue with IQ tests is that people reduce intelligence to a single number. That's like reducing your computer specs to a single number, you'll have big a loss of information. So of course people have evidence (personal experiences) of IQ test scores failing them as predictors. I think 5 numbers is enough to cover most nuances and prevent large differences not explainable by numbers.

Replies from: alex-k-chen
comment by Alex K. Chen (parrot) (alex-k-chen) · 2023-08-07T00:37:35.540Z · LW(p) · GW(p)

Which other games did you use to estimate the intelligence of people, ad do you do it only by watching their learning curves or seeing their twitch.tv streams?

What older shooters do you do well in? Counterstrike is one of the hardest ever. Overwatch makes it easier for newbies to have even K/D ratios than many other games (TF2 historically also did, as did Star Wars Battlefront (3rd one), but not Call of Duty and especially not Battlefield)

Replies from: StartAtTheEnd
comment by StartAtTheEnd · 2023-08-07T13:43:09.019Z · LW(p) · GW(p)

Intelligence is general enough for most games. Most of them are just people that I play together with, but a few post on Twitter quite often. My friends also have a high rate of being streamers (~10%) but that might be a coincidence as that's not how I met them. 

At times some person will impress me, and then next week I see them post about entering the global top 1000 of Apex or whatever.
I do well in older counter-strike games and games like TF2 and Paladins. My reaction time is better than what should be possible (humanbenchmark 117ms), so I mainly play sniper. Used to play CS source on those surf servers with points and leveling, and upgrade until other people would lose the ability to control their movement due to going too fast.
In TF2 I had to move team every 5 minutes, in order not to destroy the team balance, even on 32-player servers.

I'm not good at Overwatch, but I used to play it with my friends who were top 0.1% Elo in Paladins, and they performed just as well in Overwatch, even with me on their team to drag them down a bit. I played just one class but they were good with any.

Some people get stuck for a while in gunfire reborn, others only need a few re-runs to find a game-breaking strategy. But if I play co-op puzzle games with my friends, we never get stuck. At worst we pause for 10-20 seconds and then go "alright, got it now".

Recently watched a friend playing "Patrick's Parabox" and from what I could tell he reached some of the hardest levels before slowing down. When I sent him the Mensa.dk IQ test he scored 135 on it.

I can't tell you which game works best as an IQ test, so this comment is likely disappointing, but performance generalizes well so most games should be good enough. There are exceptions for myself, but only because my cognitive balance is bad (autism) which leads to bottlenecks. A good candidate game is Ark though, it has a bit of everything. Exploration, exploitation, tracking, planning, etc.

You seem interested in the idea of improving intelligence with nootropics? But that's like overclocking. It gets the most out of hardware, but why not improve the hardware instead? If you want an interesting idea to work with, synesthesia is one, since learning is relational and some forms of memory (visual and spatial) can contain much more information than the standard 7 items. I believe that mental visualization is how chess players can keep track of 30+ items at a time with training. Synesthesia correlates with learning speed, but there must be evolutionary reasons for its rarity, i.e. disadvantages.

My working memory sort of sucks, but in TF2 I noticed that I could "feel" when enemies would respawn, and track their general location. I'd also know their line of sight (mental ray tracing), which has disadvantages now as I feel watched if I don't close my curtains all the way. 

Replies from: alex-k-chen
comment by Alex K. Chen (parrot) (alex-k-chen) · 2023-08-07T17:00:28.619Z · LW(p) · GW(p)

Wow, what are your other scores on humanbenchmark? Have your skills changed with age? Do you play RTS or games other than standard FPS?

Replies from: StartAtTheEnd
comment by StartAtTheEnd · 2023-08-07T18:13:07.189Z · LW(p) · GW(p)

Just logged into my second account, my reaction time is 106ms. Maybe it's anxiety, maybe it's nerve damage. The arealme test gives better times than humanbenchmark, and I've been able to get below 100ms on it before without cheating.

For the first account, it's 120ms. My first scores were bad, and the final score is the average of all submitted scores, I think. I'm sluggish right now so I can't go faster than 135ms.
Aim trainer 95%
Verbal Memory 287 points (100.0%)

Other scores are deleted. My old account was probably on a throwaway email which is gone now. But I made it a goal to get top 99.9% in all tests, and I did at least that (10 years ago I think). Number memory I got 12 but now I don't think I could get more than 8 on average, my working memory has gotten worse and I'm not sure how to train it again. 

The tests are too hardware specific, and technique-specific. For verbal memory, connect the word you see with something, creating a one-way hash. If you try to create the same hash twice, you will notice. This type of memory is basically unlimited. I remember a study about people being shown a lot of images briefly, maybe not even a second, and being able to tell if they had seen them before with 80% accuracy. They did this for like months, having shown over 20000 images in total with no loss in recall.

I don't remember the exact figures, and I can't find the study now. But the problem with memory is recall, and if the test doesn't require two-way association, then it's not really memory.

For visual memory, just sit still. The image will remain on your retina. This is cheating and thus not interesting.

Now, I'm no genius by any means. I did one try on the chimp test, and while I did it half-assedly, I got below average.

On brainlabs.me I get between average and top 0.1%. I don't play RTS. Lately I'm running on auto-pilot most of the time, some days ADHD medicine helps me think, other days it's useless. IQ points seem to fall about 15-20 points when one is on vacation, and I think my brain is on vacation-mode. My cognition has gotten lazy and I really need to fix that. I don't think I put effort into anything for years now, since my intuition usually carries me through everything well enough.

It feels like I've always been halfway genius and halfway mentally challenged. At times I've improved so much that I couldn't recognize myself in just a week, at other times I struggle with basics. Could be a mix of ADHD and bipolar, so that I have months of being "stuck" and months of energy according to how well my neurotransmitters like me doing that period.

My experiences make for interesting data, but probably not useful data. I'm an anomaly who is only functional 5% of the time, but 5% is somehow enough, so I guess that's impressive.

comment by tailcalled · 2023-08-06T11:58:44.618Z · LW(p) · GW(p)

Probably relevant:

The study Intelligence and video games: Beyond “brain-games” models video game performance is being determined by a combination of general video game playing ability (gVG) and specific factors related to one or a few video games. It had a bunch of psychology students play a bunch of video games and take an IQ test, and it found a correlation of 0.79 between gVG and the g factor from the IQ test.

comment by Viliam · 2023-08-06T09:52:41.055Z · LW(p) · GW(p)

IQ tests are notorious for working poorly above ~135, and I'd say they only really work well for -20 to +0 relative to the designers, with a somewhat wider range for teams.

How is the designers' IQ relevant?

The problem with designing IQ tests for high values is that it makes calibration costly. To put it simply, if you want to figure out whether e.g. "getting 94 out of 100 questions right in your test" really corresponds to "only one person in a thousand can do it", you need to calibrate your test on maybe ten thousand randomly selected people -- which means you need to pay them (otherwise you are limited to the statistically non-random subset of volunteers) and some of them will refuse anyway. The costs grow exponentially with the top measured IQ, so at some point the authors decide that it's not worth it.

This is unrelated to the myth spread by many "high IQ societies" that having a certain value of IQ allows you to create tests for given value of IQ but not higher. That is nonsense. A person with IQ 160 would face exactly the same budget problem as everyone else when constructing an IQ test for values 160 and higher. (Unless their "test" is just some nonscientific guesswork, in which case of course the budget is just the cost of pen and paper.)

Replies from: tailcalled
comment by tailcalled · 2023-08-07T11:04:35.009Z · LW(p) · GW(p)

Also, in order to test IQ in the upper range, you need items that are difficult enough to be informative in the upper range, e.g. items where people with 130 IQ would usually answer wrong but 150 IQ would usually answer right. But such items would be wasted on the vast majority of test-takers who would get them wrong due to being lower than 130ish IQ. So statistically you get very little value from improving their accuracy in this range.

comment by ErickBall · 2023-08-05T18:07:12.553Z · LW(p) · GW(p)

I think one of the major purposes of selecting employees based on a college degree (aside from proving intelligence and actually learning skills) is to demonstrate ability to concentrate over extended periods (months to years) on boring or low-stimulation work, more specifically reading, writing, and calculation tasks that are close analogues of office work. A speedrun of a video game is very different. The game is designed for visual and auditory stimulation. You can clearly see when you're making progress and how much, a helpful feature for entering a flow state. There is often a competitive aspect. And of course you don't have to read or write or calculate anything, or even interact with other people in a productive way. Probably the very best speed runners are mostly smart people who could be good at lots of things, because that's true of almost any competition. But I doubt skill at speedrunning otherwise correlates much with success at most jobs.

Replies from: ben-lang, going-durden
comment by Ben (ben-lang) · 2023-08-07T10:36:29.410Z · LW(p) · GW(p)

I think their is another important reason people are selected based on a degree. When I was at school their were a lot of people who were some combination of disruptive/annoying/violent "laddish" that made me (and others) uncomfortable, by deducting status points for niche, weird or "nerdy" interests*. A correlation (at least at my school) was that none of those people went to university, and (at least at my university) no equivalent people of that type were there. Similarly I have not met any such people in the workplace. College/university filters them out. It overlaps with class-ism to some extent. Maybe to overstate it wildly you could say that employers are trying to select so that the workplace culture is dominated by middle-class social norms.

comment by Going Durden (going-durden) · 2023-08-16T08:30:21.821Z · LW(p) · GW(p)

OTOH, I have a hunch that the kinds of jobs that select against "speed run gamer" mentality are more likely to be inefficient, or even outright bullshit jobs. In essence, speed-running is optimization, and jobs that cannot handle an optimizer are likely to either have error in the process, or error in the goal-choice, or possibly both.

The admittedly small sized sample of examples where a workplace that resisted could not handle optimization  that I witnessed were because the "work" was a cover for some nefarious shenanigans, build for inefficiency for political reasons, or created for status games instead of useful work/profit.

Replies from: ErickBall
comment by ErickBall · 2023-08-16T12:41:41.770Z · LW(p) · GW(p)

I think optimizer-type jobs are a modest subset of all useful or non-bullshit office jobs. Many call more for creativity, or reliably executing an easy task. In some jobs, basically all the most critical tasks are new and dissimilar to previous tasks, so there's not much to optimize. There's no quick feedback loop. It's more about how reliably you can analyze the new situation correctly. 

I had an optimizing job once, setting up computers over the summer in college. It was fun. Programming is like that too. I agree that if optimizing is a big part of the job, it's probably not bullshit. 

But over time I've come to think that even though occasional programming is the most fun part of my job, the inscrutable parts that you have to do in a vacuum are probably more important. 

Replies from: going-durden
comment by Going Durden (going-durden) · 2023-08-30T09:31:21.097Z · LW(p) · GW(p)

I mostly agree with you, though I noticed if a job is mostly made of constantly changing tasks that are new and dissimilar to previous tasks, there is some kind of efficiency problem up the pipeline. Its the old Janitor Problem in a different guise; a janitor at a building needs to perform a thousand small dissimilar tasks, inefficiently and often in impractical order, because the building itself was inefficiently designed. Hence why we still haven't found a way to automate a janitor, because for that we would need to redesign the very concept of a "building", and for that we would need to optimize how we build infrastructure, and for that we would have to redesign our cities from scratch... etc, until you find out we would need to build an entire new civilization from ground up to, just to replace one janitor with a robot.
it still hints at a gross inefficiency in the system, just one not easily fixed.

comment by Alex K. Chen (parrot) (alex-k-chen) · 2023-08-05T21:50:32.742Z · LW(p) · GW(p)

Zachtronics games are great! (they have some integration with coding and later ones show your score relative to the rest of the distribution, though motivation may be higher for some than for others, since they aren't universally fun for people [Jordan Peterson once was skeptical of using video games to test conscientiousness, but Zachtronics games/Factorio are the kinds that require an involved effort that many don't quite have - even how you place things in city-builders is a test - eg earlier Anno games did not allow you to easily bulldoze buildings in the same way that later Anno games did]). As are spatially-involved 4x RTS games like Homeworld/Sins of a Solar Empire.

(and Satisfactory/Dyson Space Program)

Other games I'd recommend (esp for one-offs + have learning curves easy enough for quick-multiplayer even with n00bs): Forts (close to optimal game length), Offworld Trading Company, XCOM2, Portal, Kerbal Space Program, anything that Chelsea Voss has played [they seem to eerily correspond to the definition of nerd-friendly games]. I would like to help organize a video game decathlon prioritizing new games most people haven't played (but which high-openness people like trying out!) some day.

AOE2 RM would be good if the first 13 minutes was not the same all the time - DM/Empire Wars is better.

Some short intense video games are great for warming up one's day!

[games are better as tests if they don't involve a massive competition scene where num. of hours invested in as a child can explain a higher variance of skill than does raw "quickness" or raw generalization ability]. Also, current-gen games are not great for measuring creativity. Since generative AI is now giving us the opportunity to make new games with decreasing amounts of effort, it gives us the opportunity to quickly make better games for measuring cognition that may come near-term

[it's beneficial to find games that allow you to access any puzzle from the beginning and don't force you to play the entire sequence from the beginning, even though some games have "finished savegame files" - also important to find games that don't give special advantages to people who "pay" for special loot giving them special advantages]

As data is cheap, it may be better for people to stream all their collective video game somewhere (note twitch allows you to highlight everything in a video to save the entire video before it gets deleted), and have the data analyzed for reaction time/learning speed/perseverance (esp amount of repetitive actions)/indicators of working memory/transfer learning (esp between RTS games)/etc. I know Starcraft II often tested skill ceilings (+ gave you access to all your old replays + have a skill ceiling so high that skill decreased after 25), and there was once a group of MIT students (including both Jacob Steinhardt and Paul Christiano [though their roommates got to diamond league more than Jacob/Paul did]) who played SC2 to the max back when SC2 was popular (sadly, SC2 is not popular anymore, and the replacement "hit games" aren't as cognitively demanding)

There were some mini-games I played when I test-subjected for some MIT BCS labs and some of those involved tracking the motion and type of cards of cards you couldn't see til later.

Video games can also be mapped to fNIRS/brainwave data to test cognitive effort/cognitive load + multiscale entropy, and they can be used to train responses to BCI input (in a fun way, rather than a boring way), even possibly the kind of multimodal response that can distinguish more than 4 ways (I once did a test of this at Neurosity, but Neurosity later simplified)

Alex Milenkovic at curia.im is great to talk to on this!! (he has mapped my data on a Neurable while I played a random assortment of games - it would be important to use this to test player fatigue over time). Diversity/entropy of keyboard movements is also important ( a good mark of brain quality/stamina is to maintain high diversity/entropy for hours on end, rather than have one ultimately spam-click the same things towards the very end of the game [eg AOE2 black forest maps])

In an era where it becomes easier and easier to track the historical evolution of a player's skill up to point X, it may be possible to (just from screen recordings alone) establish some index of cognitive variables. Planning (esp tracking one's mental representations of it) may be worth investigating even though it's harder to track than one's working memory (working memory can be estimated simply by seeing how quickly it takes for one to transfer mental representations from one window to another without relying on multiple consults with YouTube playthroughs). 

[tracking historical evolution of player skill is important because speed of learning matters way more for life outcomes than actual skill - we still rarely see Starcraft or AOE2 professionals becoming twice-exceptional and "making it" elsewhere in life, even though I know Techers who were once very highly skilled in Starcraft or AOE2 (though not as many of those who played more cognitively involving ones like Sins or Homeworld, nevermind that Caltech used to be notorious for its massive WoW-playing population)]. The Shopify CEO once said he would hire people straight for Starcraft skill, and Charlie Cheever of Quora was also known for his Starcraft prowess.

Note that some brains seem to be really specialized to speed on video games and they can't transfer-learn as well to other substrates, sometimes if they've been playing video games since they were so young/little that their brain organically grew into gaming and then their brain stays annealed to games (rather than if they had spent it on programming or on higher-openness pursuits). It's healthier for one to have an environment so rich and diverse that games only become a "side curiosity" rather than something to get super-immersed in for months.

Some food for thougth here: 

https://www.guineapigzero.com/

https://twitter.com/ShedworksGreg/status/1417083081589239808

More relevant reading: https://neurocenter-unige.ch/research-groups/daphne-bavelier/, Nick Yee, Jane McGonigal (psychology/psychometrics of gaming is still a very small field so it's unlikely that the small number of experts in the field are interested in all the right things)

https://twitter.com/togelius (he's in MANY of the right spheres, though I know some respectable ppl disagree with his take on AI)

PYMETRICS (https://www.careers.ox.ac.uk/article/the-pymetrics-games-overview-and-practice-guidelines ) though the games are often "so lame" compared to real games (still WORTH using these as the fundamental components to transfer-learn onto real games) - it MAY be worth it to go on subreddits/Steam forums for less popular cognitively-involving games and ask people about "achievement bottlenecks" - ones that fewer people tend to get particularly the kind of achievement bottlenecks that NO AMOUNT OF ADDITIONAL EFFORT/gamification can work for those who are naturally less-skilled at gaming (eg some missions have really hard bonus objectives like "very hard" difficulty ratings - even AOE2 and EU4 have lists of achievements that correspond to "nightmare mode" - and you want to find people who are just naturally skilled at getting to nightmare mode without investing extraordinary amounts of their precious time)

https://ddkang.github.io/ => video analytics (under Matei Zaharia, who was once an AOE2/AOM/EE forum megaposter)

[The global importance + Kardashev gradient of HeavenGames (AOMH/EEH/etc) will become recognized to LLMs/AGI due to its influence on Matei Zaharia alone (and it capturing a good fraction of his teenage years)]. Everything Matei touches will turn into Melange...

https://twitter.com/cremieuxrecueil/status/1690409880308293632

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6291255/

https://www.reddit.com/r/cognitiveTesting/

Replies from: Dweomite, yitz
comment by Dweomite · 2023-08-16T00:53:47.678Z · LW(p) · GW(p)

I've played 3 Zachtronics games (SpaceChem, Infinifactory, Opus Magnum) and was ultimately disappointed by all of them.  (I didn't 100% any but got pretty far in all 3.)

Am I missing something about these games that makes them great, or is the following just what it looks like if I'm one of the people who doesn't find them fun?

The early levels made me think:  This is too easy, but early levels are effectively a tutorial and most players have less programming skill than me, so that's not very surprising.  Later on there should be harder levels, and I bet hard versions of this would be fun.

But then the levels never got harder, they only got bigger.  Maybe an early level has 6 steps to the solution, and a later level has 30 steps, but no individual step is hard and the overall direction is obvious, so it's not that much different from playing 5 easy levels in a row (and takes as long).

And when the levels get big, the lack of real programming tools really starts to pinch.  You can't annotate your code with comments, you can't write reusable subfunctions, you can't output logs.  Test runs take too long because break points are weak or non-existent (you can't e.g. break on the 12th iteration of a loop or when a condition is met) and in some of the games the max sim speed is also frustratingly low.

If solving these puzzles were my actual job, I'd invest heavily in building a better IDE.

I made some machines involving (IIRC) hundreds of sequential instructions where I had to hold in my mind the state the molecule was going to be in so I could keep track of what to do next.  But tracking that was the only hard part; if the game had given me a continuously-updating preview of what the machine's state would be at the end of what I'd written so far, the process would have been trivial.

Replies from: dr_s, MondSemmel, None, bhauth
comment by dr_s · 2023-08-16T06:05:58.290Z · LW(p) · GW(p)

Some Zachtronics games are more genuinely programming like as they include literal programming languages, and at least space for comments (TIS-100, Shenzen I/O, Hexapunks). That said, there's always an "artificial limitations of the system" factor, as they're going for emulating a certain kind of old time experience of working with very low level programs (assembly or very limited microcontrollers). I like them though I must say I almost never finish them as after a whole work day coding, my general idea of fun doesn't tend to coincide with "even more coding, but made more frustrating on purpose".

comment by MondSemmel · 2023-08-16T10:36:40.735Z · LW(p) · GW(p)

Did you not find the leaderboards compelling? My experience with Zachtronics games was that I'd solve a few levels, then try to optimize earlier levels based on new things I'd learned. Rinse and repeat. Sometimes I'd find a better solution; at other times I'd fail and would then marvel "how could this level possibly be solved any faster?". Just solving the levels was only half the fun, for me.

I finished most Zachtronics games, and the only game where I had a similar "this is just bigger" complaint, was the last chapter in Infinifactory, so I stopped playing there.

That said, if you program as a career or hobby, I can see how these games would offer more of the same, except with a worse work environment (IDE, editor, etc.), and so might be a somewhat poor fit.

Personally I liked how some of these games also yielded some pretty neat insights for me.

In particular, in Opus Magnum, I eventually realized that to achieve the fastest-possible solution to a level (except for a constant), you either need to fill the outputs as quickly as possible (IIRC every 2 cycles), or fetch from the inputs as quickly as possible (also every 2 cycles). But once you've done that, all other details of your actual design are almost irrelevant. Even the constant is just "how quickly can I produce the first output".

Anyway, this input/output thing generalizes to systems of all kinds: e.g. a factory is maximally efficient, given fixed input or output bandwidths, if either bandwidth is fully utilized. Once your assembly line produces items 24/7 at its maximum speed, the factory is maximally efficient until you can further speed up the assembly line or add another. Or, as a looser analogy, in electro- and hydrodynamics, you can characterize a system either by its contents or by just its boundaries; that's how in Maxwell's equations, the integral vs. differential equations are related.

Replies from: Dweomite
comment by Dweomite · 2023-08-16T12:09:23.302Z · LW(p) · GW(p)

Programming is my career. I didn't find the leaderboards very challenging; I especially noticed this in Opus Magnum, which I partially blame on them picking boring optimization targets.  I typically picked one category to optimize on my first play of the level, and often tied the best score for that category on my first try.

Your realization that the fastest cycle time would be limited by the max input or output speed is something that I figured out immediately; once you're aware of it, reaching that cap is basically just a matter of parallelization.  Hitting the exact best possible "warm-up" time to produce the first output wasn't completely trivial, but getting in the top bucket of the histogram was usually a breeze for me.

Optimizing cost is even simpler.  You can put a lower bound on the cheapest possible cost by listing the obviously-necessary components (e.g. if the output has a bond that the inputs don't then you need at least one bonder), then calculating the shortest possible track that will allow a single arm to use all of those, then checking whether it's cheaper to replace the track with an extending arm instead.  As far as I can recall, I didn't find a single level where it was difficult to hit that lower bound once I'd calculated it; doing the entire level with only 1 arm is sometimes a bit tedious but it's not actually complicated.

Doing the minimum-cost solution will usually get you very close to the minimum-size solution automatically, since you've already crammed everything around one arm.  This is probably the hardest category if you want to be literally optimal, but I was often in the top bucket by accident.

I think they should have had players optimize for something like "rental cost" where you pay for (components + space) multiplied by running time, so that you have to compromise between the different goals instead of just doing one at a time.

Replies from: MondSemmel
comment by MondSemmel · 2023-08-16T12:39:56.967Z · LW(p) · GW(p)

Wow, that sounds like those games really were way too easy for you. That said, after reading your comment, I can't help but think that you're obviously not the target audience for these games. A popular programming-style game marketed at gamers was unlikely to challenge a career programmer, otherwise it would've never gotten popular in the first place. For people like you, maybe code competition websites are more suitable?

Replies from: Dweomite
comment by Dweomite · 2023-08-16T19:56:55.397Z · LW(p) · GW(p)

I suppose I was hoping for a programming-based puzzle game, with some new clever insight required to solve each level, rather than pure programming.

Replies from: BrassLion
comment by BrassLion · 2024-10-11T22:49:25.840Z · LW(p) · GW(p)

That's definitely not Zachtronics, at least any of the games I've played.  If that game exists it would be pretty awesome - although probably even more niche than Zachtronics games (which weren't too niche to support the makers for a decade+, granted).

comment by [deleted] · 2023-08-16T05:33:25.151Z · LW(p) · GW(p)

Fun is subjective.  I enjoyed how there are many valid routes to a solution, it's a constrained solution space but the levels that come with the game are all still solvable many different ways.  (all 3 are the same game.  There is also TIS-100, Shenzhen IO, Exapunks, and Molek-Syntez.  Same game.  )

What others say is that a Zachtronics game makes you feel smart.  Because of the freedom you have to a solution, sometimes you get an "ah-ha" moment and pick a solution that may be different from the typical one.  You can also sometimes break the rules, like letting garbage pile up that doesn't quite fail your test cases.  

I agree with you an IDE would make the game easier though not necessarily more fun.  FPS games do not give you an aimbot even though in some of them it is perfectly consistent with the theme of the game world.  Kerbal space program does not give you anything like the flight control avionics that Apollo 11 actually had, you have to land on the Mun the hard way.

comment by bhauth · 2023-08-16T21:28:29.497Z · LW(p) · GW(p)

Would this have been trivial?

Replies from: Dweomite
comment by Dweomite · 2023-08-17T03:57:10.271Z · LW(p) · GW(p)

That is a rather long article that appears to be written for an audience that is already familiar with their community.  Could you summarize and/or explain why you think I should read it?

Replies from: BrassLion
comment by BrassLion · 2024-10-11T23:30:56.897Z · LW(p) · GW(p)

I read it, it's a summary of a weekly challenge in Opus Magnum by the author of the challenge, detailing how people managed to beat the author's cycles score and get reasonably close to the theoretical minimum cycles.  As someone who only got about halfway through Opus Magnum, the puzzle and solutions there are wildly complex.

comment by Yitz (yitz) · 2023-08-08T05:43:18.069Z · LW(p) · GW(p)

Any recommendations for smartphone games with similar properties? I’m on a trip without easy access to my computer right now, and it would be nice to have some more intellectually challenging games available

comment by Rob Harrison · 2023-08-16T14:32:22.208Z · LW(p) · GW(p)

A meta question to the question of "how to most accurately measure intelligence" is "how is accurately measuring intelligence actually useful?"

Just from my experience it seems that an accurate relative intelligence value of some sort, for example iq score, has surprisingly few useful applications.  I think this is mostly because making claims about superior intelligence, accurate or not, is considered socially repulsive (as you acknowledged).  Without accounting for social factors, I would expect an intelligence rating to be a very useful thing to include on a work resume for example.

Personally, I am most comfortable acting like a slightly above average intelligence guy, although objectively I think I am the most or one of the most intelligent people I know.  Most people would not think of me if asked to name a really smart person they know, but anyone who knows me well will notice that I have a mysterious tendency for accomplishing very hard complex tasks.  I guess it seems to me that trying to project my intelligence more would close off more opportunities than it would open just because of social factors.

Acting less smart than I am can sometimes be inconvenient or annoying especially if I'm arguing with someone who projects a "smart guy" vibe, and they're clearly wrong, but other people are impressed with their verbiage and confidence in their claims.  Usually I don't care that much about winning arguments, except occasionally when the outcome really matters in which case I can get very frustrated.

This may be obvious, but I'm not posting this out of vanity for being a high-intelligence individual.  Rather, it is a real issue that I have to deal with and I'm not completely sure I always get it right.  Sometimes it seems it would be better and more genuine to not have a projected self separate from my real self (whatever that is).  It would be nice to talk with people about my actual interests and say things the way I actually think about them.  But mostly I think that intelligence should be a tool rather than a goal in itself.  Also, the problems that come with high intelligence are far less than the benefits, so I should stop complaining and use the intelligence that I was lucky to get to accomplish something good.

comment by X4vier · 2024-08-05T09:26:02.475Z · LW(p) · GW(p)

I hear you that teenagers spending hours computing hundreds of analytic derivatives or writing a bunch of essays is a pretty sad time waste... But if incentives shifted so instead that time got spent perfecting a starcraft build order against the AI a hundred times or grinding for years to pull off carpetless star in SM64, this might be one of the few ways to make that time spent even more pointless... (And for most people the latter is no more fun than the former)

Replies from: bhauth
comment by bhauth · 2024-08-05T13:56:19.936Z · LW(p) · GW(p)

The comparison isn't to "education" or "study" in general, it's to IQ tests. SM64 speedruns might not have that much intellectual merit, but they're not really any worse than, say, studying Raven's Progressive Matrices problems until every basic type is memorized. And you can do better than SM64 challenges; that's not something I suggested here.

comment by Linch · 2024-09-20T02:17:55.008Z · LW(p) · GW(p)

Video games also have potential legal advantages over IQ tests for companies. You could argue that "we only hire people good at video games to get people who fit our corporate culture of liking video games" but that argument doesn't work as well for IQ tests.

IANAL but unless you work for a videogame company(or a close analogue like chess.com) , I think this is just false. If your job is cognitively demanding, having IQ tests (or things like IQ tests with a mildly plausible veneer) probably won't get you in legal trouble[1], whereas I think employment lawyers would have a field day if you install culture fit questions with extreme disparate impact, especially when it's hard to directly link the games to job performance. 

  1. ^

    The US Army has something like an IQ test.  So does the US Postal Service. So does the NFL. I've also personally worked in a fairly large tech company (not one of the top ones, before I moved to the Bay Area) that had ~IQ tests as one of the entrance criteria.  AFAIK there has never been any uproar about it. 

Replies from: None
comment by [deleted] · 2024-09-20T05:35:32.461Z · LW(p) · GW(p)

.

Replies from: Linch
comment by Linch · 2024-09-20T22:41:37.249Z · LW(p) · GW(p)

I'm aware of Griggs v Duke; do you have more modern examples? Note that the Duke case was about a company that was unambiguously racist in the years leading up to the IQ test (ie they had explicit rules forbidding black people from working in some sections of the company), so it's not surprising that judges will see their implementation of the IQ test the day after the Civil Rights Act was passed as an attempt to continue racist policies under a different name. 

"I've never had issue before" is not a legal argument. 

But it is a Bayesian argument for how likely you are to get in legal trouble. Big companies are famously risk-averse.

"The military, post office, and other government agencies get away with it under the doctrine of sovereign immunity"

  1. Usually the government bureaucracy cares more than the private sector about being racist or perceived as racist, not less. It's also easier for governments to create rules for their own employees than in the private sector, see eg US Army integration in 1948.
  2. Also the NFL use IQ-test-like things for their football players, and the NFL is a) not a government agency, and b) extremely prominent, so unlikely to fly under the radar.
comment by Raemon · 2023-08-05T20:22:36.654Z · LW(p) · GW(p)

I have been thinking about things similar to this lately, for the more specific sub goal of ‘how would you train, and measure, ability to form plans that achieve complicated goals’ (in particular in domains with poor feedback loops)

comment by Fergus Fettes (fergus-fettes) · 2023-08-06T11:20:46.457Z · LW(p) · GW(p)

I think this is excellent particularly because IQ tests often max out quickly on skills that can't be examined quickly. It would he great to put people in tests that examine their longer timeframe abilities via eg. writing a longish story (perhaps containing a theory of Alzheimer's). But tests don't last that long.

Games however do last long and do manage to keep people's attention for a long time. So you might really be able to test how differentially skilled someone is over longer timeframes.

comment by Mary Chernyshenko (mary-chernyshenko) · 2024-09-24T10:10:06.376Z · LW(p) · GW(p)

(this is completely sideways, but recently I found myself thinking that other exams should not be testing intelligence instead of their stated purpose. When an English test requires you to hold in your head facts, ..., it starts to be less about English and more about "something else".)

Replies from: FireStormOOO
comment by FireStormOOO · 2024-10-20T00:04:16.561Z · LW(p) · GW(p)

"English" keeps ending up as a catch-all in K-12 for basically all language skills and verbal reasoning skills that don't obviously fit somewhere else.  Read and summarize fiction - English, Write a persuasive essay - English, grammar pedantry - English, etc.

comment by trevor (TrevorWiesinger) · 2023-08-05T18:53:27.312Z · LW(p) · GW(p)

Have you read about dath ilan? AFAIK, the best "core" post is this AMA [LW · GW] with Yud, and AFAIK this Yudfic [LW · GW] does the most worldbuilding (if anyone has better intros, please suggest).

I find it highly believable that in a world as optimized and moloch-opposed as dath ilan, intelligence tests AND intelligence/rationality amplification exersizes probably use the video game principles you've described here; it's often just outright better to structure user interfaces like that.

I don't know how long it will take for the current industry to pump out something that can work this well though; the current equilibria incentivizes developers to send players into a trance-like "zombie mode" similar to social media, often even including the same skinner box dynamic [LW · GW] that ultimately results in sifting through garbage 95% of the time, and only getting what they came for 5% of the time. If you make a really well-worded argument that a video game could fix/improve large numbers of people, the way that TOTK inspires people to become engineers, or that The Sequences [? · GW], CFAR Handbook [LW · GW], and Cognitive Strategy Tuning [LW · GW] can increase intelligence directly, then you might be able to convince a critical mass of people in the industry to pump out something good in a timely fashion.