Posts

Prediction markets and Taxes 2024-11-01T17:39:35.191Z
Edmund's Short form 2022-03-05T02:03:37.694Z

Comments

Comment by Edmund Nelson (edmund-nelson) on Research Notes: Running Claude 3.7, Gemini 2.5 Pro, and o3 on Pokémon Red · 2025-04-21T18:23:47.043Z · LW · GW

It's interesting how much of this was computer vision problems. Inability to look at the screen and determine which set of pixels is the stairs or the complete inability to differentiate cutable tree's from ones that cannot be cut. That part at least seems like the kind of problem that would go away in a year if significant effort was devoted toward the problem.

 

I find it fascinating how this set of children's video games from the 90s does a better job of showing off my frustrations with large language models than anything else. When you give them small concrete narrow tasks and can reliably test their output they are incredibly useful, (ex They are superhuman at helping you write small functions in code)but do not try to get them to do a long context task that you can't test intermediate steps on. (after all if you could test intermediate steps you can break the task down until you get to the smallest intermediate step and prompt the model with that step). The Hallucination problem is a lot more clear when playing pokemon than anything else, and it's much more clear about random issues in agenic ability. 

 

The inability of models to have memory is really the major frustration currently preventing models from being used in longer contexts. Pokemon as a benchmark is in theory a 2-3 hour (sub 2 is possible but takes a lot of resets) long task from start to finish if you don't waste any time. 

Comment by Edmund Nelson (edmund-nelson) on Is Gemini now better than Claude at Pokémon? · 2025-04-20T16:48:30.477Z · LW · GW

>I'm not sure that TAS counts as "AI" since they're usually compiled by humans

Agreed, it's more "this is what the limit looks like"

>Still, I'd say this is more a programmed "bot" than an AI in the sense we care about.

Is stockfish 8 not an AI? I feel like the goalposts of what counts as "Ai" keep getting shifted. Pokebotbad is an "AI" that searches to solve the pokemon state space

Comment by Edmund Nelson (edmund-nelson) on Is Gemini now better than Claude at Pokémon? · 2025-04-20T14:32:09.262Z · LW · GW

I'll mention beating pokemon isn't that big of a challenge in and of itself, what's important here is that this thing that wasn't trained to do pokemon can. *

 

Depending on how strict you want to be with what you call AI beating pokemon we have Ai's that beat pokemon in less than 2 hours or if you want to go with the interpretation that "AI beating pokemon is a program that beats pokemon" we have "Ai's" that beat pokemon in less than 2 minutes or less than 1:30 if you want a more strict definition of "beat the game".  

Comment by Edmund Nelson (edmund-nelson) on Prediction markets and Taxes · 2024-11-02T04:12:37.837Z · LW · GW

Yeah that's fair, I'm just so used to American odds for gambling that I mentally use them all the time for these sorts of things.

Probably should have used good old fashioned odds instead.

 

The reason casino's show something like "Yankee's +110 Red sox -120" is so you can easily see the casino's rake or something. 

Comment by Edmund Nelson (edmund-nelson) on Prediction markets and Taxes · 2024-11-01T21:39:37.380Z · LW · GW

American odds can be basically interpreted as "The Net amount you win after making a $100 bet" So if the odds are +150 you win $150 PLUS your initial $100 bet if you win the bet. 

Negative american odds means "the amount you have to be to win $100" so odds o -230 would mean you have to bet $230 to win $100

Comment by Edmund Nelson (edmund-nelson) on Prediction markets and Taxes · 2024-11-01T21:37:43.497Z · LW · GW

As I said epistemic status "Trivial"

This is something trivial but it's worth noticing.

Comment by Edmund Nelson (edmund-nelson) on Twitter Twitches · 2023-07-05T18:26:05.435Z · LW · GW

The number is way too high for that. I use twitter almost an hour a day (way way too much time) and I don't hit the rate limit

Comment by Edmund Nelson (edmund-nelson) on [Link] A community alert about Ziz · 2023-02-25T10:53:09.192Z · LW · GW

Daniel was definitely strange in many socially awkward ways

Comment by Edmund Nelson (edmund-nelson) on [Link] A community alert about Ziz · 2023-02-25T10:39:22.255Z · LW · GW

Knowing what I know about Daniel, I could easily psychologically manipulate him into doing terrible acts if need be. I can easily see a person who is a master of being a cult leader (which Ziz is a top tier cult leader) mentally breaking Daniel. He (at least was) a socially awkward person who took things extremely literally and was easy to push around. He's definitely up there on the "easy" category, and cult leaders like Ziz need somebody like that. Going over Ziz's tactics, while Daniel is unlikely to have developed multiple personalities, he'd easily enter a few of the states Ziz mentions.  

 

However I have no clue how Daniel would physically be able to commit certain acts his hand eye coordination is bad, and he's physically not very strong. If he tried to knife me to death he'd get knocked out by my overhand right or heel hooked into submission well before he got enough stabs in (I'm not a great boxer), 

Comment by Edmund Nelson (edmund-nelson) on [Link] A community alert about Ziz · 2023-02-25T10:30:33.790Z · LW · GW

That's fair from a mental POV, but Daniel Blank physically has poor hand-eye coordination and bad reflexes, meaning that if he tried to shoot somebody he'd be extremely likely to miss at all but the shortest of ranges. (the murder in question was definitely gunshot based)

 

While he could be a part of the conspiracy his ability to have physically committed the actual act is questionable. 

Comment by Edmund Nelson (edmund-nelson) on Launching a new progress institute, seeking a CEO · 2022-07-19T07:48:47.790Z · LW · GW

Sadly I lack the skills necessary to do such a monumental task otherwise I would apply. What are the "lesser" roles you are looking for?

Comment by Edmund Nelson (edmund-nelson) on Visible Homelessness in SF: A Quick Breakdown of Causes · 2022-05-27T03:17:57.501Z · LW · GW

"I also found that, controlling for rents, the partisanship of a state did not predict homelessness""

Did Partisanship correlate with rents if so what direction?

 

Good post, it makes me think a lot of the homeless crisis is more literally "I can't buy a home". 

Comment by Edmund Nelson (edmund-nelson) on Edmund's Short form · 2022-03-05T02:04:01.536Z · LW · GW

At what point is Ai judged to have "superhuman performance"

From what I can tell there are roughly 4 stages of "Ai performance"

Stage 1 : subhuman (this covers a lot of ground) The Ai is unable to perform as well as a top human in the field (such Ai's can still be useful) 

Stage 2: The "superhuman" stage: Ai's outperform humans on normal variations of the task

Stage 3: The "adversarial stage" : Humans find adversarial examples which let them outperform the AI (This is most relevant in games) ex [starcraft 2](

Example of exploitative play allowing human to beat AI (this AI beat then top human Serral in a best of 5)

)

 

Stage 4 the god phase: Even in adversarial examples humans are outperformed by the computer (ex Chess) 


Obviously Ai's can skip stage 3 entirely, and that does happen but I hear conflicting results on stage 3, many people argue we have superhuman results in starcraft 2, but unless there is an Ai more advanced than blizzcon alphastar, it appears we are on stage 3 (humans can reliably beat the AI with anti-AI tactics, but normal play loses). Is stage 3 generally considered "superhuman"? 

Alphastar vs serral link (is there a way to collapse these?)

Comment by Edmund Nelson (edmund-nelson) on Edmund's Short form · 2022-03-05T00:57:33.088Z · LW · GW
Comment by Edmund Nelson (edmund-nelson) on How to Legally Conduct "Rationalist Bets" in California? · 2022-02-09T10:15:18.599Z · LW · GW

The main way I have gotten around these in practice has been to wager non-monetary items, things like "Whoever loses has to file the other's taxes" The method tends to work much better when you live in the same house (since housework is a very tradeable thing and is always in demand). This obviously has the weakness that it tends to work best for 50/50 wagers and is worse at the full continuum. However there are ways around this (\instead of betting on IF something will happen you can bet on WHEN, or you can bet on HOW MUCH ex "I bet 30 guests will arrive at the party tommrow, " person 2 : "I'll take the under on that")

Comment by Edmund Nelson (edmund-nelson) on How much chess engine progress is about adapting to bigger computers? · 2021-07-12T08:10:40.031Z · LW · GW

The oldest Chess engine you can find for free online is  cray blitz https://craftychess.com/downloads/crayblitz/ which was the world computer chess champion in 1983. Unfortunately A: it is not UCI compatible and B: the oldest version available is the 1989 version. I asked Harry Lewis Nelson himself yesterday and he said that he doesn't have the source code for the 1983 version anymore. Unfortunately any farther back and the original author doesn't appear to be alive anymore. (harry himself is 89)

 

The oldest chess engine that can be called a champion is kaissa https://www.chessprogramming.org/Kaissa however the best I can do is give you a rewrite in turbo C http://greko.su/index_en.html I figure 1977 is as old as you're going to get. Hippke can probably do most of the heavy lifting from here.

I agree with the use of stockfish 11 as it appears to be the best engine with poor hardware, NNUE requires more RAM than actually exists on a 1997 machine.