Posts

Comments

Comment by arabaga on Getting 50% (SoTA) on ARC-AGI with GPT-4o · 2024-06-19T08:27:24.668Z · LW · GW

I agree that there is a good chance that this solution is not actually SOTA, and that it is important to distinguish the three sets.

There's a further distinction between 3 guesses per problem (which is allowed according to the original specification as Ryan notes), and 2 guesses per problem (which is currently what the leaderboard tracks [rules]). 

Some additional comments / minor corrections:

The past SOTA got [we don't know] on the first, 52% on the second, and 34% on the third.

AFAICT, the current SOTA-on-the-private-test-set with 3 submissions per problem is 37%, and that solution scores 54% on the public eval set.

The SOTA-on-the-public-eval-set is at least 60% (see thread).

Apparently, lots of people get worse performance on the public test set than the private one

I think this is a typo and you mean the opposite.

From looking into this a bit, it seems pretty clear that the public eval set and the private test set are not IID. They're "intended" to be the "same" difficulty, but AFAICT this essentially just means that they both consist of problems that are feasible for humans to solve.

It's not the case that a fixed set of eval/test problems were created and then randomly distributed between the public eval set and private test set. At your link, Chollet says "the [private] test set was created last" and the problems in it are "more unique and more diverse" than the public eval set. He confirms that here:

This is *also* likely in part due to the fact that the eval set contains more "easy" tasks. The eval set and test set were not calibrated for difficulty. So while all tasks across the board are feasible for humans, the tasks in the test set may be harder on average. This was not intentional, and is likely either a fluke (there are only 100 tasks in the test set) or due to the test set having been created last."

Bottom line: I would expect Ryan's solution to score significantly lower than 50% on the private test set. 

Comment by arabaga on LessWrong's (first) album: I Have Been A Good Bing · 2024-04-01T15:21:39.897Z · LW · GW

You can directly write/paste your own lyrics (Custom Mode). And v3 came out fairly recently, which is better in general, in case you haven't tried it in a while.

Comment by arabaga on LessWrong's (first) album: I Have Been A Good Bing · 2024-04-01T15:13:54.507Z · LW · GW

They seem to be created by https://app.suno.ai/ And yes, it is really easy to create songs - you can either have it create the lyrics for you based on a prompt (the default), or you can write/paste the lyrics yourself (Custom Mode). Songs can be up to ~2 minutes long I think.

Comment by arabaga on Prediction markets are consistently underconfident. Why? · 2024-01-12T05:05:03.856Z · LW · GW

Yeah, this seems to be a big part of it. If you instead switch it to the probability at market midpoint, Manifold is basically perfectly calibrated, and Kalshi is if anything overconfident (Metaculus still looks underconfident overall).

Comment by arabaga on OpenAI Staff (including Sutskever) Threaten to Quit Unless Board Resigns · 2023-11-21T04:39:55.181Z · LW · GW

No, the letter has not been falsified.

Just to clarify: ~700 out of ~770 OpenAI employees have signed the letter (~90%)

Out of the 10 authors of the autointerpretability paper, only 5 have signed the letter. This is much lower than the average rate. One out of the 10 is no longer at OpenAI, so couldn't have signed it, so it makes sense to count this as 5/9 rather than 5/10. Either way, it's still well below the average rate.

Comment by arabaga on OpenAI Staff (including Sutskever) Threaten to Quit Unless Board Resigns · 2023-11-21T00:59:26.042Z · LW · GW

Ah, nice catch, I'll update my comment.

Comment by arabaga on OpenAI Staff (including Sutskever) Threaten to Quit Unless Board Resigns · 2023-11-20T23:30:01.906Z · LW · GW

There is an updated list of 702 who have signed the letter (as of the time I'm writing this) here: https://www.nytimes.com/interactive/2023/11/20/technology/letter-to-the-open-ai-board.html (direct link to pdf: https://static01.nyt.com/newsgraphics/documenttools/f31ff522a5b1ad7a/9cf7eda3-full.pdf)

Nick Cammarata left OpenAI ~8 weeks ago, so he couldn't have signed the letter.

Out of the remaining 6 core research contributors:

  • 3/6 have signed it: Steven Bills, Dan Mossing, and Henk Tillman
  • 3/6 have still not signed it: Leo Gao, Jeff Wu, and William Saunders

Out of the non-core research contributors:

  • 2/3 signed it: Gabriel Goh and Ilya Sutskever
  • 1/3 still have not signed it: Jan Leike

That being said, it looks like Jan Leike has tweeted that he thinks the board should resign: https://twitter.com/janleike/status/1726600432750125146

And that tweet was liked by Leo Gao: https://twitter.com/nabla_theta/likes

Still, it is interesting that this group is clearly underrepresented among people who have actually signed the letter.

Edit: Updated to note that Nick Cammarata is no longer at OpenAI, so he couldn't have signed the letter. For what it's worth, he has liked at least one tweet that called for the board to resign: https://twitter.com/nickcammarata/likes

Comment by arabaga on Altman firing retaliation incoming? · 2023-11-19T02:46:06.941Z · LW · GW

It seems like a strategy by investors or even large tech companies to create a self-fulfilling prophecy to create a coalition of OpenAI employees, when there previously was none.

How is this more likely than the alternative, which is simply that this is an already-existing coalition that supports Sam Altman as CEO? Considering that he was CEO until he was suddenly removed yesterday, it would be surprising if most employees and investors didn't support him. Unless I'm misunderstanding what you're claiming here?

Comment by arabaga on The Gods of Straight Lines · 2023-10-15T01:08:18.080Z · LW · GW

If you follow the link, under the section "Free Market Seen as Best, Despite Inequality", Vietnam is the country with the highest agreement by far with the statement "Most people are better off in a free market economy, even though some people are rich and some are poor" (95%!)

That being said, while it is the most pro-capitalism country, it is clearly not the most capitalist country (although it's not that bad, 72nd out of 176 countries ranked: https://www.heritage.org/index/ranking), and it would likely be more capitalist today if South Vietnam had won.

Comment by arabaga on State of Generally Available Self-Driving · 2023-08-24T19:34:06.457Z · LW · GW

Small typo/correction: Waymo and Cruise each claim 10k rides per week, not riders.

Comment by arabaga on A short calculation about a Twitter poll · 2023-08-15T19:06:31.130Z · LW · GW

Note that another way of phrasing the poll is:

Everyone responding to this poll chooses between a blue pill or red pill.

  • if you choose red pill, you live
  • if you choose blue pill, you die unless >50% of ppl choose blue pill

Which do you choose?

I bet the poll results would be very different if it was phrased this way.

Comment by arabaga on Can we evaluate the "tool versus agent" AGI prediction? · 2023-04-26T19:37:23.714Z · LW · GW

Does anyone doubt that, with at most a few very incremental technological steps from today, one could train a multimodal, embodied large language model (“RobotGPT”), to which you could say, “please fill up the cauldron”, and it would just do it, using a reasonable amount of common sense in the process — not flooding the room, not killing anyone or going to any other extreme lengths, and stopping if asked?

Indeed, isn't PaLM-SayCan an early example of this?

Comment by arabaga on How should DeepMind's Chinchilla revise our AI forecasts? · 2022-09-16T10:48:54.721Z · LW · GW

To be precise, Alphabet owns DeepMind. Google and DeepMind are sister companies.

So it's possible for something to benefit Google without benefiting DeepMind, or vice versa.

Comment by arabaga on [Review] The Problem of Political Authority by Michael Huemer · 2022-08-25T19:28:34.011Z · LW · GW

"A scenario where a group of human thugs [rips and devours your entire family] is still okay-ish in some sense, because no state was involved; at least you have avoided the horrors of non-consensual taxation!"

Sorry, this doesn't pass the ITT.

Comment by arabaga on [Review] The Problem of Political Authority by Michael Huemer · 2022-08-25T19:26:30.711Z · LW · GW

Yes, anarcho-capitalists accept that ~everyone will hire a security agency. This isn't a refutation of anarchism.

The point is that security agencies have incentive to compete on quality, whereas current governments don't (as much), so the quality of security agencies would be higher than the quality of governments today.