AlphaAndOmega's Shortform

post by AlphaAndOmega · 2024-12-27T20:19:40.990Z · LW · GW · 4 comments

Contents

4 comments

4 comments

Comments sorted by top scores.

comment by AlphaAndOmega · 2024-12-27T20:19:41.148Z · LW(p) · GW(p)

I happen to be a doctor with an interest in LW and associated concerns, who discovered a love for ML far too late for me to reskill and embrace it.

My younger cousin is a mathematician currently doing an integrated Masters and PhD. About a year back, I'd been trying to demonstrate to him the every increasing capability of SOTA LLMs at maths, and asked him to raise questions that it couldn't trivially answer.

He chose "is the one-point compactification of a Hausdorff space itself Hausdorff?".

At the time, all the models insisted invariably that that's a no. I ran the prompt multiple times on the best models available then. My cousin said it was incorrect, and provided to sketch out a proof (which was quite simple when I finally understood that much of the jargon represented rather simple ideas at their core).

I ran into him again when we're both visiting home, and I decided to run the same question through the latest models to gauge their improvements.

I tried Gemini 1206, Gemini Flash Thinking Experimental, Claude 3.5 Sonnet (New) and GPT-4o.

Other than reinforcing the fact that AI companies have abysmal naming schemes, to my surprise almost all of them gave the correct answer, barring Claude, but it was hampered by Anthropic being cheapskates and turning on the concise responses mode.

I showed him how the extended reasoning worked for Gemini Flash (it doesn't hide its thinking tokens unlike o1) and I could tell that he was shocked/impressed, and couldn't fault the reasoning process it and the other models went through.

To further shake him up, I had him find some recent homework problems he'd been assigned at his course (he's in a top 3 maths program in India) and used the multimodality inherent in Gemini to just take a picture of an extended question and ask it to solve it.* It did so, again, flawlessly.

*So I wouldn't have to go through the headache of reproducing it in latex or markdown.

He then demanded we try with another, and this time he expressed doubts that the model could handle a compact, yet vague in the absence of context not presented problem, and no surprises again.

He admitted that this was the first time he took my concerns seriously, though getting a rib in by saying doctors would be off the job market before mathematicians. I conjectured that was unlikely, given that maths and CS performance are more immediately beneficial to AI companies as they are easier to drop-in and automate, while also having direct benefits for ML, with the goal of replacing human programmers and having the models recursively self-improve. Not to mention that performance in those domains is easier to make superhuman with the use of RL and automated theorem providers for ground truth. Oh well, I reassured him, we're probably all screwed and in short order, to the point where there's not much benefit in quibbling about the other's layoffs being a few months later.

Replies from: notfnofn
comment by notfnofn · 2024-12-28T13:25:59.271Z · LW(p) · GW(p)

I similarly felt in the past that by the time computers were pareto-better than I at math, there would already be mass-layoffs. I no longer believe this to be the case at all, and have been thinking about how I should orient myself in the future. I was very fortunate to land an offer for an applied-math research job in the next few months, but my plan is to devote a lot more energy to networking + building people skills while I'm there instead of just hyperfocusing on learning the relevant fields.

o1 (standard, not pro) is still not the best at math reasoning though. I occasionally give it linear algebra lemmas that I suspect it to be able to help with, but it always has major errors. Here are some examples:

  • I have a finite-dimensional real vector space equipped with a symmetric bilinear form which is not necessarily non-degenerate. Let be the dimension of , be the subspace of with , and be the dimension of . Let and be dimensional real vector spaces that contain and are equipped with symmetric non-degenerate bilinear forms that extend . Show that there exists an isometry from to . To its credit, it gave me some references that helped me prove this, but its argument was completely bogus.

  • Let be a real finite-dimensinoal vector space equipped with a symmetric non-degenerate bilinear form and let be an isometry of . Prove or disprove that the restriction of to the fixed-point subspace of on is non-degenerate. (Here it sort of had the right idea but its counter-examples were never right).

  • Does there exist a symmetric irreducible square matrix with diagonal entries and non-positive integer off-diagonal entries such that the corank is more than ? Here it gave a completely wrong proof of "no" and, no matter how many times I corrected its errors, kept gaslighting me into believing that the general idea must work and that it's a standard result in the field that it follows from a book that I happened to actually have read. It kept insisting this, no matter how many times I corrected its errors, until I presented with an example of a corank-1 matrix that made it clear that its idea was unfixable.

I have a strong suspicion that o3 will be much better than o1 though.

Replies from: AlphaAndOmega
comment by AlphaAndOmega · 2024-12-28T14:27:39.884Z · LW(p) · GW(p)

Thank you for your insight. Out of idle curiosity, I tried putting your last query into Gemini 2 Flash Thinking Experimental and it told me yes first-shot.

Here's the final output, it's absolutely beyond my ability to evaluate, so I'm curious if you think it went about it correctly. I can also share the full COT if you'd like, but it's lengthy:

https://ibb.co/album/rx5Dy1

(Image since even copying the markdown renders it ugly here)

Replies from: notfnofn
comment by notfnofn · 2024-12-28T15:07:05.809Z · LW(p) · GW(p)

corank has to be more than 1, not equal to 1. I'm not sure if such a matrix exists; the reason I was able to change its mind by supplying a corank-1 matrix was that its kernel behaved in a way that significantly violated its intuition.