Jan Betley's Shortform
post by Jan Betley (jan-betley) · 2025-03-31T14:02:27.378Z · LW · GW · 8 commentsContents
8 comments
8 comments
Comments sorted by top scores.
comment by Jan Betley (jan-betley) · 2025-03-31T14:02:27.377Z · LW(p) · GW(p)
There are many conflicting opinions about how useful AI is for coding. Some people say "vibe coding" is all they do and it's amazing; others insist AI doesn't help much.
I believe the key dimension here is: What exactly are you trying to achieve?
(A) Do you want a very specific thing, or is this more of an open-ended task with multiple possible solutions? (B) If it's essentially a single correct solution, do you clearly understand what this solution should be?
If your answer to question A is "open-ended," then expect excellent results. The most impressive examples I've seen typically fall into this category—tasks like "implement a simple game where I can do X and Y." Usually, open-ended tasks tend to be relatively straightforward. You probably wouldn't give an open-ended task like "build a better AWS."
If your answer to question A is "a specific thing," and your answer to B is "yes, I'm very clear on what I want," then just explain it thoroughly, and you're likely to get satisfying results. Impressive examples like "rewrite this large complex thing that particular way" fall into this category.
However, if you know you want something quite specific, but you haven't yet figured out exactly what that thing should be—and you have a lot of coding experience—you'll probably have a tough time. This is because "experienced coder doesn't know exactly what they want" combined with "they know they want something specific" usually means they're looking for something unusual. And if they're looking for something unusual but struggle to clearly communicate it (which they inevitably will, because they're uncertain), they'll constantly find themselves fighting the model. The model will keep filling in gaps with mundane or standard solutions, which is exactly what they don't want.
(There's ofc also (C) How obscure is the technology? but that's obvious)
Replies from: yair-halberstadt, GregK↑ comment by Yair Halberstadt (yair-halberstadt) · 2025-03-31T17:26:24.670Z · LW(p) · GW(p)
My experience is that the biggest factor is how large is the codebase, and can I zoom into a specific spot where the change needs to be made and implement it divorced from all the other context.
Since the answer to both of those in may day job is "large" and "only sometimes" the maximum benefit of an LLM to me is highly limited. I basically use it as a better search engine for things I can't remember off hand how to do.
Also, I care about the quality of the code I commit (this code is going to be continuously worked on), and I write better code than the LLM, so I tend to rewrite it all anyway, which again allows the LLM to save me some time, but severely limits the potential upside.
When I'm writing one off bash scripts, yeah it's vibe coding all the way.
Replies from: jan-betley↑ comment by Jan Betley (jan-betley) · 2025-03-31T21:14:45.103Z · LW(p) · GW(p)
Yeah, that makes sense. I think with a big enough codebase some specific tooling might be necessary, a generic "dump everything in the context" won't help.
↑ comment by β-redex (GregK) · 2025-03-31T18:34:26.523Z · LW(p) · GW(p)
If your answer to question A is "a specific thing," and your answer to B is "yes, I'm very clear on what I want," then just explain it thoroughly, and you're likely to get satisfying results. Impressive examples like "rewrite this large complex thing that particular way" fall into this category.
Disagree. It sounds like by "being specific" you mean that you explain how you want the task to be done to the AI, which in my opinion can only be mildly useful.
When I am specific to an AI about what I want, I usually still get buggy results unless the solution is easy. (And asking the AI to debug is only sometimes successful, so if I want to fix it I have to put in a lot of work to understand the code the AI wrote carefully to debug it.)
Replies from: GregK↑ comment by β-redex (GregK) · 2025-03-31T19:05:23.550Z · LW(p) · GW(p)
Just to give an example, here is the kind of prompt I am thinking of. I am being very specific about what I want, I think there is very little room for misunderstanding about how I expect the program to behave:
Write a Python program that reads a
.olean
file (Lean v4.13.0), and outputs the names of the constants defined in the file. The program has to be standalone and only use modules from the python standard library, you cannot assume Lean to be available in the environment.
o3-mini gives pure garbage hallucination for me on this one, like it's not even close.
Replies from: yair-halberstadt↑ comment by Yair Halberstadt (yair-halberstadt) · 2025-03-31T19:31:37.617Z · LW(p) · GW(p)
That seems like an example of C (obscure technology)
Replies from: GregK↑ comment by β-redex (GregK) · 2025-03-31T21:50:01.449Z · LW(p) · GW(p)
What does "obscure" mean here? (If you label the above "obscure", I feel like every query I consider "non-trivial" could be labeled obscure.)
I don't think Lean is obscure, it's one of the most popular proof assistants nowadays. The whole Lean codebase should be in the AIs training corpus (in fact that's why I deliberately made sure to specify an older version, since I happen to know that the olean header changed recently.) If you have access to the codebase, and you understand the object representation, the solution is not too hard.
Here is the solution I wrote just now:[1]
import sys, struct
assert sys.maxsize > 2**32
f = sys.stdin.buffer.read()
def u(s, o, l): return struct.unpack(s, f[o:o+l])
b = u("Q", 48, 8)[0]
def c(p, i): return u("Q", p-b+8*(i+1), 8)[0]
def n(p):
if p == 1: return []
assert u("iHBB", p-b, 8)[3] == 1 # 2 is num, not implemented
s = c(p, 1) - b
return n(c(p, 0)) + [f[s+32:s+32+u("Q", s + 8, 8)[0]-1].decode('utf-8')]
a = c(u("Q", 56, 8)[0], 1) - b
for i in range(u("Q", a+8, 8)[0]):
print('.'.join(n(u("Q", a+24+8*i, 8)[0])))
(It's minified to both fit in the comment better and to make it less useful as future AI training data, hopefully causing this question to stay useful for testing AI skills.)
I wrote this after I wrote the previous comment, my expectation that this should be a not-too-hard problem was not informed by me actually attempting it, only by rough knowledge of how Lean represents objects and that they are serialized pretty much as-is. ↩︎
↑ comment by Yair Halberstadt (yair-halberstadt) · 2025-04-01T03:28:52.090Z · LW(p) · GW(p)
The quantity of lean code in the world is orders of magnitude smaller than the quantity of python code. I imagine that most people reporting good results are using very popular languages. My experience using Gemini 2.5 to write lean was poor.