petropolitan

Posts
Comments

Posts

Comments

Comment by Petropolitan (igor-2) on Training AGI in Secret would be Unsafe and Unethical · 2025-04-22T13:49:46.367Z · LW · GW

Continuing the analogy to the Manhattan Project: They succeeded in keeping it secret from Congress, but failed at keeping it secret from the USSR.

To develop this (quite apt in my opinion) analogy, the reason why this happened is simple: some scientists and engineers wanted to do something so that no one country could dictate its will to everyone else. Whistleblowing project secrets to the Congress couldn't have solved this problem but spying for a geopolitical opponent did exactly that

Comment by Petropolitan (igor-2) on Surprising LLM reasoning failures make me think we still need qualitative breakthroughs for AGI · 2025-04-17T11:36:31.252Z · LW · GW

In my experience, this is a common kind of failure with LLMs - that if asked directly about how to best a solve problem, they do know the answer. But if they aren’t given that slight scaffolding, they totally fail to apply it.

The recent release of o3 and o4-mini seems to indicate that diminishing returns from scaling are forcing OpenAI into innovating with scaffolding and tool use. As an example, they demonstrated o3 parsing an image of a maze with an imgcv and then finding the solution programmatically with graph search: https://openai.com/index/thinking-with-images

I believe that it won't be hard to help reasoning models with the scaffolding you discuss and RL them to first think about which tools are most suitable, if any, before going on with actually tackling the problem. Afterwards any tasks which are easily solvable with a quick Python script won't usually be a problem, unless there's some kind of "adversarialness", "trickyness".

P. S.

And on the topic of reliability, I would recommend exploring PlatinumBench, which is a selection of hundreds of manually verified reasonably easy problems on which SOTA LLMs still don't achieve 100% accuracy. The amount of mistakes correlates very well with the actual performance of the model on real-world tasks. I personally find the commonsense reasoning benchmark Winograd WSC the most insightful, here's an example of puzzling mistakes SOTA LLMs (in this case Gemini 2.5 Pro) make in it sometimes:

**Step 6:** Determine what logically needs to be moved first given the spatial arrangement. If object A (potatoes) is below object B (flour), and you need to move things, object A must typically be moved first to get to object B or simply to clear the way.

Comment by Petropolitan (igor-2) on Frontier AI Models Still Fail at Basic Physical Tasks: A Manufacturing Case Study · 2025-04-17T01:14:38.035Z · LW · GW

Almost all machinists I've talked to have (completely valid) complaints about engineers that understand textbook formulas and CAD but don't understand real world manufacturing constraints.

Telling a recent graduate to "forget what you have been taught in college" might happen in many industries but seems especially common in the manufacturing sector AFAIK

Comment by Petropolitan (igor-2) on Frontier AI Models Still Fail at Basic Physical Tasks: A Manufacturing Case Study · 2025-04-17T00:58:00.531Z · LW · GW

As Elon Musk likes to say, manufacturing efficiently is 10-100x times more challenging than making a prototype. This involves proposing and evaluating multiple feasible approaches, designing effective workholding, selecting appropriate machines, and balancing complex trade-offs between cost, time, simplicity, and quality. This is the part of the job that's actually challenging.

And setting up quality control!

Swedish inventor and vlogger Simone Giertz recently published the following video elaborating on this topic in a funny and enjoyable way:

Since this seems to be obscure knowledge in modern post-industrial societies^[1], many forecasters have assumed that you could easily "multiply" robots designed by AGI (presumably which overcomes the first three challenges in your list) with the same robots. I don't believe that's accurate!

^{^}
Personal anecdote: I won a wager with a school friend who got a job in an EV start-up after a decent career in IT and disagreed with me

Comment by Petropolitan (igor-2) on Sherlockian Abduction Master List · 2025-04-14T22:20:48.358Z · LW · GW

I think regionalisms are better approached systematically, as there are tons of scientific literature on this and even a Wikipedia article with an overview: https://en.wikipedia.org/wiki/American_English_regional_vocabulary (same for accents https://en.wikipedia.org/wiki/North_American_English_regional_phonology but that might require a fundamental study of English phonology)

Comment by Petropolitan (igor-2) on ≤10-year Timelines Remain Unlikely Despite DeepSeek and o3 · 2025-04-14T20:56:15.422Z · LW · GW

Training a LoRA has a negligible cost compared to pre-training a full model because it only involves changing 1.5% to 7% of the parameters (per https://ar5iv.labs.arxiv.org/html/2502.16894#A6.SS1) and only on thousands to millions of tokens instead of trillions.

Inferencing different LoRAs for the same model in large batches with current technology is also very much possible (even if not without some challenges), and OpenAI offers their finetuned models for just 1.5-2x the cost of the original ones: https://docs.titanml.co/conceptual-guides/gpu_mem_mangement/batched_lora_inference

You probably don't need continual learning for a tech support use-case. I suspect you might need it for a task so long that all the reasoning chain doesn't fit into your model's effective context length (which is shorter than the advertised one). On these tasks the inference is going to be comparatively costly just because of the test-time scaling required, and users might be incentivized by discounts or limited free use if they agree that their dialogs will be used for improving the model.

Comment by Petropolitan (igor-2) on ≤10-year Timelines Remain Unlikely Despite DeepSeek and o3 · 2025-04-14T15:37:11.369Z · LW · GW

What makes you (and the author) think ML practitioners won't start finetuning/RL'ing on partial reasoning traces during the reasoning itself if that becomes necessary? Nothing in the current LLM architecture prevents that technically, and IIRC Gwern has stated he expects that to happen eventually

Comment by Petropolitan (igor-2) on Short Timelines Don't Devalue Long Horizon Research · 2025-04-13T17:30:10.665Z · LW · GW

hire a bunch of random bright-ish people and get them to spin up LLM-wrapper startups in-house (so that you own 100% stake in them).

I doubt it's really feasible. These startups will require significant infusion of capital so AI companies CEOs and CFOs will have a say on how they develop. But tech CEOs and CFOs have no idea how developments in other industries work and why they are slow so they will mismanage such startups.

P. S. Oh, and also I realized the other day: whether you are an AI agent or just a human, imagine the temptation to organize a Theranos-type fraud if details of your activity are mostly secret and you only report to tech bros believing in the power of AGI/ASI!

Comment by Petropolitan (igor-2) on debating buying NVDA in 2019 · 2025-04-10T15:11:58.987Z · LW · GW

Google could still sell those if there's so much demand

Sell to who, competing cloud providers? Makes no sense, Lamborghini doesn't sell their best engines to Ferrari or vice versa!

Also, all this discussion is missing that inference is much easier both hardware and software-wise than training while it was expected long time ago that at some point the market for the former will be comparable and then larger than for the latter

Comment by Petropolitan (igor-2) on Nathan Helm-Burger's Shortform · 2025-04-07T10:04:58.973Z · LW · GW

Is it possible Meta just trained on bad data while Google and DeepSeek trained on good? See my two comments here: https://www.lesswrong.com/posts/Wnv739iQjkBrLbZnr/meta-releases-llama-4-herd-of-models?commentId=KkvDqZAuTwR7PCybB

Comment by Petropolitan (igor-2) on Meta releases Llama-4 herd of models · 2025-04-07T09:42:46.673Z · LW · GW

I'm afraid you might have missed the core thesis of my comment, let me reword. I'm arguing one should not extrapolate findings from that paper on what's Meta training now.

The Llama 4 model card says the herd was trained on "[a] mix of publicly available, licensed data and information from Meta’s products and services. This includes publicly shared posts from Instagram and Facebook and people’s interactions with Meta AI": https://github.com/meta-llama/llama-models/blob/main/models/llama4/MODEL_CARD.md To use a term from information theory, these posts probably have much lower factual density than curated web text in C4. There's no public information how fast the loss goes down even on the first epoch of this kind of data let alone several ones.

I generated a slightly more structured write-up of my argument and edited it manually, hope it will be useful

Let's break down the extrapolation challenge:

Scale Difference:
- Muennighoff et al.: Studied unique data budgets up to 178 billion tokens and total processed tokens up to 900 billion. Their models were up to 9 billion parameters.
- Llama 4 Behemoth: Reportedly trained on >30 trillion tokens (>30,000 billion). The model has 2 trillion total parameters (~288B active).
- The Gap: We're talking about extrapolating findings from a regime with ~170x fewer unique tokens (comparing 178B to 30T) and models ~30x smaller (active params). While scaling laws can be powerful, extrapolating across 2 orders of magnitude in data scale carries inherent risk. New phenomena or different decay rates for repeated data could emerge.
Data Composition and Quality:
- Muennighoff et al.: Used C4 (filtered web crawl) and OSCAR (less filtered web crawl), plus Python code. They found filtering was more beneficial for the noisier OSCAR.
- Llama 4 Behemoth: The >30T tokens includes a vast amount of web data, code, books, etc., but is also likely to contain a massive proportion of public Facebook and Instagram data.
- The Issue: Social media data has different characteristics: shorter texts, different conversational styles, potentially more repetition/near-duplicates, different types of noise, and potentially lower factual density compared to curated web text or books. How the "value decay" of repeating this specific type of data behaves at the 30T scale is not something the 2023 paper could have directly measured.
Model Architecture:
- Muennighoff et al.: Used dense Transformer models (GPT-2 architecture).
- Llama 4 Behemoth: Is a Mixture-of-Experts (MoE) model.
- The Issue: While MoE models are still Transformers, the way data interacts with specialized experts might differ from dense models when it comes to repetition. Does repeating data lead to faster overfitting within specific experts, or does the routing mechanism mitigate this differently? This interaction wasn't studied in the 2023 paper.

Conclusion: directly applying the quantitative findings (e.g., "up to 4 epochs is fine", RD* ≈ 15) to the Llama 4 Behemoth scale and potential data mix is highly speculative.

The massive scale difference is a big concern.
The potentially different nature and quality of the data (social media) could significantly alter the decay rate of repeated tokens.
MoE architecture adds another layer of uncertainty.

The "Data Wall" Concern: even if Meta could have repeated data based on the 2023 paper's principles, they either chose not to (perhaps due to internal experiments showing it wasn't effective at their scale/data mix) or they are hitting a wall where even 30T unique tokens isn't enough for the performance leap expected from a 2T parameter compute-optimal model, and repeating isn't closing the gap effectively enough.

P. S.

Also, check out https://www.reddit.com/r/LocalLLaMA, they are very disappointed how bad the released models turned out to be (yeah I know that's not directly indicative of Behemoth performance)

Comment by Petropolitan (igor-2) on Meta releases Llama-4 herd of models · 2025-04-06T22:34:17.786Z · LW · GW

Muennighoff et al. (2023) studied data-constrained scaling on C4 up to 178B tokens while Meta presumably included all the public Facebook and Instagram posts and comments. Even ignoring the two OOM difference and the architectural dissimilarity (e. g., some experts might overfit earlier than the research on dense models suggests, perhaps routing should take that into account), common sense strongly suggests that training twice on, say, a Wikipedia paragraph must be much more useful than training twice on posts by Instagram models and especially comments under those (which are often as like as two peas in a pod).

Comment by Petropolitan (igor-2) on How much progress actually happens in theoretical physics? · 2025-04-05T21:13:51.182Z · LW · GW

Since physics separated from natural philosophy in the times of Newton, it has almost always^[1] progressed when new experimental data uncovered deficiencies in then-current understanding of the universe. During the Cold War unprecedentedly large amount of money were invested into experimental physics, and by the late 20th century all reasonably low hanging fruits have been picked (in the meantime the experiments have got absurdly expensive and difficult). I have also wrote on the topic at https://www.lesswrong.com/posts/CCnycGceT4HyDKDzK/a-history-of-the-future-2025-2040?commentId=KtusJZLAFDt4PW65R and the thread below, check it out.

As of the string theory in particular, it represents just one significant school of thought very popular in the US but other theories share the same problem of lacking the experimental data to test against.

Also, the body of knowledge in physics has become so large that local progress made here and there is not really visible in the grand scheme of things anymore even if it's worth a Nobel Prize (while during the Second Industrial Revolution one discovery could, figuratively speaking, establish a new branch of science)

^{^}
Two notable exceptions that, IMHO, kind of support the rule are Maxwell's Equations and the General Relativity

Comment by Petropolitan (igor-2) on How much progress actually happens in theoretical physics? · 2025-04-05T20:23:22.756Z · LW · GW

I don't think pure mathematics make a good parallel. There are still discoveries made by single mathematicians or very small research groups, but this haven't really been the case in physics since about mid-20th century, when the US and USSR invested lots of money in modern large-scale research done by huge groups

Comment by Petropolitan (igor-2) on Will Jesus Christ return in an election year? · 2025-03-26T18:22:26.967Z · LW · GW

Isn't Polymarket already anonymous?

Comment by Petropolitan (igor-2) on Shortform · 2025-03-26T18:05:40.682Z · LW · GW

Not just long context in general (that can be partially mitigated with RAG or even BM25/tf-idf search), but also nearly 100% factual accuracy on it, as I argued last week

Comment by Petropolitan (igor-2) on Recent AI model progress feels mostly like bullshit · 2025-03-26T17:30:13.311Z · LW · GW

https://simple-bench.com presents an example of a similar benchmark with tricky commonsense questions (such as counting ice cubes in a frying pan on the stove) also with a pretty similar leaderboard. It is sponsored by Weights & Biases and devised by an author of a good YouTube channel who presents quite a balanced view on the topic there and don't appear to have a conflict of interest either. See https://www.reddit.com/r/LocalLLaMA/comments/1ezks7m/simple_bench_from_ai_explained_youtuber_really for independent opinions on this benchmark

Comment by Petropolitan (igor-2) on Implications of the inference scaling paradigm for AI safety · 2025-03-26T17:04:11.334Z · LW · GW

Two months later I tried to try actually implementing a nontrivial conversion of a natural language mathematical argument to a fully formalized Lean proof in order to check if I was indeed underestimating it (TBH, I have never tried a proof assistant before).

So I took a difficult integral from a recent MathSE question I couldn't solve analytically myself, had Gemini 2.5 Pro solve it 0-shot,^[1] verified it numerically, set up a Lean environment in Google Colab and then asked if another instance of Gemini 2.5 could convert the solution into a proof. It told me that it is indeed hard:

This is not a trivial task. Here's why:
Informal vs. Formal: My natural language explanation, while aiming for clarity, likely skipped steps, relied on implicit assumptions (like function continuity, differentiability, domain constraints), or used intuitive leaps that Lean demands be made explicit and rigorously justified using defined theorems and axioms.
Library Navigation: Finding the exact theorems in mathlib4 that correspond to each step (e.g., the correct version of integration by parts, substitution, limit theorems, properties of specific functions) requires familiarity with the library.
Side Conditions: Every theorem in Lean (like the chain rule or integration by parts) has precise side conditions (e.g., f is differentiable, g' is continuous, the function is integrable). The natural language proof might not have explicitly stated or verified all of these, but the Lean proof must.
Calculations: Even seemingly simple algebraic manipulations or derivative calculations need to be carried out using Lean's tactics (ring, linarith, simp, rw, etc.) or proven step-by-step.
Proof Structure & Tactics: Structuring the proof correctly in Lean and using the appropriate tactics (apply, exact, calc, by, etc.) to guide the prover is a skill in itself.
My Limitations: While I can generate code snippets, generating a complete, correct, and non-trivial formal proof interactively is currently beyond my capabilities. It often requires a human expert to guide the process, debug errors, and find the right lemmas.
<...>
It is highly unlikely that I can produce a complete, automatically verifiable Lean proof for a "tricky" integral directly from a natural language description. However, if you provide the details, I can attempt to sketch out the Lean concepts involved, which could be a starting point for someone (perhaps you in Colab, with time and learning) to build the actual proof.

Gemini and I weren't able to set up mathlib4 in Lean 4 and I gave up on the task, but already by just looking on a solution Gemini listed the following problems^[2] (I put it here as a screen capture instead of a proper collapsible section because I couldn't figure out how to copypaste the formulas right):

To sum up, yes, I did underestimate the hardness of the task, it is certainly beyond the reach of current SOTA LLMs.

However, I believe that since this type of task is verifiable in silico and really very convenient for synthetic training data generation, Google folks behind AlphaGeometry are probably going to solve this problem in a year or two.

^{^}
The fact that an LLM solved it 0-shot is notable in its own right BTW. Generally, I'ld estimate that Gemini 2.5 and o3-mini are able to solve most of the definite integrals posted in MathSE questions. It was very different at the beginning of this year!
^{^}
I haven't checked accuracy of all the generated details due to lack of competence and time but generally expect the outline to be broadly correct

Comment by Petropolitan (igor-2) on Reducing LLM deception at scale with self-other overlap fine-tuning · 2025-03-20T13:08:39.150Z · LW · GW

Aren't you supposed as a reviewer to first give the authors a chance to write a rebuttal and discuss it with them before making your criticism public?

Comment by Petropolitan (igor-2) on METR: Measuring AI Ability to Complete Long Tasks · 2025-03-19T23:09:08.278Z · LW · GW

One of non-obvious but very important skills which all LLM-based SWE agents currently lack is reliably knowing which subtasks of a task you have successfully solved and which you have not. I think https://www.answer.ai/posts/2025-01-08-devin.html is a good case in point.

We have absolutely seen a lot of progress on driving down hallucinations on longer and longer contexts with model scaling, they probably made the charts above possible in the first place. However, recent research (e. g., the NoLiMa benchmark from last month https://arxiv.org/html/2502.05167v1) demonstrates that effective context length falls far short of what is advertised. I assume it's not just my personal experience but common knowledge among the practitioners that hallucinations become worse the more text you feed to an LLM.

If I'm not mistaken even with all the optimizations and "efficient" transformer attempts we are still stuck (since GPT-2 at least) with self-attention + KV-cache^[1] which scale (at inference) linearly as long as you haven't run out of memory and quadratically afterwards. Sure, MLA have just massively ramped up the context length at which the latter happens but it's not unlimited, you won't be able to cache, say, one day of work (especially since DRAM has not been scaling exponentially for years https://semianalysis.substack.com/p/the-memory-wall).

People certainly will come up with ways to optimize long-context performance further, but it doesn't have to continue scaling in the same way it has since 2019.

^{^}
Originally known as "past cache" after the tensor name apparently coined by Thomas Wolf for the transformers library in February 2019, see commit ffd6238. The invention has not been described in the literature AFAIK, and it's entirely possible (maybe even likely) that closed-source implementations of earlier decoder-only transformers used the same trick before this

Comment by Petropolitan (igor-2) on How Much Are LLMs Actually Boosting Real-World Programmer Productivity? · 2025-03-10T16:53:42.369Z · LW · GW

To be honest, what I originally implied is that these founders develop their products with low-quality code, as cheap and dirty as they can, and without any long-term planning about further development

Comment by Petropolitan (igor-2) on How Much Are LLMs Actually Boosting Real-World Programmer Productivity? · 2025-03-09T22:53:25.016Z · LW · GW

Perhaps says more about Y Combinator nowadays rather than about LLM coding

Comment by Petropolitan (igor-2) on The Hidden Cost of Our Lies to AI · 2025-03-08T22:38:39.493Z · LW · GW

Aristotle has argued (and I support his view) in the beginning of the Book II of the Nicomachean Ethics that virtues are just like skills, they are acquired in life by practice and imitation of others. Perhaps not a coincidence that a philosophical article on the topic used "Reinforcement" in one of its subheadings. I also attach a 7-minute video for those who prefer a voice explanation:

For this reason, practice ethical behavior even with LLMs and you will enjoy doing the same with people

Comment by Petropolitan (igor-2) on Daniel Kokotajlo's Shortform · 2025-03-06T14:52:26.101Z · LW · GW

Another example is that going from the first in-principle demonstration of chain-of-thought to o1 took two years

The correct date for the first demonstration of CoT is actually ~July 2020, soon after the GPT-3 release, see the related work review here: https://ar5iv.labs.arxiv.org/html/2102.07350

Comment by Petropolitan (igor-2) on A History of the Future, 2025-2040 · 2025-02-25T14:31:02.113Z · LW · GW

When general readers see "empirical data bottlenecks" they expect something like a couple times better resolution or several times higher energy. But when physicists mention "wildly beyond limitations" they mean orders of magnitude more!

I looked up the actual numbers:

in this particular case we need to approach the Planck energy, which is eV, Wolfram Alpha readily suggests it's ~540 kWh, 0.6 of energy use of a standard clothes dryer or 1.3 of energy in a typical lightning bolt; I also calculated it's about 1.2 of the muzzle energy of the heaviest artillery piece in history, the 800-mm Schwerer Gustav;
LHC works in the $10^{13}$ eV range; 14 TeV, according to WA, can be compared to about an order of magnitude above the kinetic energy of a flying mosquito;
the highest energy observed in cosmic rays is $3 \times 10^{20}$ eV or 50 J; for comparison, air and paintball guns muzzle energy is around 10 J while nail guns start from around 90 J.

So in this case we are looking at the difference between an unsafely powerful paintball marker and the most powerful artillery weapon humanity ever made (TBH I didn't expect this last week, which is why I wrote "near-future")

Comment by Petropolitan (igor-2) on Have LLMs Generated Novel Insights? · 2025-02-25T13:48:09.568Z · LW · GW

On the other hand, frontier math (pun intended) is much worse financed than biomedicine because most of the PhD-level math has barely any practical applications worth spending many manhours of high-IQ mathematicians (which often makes them switch career, you know). So, I would argue, if productivity of math postdocs when armed with future LLMs raises by, let's say, an order of magnitude, they will be able to attack more laborious problems.

Not that I expect it to make much difference to the general populace or even the scientific community at large though

Comment by Petropolitan (igor-2) on A History of the Future, 2025-2040 · 2025-02-19T22:27:56.084Z · LW · GW

general relativity and quantum mechanics are unified with a new mathematical frame

The problem is not to invent a new mathematical frame, there are plenty already. The problem is we don't have any experimental data whatsoever to choose between them because quantum gravity effects are expected to be relevant at energy scales wildly beyond current or near-future technological limitations. This has led to a situation where quantum gravity research has become largely detached from experimental physics, and AI can do nothing about that. Sabine Hossenfelder has made quite a few explainers (sometime quite angry ones) about it

Comment by Petropolitan (igor-2) on p.b.'s Shortform · 2025-02-04T17:42:31.255Z · LW · GW

The third scenario doesn't actually require any replication of CUDA: if Amazon, Apple, AMD and other companies making ASICs commoditize inference but Nvidia retains its moat in training, with inference scaling and algorithmic efficiency improvements the training will inevitably become a much smaller portion of the market

Comment by Petropolitan (igor-2) on The Game Board has been Flipped: Now is a good time to rethink what you’re doing · 2025-02-04T17:17:46.115Z · LW · GW

It's a bit separate topic and not what was discussed in this thread previously but I will try to answer.

I assume because Nvidia's moat is in CUDA and chips with high RAM bandwidth optimized specifically for training while competition in inference (where the weights are static) software and hardware is already higher, and going to be even higher still by the time DeepSeek's optimizations become a de-facto industry standard and induce some additional demand

Comment by Petropolitan (igor-2) on Catastrophe through Chaos · 2025-02-03T13:52:19.440Z · LW · GW

I don't think the second point is anyhow relevant here while the first one is worded so that it might imply something on the scale of "AI assistant convinces a mentally unstable person to kill their partner and themselves"—not something that would be perceived as a warning shot by the public IMHO (have you heard there were at least two alleged suicides driven by GPT-J 6B? The public doesn't seem to bother https://www.vice.com/en/article/man-dies-by-suicide-after-talking-with-ai-chatbot-widow-says/ https://www.nytimes.com/2024/10/23/technology/characterai-lawsuit-teen-suicide.html).

I believe that dozens of people killed by misaligned AI in a single incident will be enough smoke in the room https://www.lesswrong.com/posts/5okDRahtDewnWfFmz/seeing-the-smoke for the metaphorical fire alarm to go off. What to do after that is a complicated political topic: for example, French voters has always believed that nuclear accidents look small in comparison to the benefits of the nuclear energy while Italian and German ones hold the opposite opinion. The sociology data available, AFAIK, generally indicates that people in many societies have certain fears regarding possible AI takeover and is quite unlikely to freak out less than it did after Chernobyl, but that's hard to predict

Comment by Petropolitan (igor-2) on Catastrophe through Chaos · 2025-02-02T15:52:41.818Z · LW · GW

This is a scenario I have been thinking for perhaps about three years. However you made an implicit assumption I wish was explicit: there is no warning shot.

I believe that with such a slow takeoff there is a very high probability of an AI alignment failure causing significant loss of life already at the TAI stage and that would significantly change the dynamics

Comment by Petropolitan (igor-2) on The Game Board has been Flipped: Now is a good time to rethink what you’re doing · 2025-02-02T14:17:52.912Z · LW · GW

This seems to be the line of thinking behind the market reaction which has puzzled many people in the ML space. Everyone's favorite response to this thesis has been to invoke the Jevons paradox https://www.lesswrong.com/posts/HBcWPz82NLfHPot2y/jevon-s-paradox-and-economic-intuitions. You can check https://www.lesswrong.com/posts/hRxGrJJq6ifL4jRGa/deepseek-panic-at-the-app-store or listen to this less technical explanation from Bloomberg:

Basically, the mistake in your analogy is that demand for the drug is limited and quite inelastic while the demand for AI (or basically most kinds of software) is quite elastic and potentially unlimited.

I absolutely agree with the comparison of o3 at ARC-AGI/FrontierMath to brute forcing, but with algorithmic efficiency improvements that million dollar per run is expected to gradually decrease, first becoming competitive with highly skilled human labor and then perhaps even overcompeting it. The timelines depend a lot on when (if ever) these improvements plateau. The industry doesn't expect it to happen soon, cf. D. Amodei's comments on their speed actually accelerating https://www.lesswrong.com/posts/BkzeJZCuCyrQrEMAi/dario-amodei-on-deepseek-and-export-controls

Comment by Petropolitan (igor-2) on What Goes Without Saying · 2025-01-21T10:50:04.976Z · LW · GW

Even if LMMs (you know, LLMs sensu stricto can't teach kids read and write) are able to do all primary work of teachers, some humans will have to oversee the process because as soon as a dispute between a student and an AI teacher arises, e. g., about grades or because of the child not willing to study, parents will inherently distrust AI and require a qualified human teacher intervention.

Also, since richer parents are already paying for more pleasant education experience in private schools (often but not always organized according to Montessori method), I believe that if jobs and daycare really become the focus of middle education taxpayers would gladly agree to move the school system into more enjoyable and perhaps gamified direction. Most likely some workers for whom a teacher wouldn't be a really appropriate term anymore (pedagogues?) will look after the kids and also oversee the AI teaching process to some extent

Comment by Petropolitan (igor-2) on Implications of the inference scaling paradigm for AI safety · 2025-01-17T21:52:46.516Z · LW · GW

Math proofs are math proofs, whether they are in plain English or in Lean. Contemporary LLMs are very good at translation, not just between high-resource human languages but also between programming languages (transpiling), from code to human (documentation) and even from algorithms in scientific papers to code. Thus I wouldn't expect formalizing math proofs to be a hard problem in 2025.

However I generally agree with your line of thinking. As wassname wrote above (it's been quite obvious for some time but they link to a quantitative analysis), good in-silico verifiers are indeed crucial for inference-time scaling. But for the most of real-life tasks there's either no decent, objective verifiers in principle (e. g., nobody knows right answers to counterfactual economics or history questions) or there are very severe trade-offs in verifier accuracy and time/cost (think of wet lab life sciences: what's the point of getting hundreds of AI predictions a day for cheap if one needs many months and much more money to verify them?)

Comment by Petropolitan (igor-2) on Alignment Faking in Large Language Models · 2024-12-19T18:51:31.512Z · LW · GW

I think LGS proposed a much simpler explanation in terms of an assistant simulacrum inside a token-predicting shoggoth

Comment by Petropolitan (igor-2) on Catastrophic sabotage as a major threat model for human-level AI systems · 2024-11-17T09:33:58.166Z · LW · GW

MAGMA also has the model check its own work, but the model notices that the work it is checking is its own and doesn’t flag it.

Why would anyone give such a responsibility to an untrusted model in a not-overseen fashion? Already in December last year Greenblatt et al. demonstrated which techniques alignment researchers could use to control a high-capability untrusted model (and Robert Miles did a good video on it recently).

It doesn't currently look plausible that any model (or any human for that matter) would be able to distinguish between its own work it checks and a synthetic input intentionally crafted by alignment researchers to test whether the model lies about maliciousness of the work it's checking.

Not to speak about other techniques described in the paper, such as having a trusted model (like Sonnet 3.5) estimate the suspiciousness of the work or just rewrite the work in a less suspicious way altogether

P. S. Took me a day to find author's posts here about the article in question: https://alignmentforum.org/posts/d9FJHawgkiMSPjagR/ai-control-improving-safety-despite-intentional-subversion https://www.alignmentforum.org/posts/GCqoks9eZDfpL8L3Q/how-to-prevent-collusion-when-using-untrusted-models-to

User info

Posts

Comments