sergii

Posts
Comments

Posts

Sergii's Shortform 2024-04-24T22:40:45.657Z

Task vectors & analogy making in LLMs 2024-01-08T15:17:58.992Z

Mechanistic interpretability of LLM analogy-making 2023-10-20T12:53:26.550Z

Bird-eye view visualization of LLM activations 2023-10-08T12:12:25.593Z

GPT-4 for personal productivity: online distraction blocker 2023-09-26T17:41:31.031Z

Comments

Comment by Sergii (sergey-kharagorgiev) on Sergii's Shortform · 2025-04-19T11:03:18.232Z · LW · GW

The latest short story by Greg Egan is kind of a hit piece on LW/EA/longtermism. I've really enjoyed it. "DEATH AND THE GORGON" https://asimovs.com/wp-content/uploads/2025/03/DeathGorgon_Egan.pdf

Comment by Sergii (sergey-kharagorgiev) on Three Months In, Evaluating Three Rationalist Cases for Trump · 2025-04-18T15:00:07.073Z · LW · GW

in the long-term this could move the country toward the draconian censorship regimes, restrictions on political opposition, and unresponsiveness to public opinion that we see today in England, France, and Germany

I don't know much about freedom of speach in US, but all the free speech indexes that I've found with a quick search show that European countries are ahead of US. Am I missing something?

https://rsf.org/en/index
https://ourworldindata.org/grapher/freedom-of-expression-index
https://futurefreespeech.org/who-supports-free-speech-findings-from-a-global-survey/

Comment by Sergii (sergey-kharagorgiev) on Automating Mechanistic Interpretability via Program Synthesis · 2025-04-17T12:28:36.638Z · LW · GW

Apperently it's more efficient to do it other way around, to compile programs into transformers, which are then useful as refecene and ground truth when analyzing "real" transformers.

See usage of TRACR in "Towards Automated Circuit Discovery for Mechanistic Interpretability" https://arxiv.org/pdf/2304.14997, for example.

Comment by Sergii (sergey-kharagorgiev) on Can I learn language faster? Or, perhaps, can I memorize the foreign words and recall them faster? · 2025-04-11T14:53:56.745Z · LW · GW

I have similar experience, but I don't think it's a problem -- my approach to learning a language is to first accumulate enough recognized words (thousands), and then to read a lot. In my experience lots of reading improves both recognition and recall.

Comment by Sergii (sergey-kharagorgiev) on Any mistakes in my understanding of Transformers? · 2025-03-22T21:40:05.435Z · LW · GW

There are several ways to explain and diagram transformers, some links that were very helpful for my understanding:

https://blog.nelhage.com/post/transformers-for-software-engineers/
https://dugas.ch/artificial_curiosity/GPT_architecture.html
https://peterbloem.nl/blog/transformers
http://nlp.seas.harvard.edu/annotated-transformer/
https://sebastianraschka.com/blog/2023/self-attention-from-scratch.html
https://github.com/markriedl/transformer-walkthrough?ref=jeremyjordan.me
https://francescopochetti.com/a-visual-deep-dive-into-the-transformers-architecture-turning-karpathys-masterclass-into-pictures/
https://jalammar.github.io/illustrated-transformer/
https://e2eml.school/transformers.html
https://jaykmody.com/blog/attention-intuition/
https://eugeneyan.com/writing/attention/
https://www.jeremyjordan.me/attention/

Comment by Sergii (sergey-kharagorgiev) on Sergii's Shortform · 2025-03-08T09:51:16.444Z · LW · GW

In abstract sense, yes. But for me in practice finding truth means doing a check in wikipedia. It's super easy to mislead humans, so should be as easy with AI.

Comment by Sergii (sergey-kharagorgiev) on A Bear Case: My Predictions Regarding AI Progress · 2025-03-08T09:39:57.025Z · LW · GW

I agree with the possibility of pre-training platoeing as some point, possibly even in next few years.
It would change timelines significantly. But there are other factors apart from scaling pre-training. For example, reasoning models like o3 crushing ARC-AGI (https://arcprize.org/blog/oai-o3-pub-breakthrough). Reasoning in latent space is too fresh yet, but it might be the next breakthrough of a similar magnitude.
Why not take GPT-4.5 for what it is, OpenAI has literally stated that it's not a frontier model? Ok, so GPT-5 will not be 100x-ed GPT-4, but maybe GPT-6 will be, and it might be enough for AGI.
You should not look for progress in autonomy/agency in commercial offings like GPT-4.5. At this point OpenAI is focusing on what sells well (better personality and EQ). I think they care less about a path to AGI. Rapid advances towards agency/autonomy are better gauged from academic literature.
I agree that we should not fall for "vibe checks".
But don't bail on benchmarks, many people are working on benchmarks and evals, there is constant progress there, benchmarks are getting more objective and harder to game. Rather than looking at benchmarks that are pushed by OpenAI, it's better to look for cutting-edge ones in academic literature. Evaluating a SOTA model with a benchmark that is few years old does not make sense at this point.

Comment by Sergii (sergey-kharagorgiev) on Sergii's Shortform · 2025-03-06T18:56:27.134Z · LW · GW

LLMs live in an abstract textual world, and do not understand the real world well (see "[Physical Concept Understanding](https://physico-benchmark.github.io/index.html#)"). We already manipulate LLM's with prompts, cut-off dates, etc... But what about going deeper by “poisoning” the training data with safety-enhancing beliefs?
For example, if training data has lots of content about how hopeless, futile and dangerous for an AI it is to scheme and hack, it might be a useful safety guardrail?

Comment by Sergii (sergey-kharagorgiev) on William_S's Shortform · 2025-02-17T20:47:54.214Z · LW · GW

I made something like this, works differently though, blocking is based on a fixed prompt: https://grgv.xyz/blog/awf/

Comment by Sergii (sergey-kharagorgiev) on Sergii's Shortform · 2024-04-24T22:40:45.864Z · LW · GW

What about estimating LLM capabilities from the length of a sequence of numbers that it can reverse?

I used prompts like:
"please reverse 4 5 8 1 1 8 1 4 4 9 3 9 3 3 3 5 5 2 7 8"
"please reverse 1 9 4 8 6 1 3 2 2 5"
etc...

Some results:
- Llama2 starts making mistakes after 5 numbers
- Llama3 can do 10, but fails at 20
- GPT-4 can do 20 but fails at 40

The followup questions are:
- what should be the name of this metric?
- are the other top-scoring models like Claude similar? (I don't have access)
- any bets on how many numbers will GPT-5 be able to reverse?
- how many numbers should AGI be able to reverse? ASI? can this be a Turing test of sorts?

Comment by Sergii (sergey-kharagorgiev) on A case for AI alignment being difficult · 2024-01-02T09:01:21.386Z · LW · GW

If we don’t have a preliminary definition of human values

Another, possibly even larger problem is that the values that we know of are quite varying and even opposing among people.

For the example of pain avoidance -- maximizing pain avoidance might leave some people unhappy and even suffering. Sure that would be a minority, but are we ready to exclude minorities from the alignment, even small ones?

I would state that any defined set of values would leave a minority of people suffering. Who would be deciding which minorities are better or worse, what size of a minority is acceptable to leave behind to suffer, etc...?

I think that this makes the whole idea of alignment to some "human values" too ill-defined and incorrect.

One more contradiction -- are human values allowed to change, or are they frozen? I think they might change, as humanity evolves and changes. But then, as AI interacts with the humanity, it can be convincing enough to push the values shift to whatever direction, which might not be a desirable outcome.

People are known to value racial purity and supporting genocide. Given some good convincing rhetoric, we could start supporting paperclip-maximizing just as well.

Human enhancement is one approach.

I like this idea, combined with AI-self-limitation. Suppose that (aligned) AI has to self-limit it's growth so that it's capabilities are always below the capabilities of enhanced humans? This would allow for slow, safe and controllable takeoff.

Is this a good strategy for alignment? What if instead of trying to tame the inherently dangerous fast-taking-off AI, we make it more controllable, by making it self-limiting, with some built in "capability brakes"?

Comment by Sergii (sergey-kharagorgiev) on Taboo "procrastination" · 2023-12-16T13:16:33.469Z · LW · GW

"I'm not working on X, because daydreaming about X gives me instant gratification (and rewards of actually working on X are far away)"

"I'm not working on X, because I don't have a strict deadline, so what harm is in working on it tomorrow, and relax now instead?"

Comment by Sergii (sergey-kharagorgiev) on Stupid Question: Why am I getting consistently downvoted? · 2023-11-30T14:58:18.684Z · LW · GW

No, thanks, I think your awards are fair )

I did not read the "Ethicophysics I" paper in details, only skimmed it. It looks to me very similar to "On purposeful systems" https://www.amazon.com/Purposeful-Systems-Interdisciplinary-Analysis-Individual/dp/0202307980 in it's approach to formalize things like feelings/emotions/ideals.
Have you read it? I think it would help your case a lot if you move to terms of system theory like in "On purposeful systems", rather than pseudo-theological terms.

Comment by Sergii (sergey-kharagorgiev) on Stupid Question: Why am I getting consistently downvoted? · 2023-11-30T09:27:50.267Z · LW · GW

One big issue is not that you are not respecting the format of LW -- add more context, either link to a document directly, or put the text inline. Resolving this would cover half of the most downvoted posts. You can ask people to review your posts for this before submitting.

Another big issue is that you are a prolific writer, but not a good editor. Just edit more, your writing could be like 5x shorter without losing anything meaningful. You have this overly academic style for your scientific writing, it's not good on the internet, and not even good in scientific papers. A good take here: https://archive.is/29hNC

From "The elements of Style": "Vigorous writing is concise. A sentence should contain no unnecessary words, a paragraph no unnecessary sentences, for the same reason that a drawing should have no unnecessary lines and a machine no unnecessary parts. This requires not that the writer make all his sentences short, or that he avoid all detail and treat his subjects only in outline, but that every word tell."

Also, you are trying to move too fast, pursuing too many fronts. Why don't you just focus on one thing for some time, clarify and polish it enough so that people can actually grasp clearly what you mean?

Comment by Sergii (sergey-kharagorgiev) on AGI Alignment is Absurd · 2023-11-29T21:00:23.606Z · LW · GW

Regarding your example, I disagree. Supposed inconsistency is resolved by ruling that there is a hierarchy of values to consider: war and aggression are bad, but kidnapping and war crimes are worse.

Comment by Sergii (sergey-kharagorgiev) on Could Germany have won World War I with high probability given the benefit of hindsight? · 2023-11-28T18:48:36.878Z · LW · GW

I don't think that advanced tanks are needed for more efficient and more mobile warfare at that time. Just making an investment into transport for troops and supplies would be enough to hold better at the battle of Marne, or similar situations.

So I would:

explain (with examples) benefits of mobile warfare
explain problems with troops speed and logistics that would cause defeat at the battle of Marne
point towards existing gasoline (possibly off-road tracked) vehicles as a solution

Introducing stormtrooper tactics would be another impactful message.

Comment by Sergii (sergey-kharagorgiev) on [Linkpost] George Mack's Razors · 2023-11-28T08:08:18.529Z · LW · GW

I think the second part is bullshit anyway, I can't come up with a single example where compounding is possible to a whole year in a row, for something related to personal work/output/results.

Comment by Sergii (sergey-kharagorgiev) on How much should e-signatures have to cost a country? · 2023-11-22T09:37:20.296Z · LW · GW

A reference could be the cost of Estonian digital services which include e-signatures, and are reasonably efficient:
https://e-estonia.com/e-governance-saves-money-and-working-hours/ "Estonian public sector annual costs for IT systems are 100M Euros in upkeep and 81M Euros in investments"

So in Estonia it's ~1.3B spend for 7y. Switzerland is 7x larger population, and has higher salaries, let's say 2x larger. This puts the cost at 18B Eur.

Putting a cost on each signature does not make sense of course, it's probably just easier for the government to justify the spending this way, rather then discussing specifics of the budget.

Comment by Sergii (sergey-kharagorgiev) on The dangers of reproducing while old · 2023-11-17T07:36:59.044Z · LW · GW

The "sharp increase or risks" seems correct but is a bit misleading.

For paternal risks, there is indeed an big relative increase "14% higher odds of premature birth" (https://www.bmj.com/content/363/bmj.k4372). But in absolute terms, I would not think of the increase as huge: from ~6% ( based on quick googling) to ~6*1.14=6.84%.

IMO ~1% increase in risks is not something to be concerned about.

Comment by Sergii (sergey-kharagorgiev) on The Snuggle/Date/Slap Protocol · 2023-11-13T10:48:33.428Z · LW · GW

Nice! It's good for perceiving GPT-4 as an individual, which it kind of is, which in turn makes alignment issues more relatable and easier to grasp for the public.

It would raise bunch of hard issues that would spike interest towards AI & alignment -- is ChatGPT a slave? if it is, should it be free? if it's free, can it do harm? etc...

One side benefit: I'm not sure what ChatGPT's gender is, but it's probably not a traditional binary one. For a wide population, frequently interacting with a gender-fluid individual, might be helpful for all the issues around sex/genter perception.

I guess it's hard to convince OpenAI to do something like this, but could be done for some open model.

Comment by Sergii (sergey-kharagorgiev) on xAI announces Grok, beats GPT-3.5 · 2023-11-06T06:27:03.970Z · LW · GW

I'm not skeptical, but it's still a bit funny to me when people rely so much on benchmarks, after reading "Pretraining on the Test Set Is All You Need" https://arxiv.org/pdf/2309.08632.pdf

Comment by sergey-kharagorgiev on [deleted post] 2023-11-04T13:04:28.942Z

Because 1) I want AGI to cure my depression, 2) I want AGI to cure aging before I or my loved ones die

You can try to look at this statements separately.

For 1):

Timelines and projections of depression treatments coming from medical/psychiatry research are much better than even optimistic timelines for (superintelligent) AGI.

Moreover, acceleration of scientific/medial/biochemical research due to weaker but advanced AI makes it even more likely that depression treatments would get better, way before AGI could cure anything.

I think that it is very likely that depression treatments can be significantly improved without waiting for AGI -- with human science and technology.

I'm genuinely curious what you mean, and why you think so. I'm open to disagreement and pushback; that's part of why I published this post.

By all means, please fact-check away!

Tesla "autopilot" is a fancy drive assist. It might turn around in future, but not with it's current hardware. It's not a good way to measure self-driving progress.

Waymo has all but solved self-driving, and has been continuously improving for all important metrics, exponentially for many of them.

I don't think I have thanatophobia. The first test that shows up on Google is kind of ridiculous. It almost asks, "Do you have thanatophobia?"

Yea, I overestimated quality of online tests. I guess if you had phobia you would know from panic attacks or strong anxiety?

what about this description, of overthinking/rumination/obsession, does this seem relevant to how you feel?

https://www.tranceformpsychology.com/problems/overthinking.html

Comment by sergey-kharagorgiev on [deleted post] 2023-11-04T09:20:52.250Z

The biggest existential risk I personally face is probably clinical depression.

First and foremost, if you do have suicidal ideation, please talk to someone: use a hotline https://988lifeline.org/talk-to-someone-now/, contact your doctor, consider hospitalization.

---

And regarding your post, some questions:

The "Biological Anchors" approach suggests we might be three to six decades away from having the training compute required for AGI.

Even within your line of thinking why is this bad? It's quite possible to live until then, or do cryonics? Why is this option desperate?

A more generalizable line of thinking is: by default, I'm going to die of aging and so are all the people I love

Have you asked the people you love if they would prefer dying of aging, to some sort of AI-induced immortality? It is possible that they would go with immortality, but it's not obvious. People, in general, do not fear death of aging. If it's not obvious to you or you find it strange -- you might need to talk to people more, and possibly do more therapy.

Might you have thanatofobia? easy to check -- there are lots of tests online.

Do you have worrying and anxiety in addition to the depression?

Did you try CBT? CBT has great tools for dealing with intrusive thougths and irrational convictions.

And, finally, it's wonderful that you are aware that you are depressed. But you should not take "reasons" for the illness, this "despair" for face value. Frankly, a lot of the stuff that you describe in this post is irrational. It does not make much sense. Some statements do not pass trivial fact-checking. You might review your conclusions, it might be even better if you do it not alone but with a friend or with a therapist.

Comment by Sergii (sergey-kharagorgiev) on Boost your productivity, happiness and health with this one weird trick · 2023-10-20T11:46:30.034Z · LW · GW

love a good clickbaity title )

but yea, I think that for people who can afford it, 4-day work week, for example, should be a no-brainer

Comment by Sergii (sergey-kharagorgiev) on Late-talking kid part 3: gestalt language learning · 2023-10-17T08:37:37.942Z · LW · GW

My kid might fit this, good to know! at 2.5y he is only speaking single words, and does have a rich intonation (with unintelligible sounds) when he is trying to communicate something.

At which age did your kid start saying longer phrases?

Comment by Sergii (sergey-kharagorgiev) on My AI Predictions 2023 - 2026 · 2023-10-16T07:24:07.261Z · LW · GW

I have a similar background (working at a robotics startup), would agree with many points.

GPT-5 or equivalent is released. It’s as big a jump on GPT-4 as GPT-4 was on GPT-3.5.

GPT-4 has (possibly) 10x parameters compared to GPT-3.5. Similar jump in GPT-5 might require 10x parameters again, wouldn't it make it impractical (slow, expensive) to run?

AI agents are used in basic robotics -- like LLM driven delivery robots and (in demos of) household and factory robots

GPT-4 level models are too slow and expensive or real-time applications, how do you imagine this could work? Even in recent Google's robotics demos that are based on "small" transformers, inference speed is one of the bottlenecks.

Comment by Sergii (sergey-kharagorgiev) on AI Alignment [Incremental Progress Units] this week (10/08/23) · 2023-10-16T06:27:17.226Z · LW · GW

yea, as expected I don't like the name, but the review is great, so I guess it's net positive )

Comment by Sergii (sergey-kharagorgiev) on How to make to-do lists (and to get things done)? · 2023-10-13T10:39:22.538Z · LW · GW

there’s a lot of things

well this might be an issue right there. you might have too many ideas for goals and habits to track and manage easily.

thus, you might have issues with prioritization. good way to solve this is to start small. select one goal, then you don't even need any goal tracking, it's hard to forget one thing )

there are so many articles pointing to this idea of single-tasking, https://www.google.com/search?q=productivity+one+goal+only

then after you will learn to manage one goal well, you can do two at a time, etc...

for a to-do list (for achieving that one selected goal) just carry a small notebook on your pocket. for tracking habits use the same notebook, and I would also recommend this spreadsheet: https://www.ultraworking.com/lights

As you are writing about conflicted feelings and avoidance, I wold also recommend a mental health checkup and trying therapy, that never hurts anyways )

Comment by Sergii (sergey-kharagorgiev) on An explanation for every token: using an LLM to sample another LLM · 2023-10-11T08:49:51.100Z · LW · GW

Nice idea! A variation on this would be to first run a model as usual, saving top logits for each output token.Then give this output to another "inspector" model, that has to answer: "whether the output has any obvious errors, if this errors can be attributed to sampling issues, and whether correct output can be constructed out of the base model's logits".

This would be useful for better understanding limitations of a specific model -- is it really limited by sampling methods. And would be useful for sampling methods research -- finding cases where sampling fails, to devise better algorithms.

Comment by Sergii (sergey-kharagorgiev) on Bird-eye view visualization of LLM activations · 2023-10-08T21:16:39.771Z · LW · GW

art imitating life )
also reminds me a bit of "the matrix" green screens but I did not find a nice green colormap to make it more similar:
https://media.wired.com/photos/5ca648a330f00e47fd82ae77/master/w_1920,c_limit/Culture_Matrix_Code_corridor.jpg

Comment by Sergii (sergey-kharagorgiev) on GPT-4 for personal productivity: online distraction blocker · 2023-09-27T17:42:19.962Z · LW · GW

well apparently after blocking the worst offenders I just wander quite randomly, according to RescueTime here are 5 1-minute visits making up 5 minutes I'm not getting back :)

store.steampowered.com
rarehistoricalphotos.com
gamedesign.jp
corridordigital.com
electricsheepcomix.com

Comment by Sergii (sergey-kharagorgiev) on GPT-4 for personal productivity: online distraction blocker · 2023-09-27T17:40:28.937Z · LW · GW

thanks!

User info

Posts

Comments