Posts

AI & Liability Ideathon 2024-11-26T13:54:01.820Z
Kabir Kumar's Shortform 2024-11-03T17:03:01.824Z

Comments

Comment by Kabir Kumar (kabir-kumar) on Automatically finding feature vectors in the OV circuits of Transformers without using probing · 2024-11-27T01:36:04.904Z · LW · GW

I think the Conclusion could serve well as an abstract

Comment by Kabir Kumar (kabir-kumar) on Automatically finding feature vectors in the OV circuits of Transformers without using probing · 2024-11-27T01:35:37.024Z · LW · GW

An abstract which is easier to understand and a couple sentences at each section that explain their general meaning and significance would make this much more accessible

Comment by Kabir Kumar (kabir-kumar) on AI & Liability Ideathon · 2024-11-27T00:30:50.636Z · LW · GW

I plan to send the winning proposals from this to as many governing bodies/places that are enacting laws as possible - one country is lined up atm. 

Comment by Kabir Kumar (kabir-kumar) on AI & Liability Ideathon · 2024-11-27T00:29:48.870Z · LW · GW

Let me know if you have any questions!

Comment by Kabir Kumar (kabir-kumar) on Yonatan Cale's Shortform · 2024-11-26T14:13:18.255Z · LW · GW

options to vary rules/environment/language as well, to see how the alignment generalizes ood. will try this today

Comment by Kabir Kumar (kabir-kumar) on Yonatan Cale's Shortform · 2024-11-26T14:12:33.736Z · LW · GW

it would basically be DnD like. 

Comment by Kabir Kumar (kabir-kumar) on Yonatan Cale's Shortform · 2024-11-26T14:10:59.408Z · LW · GW

Making a thing like Papers Please, but as a text adventure, popping an ai agent into that. 
Also, could literally just put the ai agent into a text rpg adventure - something like the equivalent of Skyrim, where there are a number of ways to achieve the endgame, level up, etc, both more and less morally. Maybe something like https://www.choiceofgames.com/werewolves-3-evolutions-end/ 
Will bring it up at the alignment eval hackathon

Comment by Kabir Kumar (kabir-kumar) on DeepSeek beats o1-preview on math, ties on coding; will release weights · 2024-11-25T18:42:24.277Z · LW · GW

I see them in o1-preview all the time as well. Also, french occasionally

Comment by Kabir Kumar (kabir-kumar) on DeepSeek beats o1-preview on math, ties on coding; will release weights · 2024-11-25T18:41:28.609Z · LW · GW

If developments like this continue, could open weights models be made into a case for not racing? E.g. if everyone's getting access to the weights, what's the point in spending billions to get there 2 weeks earlier?

Comment by Kabir Kumar (kabir-kumar) on Yonatan Cale's Shortform · 2024-11-25T18:31:46.181Z · LW · GW

this can be done more scalably in a text game, no? 

Comment by Kabir Kumar (kabir-kumar) on The Online Sports Gambling Experiment Has Failed · 2024-11-16T20:18:25.904Z · LW · GW

People Cannot Handle Gambling on Smartphones

this seems a very strange way to say "Smartphone Gambling is Unhealthy"
It's like saying "People's Lungs Cannot Handle Cigarettes"

Comment by Kabir Kumar (kabir-kumar) on The hostile telepaths problem · 2024-11-16T20:05:41.585Z · LW · GW

To be a bit less useless - I think this fundamentally misses the problem of respect and actually being able to communicate with yourself and fully do things, if you've done so - and that you can do these when you have full faith and respect in yourself (meaning all of yourself - may include love as well, not sure how necessary that is for this). Could maybe be done in other ways as well, but I find those less beautiful, personally. 

Comment by Kabir Kumar (kabir-kumar) on The hostile telepaths problem · 2024-11-16T20:01:33.286Z · LW · GW

I think this is really along the wrong path and misunderstanding a lot of things, but so far along the incorrect path of thought and misunderstanding so much, that it's hard to untangle

Comment by Kabir Kumar (kabir-kumar) on The hostile telepaths problem · 2024-11-16T19:57:39.771Z · LW · GW

I thought this was going to be an allegory for interpretability.

Comment by Kabir Kumar (kabir-kumar) on Kabir Kumar's Shortform · 2024-11-16T16:00:01.375Z · LW · GW

give better names to actual formal math things, jesus christ. 

Comment by Kabir Kumar (kabir-kumar) on How "Pause AI" advocacy could be net harmful · 2024-11-13T13:35:39.766Z · LW · GW

I think posts like this are net harmful, by discouraging people from joining those doing good things without providing an alternative and so wasting energy on meaningless ruminating that doesn't culminate in any useful action.

Comment by Kabir Kumar (kabir-kumar) on Lorec's Shortform · 2024-11-08T20:36:39.749Z · LW · GW

oh, sorry, I thought slatestar codex wrote something about it and you were saying that's where it comes from

Comment by Kabir Kumar (kabir-kumar) on Kabir Kumar's Shortform · 2024-11-05T01:27:59.334Z · LW · GW

I pretty much agree. I prefer rigid definitions because they're less ambiguous to test and more robust to deception. And this field has a lot of deception.

Comment by Kabir Kumar (kabir-kumar) on Kabir Kumar's Shortform · 2024-11-04T14:49:15.408Z · LW · GW

Yup, those are hard. Was just thinking of a definition for the alignment problem, since I've not really seen any good ones.

Comment by Kabir Kumar (kabir-kumar) on Shortform · 2024-11-03T17:19:50.054Z · LW · GW

what do you think of replit agent, stack blitz, etc?

Comment by Kabir Kumar (kabir-kumar) on Ricki Heicklen's Shortform · 2024-11-03T17:19:10.935Z · LW · GW

damn, those prices are wild

Comment by Kabir Kumar (kabir-kumar) on Lorec's Shortform · 2024-11-03T17:11:48.303Z · LW · GW

used before, e.g. Feynman: https://calteches.library.caltech.edu/51/2/CargoCult.htm

Comment by Kabir Kumar (kabir-kumar) on Kabir Kumar's Shortform · 2024-11-03T17:03:02.193Z · LW · GW

btw, thoughts on this for 'the alignment problem'?
"A robust, generalizable, scalable,  method to make an AI model which will do set [A] of things as much as it can and not do set [B] of things as much as it can, where you can freely change [A] and [B]"

Comment by Kabir Kumar (kabir-kumar) on Are we dropping the ball on Recommendation AIs? · 2024-11-03T17:02:47.630Z · LW · GW

Unfortunately this is a fundamental problem of Media, imo. 

Comment by Kabir Kumar (kabir-kumar) on Are we dropping the ball on Recommendation AIs? · 2024-11-03T17:01:35.540Z · LW · GW

Yes, this would be very very good. I might hold a hackathon/ideathon for this in January. 

Comment by Kabir Kumar (kabir-kumar) on The Rocket Alignment Problem · 2024-10-28T09:11:41.233Z · LW · GW

I didn't get the premise, no. I got that it was before a lot of physics was known, didn't know they didn't know calculus either. 
Just stating it plainly and clearly at the start would have been good. Even with that premise, I still find it very annoying. I despise the refusal to speak clearly, the way it's constantly dancing around the bush, not saying the actual point, to me this is pretty obviously because the actual point is a nothing burger(because the analogy is bad) and by dancing around it, the text is trying to distract me and convince me of the point before I realize how dumb it is. 

Why the analogy is bad: rocket flights can be tested and simulated much more easily than a superintelligence, with a lot less risk

Analogies are by nature lossy, this one is especially so. 

Comment by Kabir Kumar (kabir-kumar) on The Rocket Alignment Problem · 2024-10-26T12:51:03.225Z · LW · GW

personally, I found how Beth just kept saying 'not really' and not saying the actual physics very very annoying. 

Comment by Kabir Kumar (kabir-kumar) on Are we dropping the ball on Recommendation AIs? · 2024-10-26T12:30:42.526Z · LW · GW

Yup, I think research that studies the effect of recommendation algorithms on the brain, from various social media platforms and compares them to the effects of narcotics, would be extremely useful. 
I think we're really really lacking in decent legislation for recommendation algorithms atm - at the absolute bare minimum, platforms which use very addictive algorithms should have some kind of warning label informing users of the possibility of addiction - similarly to cigarettes - so that parents know clearly what might happen to their children. 
This is going to be even more important as things like character.ai grow. 

Comment by Kabir Kumar (kabir-kumar) on Survey: How Do Elite Chinese Students Feel About the Risks of AI? · 2024-09-05T18:55:03.303Z · LW · GW

rather than this, there should just be a better karma system, imo. 
one way to improve it - have the voting buttons for comments be on the opposite side of the username

Comment by Kabir Kumar (kabir-kumar) on Survey: How Do Elite Chinese Students Feel About the Risks of AI? · 2024-09-05T18:50:24.234Z · LW · GW

This is very useful, thank you. 
Something that might be interesting to add at the end of surveys such as these:
"How much has this survey changed your mind on things?" - sometimes just being asked a question about something can change your mind on it, would be interesting to see if it happens and how much so.

Comment by Kabir Kumar (kabir-kumar) on How I got 4.2M YouTube views without making a single video · 2024-09-04T00:29:55.625Z · LW · GW

Clickbait still works here, just with a different language. 

Comment by Kabir Kumar (kabir-kumar) on Sources of intuitions and data on AGI · 2024-08-31T01:02:47.124Z · LW · GW

Cons: Humans are opaque. Even from our inside view, it is very difficult to understand how they work, and very hard to modify. They are also the most difficult to talk about rigorously. There is also the failure mode of anthropomorphizing badly and attributing arbitrary properties of humans (and especially human goals) to AGI.

I don't think it's really correct to say that humans are opaque from an inside view, especially for people with high empathy. People who understand themselves well and have high empathy can very consistently predict and understand others.

Comment by Kabir Kumar (kabir-kumar) on Notes on Dwarkesh Patel’s Podcast with Demis Hassabis · 2024-03-05T11:25:46.871Z · LW · GW
Comment by Kabir Kumar (kabir-kumar) on If you weren't such an idiot... · 2024-03-04T21:55:27.796Z · LW · GW

Pretty much all of those reasons - what it's missing is that nicotine itself may also be a carcinogen- at least, it has the ability to be one: https://link.springer.com/article/10.1007/s10311-023-01668-1 
Although there aren't enough isolated studies done on nicotine in a long period to be conclusive: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5020336/ 
Some reviews disagree: https://pubmed.ncbi.nlm.nih.gov/26380225/ 
 

Comment by Kabir Kumar (kabir-kumar) on If you weren't such an idiot... · 2024-03-04T12:51:04.947Z · LW · GW

I strongly advise against taking nicotine.

Comment by Kabir Kumar (kabir-kumar) on MIRI announces new "Death With Dignity" strategy · 2023-12-15T21:25:45.260Z · LW · GW

Eliezer is extremely skilled at capturing attention. One of the best I've seen, outside of presidents and some VCs.
However, as far as I've seen, he's terrible at getting people to do what he wants. 
Which means that he has a tendency to attract people to a topic he thinks is important but they never do what he thinks should be done- which seems to lead to a feeling of despondence. 
This is where he really differs from those VCs and presidents- they're usually far more balanced.

For an example of an absolute genius in getting people to do what he wants, see Sam Altman.

Comment by Kabir Kumar (kabir-kumar) on How LDT helps reduce the AI arms race · 2023-12-12T16:25:31.194Z · LW · GW

Unless I'm missing something, this seems to disregard the possibility of deception. Or it handwaves deception away in a line or two.

The type of person to end up as the CEO of a leading AI company is likely (imo) someone very experienced in deception and manipulation- at the very least through experiencing others trying it on them, even if by some ridiculously unlikely chance they haven't used deception to gain power themselves.

A clever, seemingly logically sound argument for them to slow down and trust that their competitor will also slow down because of the argument, will ring all kinds of bells.

I think whistleblower protections, licenses, enforceable charters, mandatory 3rd party safety evals, etc have a much higher chance of working.

Comment by Kabir Kumar (kabir-kumar) on Shallow review of live agendas in alignment & safety · 2023-12-05T19:52:06.604Z · LW · GW

Yes, we host a bi-monthly Critique-a-Thon- the next one is from December 16th to 18th!

Judges include:
- Nate Soares, President of MIRI, 
- Ramana Kumar, researcher at DeepMind
- Dr Peter S Park, MIT postdoc at the Tegmark lab,
- Charbel-Raphael Segerie, head of the AI unit at EffiSciences.

Comment by Kabir Kumar (kabir-kumar) on Apocalypse insurance, and the hardline libertarian take on AI risk · 2023-11-30T12:15:14.909Z · LW · GW

What about regulations against implementations of known faulty architectures?