Posts

Making progress bars for Alignment 2025-01-03T21:25:58.292Z
AI & Liability Ideathon 2024-11-26T13:54:01.820Z
Kabir Kumar's Shortform 2024-11-03T17:03:01.824Z

Comments

Comment by Kabir Kumar (kabir-kumar) on Kabir Kumar's Shortform · 2025-04-26T12:33:09.880Z · LW · GW

Update - consulting went well. He said he was happy with it and got a lot of useful stuff. I was upfront with the fact that I just made up the $15 an hour and might change it, asked him what he'd be happy with, he said it's up to me, but didn't seem bothered at all at the price potentially changing. 

I was upfront about the stuff I didn't know and was kinda surprised at how much I was able to contribute, even knowing that I underestimate my technical knowledge because I barely know how to code. 

Comment by Kabir Kumar (kabir-kumar) on Kabir Kumar's Shortform · 2025-04-24T19:50:44.538Z · LW · GW

For AI Safety funders/regranters - e.g. Open Phil, Manifund, etc: 

It seems like a lot of the grants are swayed by 'big names' being on there. I suggest making anonymity compulsary if you want to more merit based funding, that explores wider possibilities and invests in more upcoming things. 

Treat it like a Science rather than the Bragging Competition it currently is. 

A Bias Pattern atm seems to be that the same people get funding, or recommended funding by the same people, leading to the number of innovators being very small, or growing much more slowly than if the process was anonymised. 

Also, ask people seeking funding to make specific, unambiguous, easily falsiable predictions of positive outcomes from their work. And track and follow up on this! 

Comment by Kabir Kumar (kabir-kumar) on Kabir Kumar's Shortform · 2025-04-23T17:42:35.216Z · LW · GW

Yeah, a friend told me this was low - I'm just scared of asking for money rn I guess. 

I do see people who seem very incompetent getting paid as consultants, so I guess I can charge for more. I'll see how much my time gets eaten by this and how much money I need. I want to buy some gpus, hopefully this can help.

Comment by Kabir Kumar (kabir-kumar) on Kabir Kumar's Shortform · 2025-04-23T10:57:58.575Z · LW · GW

So, apparently, I'm stupid. I could have been making money this whole time, but I was scared to ask for it

i've been giving a bunch of people and businesses advice on how to do their research and stuff. one of them messaged me, i was feeling tired and had so many other things to do. said my time is busy.

then thought fuck it, said if they're ok with a $15 an hour consulting fee, we can have a call. baffled, they said yes.

then realized, oh wait, i have multiple years of experience now leading dev teams, ai research teams, organizing research hackathons and getting frontier research done.

wtf

Comment by Kabir Kumar (kabir-kumar) on Charbel-Raphaël's Shortform · 2025-04-23T02:30:38.786Z · LW · GW

I think he would lie, or be deceptive in a way that's not technically lying, but has the same benefits to him, if not more.

Comment by Kabir Kumar (kabir-kumar) on Kabir Kumar's Shortform · 2025-04-23T02:30:08.204Z · LW · GW

i earnt more from working at a call center for about 3 months than i have in 2+ years of working in ai safety. 

And i've worked much harder in this than I did at the call center

Comment by Kabir Kumar (kabir-kumar) on What Makes an AI Startup "Net Positive" for Safety? · 2025-04-21T19:16:43.504Z · LW · GW

I downvoted because it seems obviously wrong and irrelevant to me.

Comment by Kabir Kumar (kabir-kumar) on Kabir Kumar's Shortform · 2025-04-20T23:23:09.000Z · LW · GW

I asked because I'm pretty sure that I'm being badly wasted (i.e. I could be making much more substantial contributions to AI safety),

I think this is the case for most in AI Safety rn

And by the way, your brainchild AI-Plans is a pretty cool resource. I can see it being useful for e.g. a frontier AI organization which thinks they have an alignment plan, but wants to check the literature to know what other ideas are out there. 

Thanks! Doing a bunch of stuff atm, to make it easier to use and a larger userbase. 

Comment by Kabir Kumar (kabir-kumar) on Kabir Kumar's Shortform · 2025-04-17T21:35:54.607Z · LW · GW

If I knew the specific bs, I'd be better at making successful applications and less intensely frustrated. 

Comment by Kabir Kumar (kabir-kumar) on Meditation and Reduced Sleep Need · 2025-04-14T21:27:54.162Z · LW · GW

Could it be that meditation is doing some of the same job as sleep? I'd be curious what the amount of time spent meditating vs amount of sleep need reduced. 

Could also reduce unrest/time waiting to sleep.

Comment by Kabir Kumar (kabir-kumar) on Kabir Kumar's Shortform · 2025-04-14T19:36:56.808Z · LW · GW

all of the above, then averaged :p

Comment by Kabir Kumar (kabir-kumar) on Kabir Kumar's Shortform · 2025-04-14T14:27:20.891Z · LW · GW

prob not gonna be relatable for most folk, but i'm so fucking burnt out on how stupid it is to get funding in ai safety. the average 'ai safety funder' does more to accelerate funding for capabilities than safety, in huge part because what they look for is Credentials and In-Group Status, rather than actual merit. 
And the worst fucking thing is how much they lie to themselves and pretend that the 3 things they funded that weren't completely in group, mean that they actually aren't biased in that way. 

At least some VCs are more honest that they want to be leeches and make money off of you. 

Comment by Kabir Kumar (kabir-kumar) on Kabir Kumar's Shortform · 2025-04-14T14:16:24.157Z · LW · GW

the average ai safety funder does more to accelerate capabilities than they do safety, in part due to credentialism and looking for in group status.

Comment by Kabir Kumar (kabir-kumar) on Short Timelines Don't Devalue Long Horizon Research · 2025-04-12T22:42:12.547Z · LW · GW

this runs into the "assumes powerful ai will be low/non agentic" fallacy

or "assumes ai's that can massively assist in long horizon alignment research will be low/non agentic"

Comment by Kabir Kumar (kabir-kumar) on Short Timelines Don't Devalue Long Horizon Research · 2025-04-12T22:33:31.399Z · LW · GW

"Short Timelines means the value of Long Horizon Research is prompting future AIs"

Would be a more accurate title for this, imo

Comment by Kabir Kumar (kabir-kumar) on You can just wear a suit · 2025-04-01T00:57:13.226Z · LW · GW

In sixth form, I wore a suit for 2 years. Was fun! Then, got kinda bored of suits

Comment by Kabir Kumar (kabir-kumar) on Recent AI model progress feels mostly like bullshit · 2025-04-01T00:49:18.110Z · LW · GW

Why does it seem very unlikely?

Comment by Kabir Kumar (kabir-kumar) on How We Might All Die in A Year · 2025-03-29T13:34:59.982Z · LW · GW

The companies being merged and working together seems unrealistic. 

Comment by Kabir Kumar (kabir-kumar) on Thoughts on “AI is easy to control” by Pope & Belrose · 2025-03-27T12:02:39.199Z · LW · GW

the fact that good humans have been able to keep rogue bad humans more-or-less under control

Isn't stuff like the transatlantic slave trade, genocide of native americans, etc evidence that the amount isn't sufficient?? 

Comment by Kabir Kumar (kabir-kumar) on shouldn't we try to get media attention? · 2025-03-23T19:01:14.174Z · LW · GW

pauseai, controlai, etc, are doing this

Comment by Kabir Kumar (kabir-kumar) on The Case Against AI Control Research · 2025-03-18T12:17:24.241Z · LW · GW

Helps me decide which research to focus on

Comment by Kabir Kumar (kabir-kumar) on Kabir Kumar's Shortform · 2025-03-11T12:54:27.715Z · LW · GW

Both. Not sure, its something like lesswrong/EA speak mixed with the VC speak. 

Comment by Kabir Kumar (kabir-kumar) on Kabir Kumar's Shortform · 2025-03-10T16:02:26.338Z · LW · GW

What I liked about applying for VC funding was the specific questions. 

"How is this going to make money?"

"What proof do you have this is going to make money"

and it being clear the bullshit that they wanted was numbers, testimonials from paying customers, unambiguous ways the product was actually better, etc. And then standard bs about progress, security, avoiding weird wibbly wobbly talk, 'woke', 'safety', etc. 

With Alignment funders, they really obviously have language they're looking for as well, or language that makes them more and less willing to put more effort into understanding the proposal. Actually, they have it more than the VCs. But they act as if they don't. 

Comment by Kabir Kumar (kabir-kumar) on Kabir Kumar's Shortform · 2025-03-10T13:19:55.521Z · LW · GW

it's so unnecessarily hard to get funding in alignment.

they say 'Don't Bullshit' but what that actually means is 'Only do our specific kind of bullshit'.

and they don't specify because they want to pretend that they don't have their own bullshit

Comment by Kabir Kumar (kabir-kumar) on Study Guide · 2025-03-10T12:40:15.978Z · LW · GW

I would not call this a "Guide". 

It's more a list of recommendations and some thoughts on them. 

Comment by Kabir Kumar (kabir-kumar) on A Bear Case: My Predictions Regarding AI Progress · 2025-03-10T10:13:46.552Z · LW · GW

What observations would change your mind? 

Comment by Kabir Kumar (kabir-kumar) on The Hidden Cost of Our Lies to AI · 2025-03-08T11:37:04.421Z · LW · GW

You can split your brain and treat LLMs differently, in a different language. Rather, I can and I think most people could as well

Comment by Kabir Kumar (kabir-kumar) on Challenges with Breaking into MIRI-Style Research · 2025-03-06T01:11:45.455Z · LW · GW

Ok, I want to make that at scale. If multiple people have done it and there's value in it, then there is a formula of some kind. 

We can write it down, make it much easier to understand unambiguously (read: less unhelpful confusion about what to do or what the writer meant and less time wasted figuring that out) than any of the current agent foundations type stuff. 

I'm extremely skeptical that needing to hear a dozen stories dancing around some vague ideas of a point and then 10 analogies (exagerrating to get emotions across) is the best we can do. 

Comment by Kabir Kumar (kabir-kumar) on The Compliment Sandwich 🥪 aka: How to criticize a normie without making them upset. · 2025-03-04T15:01:34.586Z · LW · GW

regardless of if it works, I think it's disrespectful for being manipulative at worst and wasting the persons time at best.

Comment by Kabir Kumar (kabir-kumar) on The Compliment Sandwich 🥪 aka: How to criticize a normie without making them upset. · 2025-03-04T15:00:30.666Z · LW · GW

You can just say the actual criticism in a constructive way. Or if you don't know how to, just ask - "hey I have some feedback to give that I think would help, but I don't know how to say it without it potentially sounding bad - can I tell you and you know I don't dislike you and I don't mean to be disrespectful?" and respect it if they say no, they're not interested. 

Comment by Kabir Kumar (kabir-kumar) on The Sorry State of AI X-Risk Advocacy, and Thoughts on Doing Better · 2025-03-02T20:42:28.867Z · LW · GW

yup.

Comment by Kabir Kumar (kabir-kumar) on The Sorry State of AI X-Risk Advocacy, and Thoughts on Doing Better · 2025-03-02T20:41:32.249Z · LW · GW

Multiple talented researchers I know got into alignment because of PauseAI. 

Comment by Kabir Kumar (kabir-kumar) on The Sorry State of AI X-Risk Advocacy, and Thoughts on Doing Better · 2025-03-02T20:40:11.675Z · LW · GW

You can also give them the clipboard and pen, works well

Comment by Kabir Kumar (kabir-kumar) on Kabir Kumar's Shortform · 2025-02-19T22:03:42.838Z · LW · GW

in general, when it comes to things which are the 'hard part of alignment', is the crux 
```
a flawless method of ensuring the AI system is pointed at and will always continue to be pointed at good things
```
?
the key part being flawless - and that seeming to need a mathematical proof?

Comment by Kabir Kumar (kabir-kumar) on Kabir Kumar's Shortform · 2025-02-13T19:36:13.182Z · LW · GW

Trying to put together a better explainer for the hard part of alignment, while not having a good math background https://docs.google.com/document/d/1ePSNT1XR2qOpq8POSADKXtqxguK9hSx_uACR8l0tDGE/edit?usp=sharing
Please give feedback!

Comment by Kabir Kumar (kabir-kumar) on The Risk of Gradual Disempowerment from AI · 2025-02-07T00:58:53.512Z · LW · GW

Make the (!aligned!) AGI solve a list of problems, then end all other AIs, convince (!harmlessly!) all humans to never make another AI, in a way that they will pass down to future humans, then end itself. 

Comment by Kabir Kumar (kabir-kumar) on Steering Gemini with BiDPO · 2025-01-31T02:52:22.725Z · LW · GW

Thank you for sharing negative results!! 

Comment by Kabir Kumar (kabir-kumar) on The Gentle Romance · 2025-01-31T02:34:36.070Z · LW · GW

Sure? I agree this is less bad than 'literally everyone dying and that's it', assuming there's humans around, living, still empowered, etc in the background. 

I was saying overall, as a story, I find it horrifying, especially contrasting with how some seem to see it utopic. 

Comment by Kabir Kumar (kabir-kumar) on The Gentle Romance · 2025-01-31T00:23:28.837Z · LW · GW
  1. Sure, but it seems like everyone died at some point anyway, and some collective copies of them went on? 

     

  2. I don't think so. I think they seem to be extremely lonely and sad and the AIs are the only way for them to get any form of empowerment. And each time they try to inch further with empowering themselves with the AIs, it leads to the AI actually getting more powerful and themselves only getting a brief moment of more power, but ultimately degrading in mental capacity. And needing to empower the AI more and more, like an addict needing an ever greater high. Until there is nothing left for them to do, but Die and let the AI become the ultimate power. 

     

  3.  I don't particularly care if some non human semisentients manage to be kind of moral/good at coordinating, if it came at what seems to be the cost of all human life. 

 

Even if offscreen all of humanity didn't die, these people dying, killing themselves and never realizing what's actually happening is still insanely horrific and tragic. 

Comment by Kabir Kumar (kabir-kumar) on The Gentle Romance · 2025-01-30T19:53:49.175Z · LW · GW

How is this optimistic. 

Comment by Kabir Kumar (kabir-kumar) on The Gentle Romance · 2025-01-30T19:53:16.263Z · LW · GW

Oh yes. It's extremely dystopian. And extremely lonely, too. Rather than having a person, actual people around him to help, his only help comes from tech. It's horrifyingly lonely and isolated. There is no community, only tech. 

Also, when they died together, it was horrible. They literally offloaded more and more of themselves into their tech until they were powerless to do anything but die. I don't buy the whole 'the thoughts were basically them' thing at all. It was at best, some copy of them. 

There can be made an argument for it qualitatively being them, but quantitatively, obviously not. 

Comment by Kabir Kumar (kabir-kumar) on The Gentle Romance · 2025-01-30T19:48:41.163Z · LW · GW

A few months later, he and Elena decide to make the jump to full virtuality. He lies next to Elena in the hospital, holding her hand, as their physical bodies drift into a final sleep. He barely feels the transition

this is horrifying. Was it intentionally made that way?

Comment by Kabir Kumar (kabir-kumar) on Kabir Kumar's Shortform · 2025-01-25T01:37:33.525Z · LW · GW

Thoughts on this?


### Limitations of HHH and other Static Dataset benchmarks

A Static Dataset is a dataset which will not grow or change - it will remain the same. Static dataset type benchmarks are inherently limited in what information they will tell us about a model. This is especially the case when we care about AI Alignment and want to measure how 'aligned' the AI is.

### Purpose of AI Alignment Benchmarks

When measuring AI Alignment, our aim is to find out exactly how close the model is to being the ultimate 'aligned' model that we're seeking - a model whose preferences are compatible with ours, in a way that will empower humanity, not harm or disempower it.

### Difficulties of Designing AI Alignment Benchmarks
What preferences those are, could be a significant part of the alignment problem. This means that we will need to frequently make sure we know what preferences we're trying to measure for and re-determine if these are the correct ones to be aiming for.

### Key Properties of Aligned Models

These preferences must be both robustly and faithfully held by the model:
Robustness: 
- They will be preserved over unlimited iterations of the model, without deterioration or deprioritization. 
- They will be robust to external attacks, manipulations, damage, etc of the model.
Faithfulness: 
- The model 'believes in', 'values' or 'holds to be true and important' the preferences that we care about .
- It doesn't just store the preferences as information of equal priority to any other piece of information, e.g. how many cats are in Paris - but it holds them as its own, actual preferences.

Comment on the Google Doc here: https://docs.google.com/document/d/1PHUqFN9E62_mF2J5KjcfBK7-GwKT97iu2Cuc7B4Or2w/edit?usp=sharing

This is for the AI Alignment Evals Hackathon: https://lu.ma/xjkxqcya by AI-Plans

Comment by Kabir Kumar (kabir-kumar) on Kabir Kumar's Shortform · 2025-01-20T13:57:40.966Z · LW · GW

this might basically be me, but I'm not sure how exactly to change for the better. theorizing seems to take time and money which i don't have. 

Comment by Kabir Kumar (kabir-kumar) on Kabir Kumar's Shortform · 2025-01-16T12:29:20.865Z · LW · GW

Thinking about judgement criteria for the coming ai safety evals hackathon (https://lu.ma/xjkxqcya )
These are the things that need to be judged: 
1. Is the benchmark actually measuring alignment (the real, scale, if we dont get this fully right right we die, problem) 
2. Is the way of Deceiving the benchmark to get high scores actually deception, or have they somehow done alignment?

Both of these things need: 
- a strong deep learning & ml background (ideally, muliple influential papers where they're one of the main authors/co-authors, or doing ai research at a significant lab, or they have, in the last 4 years)
- a good understanding of what the real alignment problem actually means - can judge this by looking at their papers, activity on lesswrong, alignmentforum, blog, etc
- a good understanding of evals/benchmarks (1 great or two pretty good papers/repos/works on this, ideally for alignment)

Do these seem loose? Strict? Off base?

Comment by Kabir Kumar (kabir-kumar) on Kabir Kumar's Shortform · 2025-01-05T13:55:38.433Z · LW · GW

 I'm looking for feedback on the hackathon page
mind telling me what you think?
https://docs.google.com/document/d/1Wf9vju3TIEaqQwXzmPY--R0z41SMcRjAFyn9iq9r-ag/edit?usp=sharing

Comment by Kabir Kumar (kabir-kumar) on Why I'm Moving from Mechanistic to Prosaic Interpretability · 2025-01-05T02:03:38.037Z · LW · GW

Intelligence is computation. It's measure is success. General intelligence is more generally successful. 

Comment by Kabir Kumar (kabir-kumar) on Kabir Kumar's Shortform · 2025-01-04T01:00:32.555Z · LW · GW

https://kkumar97.blogspot.com/2025/01/pain-of-writing.html 

Comment by Kabir Kumar (kabir-kumar) on Shallow review of live agendas in alignment & safety · 2024-12-30T15:47:58.631Z · LW · GW

We're doing this on https://ai-plans.com !

Comment by Kabir Kumar (kabir-kumar) on johnswentworth's Shortform · 2024-12-27T23:40:06.335Z · LW · GW

Personally, I think o1 is uniquely trash, I think o1-preview was actually better. Getting on average, better things from deepseek and sonnet 3.5 atm.