Posts

AI Law-a-Thon 2024-01-28T02:30:09.737Z
Review of Alignment Plan Critiques- December AI-Plans Critique-a-Thon Results 2024-01-15T19:37:07.984Z
Critique-a-Thon of AI Alignment Plans 2023-12-05T20:50:07.661Z
Proposal for improving state of alignment research 2023-11-06T13:55:39.015Z
Looking for judges for critiques of Alignment Plans 2023-08-17T22:35:40.666Z
Specific Arguments against open source LLMs? 2023-07-30T14:27:13.116Z
AI-Plans.com 10-day Critique-a-Thon 2023-07-27T11:44:01.660Z
Simple alignment plan that maybe works 2023-07-18T22:48:36.771Z
Even briefer summary of ai-plans.com 2023-07-16T23:25:44.076Z
LeCun says making a utility function is intractable 2023-06-28T18:02:13.721Z
Brief summary of ai-plans.com 2023-06-28T00:33:36.309Z
An overview of the points system 2023-06-27T09:09:54.881Z
AI-Plans.com - a contributable compendium 2023-06-25T14:40:01.414Z
A more effective Elevator Pitch for AI risk 2023-06-15T12:39:03.363Z
A more grounded idea of AI risk 2023-05-11T09:48:00.569Z
An Ignorant View on Ineffectiveness of AI Safety 2023-01-07T01:29:59.126Z

Comments

Comment by Iknownothing on Even briefer summary of ai-plans.com · 2024-03-06T12:11:28.032Z · LW · GW

Thank you! Changed it to that!

Comment by Iknownothing on AI Law-a-Thon · 2024-02-02T12:36:25.985Z · LW · GW

Yup, that's definitely something that can be argued by people Against during the Debate Stage!
And they might come to the same conclusion!

Comment by Iknownothing on E.T. Jaynes Probability Theory: The logic of Science I · 2023-12-28T14:19:13.976Z · LW · GW

I'd also read Elementary Analysis before

Comment by Iknownothing on E.T. Jaynes Probability Theory: The logic of Science I · 2023-12-28T04:23:01.294Z · LW · GW

I'm not a grad physics student- I don't have a STEM degree, or the equivalent- I found the book very readable, nonetheless. It's by far my favourite textbook- feels like it was actually written by someone sane, unlike most.

Comment by Iknownothing on Critical review of Christiano's disagreements with Yudkowsky · 2023-12-28T04:20:41.223Z · LW · GW

I'm really glad you wrote this! 
I think you address an important distinction there, but I think there might be a further one to be made- in that how we measure/tell if a model is aligned in the first place. 
There seems to be a growing voice which says that if a model's output seems to be the output we might expect from an aligned AI, then it's aligned. 
I think it's important to distinguish that from the idea that the model is aligned if you actually have a strong idea of what it's values are, how it's gotten them, etc. 

Comment by Iknownothing on Torture vs. Dust Specks · 2023-12-25T18:10:37.395Z · LW · GW
Comment by Iknownothing on AI Safety Chatbot · 2023-12-24T10:18:09.231Z · LW · GW

I'm really excited to see this!! 
I'd like it if this became embed-able so it could be used on ai-plans.com and on other sites!!
Goodness knows, I'd like to be able to get summaries and answers to obscure questions on some alignmentforum posts!

Comment by Iknownothing on AI safety advocates should consider providing gentle pushback following the events at OpenAI · 2023-12-23T02:14:08.354Z · LW · GW
Comment by Iknownothing on Why aren't more people in AIS familiar with PDP? · 2023-12-23T02:11:12.819Z · LW · GW

What do you think someone who knows about PDP knows that someone with a good knowledge of DL doesn't?
And why would it be useful?

Comment by Iknownothing on Why Is No One Trying To Align Profit Incentives With Alignment Research? · 2023-12-23T02:08:33.927Z · LW · GW

I think folks in AI Safety tend to underestimate how powerful and useful liability and an established duty of care would be for this.

Comment by Iknownothing on Here's the exit. · 2023-12-19T16:08:21.280Z · LW · GW

I think calling things a 'game' makes sense to lesswrongers, but just seems unserious to non lesswrongers.

Comment by Iknownothing on How dath ilan coordinates around solving alignment · 2023-12-11T21:47:30.241Z · LW · GW

I don't think a lack of IQ is the reason we've been failing at making AI sensibly. Rather, it's a lack of good incentive making. 
Making an AI recklessly is current much more profitable than not doing do- which imo, shows a flaw in the efforts which have gone towards making AI safe - as in, not accepting that some people have a very different mindset/beliefs/core values and figuring out a structure/argument that would incentivize people of a broad range of mindsets.

Comment by Iknownothing on How dath ilan coordinates around solving alignment · 2023-12-11T15:03:43.928Z · LW · GW

Hasn't Eliezer Yudkowsky largely failed at solving alignment and getting other to solve alignment? 
And wasn't he largely responsible for many people noticing that AGI is possible and potentially highly fruitful?
Why would a world where he's the median person be more likely to solve to solve alignment?

Comment by Iknownothing on Critique-a-Thon of AI Alignment Plans · 2023-12-07T17:03:06.190Z · LW · GW

Update: Rob Miles will also be judging some critiques! He'll be judging Communication!

Comment by Iknownothing on Critique-a-Thon of AI Alignment Plans · 2023-12-05T20:54:24.788Z · LW · GW

Hi, I'm Kabir Kumar, the founder of AI-Plans.com, I'm happy to answer any questions you might have about the site or the Critique-a-Thon!

Comment by Iknownothing on Shallow review of live agendas in alignment & safety · 2023-12-05T19:40:33.283Z · LW · GW

Hi, we've already made a site which does this!

Comment by Iknownothing on Does bulemia work? · 2023-11-06T18:58:16.781Z · LW · GW

Probably much better for health overall to have a bowl of veg and fruit at your table for easy healthy snacking (carrots, cucumber, etc)

Comment by Iknownothing on AI Safety is Dropping the Ball on Clown Attacks · 2023-11-06T17:26:40.177Z · LW · GW

Most of my knowledge on dependencies and addictions comes from a brief study I did on neurotransmitter's roles in alcohol dependence/abuse while in school,  for an EPQ, so I'm really not sure how much of this applies- also, a lot of my study was finding that my assumptions were in the wrong direction(I didn't know about endorphins)- but I think a lot of the stuff on neurotransmitters and receptors holds across different areas- take it with some salt though. 

Quitting cold turkey rarely ever works for addictions/dependencies. The vast majority of time the person has a big resurgence in the addiction.
The balance of dopamine/sensitivity of the dopamine receptors often takes time to shift back. 
Tapering, I think for this reason, has been one of the most reliable ways of recovering from an addiction/dependence. I believe it's been shown to have a 70% success rate. 
Interestingly, the first study I found on tapering, which is testing tapering strips in assistance of discontinuing antidepressant use, also says 70% https://journals.sagepub.com/doi/full/10.1177/20451253211039327
Ever site I read on reducing alcohol dependency with tapering said something similar, back in the day.

Comment by Iknownothing on AI Safety is Dropping the Ball on Clown Attacks · 2023-11-06T13:26:44.097Z · LW · GW

When I say media, I mean social media, movies, videos, books etc- any type of recording or something that you believe you're using as entertainment. 

I'm trying this myself. Done singular days before, sometimes 2 or 3 days, but failed to keep it consistent. I did find that when I did it, my work output was far higher and greater quality, I had a much better sleeping schedule and was generally in a much more enjoyable mood.
I also ended up spending more time with friends and family, meeting new people, trying interesting things, spending time outdoors, etc. 

This time I'm building up to it- starting with 1 media free hour a day, then 2 hours, then 3, etc. 
I think building up to it will let me build new habits which will stick more.

 

Comment by Iknownothing on AI Safety is Dropping the Ball on Clown Attacks · 2023-11-05T01:36:55.578Z · LW · GW

A challenge for folks interested: spend 2 weeks without media based entertainment. 

Comment by Iknownothing on My thoughts on the social response to AI risk · 2023-11-04T23:07:47.682Z · LW · GW

"CESI’s Artificial Intelligence Standardization White Paper released in 2018 states
that “AI systems that have a direct impact on the safety of humanity and the safety of life,
and may constitute threats to humans” must be regulated and assessed, suggesting a broad
threat perception (Section 4.5.7).42 In addition, a TC260 white paper released in 2019 on AI
safety/security worries that “emergence” (涌现性) by AI algorithms can exacerbate the
black box effect and “autonomy” can lead to algorithmic “self-improvement” (Section
3.2.1.3).43"
From https://concordia-consulting.com/wp-content/uploads/2023/10/State-of-AI-Safety-in-China.pdf 
 

Comment by Iknownothing on An Ignorant View on Ineffectiveness of AI Safety · 2023-10-31T02:08:59.511Z · LW · GW

I disagree with this paragraph today: "A lot of what AI does currently, that is visible to the general public seems like it could be replicated without AI"

Comment by Iknownothing on EA Vegan Advocacy is not truthseeking, and it’s everyone’s problem · 2023-10-02T17:58:40.286Z · LW · GW

I was talking about for a farmer. For a consumer, they can get their eggs/milk from such a farmer and fund/invest in such a farm, if they can. 
Or talk to a local farm about setting aside some chickens, pay for them to be given extra space, better treatment, etc.

I don't really know what you mean about the EA reducetarian stuff. 

Also, if you as an individual want to be healthy, not contribute to harming animal and have the time, space, money, willingness etc to raise some chickens, why not? 

Comment by Iknownothing on List of how people have become more hard-working · 2023-10-02T09:21:36.282Z · LW · GW

Exercise in general is pretty great, yes. Especially if done outdoors, imo.

Comment by Iknownothing on EA Vegan Advocacy is not truthseeking, and it’s everyone’s problem · 2023-10-02T09:18:55.712Z · LW · GW

Could a solution to some of this be to raise some chickens for eggs, treat them nicely, give them space to roam, etc? 
Obviously the best would be to raise cows as well, treat them well, don't kill the male calves, etc- but that's much less of an option for most.

Comment by Iknownothing on AI Alignment Breakthroughs this Week [new substack] · 2023-10-02T09:12:27.657Z · LW · GW

This is great! Thank you for doing this! Might add some of these to ai-plans.com!

Comment by Iknownothing on The point of a game is not to win, and you shouldn't even pretend that it is · 2023-09-28T19:11:56.775Z · LW · GW

Yes, winning if fun!

Comment by Iknownothing on [deleted post] 2023-09-26T22:49:24.627Z

I think this kind of thing makes people feel like you're pushing a message, to which the automatic response is to push back.
What I've found works is to be agreeable, inviting, meet them at their own values and present how it as a hard problem to solve which isn't being competently tackled by this other dumb group (not us, we wouldn't do this). 
That kind of thing. Had a 100% success rate so far.
I'm simplifying my approach, since I'm not spending a lot of time on this, but if you imagine I'm not a dumbass and think about what kind of approach like this could work a lot, while not being dumb in that it doesn't actually address the problem, you'll probably get what I mean.

Comment by Iknownothing on Automatic Rate Limiting on LessWrong · 2023-09-26T20:07:52.609Z · LW · GW

I'm generally disincentivized to post or put effort into a post from the system where someone can just heavily downvote my post, without even giving a reason.
 

Comment by Iknownothing on Automatic Rate Limiting on LessWrong · 2023-09-26T20:04:40.588Z · LW · GW

A simple way to improve this system would be to require someone to comment/give a reason when heavily upvoting/heavily downvoting things. 
 

Comment by Iknownothing on Politics is the Mind-Killer · 2023-09-23T12:44:50.964Z · LW · GW
Comment by Iknownothing on Politics is the Mind-Killer · 2023-09-23T12:44:27.658Z · LW · GW

"In the ancestral environment, politics was a matter of life and death." - this is a pretty strong statement to make with no evidence to back it up.

Comment by Iknownothing on There should be more AI safety orgs · 2023-09-23T12:30:17.014Z · LW · GW

What about orgs such as ai-plans.com, which aim to be exponentially useful for AI Safety?

Comment by Iknownothing on There should be more AI safety orgs · 2023-09-23T12:29:24.963Z · LW · GW

I think your ideas are some of the most promising I've seen- I'd love to see them pursued further, though I'm concerned about the air-gaping  

Comment by Iknownothing on AI-Plans.com - a contributable compendium · 2023-09-17T02:49:35.620Z · LW · GW

Hi Ruby! Thanks for the great feedback!! Sorry for the late reply, I've been working on the site!

So, we're not doing just criticisms anymore- we're ranking plans by Total Strength score - Total Vulnerabilities scores. Quite a few researchers have been posting their plans on the site!
Going to do a full rebuild soon, to make the site look nicer and be even faster to work on.
We're also holding regular critique-a-thons. The last one went very well! 
We had 40+ submissions and produced what I think is really great work!
We also made a Broad List of Vulnerabilities in the first two days! https://docs.google.com/document/d/1tCMrvJEueePNgb2_nOEUMc_UGce7TxKdqI5rOJ1G7C0/edit?usp=sharing 

On not getting all of a plan's details without talking to the person a lot- I think this is a vulnerability in communication. 
A serious plan, with the intention of actually solving the problem, should have the effort put into it to make it clear to a reader what it actually is, what problems it aims to solve, why it aims to solve them and how it seeks to do so. 
A failure to do so is silly for any serious strategy. 

The good thing is, that if such a vulnerability is pointed out, on AI-Plans.com, the poster can see the vulnerability and iterate on it!
 

Comment by Iknownothing on AI presidents discuss AI alignment agendas · 2023-09-11T23:51:58.228Z · LW · GW

This was really great. Thanks for making it.

Comment by Iknownothing on AI presidents discuss AI alignment agendas · 2023-09-11T23:50:48.560Z · LW · GW

I was curious why Trump was dropping some of the best takes!

Comment by Iknownothing on [deleted post] 2023-08-24T11:02:54.779Z

Yeah, I think you're right- at least about the sequences. 

I think something more specific about attitudes would be more accurate and useful.

Comment by Iknownothing on AI-Plans.com 10-day Critique-a-Thon · 2023-07-27T21:16:51.815Z · LW · GW

Thank you! I've sorted that now!!

Please let me know if you have any other feedback!!

Comment by Iknownothing on Simple alignment plan that maybe works · 2023-07-19T22:16:48.409Z · LW · GW

From my very spotty info on evolution:
Humans got 'trained' to maximise reproducibility and in doing so maximised a bunch of other stuff along the way- including resource acquisition.

What I spoke about here is creating an environment where a more intelligent+fast agent is put in an environment that is deliberately crafted such that it can only survive by helping much dumber, slower agents. Training to act co-operatively. 

Writing this out, I may have just made an overcomplicated version of reinforcement learning.

Comment by Iknownothing on Simple alignment plan that maybe works · 2023-07-19T22:11:30.061Z · LW · GW

That was something like what I was thinking. But I think this won't work, unless modified so much that it'd be completely different. More an idea to toss around.
 

I'll start over with something else. I do think something that might have value is designing an environment that induces empathy/values/whatever, rather than directly trying to design the AI to be what you want from scratch. 
Environment design can be very powerful in influencing humans, but that's in huge part because we (or at least, those of us who put thought in designing environments for folk) understand humans far better than we understand AI. 

Like a lot of the not-ridiculously terrible and only extremely terrible plans, this kind of relies on a lot of interpretability. 

Comment by Iknownothing on Simple alignment plan that maybe works · 2023-07-19T22:06:33.656Z · LW · GW

That's very astute. True.

Comment by Iknownothing on Decision Theory with the Magic Parts Highlighted · 2023-07-17T11:10:23.643Z · LW · GW

On the porch/outside/indoors thing- maybe that's not a great example, because having the numbers there seems to add nothing of value to me. Other than maybe clarifying to yourself how you feel about certain ideas/outcomes, but that's something that any one with decent thinking does anyways.

Comment by Iknownothing on Even briefer summary of ai-plans.com · 2023-07-17T10:37:29.046Z · LW · GW

Sorry, I think I have an idea of what you're saying, but I'm not really sure. Do you mind elaborating? With a little less LessWrong lingo, please.

Comment by Iknownothing on Even briefer summary of ai-plans.com · 2023-07-17T09:51:10.717Z · LW · GW

Absolutely! 

One of the reasons I've gone against the idea of tags, different ways of sorting, etc (though they get brought up a lot) is that it could lead to plans which are the most attractive at first glance, or the most understandable at first glance, appealing, etc getting the most attention.
It's very important that what a criticism's points measure is the validity of the criticism to the plan and not something else - though, I think if there are two criticisms making the same point and one gets a higher amount of points because it's more readable/better said/organized, this would actually be good. 
Some of the measures taken for this so far: 
Criticisms do not have author attribution- so someone such as Musk, Yudkowsky, etc can't just post 'this plan suxx, lmaooo~~' and get a thousand points (we're obviously working on a spam filter to catch obvious stuff like this).
Authors/posters of plans cannot vote on criticisms of their own plans (we're also thinking about solutions to sock puppets)
Criticizers cannot vote on their own criticisms.
We're thinking about having a system for measuring if users are just voting for the same people- this could help with sock puppets and also voting circles. We're working with TJ to integrate the EigenKarma Network, which I think may be able to help with this.
A lot of this is going to be stuff the average user never notices or sees- the goal is to make something that just works, by aggressively attacking the ways it might not work.

It's very important to get the 'root' right in the karma system- make sure that the first few users, who might heavily influence which way the site's direction goes, is done right. I've been doing a lot of red teaming of the ideas for this. 

Currently, I'm making a rigorous test for prospective moderators, to make sure they understand what the hard and important parts of the alignment problem are, which I'll be posting here and in other groups when it's done. Dr Roman Yampolsky has also sent some papers on why he believes aligning an AGI/ASI is impossible, I will be integrating those as well. 
 

Another problem is getting users- emailing scientists whose papers we've added to both avoid any copyright/stepping on toes problems and also generate interest and get feedback on the site- I've been very pleasantly surprised with many of the responses!!

Do you have any suggestions on improvements we could make or things we should be doing but haven't thought of? I'd love to hear them!!

Comment by Iknownothing on Brief summary of ai-plans.com · 2023-07-05T12:25:36.413Z · LW · GW

Thank you, I think there's an error in my phrasing. 
I should have said: 

Currently, it takes a very long time to get an idea of who is doing what in the field of AI Alignment and how good each plan is, what the problems are, etc.
Comment by Iknownothing on An overview of the points system · 2023-07-01T18:51:16.582Z · LW · GW

Thank you very much for this. 
I agree, it does seem like this way, people will end up getting a bunch of karma even for bad criticisms. Which would defeat the whole point of the points system.

I'm not sure I fully understand "So I would rather make sure that the bottom half of criticism gets an increasing potential for negative karma impact, by applying a weight on the upvote points starting from 1 for the median criticism, and progressing towards 0 for the worst criticism. (goodness can be measured as unweighted votes divided by number of votes.)"

I think there's a lot of merit in affecting karma points just for the action of criticism and voting. Perhaps 1 karma for every criticism that has net positive votes? And perhaps 1 karma for the first 5 votes, 25 votes, 125 votes etc? 

"For users in high karma range I would engourage to do much criticising and especially voting. For this reason I would apply a constant monthly karma reduction on them which can only be undone by sufficient karma collected through criticising and voting." This is really interesting - we've been talking about ideas such as -perhaps after some time karma dissolves or turns into a different form of karma that doesn't give weight - or create ways to spend karma(perhaps to give a criticism a stronger upvote/downvote?)
 


 

Comment by Iknownothing on Brief summary of ai-plans.com · 2023-06-30T16:19:51.813Z · LW · GW

not just that. It's because the field isn't organized at all. 

Comment by Iknownothing on LeCun says making a utility function is intractable · 2023-06-29T18:10:01.890Z · LW · GW

Sorry, what do you mean?

Comment by Iknownothing on [deleted post] 2023-06-26T23:24:14.140Z