Posts
Comments
Thank you! Changed it to that!
Yup, that's definitely something that can be argued by people Against during the Debate Stage!
And they might come to the same conclusion!
I'd also read Elementary Analysis before
I'm not a grad physics student- I don't have a STEM degree, or the equivalent- I found the book very readable, nonetheless. It's by far my favourite textbook- feels like it was actually written by someone sane, unlike most.
I'm really glad you wrote this!
I think you address an important distinction there, but I think there might be a further one to be made- in that how we measure/tell if a model is aligned in the first place.
There seems to be a growing voice which says that if a model's output seems to be the output we might expect from an aligned AI, then it's aligned.
I think it's important to distinguish that from the idea that the model is aligned if you actually have a strong idea of what it's values are, how it's gotten them, etc.
I'm really excited to see this!!
I'd like it if this became embed-able so it could be used on ai-plans.com and on other sites!!
Goodness knows, I'd like to be able to get summaries and answers to obscure questions on some alignmentforum posts!
What do you think someone who knows about PDP knows that someone with a good knowledge of DL doesn't?
And why would it be useful?
I think folks in AI Safety tend to underestimate how powerful and useful liability and an established duty of care would be for this.
I think calling things a 'game' makes sense to lesswrongers, but just seems unserious to non lesswrongers.
I don't think a lack of IQ is the reason we've been failing at making AI sensibly. Rather, it's a lack of good incentive making.
Making an AI recklessly is current much more profitable than not doing do- which imo, shows a flaw in the efforts which have gone towards making AI safe - as in, not accepting that some people have a very different mindset/beliefs/core values and figuring out a structure/argument that would incentivize people of a broad range of mindsets.
Hasn't Eliezer Yudkowsky largely failed at solving alignment and getting other to solve alignment?
And wasn't he largely responsible for many people noticing that AGI is possible and potentially highly fruitful?
Why would a world where he's the median person be more likely to solve to solve alignment?
Update: Rob Miles will also be judging some critiques! He'll be judging Communication!
Hi, I'm Kabir Kumar, the founder of AI-Plans.com, I'm happy to answer any questions you might have about the site or the Critique-a-Thon!
Hi, we've already made a site which does this!
Probably much better for health overall to have a bowl of veg and fruit at your table for easy healthy snacking (carrots, cucumber, etc)
Most of my knowledge on dependencies and addictions comes from a brief study I did on neurotransmitter's roles in alcohol dependence/abuse while in school, for an EPQ, so I'm really not sure how much of this applies- also, a lot of my study was finding that my assumptions were in the wrong direction(I didn't know about endorphins)- but I think a lot of the stuff on neurotransmitters and receptors holds across different areas- take it with some salt though.
Quitting cold turkey rarely ever works for addictions/dependencies. The vast majority of time the person has a big resurgence in the addiction.
The balance of dopamine/sensitivity of the dopamine receptors often takes time to shift back.
Tapering, I think for this reason, has been one of the most reliable ways of recovering from an addiction/dependence. I believe it's been shown to have a 70% success rate.
Interestingly, the first study I found on tapering, which is testing tapering strips in assistance of discontinuing antidepressant use, also says 70% https://journals.sagepub.com/doi/full/10.1177/20451253211039327
Ever site I read on reducing alcohol dependency with tapering said something similar, back in the day.
When I say media, I mean social media, movies, videos, books etc- any type of recording or something that you believe you're using as entertainment.
I'm trying this myself. Done singular days before, sometimes 2 or 3 days, but failed to keep it consistent. I did find that when I did it, my work output was far higher and greater quality, I had a much better sleeping schedule and was generally in a much more enjoyable mood.
I also ended up spending more time with friends and family, meeting new people, trying interesting things, spending time outdoors, etc.
This time I'm building up to it- starting with 1 media free hour a day, then 2 hours, then 3, etc.
I think building up to it will let me build new habits which will stick more.
A challenge for folks interested: spend 2 weeks without media based entertainment.
"CESI’s Artificial Intelligence Standardization White Paper released in 2018 states
that “AI systems that have a direct impact on the safety of humanity and the safety of life,
and may constitute threats to humans” must be regulated and assessed, suggesting a broad
threat perception (Section 4.5.7).42 In addition, a TC260 white paper released in 2019 on AI
safety/security worries that “emergence” (涌现性) by AI algorithms can exacerbate the
black box effect and “autonomy” can lead to algorithmic “self-improvement” (Section
3.2.1.3).43"
From https://concordia-consulting.com/wp-content/uploads/2023/10/State-of-AI-Safety-in-China.pdf
I disagree with this paragraph today: "A lot of what AI does currently, that is visible to the general public seems like it could be replicated without AI"
I was talking about for a farmer. For a consumer, they can get their eggs/milk from such a farmer and fund/invest in such a farm, if they can.
Or talk to a local farm about setting aside some chickens, pay for them to be given extra space, better treatment, etc.
I don't really know what you mean about the EA reducetarian stuff.
Also, if you as an individual want to be healthy, not contribute to harming animal and have the time, space, money, willingness etc to raise some chickens, why not?
Exercise in general is pretty great, yes. Especially if done outdoors, imo.
Could a solution to some of this be to raise some chickens for eggs, treat them nicely, give them space to roam, etc?
Obviously the best would be to raise cows as well, treat them well, don't kill the male calves, etc- but that's much less of an option for most.
This is great! Thank you for doing this! Might add some of these to ai-plans.com!
Yes, winning if fun!
I think this kind of thing makes people feel like you're pushing a message, to which the automatic response is to push back.
What I've found works is to be agreeable, inviting, meet them at their own values and present how it as a hard problem to solve which isn't being competently tackled by this other dumb group (not us, we wouldn't do this).
That kind of thing. Had a 100% success rate so far.
I'm simplifying my approach, since I'm not spending a lot of time on this, but if you imagine I'm not a dumbass and think about what kind of approach like this could work a lot, while not being dumb in that it doesn't actually address the problem, you'll probably get what I mean.
I'm generally disincentivized to post or put effort into a post from the system where someone can just heavily downvote my post, without even giving a reason.
A simple way to improve this system would be to require someone to comment/give a reason when heavily upvoting/heavily downvoting things.
"In the ancestral environment, politics was a matter of life and death." - this is a pretty strong statement to make with no evidence to back it up.
What about orgs such as ai-plans.com, which aim to be exponentially useful for AI Safety?
I think your ideas are some of the most promising I've seen- I'd love to see them pursued further, though I'm concerned about the air-gaping
Hi Ruby! Thanks for the great feedback!! Sorry for the late reply, I've been working on the site!
So, we're not doing just criticisms anymore- we're ranking plans by Total Strength score - Total Vulnerabilities scores. Quite a few researchers have been posting their plans on the site!
Going to do a full rebuild soon, to make the site look nicer and be even faster to work on.
We're also holding regular critique-a-thons. The last one went very well!
We had 40+ submissions and produced what I think is really great work!
We also made a Broad List of Vulnerabilities in the first two days! https://docs.google.com/document/d/1tCMrvJEueePNgb2_nOEUMc_UGce7TxKdqI5rOJ1G7C0/edit?usp=sharing
On not getting all of a plan's details without talking to the person a lot- I think this is a vulnerability in communication.
A serious plan, with the intention of actually solving the problem, should have the effort put into it to make it clear to a reader what it actually is, what problems it aims to solve, why it aims to solve them and how it seeks to do so.
A failure to do so is silly for any serious strategy.
The good thing is, that if such a vulnerability is pointed out, on AI-Plans.com, the poster can see the vulnerability and iterate on it!
This was really great. Thanks for making it.
I was curious why Trump was dropping some of the best takes!
Yeah, I think you're right- at least about the sequences.
I think something more specific about attitudes would be more accurate and useful.
Thank you! I've sorted that now!!
Please let me know if you have any other feedback!!
From my very spotty info on evolution:
Humans got 'trained' to maximise reproducibility and in doing so maximised a bunch of other stuff along the way- including resource acquisition.
What I spoke about here is creating an environment where a more intelligent+fast agent is put in an environment that is deliberately crafted such that it can only survive by helping much dumber, slower agents. Training to act co-operatively.
Writing this out, I may have just made an overcomplicated version of reinforcement learning.
That was something like what I was thinking. But I think this won't work, unless modified so much that it'd be completely different. More an idea to toss around.
I'll start over with something else. I do think something that might have value is designing an environment that induces empathy/values/whatever, rather than directly trying to design the AI to be what you want from scratch.
Environment design can be very powerful in influencing humans, but that's in huge part because we (or at least, those of us who put thought in designing environments for folk) understand humans far better than we understand AI.
Like a lot of the not-ridiculously terrible and only extremely terrible plans, this kind of relies on a lot of interpretability.
That's very astute. True.
On the porch/outside/indoors thing- maybe that's not a great example, because having the numbers there seems to add nothing of value to me. Other than maybe clarifying to yourself how you feel about certain ideas/outcomes, but that's something that any one with decent thinking does anyways.
Sorry, I think I have an idea of what you're saying, but I'm not really sure. Do you mind elaborating? With a little less LessWrong lingo, please.
Absolutely!
One of the reasons I've gone against the idea of tags, different ways of sorting, etc (though they get brought up a lot) is that it could lead to plans which are the most attractive at first glance, or the most understandable at first glance, appealing, etc getting the most attention.
It's very important that what a criticism's points measure is the validity of the criticism to the plan and not something else - though, I think if there are two criticisms making the same point and one gets a higher amount of points because it's more readable/better said/organized, this would actually be good.
Some of the measures taken for this so far:
Criticisms do not have author attribution- so someone such as Musk, Yudkowsky, etc can't just post 'this plan suxx, lmaooo~~' and get a thousand points (we're obviously working on a spam filter to catch obvious stuff like this).
Authors/posters of plans cannot vote on criticisms of their own plans (we're also thinking about solutions to sock puppets)
Criticizers cannot vote on their own criticisms.
We're thinking about having a system for measuring if users are just voting for the same people- this could help with sock puppets and also voting circles. We're working with TJ to integrate the EigenKarma Network, which I think may be able to help with this.
A lot of this is going to be stuff the average user never notices or sees- the goal is to make something that just works, by aggressively attacking the ways it might not work.
It's very important to get the 'root' right in the karma system- make sure that the first few users, who might heavily influence which way the site's direction goes, is done right. I've been doing a lot of red teaming of the ideas for this.
Currently, I'm making a rigorous test for prospective moderators, to make sure they understand what the hard and important parts of the alignment problem are, which I'll be posting here and in other groups when it's done. Dr Roman Yampolsky has also sent some papers on why he believes aligning an AGI/ASI is impossible, I will be integrating those as well.
Another problem is getting users- emailing scientists whose papers we've added to both avoid any copyright/stepping on toes problems and also generate interest and get feedback on the site- I've been very pleasantly surprised with many of the responses!!
Do you have any suggestions on improvements we could make or things we should be doing but haven't thought of? I'd love to hear them!!
Thank you, I think there's an error in my phrasing.
I should have said:
Currently, it takes a very long time to get an idea of who is doing what in the field of AI Alignment and how good each plan is, what the problems are, etc.
Thank you very much for this.
I agree, it does seem like this way, people will end up getting a bunch of karma even for bad criticisms. Which would defeat the whole point of the points system.
I'm not sure I fully understand "So I would rather make sure that the bottom half of criticism gets an increasing potential for negative karma impact, by applying a weight on the upvote points starting from 1 for the median criticism, and progressing towards 0 for the worst criticism. (goodness can be measured as unweighted votes divided by number of votes.)"
I think there's a lot of merit in affecting karma points just for the action of criticism and voting. Perhaps 1 karma for every criticism that has net positive votes? And perhaps 1 karma for the first 5 votes, 25 votes, 125 votes etc?
"For users in high karma range I would engourage to do much criticising and especially voting. For this reason I would apply a constant monthly karma reduction on them which can only be undone by sufficient karma collected through criticising and voting." This is really interesting - we've been talking about ideas such as -perhaps after some time karma dissolves or turns into a different form of karma that doesn't give weight - or create ways to spend karma(perhaps to give a criticism a stronger upvote/downvote?)
not just that. It's because the field isn't organized at all.
Sorry, what do you mean?