Posts
Comments
I feel pretty sympathetic to the desire not to do things by text; I suspect you get much more practiced and checked over answers that way.
which privacy skills you are able to execute.
This link goes to a private google doc, just fyi.
This is great!
I really like this about slack:
- If you aren’t maintaining this, err on the side of cultivating this rather than doing high-risk / high-reward investments that might leave you emotionally or financially screwed.
- (or, if you do those things, be aware I may not help you if it fails. I am much more excited about helping people that don’t go out of their way to create crises)
Seems like a good norm and piece of advice.
I'm confused how much I should care whether an impact assessment is commissioned by some organization. The main thing I generally look for is whether the assessment / investigation is independent. The argument is that because AISC is paying for it, that will influence the assessors?
I have not read most of what there is to read here, just jumping in on "illegal drugs" ---> ADHD meds. Chloe's comment spoke to weed as the illegal drug on her mind.
AI has immense potential, but also immense risks. AI might be misused by China, or get of control. We should balance the needs for innovation and safety." I wouldn't call this lying (though I agree it can have misleading effects, see Issue 1).
Not sure where this slots in, but there's also a sense in which this contains a missing positive mood about how unbelievably good (aligned) AI could or will be, and how much we're losing by not having it earlier.
Thanks!
Interesting how many of these are "democracy / citizenry-involvement" oriented. Strongly agree with 18 (whistleblower protection) and 38 (simulate cyber attacks).
20 (good internal culture), 27 (technical AI people on boards) and 29 (three lines of defense) sound good to me, I'm excited about 31 if mandatory interpretability standards exist.
42 (on sentience) seems pretty important but I don't know what it would mean.
The top 6 of the ones in the paper (the ones I think got >90% somewhat or strongly agree, listed below), seem pretty similar to me - are there important reasons people might support one over another?
- Pre-deployment risk assessments
- Evaluations of dangerous capabilities
- Third-party model audits
- Red teaming
- Pre-training risk assessments
- Pausing training of dangerous models
Curious if you have any updates!
Chat GPT gives some interesting analysis when asked, though I think not amazingly accurate. (The sentence I gave it, from here is a weird example, though)
Does it say anything about AI risk that is about the real risks? (Have not clicked the links, the text above did not indicate to me one way or another).
This is great, and speaks to my experience as well. I have my own frames that map onto some of this but don't hit some of the things you've hit and vice versa. Thanks for writing!
Is this something Stampy would want to help with?
https://www.lesswrong.com/posts/WXvt8bxYnwBYpy9oT/the-main-sources-of-ai-risk
I think that incentivizes self-deception on probabilities. Also, P <10^-10 are pretty unusual, so I'd expect that to cause very little to happen.
Thanks!
When you say "They do, however, have the potential to form simulacra that are themselves optimizers, such as GPT modelling humans (with pretty low fidelity right now) when making predictions"
do you mean things like "write like Ernest Hemingway"?
Is it true that current image systems like stable diffusion are non-optimizers? How should that change our reasoning about how likely it is that systems become optimizers? How much of a crux is "optimizeriness" for people?
Why do people keep saying we should maximize log(odds) instead of odds? Isn't each 1% of survival equally valuable?
In addition to Daniel's point, I think an important piece is probabilistic thinking - the AGI will execute not based on what will happen but on what it expects to happen. What probability is acceptable? If none, it should do nothing.
Have you written about your update to slow takeoff?
Nice! Added these to the wiki on calibration: https://www.lesswrong.com/tag/calibration
Oh, whoops. I took from this later tweet in the thread that they were talking.
After years of tinkering and incremental progress, AIs can now play Diplomacy as well as human experts.[6]
Maybe this happened in 2022: https://twitter.com/polynoamial/status/1580185706735218689
Let me know if you have a cheerful price for this!
Here's the git! https://github.com/SonOfLilit/calibrate?fbclid=IwAR2vBZ8IWfMgHTPla0CbohCUIqmrMUl-XEcYIWhKUrJ4ZRfH2Eg7Z7Zf1J4
I will talk to the developer about it being open source - I think that was both of our ideals.
Do you know how to do this kind of thing? I'd be happy to pay you for your time.
This seems interesting to me but I can't yet latch onto it. Can you give examples of secrets being one or the other?
Are you distinguishing between "secrets where the existence of the secret is a big part of the secret" and "secrets where it's not"?
One of my feature requests! Just hard to do.
Why would they be jokes?
Don't know what you mean in the latter sentence.
Conversational moves in EA / Rationality that I like for epistemics
- “So you are saying that”
- “But I’d change my mind if”
- “But I’m open to push back here”
- “I’m curious for your take here”
- “My model says”
- “My current understanding is…”
- “...I think this because…”
- “...but I’m uncertain about…”
- “What could we bet on?”
- “Can you lay out your model for me?”
- “This is a butterfly idea”
- “Let’s do a babble”
- “I want to gesture at something / I think this gestures at something true”
This is why less wrong needs the full suite of emoji reacts.
Title changed!
I meant signposting to indicate things like saying "here's a place where I have more to say but not in this context" etc, during for instance a conversation, so I'm truthfully saying that there's more to the story.
Yeah, I think "intentionally causing others to update in the wrong direction" and "leaving them with their priors" end up pretty similar (if you don't make strong distinctions between action and omission, which I think this test at least partially rests on) if you have a good model of their priors (which I think is potentially the hardest part here).
Kind is one of the four adjectives in your description of Iron Hufflepuff.
Hm, Keltham has a lot of good qualities here, but kind doesn't seem among them.
Sounds scary, but thank you for the model of what's actually going on!
Oh woah! Thanks for linking.
True! 65 Watts! That would really be something.
Unfortunately I'm not seeing anything close to that on the Amazon UK site :/
Might be bad search skills, though.
Your link's lightbulbs have a bayonet style, not the E27 threading :) Thanks for the other link! Amazon says currently unavailable.
ETA: Found some, will add to post
Tried to buy those, didn't have any luck finding ones that fit nicely into my sockets! (An embarassing mistake I didn't describe in detail is buying corn bulbs that turned out to be...mini?) If you have an amazon UK link for ones with E27 threading, that would be awesome.
ETA: Having looked, it looks like not all corn bulbs are brighter than the ones I have, though I have now found 2000 lumen ones. I don't know if corn bulbs are still better if they have lower lumens. I would guess not?
ETA 2: The link above does have E27 if you click through the multiple listings in the same link, wasn't obvious to me at first, thanks!
I saw people discussing forecasting success of this on twitter and people were saying that the intelligence agencies actually called this right. Does anyone know an easy link to what those agencies were saying?
Context: https://twitter.com/ClayGraubard/status/1496699988801433602?s=20&t=mQ8sAzMRppI8Pr44O38M3w
https://twitter.com/ClayGraubard/status/1496866236973658112?s=20&t=mQ8sAzMRppI8Pr44O38M3w
Nice! Welcome!
I definitely find it helpful to be surrounded by people who will do this for me and help me cultivate a habit of it over time. The case for it being very impactful is if people do a one-time thing, like apply for something or put themselves in the running for something that they otherwise wouldn't have that makes a big difference. The ones that are about accountability (Can I remind you about that in a week?) also are sort of a conscientiousness loan, which can be cheap since it can be easier to check in on other people than to do it for yourself.
It is definitely important to have sense of who you're talking to and what they need (law of equal and opposite advice). For what it's worth, 5-10 and 13 are aimed to be disproportionately helpful for people who have trouble doing things (depending on the reason).
The other move, I think, is something like "my cat's not doing well", which is pretty fucked up to say if false, but does put the frame back on "you don't know what's going on with me and you don't get to assume".
What comes to mind immediately for me is that conveying those of these that are relatively empirical to newcomers in each given field seems really valuable.
Oh yeah, I've seen you post this before, I liked it!
For what it's worth, I think some of those terrible ideas are great or close to great.
In particular:
- Hire a team of well-paid moderators for a three-month high-effort experiment of responding to every bad comment with a fixed version of what a good comment making the same point would have looked like. Flood the site with training data.
- Make a fork of LessWrong run by me, or some other hopeless idealist that still thinks that there might be something actually good that we can get if we actually do the thing (but not if we don't).
- Create an anonymous account with special powers called TheCultureCurators or something, and secretly give the login credentials to a small cadre of 3-12 people with good judgment and mutual faith in one another's good judgment. Give TheCultureCurators the ability to make upvotes and downvotes of arbitrary strength, or to add notes to any comment or post à la Google Docs, or to put a number on any comment or post that indicates what karma TheCultureCurators believe that post should have.
Rob Bensinger wants me to note that he agrees.
The first one would be costly and annoying to lots of people but also time boxed and super interesting. Training data is really good, and very pedagogically valuable.
The second one just seems low cost to everyone except the idealist, so if they're willing, great!
The third would be controversial and complicated, but for instance putting a number for what karma they think it should have wouldn't change the current voting system and add information and could be time boxed like the first one.
Mostly I appreciate just the generation of lots of ideas to give my brain more to chew on and a sense that bigger things are possible.
Also more generally I really resonate with "dear God, I need the other people around me to be good at this to be my best self."
I'm curious what you and others think of Raelfin's post about the karma system: https://www.lesswrong.com/posts/xN2sHnLupWe4Tn5we/improving-on-the-karma-system