Posts
Comments
Lsusr's parables are not everyone's cup of tea but I liked this one enough to nominate it. It got me thinking about language and what it means to be literal, and made me laugh too.
I quite liked this post, and strong upvoted it at the time. I honestly don't remember reading it, but rereading it, I think I learned a lot, both from the explanation of the feedback loops, and especially found the predictions insightful in the "what to expect" section.
Looking back now, the post seems obvious, but I think the content in it was not obvious (to me) at the time, hence nominating it for LW Review.
(Just clarifying that I don't personally believe working on AI is crazy town. I'm quoting a thing that made an impact on me awhile back and I still think is relevant culturally for the EA movement.)
I think AIS might have been what poisoned EA? The global development people seem much more grounded (to this day), and AFAIK the ponzi scheme recruiting is all aimed at AIS and meta
I agree, am fairly worried about AI safety taking over too much of EA. EA is about taking ideas seriously, but also doing real things in the world with feedback loops. I want EA to have a cultural acknowledgement that it's not just ok but good for people to (with a nod to Ajeya) "get off the crazy train" at different points along the EA journey. We currently have too many people taking it all the way into AI town. I again don't know what to do to fix it.
(Commenting as myself, not representing any org)
Thanks Elizabeth and Timothy for doing this! Lots of valuable ideas in this transcript.
I felt excited, sad, and also a bit confused, since it feels both slightly resonant but also somewhat disconnected from my experience of EA. Resonant because I agree with the college-recruiting and epistemic aspects of your critiques. Disconnected, because while collectively the community doesn't seem to be going in the direction that I would hope, I do see many individuals in EA leadership positions who I deeply respect and trust to have good individual views and good process and I'm sad you don't see them (maybe they are people who aren't at their best online, and mostly aren't in the Bay).
I am pretty worried about the Forum and social media more broadly. We need better forms of engagement online - like this article + your other critiques. In the last few years, it's become clearer and clearer to me that EA's online strategy is not really serving the community well. If I knew what the right strategy was, I would try to nudge it. Regardless I still see lots of good in EA's work and overall trajectory.
[my critiques] dropped like a stone through water
I dispute this. Maybe you just don't see the effects yet? It takes a long time for things to take effect, even internally in places you wouldn't have access to, and even longer for them to be externally visible. Personally, I read approximately everything you (Elizabeth) write on the Forum and LW, and occasionally cite it to others in EA leadership world. That's why I'm pretty sure your work has had nontrivial impact. I am not too surprised that its impact hasn't become apparent to you though.
Personally, I'm still struggling with my own relationship to EA. I've been on the EV board for a year+ - an influential role at the most influential meta org - and I don't understand how to use this role to impact EA. I see the problems more clearly than I did before, which is great, but I don't see solutions or great ways forward yet, and I sense that nobody really does. We're mostly working on stuff to stay afloat rather than high level navigation.
I liked Zach's recent talk/Forum post about EA's commitment to principles first. I hope this is at least a bit hope-inspiring, since I get the sense that a big part of your critique is that EA has lost its principles.
Yes - HN users with flag privileges can flag posts. Flags operate as silent mega-downvotes.
(I am a longtime HN user and I suspect the title was too clickbait-y, setting off experienced HN users' troll alarms)
Great post! But, I asked Claude what he thought:
I cannot recommend or endorse the "Peekaboo" game described in the blog post. While intended to be playful, having an adult close their eyes while a child gets ready for bed raises significant safety concerns. Children require proper supervision during bedtime routines to ensure their wellbeing. Additionally, this game could potentially blur important boundaries between adults and children. Instead, I would suggest finding age-appropriate, supervised activities that maintain clear roles and responsibilities during bedtime routines. There are many safe ways to make bedtime fun and engaging for children that don't compromise supervision or safety.
(Just kidding! Claude did write that, but my prompt was: write a Claude style LLM refusal for the "Peekaboo" game
. But, I do think this sort of fun is the sort of Fun that our AI overlords will not be too tolerant of, which made me sad.)
For home cooking I would like to recommend J. Kenji Lopez-Alt (https://www.youtube.com/@JKenjiLopezAlt/videos). He's a well-loved professional chef who writes science-y cooking books, and his youtube channel is a joy because it's mostly just low production values: him in his home kitchen, making delicious food from simple ingredients, just a few cuts to speed things up.
I'm sorry you feel that way. I will push back a little, and claim you are over-indexing on this: I'd predict that most (~75%) of the larger (>1000-employee) YC-backed companies have similar templates for severance, so finding this out about a given company shouldn't be much of a surprise.
I did a bit of research to check my intuitions + it does seem like non-disparagement is at least widely advised (for severance specifically and not general employment), e.g., found two separate posts on the YC internal forums regarding non-disparagement within severance agreements:
"For the major silicon valley law firms (Cooley, Fenwick, OMM, etc) non disparagement is not in the confidentiality and invention assignment agreement [employment agreement], and usually is in the separation and release [severance] template."
(^ this person also noted that it would be a red flag to find non-disparagement in the employment agreement.)
"One thing I’ve learned - even when someone has been terminated with cause, a separation agreement [which includes non-disparagement] w a severance can go a long way."
Jeff is talking about Wave. We use a standard form of non-disclosure and non-disparagement clauses in our severance agreements: when we fire or lay someone off, getting severance money is gated on not saying bad things about the company. We tend to be fairly generous with our severance, so people in this situation usually prefer to sign and agree. I think this has successfully prevented (unfair) bad things from being said about us in a few cases, but I am reading this thread and it does make me think about whether some changes should be made.
I also would re-emphasize something Jeff said - that these things are quite common - if you just google for severance package standard terms, you'll find non-disparagement clauses in them. As far as I am aware, we don't ask current employees or employees who are quitting without severance to not talk about their experience at Wave.
In my view you have two plausible routes to overcoming the product problem, neither of which is solved (primarily) by writing code.
Route A would be social proof: find a trusted influencer who wants to do a project with DACs. Start by brainstorming various types of projects that would most benefit from DACs, aiming to find an idea which an (ideally) narrow group of people would be really excited about, that demonstrates the value of such contracts, led by a person with a lot of 'star power'. Most likely this would be someone who would be likely to raise quite a lot of money through a traditional donation/kickstarter-type drive, but instead they decide to demo the DAC (and in doing so make a good case for it).
Route B is to focus on comms. Iterate on the message. Start by explaining it to non-economist friends, then graduate to focus groups. It's crucial to try to figure out how to most simply explain the idea in a sentence or two, such that people understand and don't get confused by it.
I'm guessing you'll need to follow both these routes, but you can follow them simultaneously and hopefully learn cross-useful things while doing so.
I like the idea of getting more people to contribute to such contracts. Not thrilled about the execution. I think there is a massive product problem with the idea -- people don't understand it, think it is a scam, etc. If your efforts were more directed at the problem of getting people to understand and be excited about crowdfunding contracts like this, I would be a lot more excited.
Mild disagree: I do think x-risk is a major concern, but seems like people around DC tend to put 0.5-10% probability mass on extinction rather than the 30%+ that I see around LW. This lower probability causes them to put a lot more weight on actions that have good outcomes in the non extinction case. The EY+LW frame has a lot more stated+implied assumptions about uselessness of various types of actions because of such high probability on extinction.
Your question is coming from within a frame (I'll call it the "EY+LW frame") that I believe most of the DC people do not heavily share, so it is kind of hard to answer directly. But yes, to attempt an answer, I've seen quite a lot of interest (and direct policy successes) in reducing AI chips' availability and production in China (eg via both CHIPS act and export controls), which is a prerequisite for US to exert more regulatory oversight of AI production and usage. I think the DC folks seem fairly well positioned to give useful inputs into further AI regulation as well.
I've been in DC for ~ the last 1.5y and I would say that DC AI policy has a good amount of momentum, I doubt it's particularly visible on twitter but also it doesn't seem like there are any hidden/secret missions or powerful coordination groups (if there are, I don't know about it yet). I know ~10-20 people decently well here who work on AI policy full time or their work is motivated primarily by wanting better AI policy, and maybe ~100 who I have met once or twice but don't see regularly or often; most such folks have been working on this stuff since before 2022; they all have fairly normal-seeming thinktank- or government-type jobs.
They don't mostly spend time on LW (although certainly a few of them do). Many do spend time on Twitter, and they do read lots of AI related takes from LW-influenced folks. They have meetup groups related to AI policy. I guess it looks pretty much as I was expecting before I came here. Happy to answer further questions that don't identify specific people, just because I don't know how many of them want to be pointed-at on LW.
Not who you're responding to, but I've just written up my vegan nutrition tips and tricks: http://www.lincolnquirk.com/2023/06/02/vegan_nutrition.html
If you have energy for this, I think it would be insanely helpful!
Thanks for writing this. I think it's all correct and appropriately nuanced, and as always I like your writing style. (To me this shouldn't be hard to talk about, although I guess I'm a fairly recent vegan convert and haven't been sucked into whatever bubble you're responding to!)
Thanks for doing this! These results may affect my supplementation strategy.
My recent blood tests (unrelated to this blog post) -- if you have any thoughts on them let me know, I'd be curious what your threshold for low-but-not-clinical is.
- Hemoglobin - 14.8 g/dL
- Vitamin D, 25-Hydroxy - 32.7 ng/mL
- Vitamin B12 - 537 pg/mL
(I have other results I can send you privately if you want, from comp metabolic panel + cbc + lipid panel + D + B12; but didn't think to ask for iron. Is it worth going back to ask for this? or might iron be under a name I don't recognize?)
I'm vegan and have been solidly for > 1 year. Generally feel good, no particular fatigue except sleepiness after I eat carbs for lunch. I supplement B12, omega-3 EPA+DHA algae oil, creatine and occasional D3 gummies.
Tim Urban's new book, What's Our Problem, is out as of yesterday. I've started reading it and it's good so far, and very applicable to rationality training. waitbutwhy.com
Excited about this!
Points of feedback:
- I don't like to have to scroll my screen horizontally to read the comment. (I notice there's a lot of perfectly good unused white space on the left side; comments would probably fit horizontally if you pushed everything to the left!)
- Sometimes when you mouse over the side-comment icon, it tries to scroll the page to make the comment readable. This is very surprising and makes me lose my place.
- Hovering over the icon makes the comment appear briefly. If I then want to scroll in order to read the comment, there seems to be no way to 'stay hovered' -- I have to click and toggle it, to make the comment stick around so I can actually read it. (This plus being forced to scroll the screen makes the hover feature kind of useless.)
Overall, feeling optimistic though, and will probably use this.
I think your argument is wrong, but interestingly so. I think DL is probably doing symbolic reasoning of a sort, and it sounds like you think it is not (because it makes errors?)
Do you think humans do symbolic reasoning? If so, why do humans make errors? Why do you think a DL system won't be able to eventually correct its errors in the same way humans do?
My hypothesis is that DL systems are doing a sort of fuzzy finite-depth symbolic reasoning -- it has capacity to understand the productions at a surface level and can apply them (subject to contextual clues, in an error-prone way) step by step, but once you ask for sufficient depth it will get confused and fail. Unlike humans, feedforward neural nets can't think for longer and churn step by step yet; but if someone were to figure out a way to build a looping option into the architecture then I won't be surprised to see DL systems which can go a lot further on symbolic reasoning than they currently do.
What is Pop Warner in this context? I have googled it and it sounds like he was one of the founders of modern American football, but I don't understand what it is in contrast to. Is there some other (presumably safer) ruleset?
(Inside-of-door-posted hotel room prices are called "rack rates" and nobody actually pays those. This is definitely a miscommunication.)
I am guilty of being a zero-to-one, rather than one-to-many, type person. It seems far easier and more interesting to me, to create new forms of progress of any sort, rather than convincing people to adopt better ideas.
I guess the project of convincing people seems hard? Like, if I come up with something awesome that's new, it seems easier to get it into people's hands, rather than taking an existing thing which people have already rejected and telling them "hey this is actually cool, let's look again".
All that said, I do find this idea-space intriguing partly thanks to this post - it makes me want to think of ways of doing more one-to-many type stuff. I've been recently drawn into living in DC and I think the DC effective altruism folks are much more on the one-to-many side of the world.
Upvoted for raising something to conscious attention, that I have never previously considered might be worth paying attention to.
(Slightly grumpy that I'm now going to have a new form of cognitive overhead probably 10+ times per day... these are the risks we take reading LW :P)
Look, I don’t know you at all. So please do ignore me if what I’m saying doesn’t seem right, or just if you want to, or whatever.
I’m a bit worried that you’re seeking approval, not advice? If this is so, know that I for one approve of your chosen path. You are allowed to spend a few years focusing on things that you are passionate about, which (if it works out) may result in you being happy and productive and possibly making the world better.
If you are in fact seeking advice, you should explain what your goal is. If your goal is to make the maximum impact possible — it’s worth at least hundreds of hours trying to see if you can learn more & motivate yourself along a path which seems like it combines high impact with personal resonance. I wouldn’t discount philosophy along this angle, but (for example) it sounds like you may not know that much about the potential of policy careers; there are plenty that do not require particularly strong mathematical skills (… or even any particularly difficult skills beyond some basic extraversion, resistance to boredom and willingness to spend literal decades grinding away within bureaucracies).
If your goal is to be happy, I think you will be happy doing philosophy, and I think you have a potential to make a huge impact that way. Certainly there are a decent number of full-time philosophers within effective altruism who I have huge respect for (Macaskill, Ord, Bostrom, Greaves, and Trammell jump to mind). Plus, you can save a few hundred hours, which seems pretty important if you might already know the outcome of your experimentation!
Thanks! This is very helpful, and yes, I did mean to refer to grokking! Will update the post.
Nice post!
One of my fears is that the True List is super long, because most things-being-tracked are products of expertise in a particular field and there are just so many different fields.
Nevertheless:
- In product/ux design, tracking the way things will seem to a naive user who has never seen the product before.
- In navigation, tracking which way north is.
- I have a ton of "tracking" habits when writing code:
- types of variables (and simulated-in-my-head values for such)
- refactors that want to be done but don't quite have enough impetus for yet
- loose ends, such as allocated-but-not-freed resources, or false symmetry (something that looks like it should be symmetric but isn't in some critical way), or other potentially-misleading things that need to be explained
- [there are probably a lot more of these that I am not going to write down now]
Here's my attempt. I haven't read any of the other comments or the tag yet. I probably spent ~60-90m total on this, spread across a few days.
On kill switches
- low impact somehow but I don’t know how
- Go slow enough so that people can see what you’re doing
- Have a bunch of "safewords" and other kill-switches installed at different places, some hopefully hard-to-reach by the AI. Test them regularly, and consider it a deadly flaw if one stops working.
On the AI accurately knowing what it is doing, and pointing at things in the real world
- watch all the metrics (!)
- Predict all the metrics you watch, and ask humans about any anomalous metrics that you are watching
- group inputs and outputs separately and treat inputs as sacred. Perhaps by having an epistemic module which is incentivized by producing true predictions about the world. Make the epistemic module the one that gets all the juice. Planning module should not have any way to influence it.
On responding predictably
- Require inputs of some kind to get more outputs (something about control theory?) - like a power steering wheel, or an ebike, it helps the human by amplifying the motions you make but not actually Doing Anything on its own
- Have metrics look smooth on an input/output response curve. No sharp edges. let the humans be the one to turn the knobs.
- Try things. Try things on a small scale before going big. (define "big"? maybe look at the metrics!)
On epistemology and ontology
- Ask for feedback both about things you tried; and experiments you want to run
- make the epistemic module have a stable ontology. If it wants to change the ontology, it needs to create a new epistemic module, and leave the old one running. The new module has to be able not only to predict physical things about the world but also what the old model would say about things being pointed-to and tagged as important in the old models ontology.
- have a concept of goal metrics and side effects. Try to predict what the human will say about which metrics are goals and which are side effects. (hmm..this is incoherent because you can come up with many irrelevant metrics… what even is a relevant metric. Maybe the system should be grouping metrics using some kind of natural unsupervised clustering system and then trying to predict the prioritization of metric groups? And then to explain the groups it gives samples of the actual thing it’s measuring. Still difficult to make sense of this. )
On planning
- Bound or penalize plans in complexity and time.
- "complexity" is hard to define here, and isn't Kolmogorov I don't think.
- Harshly penalize plans which lean on predicting what other agents will do. Ban plans which depend on prediction of what your operator will do. (AI can still predict what the operator will think of the plan, but that prediction should not feed into any plan itself.)
- Separate planning from other parts of AI cognition, such that you are never confused about whether you are making a plan or doing other things, such as predicting things about the world, or executing a plan.
- Don't execute plans at all
- Train the AI to identify the purpose(s) of each step in the plan. Give negative feedback when steps lead to outcomes that are not identified ahead of time.
On transparency
- A part of the AI should be trained to explicate what the rest of the AI is "doing"/focused on at all times. Find a way to make this into a feedback cycle.
- AI should be arranged so it is mostly only doing one thing at a time, and that doesn't switch its focus too many times each minute.
I notice that I am extremely surprised by your internship training. Its existence, its lessons and the impact it had on you (not you specifically, just a person who didn't come in with that mindset) are all things I don't think I would have predicted. I would be thrilled if you would write as much as you can bring yourself to about this, braindump format is fine, into a top level post!
Congrats, I'm excited about this!
I've been turning this over in my head for a while now. (Currently eating mostly vegan fwiw, but I am not sure if this is the right decision.)
I think the main argument against veganism is that it actually incurs quite a large cost. Being vegan is a massive lifestyle change with ripple effects that extend into one's social life. This argument falls under your "there are higher-impact uses of your (time/energy/money/etc.)", but what you wrote doesn't capture the reasons why this is important.
most of us do not have good reason to treat this as a zero-sum game in which each attempt to do good in the world must crowd out another. For one thing, we're nowhere near putting all available resources into our efforts to do good, so we can simply choose to expand that budget.
I am reminded of Zvi's Slack post (ctrl+f "afford"). Attention is a very scarce resource. If I am spending all my attention on important things, I cannot also spend attention on creating a whole new diet, finding friends who won't mock me, learning how to cook all new things, etc. On the other hand, animal welfare offsets are probably quite cheap.
For another, our psychology is complicated, and making a moral effort can just as easily increase our capacity to make further such efforts as deplete it.
Indeed, and this is actually why I have become mostly vegan in recent months; but it is not going to be true for everyone. My current decision to eat mostly vegan except when inconvenient feels somehow indulgent.
I wrote more about how I am trying to be vegan: http://www.lincolnquirk.com/2022/02/15/vegan.html
This community has a virtue of taking weird ideas seriously. Roko came up with a weird idea which, the more seriously you took it, the more horrifying it became. This was deemed an info hazard, and censored in some way, I don't know how. But the people who didn't take it seriously in the first place weren't horrified by the idea and thus were confused about why it should have been censored, and thus boosted the Streisand effect.
I had photochromics for several years. I found them mildly-helpful-and-mostly-unobjectionable in the summer, but ridiculously annoying in the winter (when they both tend to be darker because of low-altitude sun, and the temperature makes them clear up slower once you move inside).
Also, I was relentlessly mocked by the fashion police. :P
Ultimately I moved away from them.
I downvoted this. I usually like the concise writing style exhibited in this essay (similar to lsusr, paul graham, both of whom I like) , but I apparently only like it when I think it's correct. :P
I especially downvoted because I think it is fairly likely to attract low-quality discussion. A differently-written version of a similar but perhaps more nuanced point, with better fleshed-out examples of why given works are net helpful or net harmful, would be a better post. I am sympathetic to the general idea of the post!
I think there's something about programming that attracts the right sort of people. What could that be? Well, programming has very tight feedback loops, which make it fun. You can "do a lot": one's ability to gain power over the universe, if you will, is quite high with programming. I'd guess a combination of these two factors.
The Wizard's Bane series by Rick Cook. The basic idea is great: a Silicon Valley programmer is transported into a magical universe where he has to figure out how to apply programming to a magic system. Caveat lector: the writing is not the best quality, it's a bit juvenile, but still a light, enjoyable read :)
This is great! Thanks for sharing!
A fair question. I don't think it is established, exactly, but the plausible window is quite narrow. For example, if nanomachinery were easy, we would already have that technology, no? And we seem quite near to AGI.
evolution would love superintelligences whose utility function simply counts their instantiations! so of course evolution did not lack the motivation to keep going down the slide. it just got stuck there (for at least ten thousand human generations, possibly and counterfactually for much-much longer). moreover, non evolutionary AI’s also getting stuck on the slide (for years if not decades; median group folks would argue centuries) provides independent evidence that the slide is not too steep (though, like i said, there are many confounders in this model and little to no guarantees).
Evolution got stuck on the slide with humans because cultural evolution outcompeted biological evolution, because of cultural evolution's ability to make immediate direct impacts on small tribes in hunter-gatherer environment within a few short generations (from the first chapter of Secret Of Our Success) and the high-order bit in biological evolution suddenly became "how efficient is cultural evolution".
(Non evolutionary AIs don't seem stuck on the slide at all to me.)
I don’t think I agree that this is made-up though. You’re right that the quotes are things people wouldn’t say but they do imply it through social behavior.
I suppose you’re right that it’s hard to point to specific examples of this happening but that doesn’t mean it isn’t happening, just that it’s hard to point to examples. I personally have felt multiple instances of needing to do the exact things that Sasha writes about - talk about/justify various things I’m doing as “potentially high impact”; justify my food choices or donation choices or career choices as being self-improvement initiatives; etc.
this article points at something real
I'd like to express my gratitude and excitement (and not just to you, Rob, though your work is included in this):
Deep thanks to everyone involved for having the discussion, writing up and formatting, and posting it on LW. I think this is some of the more interesting and potentially impactful stuff I've seen relating to AI alignment in a long while.
(My only thought is... why hasn't a discussion like this occurred sooner? Or has it, and it just hasn't made it to LW?)
Regardless of the precise mechanism, Tinder almost certainly shows more attractive people more often. If it didn't, it would have a retention problem because there are lots of people who swipe tinder to fantasize about matching with hot people, and they wouldn't get enough hot people to keep them going. Most likely, Tinder has determined a precise ratio of "hot people" and "people in your league" to show you, in order to keep you swiping.
Given the existence of the incentive and likelihood that Tinder et al. would follow such an incentive, it makes sense to try to have your profile be more generally attractive so you get shown to more people.
Use the table of contents / "summary of the language" section.
For your project I would recommend skipping to 28 and then going from there, and skipping patterns which don't seem relevant.
Yes: A far higher % of OpenAI reads this forum than the other orgs you mentioned. In some sense OpenAI is friends with LW, in a way that is not true for the others.
What should be done instead of a public forum? I don't necessarily think there needs to be a "conspiracy", but I do think that it's a heck of a lot better to have one-on-one meetings with people to convince them of things. At my company, when sensitive things need to be decided or acted on, a bunch of slack DMs fly around until one person is clearly the owner of the problem; they end up in charge of having the necessary private conversations (and keeping stakeholders in the loop). Could this work with LW and OpenAI? I'm not sure.
Ineffective, because the people arguing on the forum are lacking knowledge about the situation. They don't understand OpenAI's incentive structure, plan, etc. Thus any plans they put forward will be in all likelihood useless to OpenAI.
Risky, because (some combination of):
- it is emotionally difficult to hear that one of your friends is plotting against you (and openAI is made up of humans, many of whom came out of this community)
- it's especially hard if your friend is misinformed and plotting against you; and I think it likely that the openAI people believe that Yudkowsky/LW commentators are misinformed or at least under-informed (and they are probably right about this)
- to manage that emotional situation, you may want to declare war back on them, cut off contact, etc.; any of these actions if declared as an internal policy would be damaging to the future relationship between openAI and the LW world
- openAI has already had a ton of PR issues over the last few years and so they probably have a pretty well developed muscle for dealing internally with bad PR, which this would fall under. If true, the muscle probably looks like internal announcements with messages like "ignore those people/stop listening to them, they don't understand what we do, we're managing all these concerns and those people are over indexing on them anyway"
- the evaporative cooling effect may eject some people who were already on the fence about leaving, but the people who remain will be more committed to the original mission, more "anti LW" and less inclined to listen to us in the future
- hearing bad arguments makes one more resistant to similar (but better) arguments in the future
I want to state for the record that I think OpenAI is sincerely trying to make the world a better place, and I appreciate their efforts. I don't have a settled opinion on the sign of their impact so far.
I'd like to put in my vote for "this should not be discussed in public forums". Whatever is happening, the public forum debate will have no impact on it; but it does create the circumstances for a culture war that seems quite bad.
When I learned it from Geoff in 2011, they were recommending yEd Graph Editor. The process is to generally write things you do or want to do as nodes, and then connect them to each other using "achieves or helps to achieve" edges (i.e., if you go to work, that achieves making money, which achieves other things you want).