Posts
Comments
That's great and all, but with all due respect:
Fuck. That. Noise.
Regardless of the odds of success and what the optimal course of action actually is, I would be very hard pressed to say that I'm trying to "help humanity die with dignity". Regardless of what the optimal action should be given that goal, on an emotional level, it's tantamount to giving up.
Before even getting into the cost/benefit of that attitude, in the worlds where we do make it out alive, I don't want to look back and see a version of me where that became my goal. I also don't think that if that was my goal, that I would fight nearly as hard to achieve it. I want a catgirl volcano lair not "dignity". So when I try to negotiate with my money brain to expend precious calories, the plan had better involve the former, not the latter. I suspect that something similar applies to others.
I don't want to hear about genre-saviness from the defacto-founder of the community that gave us HPMOR!Harry and the Comet King after he wrote this post. Because it's so antithetical to the attitude present in those characters and posts like this one.
I also don't want to hear about second-order effects when, as best as I can tell, the attitude present here is likely to push people towards ineffective doomerism, rather than actually dying with dignity.
So instead, I'm gonna think carefully about my next move, come up with a plan, blast some shonen anime OSTs, and get to work. Then, amongst all the counterfactual worlds, there will be a version of me that gets to look back and know that they faced the end of the world, rose to the challenge, and came out the other end having carved utopia out of the bones of lovecraftian gods.
I disagree to an extent. The examples provided seem to me to be examples of "being stupid" which agents generally have an incentive to do something about, unless they're too stupid for that to occur to them. That doesn't mean that their underling values will drift towards a basin of attraction.
The corrigibility thing is a basin of attraction specifically because a corrigible agent has preferences over itself and it's future preferences. Humans do that too sometimes, but the examples provided are not that.
In general, I think you should expect dynamic preferences (cycles, attractors, chaos, etc...) anytime an agent has preferences over it's own future preferences, and the capability to modify it's preferences.
If you have access to the episode rewards, you should be able to train an ensemble of NNs using bayes + MCMC, with the final reward as output and the entire episode as input. Maybe using something like this: http://people.ee.duke.edu/~lcarin/sgnht-4.pdf
This get's a lot more difficult if you're trying to directly learn behaviour from rewards or vise-versa because now you need to make assumptions to derive "P(behaviour | reward)" or "P(reward | behaviour)".
Edit: Pretty sure OAI used a reward ensemble in https://openai.com/blog/deep-reinforcement-learning-from-human-preferences/ to generate candidate pairs for further data collection.
From the paper "we sample a large number of pairs of trajectory segments of length k, use each reward predictor in our ensemble to predict which segment will be preferred from each pair, and then select those trajectories for which the predictions have the highest variance across ensemble members."
I'm not convinced. Especially if this sort of underpay is a common policy across multiple orgs across the rationalist and EA communities. In a closed system with 2 people a "fair" price will balance the opportunity cost to the person doing the work and the value both parties assign to the fence.
But this isn't a closed system. I expect that low balling pay has a whole host of higher order negative effects. Off the top of my head:
- This strategy is not scaleable. There's a limited pool of talent willing to take a pay cut because they value the output of their own work. There are probably better places to put that talent, and it's probably put to better use than on something like generic software engineering, which is essentially a commodity.
- Pay is closely associated with social status, and status influences the appeal of ideas and value systems. If working in an sector pays less then industry, then it will lose support on the margin.
- Future pay is a function current pay, individuals deciding to take a pay cut from industry rates are not only temporarily losing money, but are foregoing potentially very large sums over their careers.
- Orgs like lightconeinfrastructure compete for talent not just with other EA orgs, but with earning to give, which pays industry rates, comes with big wads of status, and the option to pocket the money if for whatever reason an individual decides to leave EA, which I would expect to create to an over-allocation of manpower to earning to give and an under-allocation to actual EA work.
- This line of reasoning creates perverse incentives. Essentially you end up paying people less the more they share your values, which given that people have malleable values systems, means that your incentivizing them to not share your values or lie about sharing your values.
I can also see some benefits of the policy, such as filtering and extra runway, but there are other arguably better ways of doing the former, and the latter isn't all that important if you can maintain a 30% yoy growth rate.
Our current salary policy is to pay rates competitive with industry salary minus 30%.
What was the reasoning behind this? To me this would make sense if there was a funding constraint, but I was under the impression that EA is flush with cash.
If the following are the stated stakes:
If things go right, we can shape almost the full light cone of humanity to be full of flourishing life. Billions of galaxies, billions of light years across, for some 10^36 (or so) years until the heat death of the universe.
Then I would strongly advise against low balling or cheaping-out when it comes to talent acquisition and retention.
Here's a legitimate application, buying PornHub Premium. https://news.bitcoin.com/pornhubs-premium-services-crypto-payments-13-digital-assets-supported/.
Online payment processors are an oligopoly and can at any moment revoke a businesses ability to receive online payment even if they're not breaking the law. Thus what business is and is not permissible online is entirely up to the whims of this oligopoly and the law. Crypto provides a way around this.
I'm liking where this story is going.
IMO 2020 wasn't a turning point, and Facebook is not special. The events that happend lately have been a predictable development in a steadily escalating trend toward censorship. I'll note that these censorship policies are widespread across every social media platform, and infact extend well beyond social media and apply to the entire infrastructure stack. Everything from DDoS protection services, to cloud service providers, to payment processors have all been getting more bold over the course of several years about pulling plugs on people saying the wrong things or providing platforms for others to say the wrong things. Here's how I think it went down:
1.From 2010-2020 Social media and other SV companies gained a tremendous amount of power by gaining control over social media networks.
2. By virtue of all being near each other, they formed a political monoculture/ingroup.
3. They found themselves capable of deplatforming anyone they disagreed with.
4. They started banning people, starting with the most deplorable and the outgroup and working their way up from there. This seems to have become especially noticeable sometime around 2015.
5. First the deplorables complained about this by making appeals to free speech, which made free speech low status.
5. Then the outgroup complained about this and made appeals to free speech, which made supporting free speech an outgroup identifier.
7. Everyone falls in line because otherwise they might get unpersoned if they're mistaken as a member of the outgroup or the deplorables by defending free speech.
8. The overton window of acceptable speech continues to shrink as opinions on the ever changing fringe continue to get silenced in a process that's not too different from the evaporative cooling of group beliefs.
I expect that the trend towards more censorship will continue unabated, especially on public social media platforms.
I wouldn't look too deeply into that. The selection process for moderators on reddit is essentially first come first serve + how good are you at convincing existing moderators you should join the team. As far as I can tell this process doesn't usually select for "good" moderation, especially once a sub gets big enough that network effects make a subreddit grow despite "bad" moderation. This applies for most values of "good" and "bad".