Posts
Comments
Not from OpenAI but the language sounds like this could be the board protecting themselves against securities fraud committed by Altman.
I am confused about the opening of your analysis:
In some sense, this idea solves basically none of the core problems of alignment. We still need a good-enough model of a human and a good-enough pointer to human values.
It seems to me that while the fixed point conception here doesn't uniquely determine a learning strategy, it should be possible to uniquely determine that strategy by building it into the training data.
In particular, if you have a base level of "reality" like the P_0 you describe, then it should be possible to train a model first on this reality, then present it with training scenarios that start by working directly on the "verifiable reality" subset, then build to "one layer removed" and so on.
My (very weak) shoulder-John says that just because this "feels like it converges" doesn't actually make any guarantees about convergence, but since P_0, P_1, etc. are very well specified it feels like a more approachable problem to try to analyze a specific basis of convergence. If one gets a basis of convergence, AND an algorithm for locating that basis of convergence, that seems to me sufficient for object-level honesty, which would be a major result.
I'm curious if you disagree with:
- The problem of choosing a basis of convergence is tractable (relative to alignment research in general)
- The problem of verifying that AI is in the basis of convergence is tractable
- Training an AI into a chosen basis of convergence could enforce that AI to be honest on the object level when object level honesty is available
- Object level honesty is not a major result, for example because not enough important problems can be reduced to object level or because it is already achievable
Writing that out, I am guessing that 2 may be a disagreement that I still disagree with (e.g. you may think it is not tractable), and 3 may contain a disagreement that is compelling and hard to resolve (e.g. you may think we cannot verify which basis of convergence satisfies our honesty criteria--my intuition is that this would require not having a basis of convergence at all).
My issue isn't with the complexity of a Turing machine, it's with the term "accessible." Universal search may execute every Turing machine, but it also takes adds more than exponential complexity time to do so.
In particular because if there are infinitely many schelling points in the manipulation universe to be manipulated and referenced, then this requires all of that computation to causally precede the simplest such schelling point for any answer that needs to be manipulated!
It's not clear to me what it actually means for there to exist a schelling point in the manipulation universe that would be used by Solomonoff Induction to get an answer, but my confusion isn't about (arbitrarily powerful computer) or (schelling point) on their own, it's about how much computation you can do before each schelling point, while still maintaining the minimality criteria for induction to be manipulated.
I'm confused by your intuition that team manipulation's universe has similar complexity to ours.
My prior is that scaling the size of (accessible) things in a universe also requires scaling the complexity of the universe in a not-bounded way, probably even a super-linear way, such that fully specifying "infinite computing power" or more concretely "sufficient computing power to simulate universes of complexity <=X for time horizons <=Y" requires complexity f(x,y) which is unbounded in x,y, and therefore falls apart completely as a practical solution (since our universe is at age 10^62 planck intervals) unless f(x,y) is ~O(log(y)), whereas using a pure counting method (e.g. the description simply counts how many universe states can be simulated) gives O(exp(y)).
Since my intuition gives the complexity of Team Manipulation's raw universe at >10^(10^62), I'm curious what your intuition is that makes it clearly less than that of Team Science. There are approximately 10^185 Planck volumes in our observable universe so it takes only a few hundred bits to specify a specific instance of something inside a universe, plus a hundred or so to specify the Planck timestamp. In particular, this suggests that the third branch of Team Science is pretty small relative to the 10^8 specification of an observer architecture, not overwhelmingly larger.
Many people commute to work in businesses in San Francisco who don't live there. I would expect GDP per capita to be misleading in such cases for some purposes.
Broadening to the San Francisco-San Jose area, there are 9,714,023 people with a GDP of $1,101,153,397,000/year, giving a GDP/capita estimate of $113,357. I know enough people who commute between Sunnyvale and San Francisco or even further that I'd expect this to be 'more accurate' in some sense, though obviously it's only slightly lower than your first figure and still absurdly high.
But the city of San Francisco likely has a much smaller tax base than its putative GDP/capita would suggest, so provision of city based public services may be more difficult to manage.
tl;dr: if models unpredictably undergo rapid logistic improvement, we should expect zero correlation in aggregate.
If models unpredictably undergo SLOW logistic improvement, we should expect positive correlation. This also means getting more fine-grained data should give different correlations.
To condense and steelman the original comment slightly:
Imagine that learning curves all look like logistic curves. The following points are unpredictable:
- How big of a model is necessary to enter the upward slope.
- How big of a model is necessary to reach the plateau.
- How good of performance the plateau gives.
Would this result in zero correlation between model jumps?
So each model is in one of the following states:
- floundering randomly
- learning fast
- at performance plateau
Then the possible transitions (small -> 7B -> 280B) are:
1->1->1 : slight negative correlation due to regression to the mean
1->1->2: zero correlation since first change is random, second is always positive
1->1->3: zero correlation as above
1->2->2: positive correlation as the model is improving during both transitions
1->2->3: positive correlation as the model improves during both transitions
1->3->3: zero correlation, as the model is improving in the first transition and random in the second
2->2->2: positive correlation
2->2->3: positive correlation
2->3->3: zero correlation
3->3->3: slight negative correlation due to regression to the mean
That's two cases of slight negative correlation, four cases of zero correlation, and four cases of positive correlation.
However positive correlation only happens if the middle state is state 2, so only if the 7B model does meaningfully better than the small model, AND is not already saturated.
If the logistic jump is slow (takes >3 OOM) AND we are able to reach it with the 7B model for many tasks, then we would expect to see positive correlation.
However if we assume that
- Size of model necessary to enter the upward slope is unpredictable
- Size of a model able to saturate performance is rarely >100x models that start to learn
- The saturated performance level is unpredictable
Then we will rarely see a 2->2 transition, which means the actual possibilities are:
Two cases of slight negative correlation
Four cases of zero correlation
One case of positive correlation (1->2->3, which should be less common as it requires 'hitting the target' of state 2)
Which should average out to around zero or very small positive correlation, as observed.
However, more precise data with smaller model size differences would be able to find patterns much more effectively, as you could establish which of the transition cases you were in.
However again, this model still leaves progress basically "unpredictable" if you aren't actively involved in the model production, since if you only see the public updates you don't have the more precise data that could find the correlations.
This seems like evidence for 'fast takeoff' style arguments--since we observe zero correlation, if the logistic form holds, that suggests that ability to do a task at all is very near in cost to ability to do a task as well as possible.
Seconded. AI is good at approximate answers, and bad at failing gracefully. This makes it very hard to apply to some problems, or requires specialized knowledge/implementation that there isn't enough expertise or time for.
Based on my own experience and the experience of others I know, I think knowledge starts to become taut rather quickly - I’d say at an annual income level in the low hundred thousands.
I really appreciate this specific calling out of the audience for this post. It may be limiting, but it is also likely limiting to an audience with a strong overlap with LW readership.
Everything money can buy is “cheap”, because money is "cheap".
I feel like there's a catch-22 here, in that there are many problems that probably could be solved with money, but I don't know how to solve them with money--at least not efficiently. As a very mundane example, I know I could reduce my chance of ankle injury during sports by spending more money on shoes. But I don't know which shoes will actually be cost-efficient for this, and the last time I bought shoes I stopped using two different pairs after just a couple months.
Unfortunately I think that's too broad of a topic to cover and I'm digressing.
Overall coming back to this I'm realizing that I don't actually have any way to act on this piece. even though I am in the intended audience, and I have been making a specific effort in my life to treat money as cheap and plentiful, I am not seeing:
- Advice on which subjects are likely to pay dividends, or why
- Advice on how to recover larger amounts of time or effort by spending money more efficiently
- Discussion of when those tradeoffs would be useful
This seems especially silly not to have given, for example, Zvi's Covid posts, which are a pretty clear modern day example of the Louis XV smallpox problem.
I would be interested in seeing someone work through how it is that people on LW ended up trusting Zvi's posts and how that knowledge was built. But I would expect that to turn into social group dynamics and analysis of scientific reasoning, and I'm not sure that I see where the idea of money's abundancy would even come into it.
I think this post does a good job of focusing on a stumbling block that many people encounter when trying to do something difficult. Since the stumbling block is about explicitly causing yourself pain, to the extent that this is a common problem and that the post can help avoid it, that's a very high return prospect.
I appreciate the list of quotes and anecdotes early in the post; it's hard for me to imagine what sort of empirical references someone could make to verify whether or not this is a problem. Well known quotes and a long list of anecdotes is a substitute, though not a perfect substitute.
That said, the "Antidotes" section could easily contain some citations. for example:
If your wrists ache on the bench press, you're probably using bad form and/or too much weight. If your feet ache from running, you might need sneakers with better arch support. If you're consistently sore for days after exercising, you should learn to stretch properly and check your nutrition.
Such rules are well-established in the setting of physical exercise[...]
There are 4 claims being made here, but if the rules really are well established, shouldn't it be easy to find citations for them?
I don't doubt those claims, but the following claims:
If reading a math paper is actively unpleasant, you should find a better-written paper or learn some background material first (most likely both). If you study or work late into the night and it disrupts your Circadian rhythm, you're trading off long-term productivity and well-being for low-quality work.
I'm more skeptical of. In many cases there is only one definitive paper on a subject in math research. Often it's a poorly written paper, but there may not be a better writeup of the results (at least for modern research results). Studying late into the night could disrupt one person's Circadian rhythm, but it could be a way for someone else to actually access their productive hours, instead of wasting effort waking up early in the morning.
These aren't criticisms of the core point of the post, but they are places where the focus on examples without citations I think move away from the core point and could be taken out of context.
The comments outline a number of issues with some of the framing and antidote points, and I think the post would be better served by making a clearer line about the distinction between "measuring pain is not a good way to measure effort" and "painful actions can be importantly instrumental."
I can imagine an experiment in which two teams are asked to accomplish a task and asked to focus on remembering either "no pain no gain" or "pain is not the unit of effort" and consider what happens to their results, but whether one piece of advice is better on the marginal seems likely to be very personal and I don't know that I'd expect to get very interesting results from such an experiment.
Your model of supporters of farm animal welfare seems super wrong to me.
I would predict that actually supporters of the law will be more unhappy the more effect it has on the actual market, because that reveals info about how bad conditions are for farm animals. In particular if it means shifting pork distribution elsewhere, that means less reduction in pig torture and also fewer options to shift consumption patterns toward more humanely raised meat on the margins.
Those costs can be worth paying, if you still expect some reduction in pig torture, but obviously writing laws to be better defined and easier to measure would be a further improvement.
70% compute, 30% algo (give or take 10 percentage points) over the last 25 years. Without serious experiments, have a look at the Stockfish evolution at constant compute. That's a gain of +700 ELO points over ~8 years (on the high side, historically). For comparison, you gain ~70 ELO per double compute. Over 8 years one has on average gained ~400x compute, yielding +375 ELO. That's 700:375 ELO for compute:algo
Isn't that 70:30 algo:compute?
I'm curious about what the state of evidence around long covid is now, and especially how protective vaccines are against it. I imagine there still isn't much data about it yet though.
A friend of mine on Facebook notes that the instances of blood clots in Germany were concerning because in Germany mostly young health care workers are getting vaccinated, where it's both more possible to distinguish small numbers of blood clots from chance and more concerning to see extreme side effects.
The rate is still low enough that pausing vaccination is (obviously) a dangerous move, but dismissing the case that blood clots may be caused by the vaccine isn't a fair assessment of the evidence, and that may be important in maybe two years when supply of non-AZ vaccines is no longer a limit for the world.
Do you have any thoughts on what you'd do differently to be more personally confident doing this again?
Maybe but the US number lines up with 1% of the population lines up with the top 1% figure; if people outside the US are ~50x as likely to be top-1% at various hobbies that's a bold statement that needs justification, not an obvious rule of thumb!
Or it could be across all time, which lines up with ~100 billion humans in history.
I think "a billion people in the world" is wrong here--it should only be about 75 million by pure multiplication.
I see, I definitely didn't read that closely enough.
Looks like the initial question was here and a result around it was posted here. At a glance I don't see the comments with counterexamples, and I do see a post with a formal result, which seems like a direct contradiction to what you're saying, though I'll look in more detail.
Coming back to the scaling question, I think I agree that multiplicative scaling over the whole model size is obviously wrong. To be more precise, if there's something like a Q-learning inner optimizer for two tasks, then you need the cross product of the state spaces, so the size of the Q-space could scale close-to-multiplicatively. But the model that condenses the full state space into the Q-space scales additively, and in general I'd expect the model part to be much bigger--like the Q-space has 100 dimensions and the model has 1 billion parameters, so going adding a second model of 1 billion parameters and increasing the Q-space to 10k dimensions is mostly additive in practice, even if it's also multiplicative in a technical sense.
I'm going to update my probability that "GPT-3 can solve X, Y implies GPT-3 can solve X+Y," and take a closer look at the comments on the linked posts. This also makes me think that it might make sense to try to find simpler problems, even already-mostly-solved problems like Chess or algebra, and try to use this process to solve them with GPT-2, to build up the architecture and search for possible safety issues in the process.
I'm replying on my phone right now because I can't stop thinking about it. I will try to remember to follow up when I can type more easily.
I think the vague shape of what I think I disagree about is how dense GPT-3's sets of implicit knowledge are.
I do think we agree that GPT-5000 will be broadly superhuman, even if it just has a grab bag of models in this way, for approximately the reasons you give.
I'm thinking about "intelligent behavior" as something like the set of real numbers, and "human behavior" as covering something like rational numbers, so we can get very close to most real numbers but it takes some effort to fill in the decimal expansion. Then I'm thinking of GPT-N as being something like integers+1/N. As N increases, this becomes close enough to the rational numbers to approximate real numbers, and can be very good at approximating some real numbers, but can't give you incomputable numbers (unaligned outcomes) and usually won't give you duplicitous behavior (numbers that look very simple at first approximation but actually aren't, like .2500000000000004, which seems to be 1/4 but secretly isn't). I'm not sure where that intuition comes from but I do think I endorse it with moderate confidence.
Basically I think for minimal circuit reasons that if "useful narrowly" emerges in GPT-N, then "useful in that same domain but capable of intentionally doing a treacherous turn" emerges later. My intuition is that this won't be until GPT-(N+3) or more, so if you are able to get past unintentional turns like "the next commenter gives bad advice" traps, this alignment work is very safe, and important to do as fast as possible (because attempting it later is dangerous!)
In a world where GPT-(N+1) can do a treacherous turn, this is very dangerous, because you might accidentally forget to check if GPT-(N-1) can do it, and get the treacherous turn.
My guess is that you would agree that "minimal circuit that gives good advice" is smaller than "circuit that gives good advice but will later betray you", and therefore there exist two model sizes where one is dangerous and one is safe but useful. I know I saw posts on this a while back, so there may be relevant math about what that gap might be, or it might be unproven but with some heuristics of what the best result probably is.
My intuition is that combining narrow models is multiplicative, so that adding a social manipulation model will always add an order of magnitude of complexity. My guess is that you don't share this intuition. You may think of model combination as additive, in which case any model bigger than a model that can betray you is very dangerous, or you might think the minimal circuit for betrayal is not very large, or you might think that GPT-2-nice would be able to give good advice in many ways so GPT-3 is already big enough to contain good advice plus betrayal in many ways.
In particular if combining models is multiplicative in complexity, a model could easily learn two different skills at the same time, while being many orders of magnitude away from being able to use those skills together.
I think this is obscuring (my perception of) the disagreement a little bit.
I think what I'm saying is, GPT-3 probably doesn't have any general truth+noise models. But I would expect it to copy a truth+noise model from people, when the underlying model is simple.
I then expect GPT-3 to "secretly" have something like an interesting diagnostic model, and probably a few other narrowly superhuman skills.
But I would expect it to not have any kind of significant planning capacity, because that planning capacity is not simple.
In particular my expectation is that coherently putting knowledge from different domains together in generally useful ways is MUCH, MUCH harder than being highly superhuman in narrow domains. Therefore I expect Ajeya's approach to be both effective, because "narrowly superhuman" can exist, and reasonably safe, because the gap between "narrowly superhuman" or even "narrowly superhuman in many ways" and "broadly superhuman" is large so GPT-3 being broadly superhuman is unlikely.
Phrased differently, I am rejecting your idea of smartness-spectrum. My intuition is that levels of GPT-N competence will scale the way computers have always scaled at AI tasks--becoming usefully superhuman at a few very quickly, while taking much much longer to exhibit the kinds of intelligence that are worrying, like modeling human behavior for manipulation.
This seems like it's using the wrong ontology to me.
Like, in my mind, there are things like medical diagnostics or predictions of pharmaceutical reactions, which are much easier cognitive tasks than general conversation, but which humans are specialized away from.
For example, imagine the severity of side effects from a specific medication. can be computed by figuring out 15 variables about the person and putting them into a neural network with 5000 parameters, and the output is somewhere in a six-dimensional space, and this model is part of a general model of human reactions to chemicals.
Then GPT-3 would be in a great position to use people's reddit posts talking about medication side effects to find this network. I doubt that medical science in our current world could figure that out meaningfully. It would be strongly superhuman in this important medical task, but nowhere near superhuman in any other conversational task.
My intuition is that most professional occupations are dominated by problems like this, that are complex enough that we as humans can only capture them as intuitions, but simple enough that the "right" computational solution would be profoundly superhuman in that narrow domain, without being broadly superhuman in any autonomous sense.
Maybe a different reading of your comment is something like, there are so many of these things that if a human had access to superhuman abilities across all these individual narrow domains, that human could use it to create a decisive strategic advantage for themself, which does seem possibly very concerning.
This post matches and specifies some intuitions I've had for a while about empirical research and I'm very happy it has been expanded.
Upcoming this comment because it helped me understand why nobody seems to be engaging with what I think the central point of my post is.
After reading some of this reddit thread I think I have a better picture of how people are reacting to these events. I will probably edit or follow up on this post to follow up.
My high level takeaway is:
- people are afraid to engage in speech that will be interpreted as political, so are saying nothing.
- nobody is actually making statements about my model of alignment deployment, possibly nobody is even thinking about it.
In the edit or possibly in a separate followup post I will try to present the model at a further disconnect from the specific events and actors involved, which I am only interested in as inputs to the implementation model anyway.
I appreciate the thread as context for a different perspective, but it seems to me that it loses track of verifiable facts partway through (around here), though I don't mean to say it's wrong after that.
I think in terms of implementation of frameworks around AI, it still seems very meaningful to me how influence and responsibility are handled. I don't think that a federal agency specifically would do a good job handling an alignment plan, but I also don't think Yann LeCun setting things up on his own without a dedicated team could handle it.
I would want to see a strong justification before deciding not to discuss something that is directly relevant to the purpose of the site.
Noted that a statement has been made. I don't find it convincing, and even if it did I don't think it changes the effect of the argument.
In particular, even if it was the case that both dismissals were completely justified, I think the chain of logic still holds.
I think this makes sense, but I disagree with it as a factual assessment.
In particular I think "will make mistakes" is actually an example of some combination of inner and outer alignment problems that are exactly the focus of LW-style alignment.
I also tend to think that the failure to make this connection is perhaps the biggest single problem in both ethical AI and AI alignment spaces, and I continue to be confused about why no one else seems to take this perspective.
I am currently writing fiction that features protagonists that are EAs.
This seems at least related to the infrastructure fund goal of presenting EA principles and exposing more people to them.
I think receiving a grant would make me more likely to aggressively pursue options to professionally edit, publish, and publicize the work. That feels kind of selfish and makes me self-conscious, but also wouldn't require a very large grant. It's hard for me to unwrap my feelings about this vs. the actual public good, so I'm asking here first.
Does this sounds like a good grant use?
Any preliminary results on side effects so far?
How were you able to find someone who would give you an antibody test?
I made some effort to get an antibody test a few weeks ago but multiple sources refused to order or run one, even after I had an appointment that I showed up for in person.
Welp, I spent five minutes plus trying to switch to the markdown editor to fix my spoilers and failed. Giving up now.
I would expect the prior to be to end up with something similar to the flu vaccine, which we try to get everyone to take approximately yearly and have more safety concerns about people not taking it.
I find both directions plausible. I do agree that I don't see any existing institutions ready to take it's place, but looking at secular solstice, for example, I definitely expect that better institutions are possible.
There might be a sufficiency stagnation following similar mechanics to crowding out, since people have a "good enough" option they don't try to build better things, and centralized leadership causes institutional conservatism.
I would bet this is supported by worse outcomes for more centralized churches, like unitarians vs megachurches or orthodox catholics, but that's a weakly held belief.
I think I find this plausible. An alternative to MichaelBowbly's take is that religion may crowd out other community organization efforts which could plausibly be better.
I'm thinking of unions, boys and girls clubs, community centers, active citizenship groups, meetup groups, and other types of groups that have never yet existed.
It could be that in practice introducing people to religious practices shows them examples of ways to organize their communities, but it could also be that religious community efforts are artificially propped up by government subsidies via being tax exempt.
The normative implication in this case, which I think is probably a good idea in general, is that you should focus on building intimate (not professionalized and distant) community groups to connect with people and exchange services.
A toy model that makes some sense to me is that the two population distinction is (close to) literally true; that there's a subset of like 20% of people who have reduced their risk by 95%+, and models should really be considering only the other 80% of the population, which is much more homogeneous.
Then because you started with effectively 20% population immunity, that means R0 is actually substantially higher, and each additional piece of immunity is less significant because of that.
I haven't actually computed anything with this model so I don't know whether it is actually explanatory.
I did some calculations of basic herd immunity thresholds based on fractal risk (without an infection model) a few months back, and the difference between splitting the population into high exposure vs low exposure captures more than half the change from the limit of infinite splits. The threshold stopped changing almost entirely after three splits, which was only 6 subpopulatuons.
With many other variables as exist here I'm not confident that effect would persist but my default guess is that adding fractal effects to the model will less than double the change from the homogenous case, and possibly change very little at all as the herd immunity threshold and therefore level of spread reduction will be changed even less (especially with control systems.)
That may end up being pretty significant in terms of actual number of deaths and infections at the end, but I would be very surprised if it changes whether or not there are peaks.
I'd like to use this feature, especially to keep track if I meet a user in the walled garden or IRL but need consistency to remember which user they are. This is a common feature in video games and without it I would have no idea who most of my friends in League of Legends are.
I wouldn't be that worried about privacy for the notes, since I'd expect few of them to contain sensitive information, though they might contain some awkward information.
Yeah I think my main disagreements are 4 and 5.
Given stories I've heard about cryonics orgs, I'd put 10-50% on 5. Given my impression of neuroscience, I'd put 4 at 25-75%.
Given that I'm more pessimistic in general, I'd put an addition 2x penalty on my skepticism of their other guesses.
That puts me around 0.01%-20% spread, or one in ten thousand lower bound, which is better than I expected. If I was convinced that a cryo org was actually a responsible business that would be enough for me to try to make it happen.
Even 0.2% seems quite optimistic to me. Without going into detail, anything from 3-8 seems like it could be 10% or lower and 12-14 seem nearly impossible to estimate. I wouldn't be surprised to find my personal estimate below one in a million.
I was trying to do a back of the envelope calculations of total cost of work and total value created (where I'm using cost of rent as a (bad) proxy for (capturable) value created).
I definitely wouldn't assume that the government or any single agent would be doing the project, just that the overall amount of capturable value must be worth it for the investment costs, then different parties can pay portions of those costs in exchange for portions of or rights to that value, but I doubt adding in the different parties involved would make my estimates more accurate.
Do you have a source for cost of similar projects? My estimates are definitely very bad for many reasons.
I want to have this post in a physical book so that I can easily reference it.
It might actually work better as a standalone pamphlet, though.
I like that this responds to a conflict between two of Eliezer's posts that are far apart in time. That seems like a strong indicator that it's actually building on something.
Either "just say the truth", or "just say whatever you feel you're expected to say" are both likely better strategies.
I find this believable but not obvious. For example, if the pressure on you is you'll be executed for saying the truth, saying nothing is probably better that saying the truth. If the pressure on you is remembering being bullied on tumblr, and you're being asked if you disagree with the common wisdom at a LW meetup, saying nothing is better than saying what you feel expected to say.
I find it pretty plausible that those are rare circumstances where the triggering uncertainty state doesn't arise, but then there are some bounds on when the advice applies that haven't been discussed at all.
a little cherry-picking is OK
I think the claim being made here is that in most cases, it isn't practical to review all existing evidence, and if you attempt to draw out a representative sub-sample of existing evidence, it will necessarily line up with your opinion.
In cases where you can have an extended discussion you can mention contradicting evidence and at least mention that it is not persuasive, and possibly why. But in short conversations there might only be time for one substantial reference. I think that's distinct from what I would call "cherry-picking." (it does seem like it would create some weird dynamics where your estimate of the explainer's bias rises as you depart from uncertainty, but I think that's extrapolating too far for a review)
I think the comment of examples is helpful here.
I wonder about the impact of including something like this, especially with social examples, in a curated text that is at least partly intended for reading outside the community.
The factual point that moderate liberals are more censorious is easy to lose track of, and I saw confusion about it today that sent me back to this article.
I appreciate that this post starts from a study, and outlines not just the headline from the study but the sample size. I might appreciate more details on the numbers, such as how big the error bars are, especially for subgroups stats.
Historical context links are good, and I confirm that they state what they claim to state.
Renee DiResta is no longer at New Knowledge, though her previous work there is still up on her site. I really like the exploration of her background. It might be nice to see something similar about Justin Murphy as well.
Swearing is negatively correlated with agreeableness
citation for this is in the link on the previous sentence; I might adjust the link so it's clear what it covers.
It’s often corporate caution that drives speech codes that restrict political controversy, obscenity, and muckraking/whistleblowing. It’s not just racist or far-right opinions that get silenced; media and social-media corporations worry about offending prudes, homophobes, Muslim extremists, the Chinese government, the US military, etc, etc.
This paragraph seems clearly true to me, but I'd prefer to see citations, especially since it's related to politics.
every guy with a printing press could publish a “newspaper” full of opinions and scurrilous insults
citation for this would be nice, or just a link to an example. Here's a discussion with sources.
I really like Zvi's comment tying this back to a more detailed model of Asymmetric Justice.
I really like this post overall; especially in the context of Asymmetric Justice it feels like something that's simple and obvious to me after reading it, while being invisible to me beforehand.
it's not a simple enough question for easy answers.
It's also plausible to me that it requires enough intersections (owns a house; rents the house out on AirBnB; in a single metro area; measures success in a reasonable way; writes about it on the internet) gets small enough that there are no results.
Looking for general advice (how to succeed as an AirBNB host) might give a model that's easy to fill in, like "you will succeed if the location is X appealing and there are f(X) listings or fewer."
That still seems like a pretty easy answer to me, but it could only be found with slightly better Google Fu.
I think that leads to a need for heuristics on how hard to try rephrasing things or when to give up quickly rather than getting sucked down a two day wiki walk rabbit hole.
I think you're misunderstanding my analogy.
I'm not trying to claim that if you can solve the (much harder and more general problem of) AGI alignment, then it should be able to solve the (simpler specific case of) corporate incentives.
It's true that many AGI architectures have no clear analogy to corporations, and if you are using something like a satisficer model with no black-box subagents, this isn't going to be a useful lens.
But many practical AI schema have black-box submodules, and some formulations like mesa-optimization or supervised amplification-distillation explicitly highlight problems with black box subagents.
I claim that an employee that destroys documentation so that they become irreplaceable to a company is a misaligned mesa-optimizer. Then I further claim that this suggests:
- Company structures contain existing research on misaligned subagents. It's probably worth doing a literature review to see if some of those structures have insights that can be translated.
- Given a schema for aligning sub-agents of an AGI, either the schema should also work on aligning employees at a company or there should be a clear reason it breaks down
- if the analogy applies, one could test the alignment schema by actually running such a company, which is a natural experiment that isn't safely accessible for AI projects. This doesn't prove that the schema is safe, but I would expect aspects of the problem to be easier to understand via natural experiment than via doing math on a whiteboard.
As remizidae points out, most of these restrictions are not effectively enforced by governments, they are enforced by individuals and social groups. In California, certainly, the restaurants and bars thing is enforced mostly by the government, but that's mostly a "governments can't act with nuance" problem.
But for things like gatherings of friends, I think this question still applies. The government cannot effectively enforce limits on that, but your group of friends certainly can.
And I think in that context, this question remains. That is, I think groups of friends in California should start plans for how to handle social norms under partial immunity.
I have personally suggested this to friends a couple times, and I've met with a lack of enthusiasm. I think a part of that is that the question is so politically tribal, that taking any action that isn't MAXIMALLY SERIOUS is a betrayal of the tribe, even if it has no practical value.
Also, making any such plans public, versus just keeping a google doc of who among your friends has been vaccinated, creates a lot of social awkwardness, so I'd expect that in practice people will come up with their own personal, secret, and highly error-prone ways of handling it.
I think this misunderstands my purpose a little bit.
My point isn't that we should try to solve the problem of how to run a business smoothly. My point is that if you have a plan to create alignment in AI of some kind, it is probably valuable to ask how that plan would work if you applied it to a corporation.
Creating a CPU that doesn't lie about addition is easy, but most ML algorithms will make mistakes outside of their training distribution, and thinking of ML subcomponents as human employees is an intuition pump for how or whether your alignment plan might interact.
I like this post and would like to see it curated, conditional on the idea actually being good. There are a few places where I'd want more details about the world before knowing if this was true.
- Who owns this land? I'm guessing this is part of the Guadalupe Watershed, though I'm not sure how I'd confirm that.
This watershed is owned and managed by the Santa Clara Valley Water District.
- What legal limits are there on use of the land? Wikipedia notes:
The bay was designated a Ramsar Wetland of International Importance on February 2, 2012.
I don't know what that means, but it might be important.
- How much does it cost to fill in land like this?
It looks like for pool removal there's a cost of between $20-$130 per cubic foot yard (thanks johnswentworth). Making the bad simplifying assumption of 6ft of depth and 50 square miles that's 8.3 billion ft^3 310 million cubic yards. Since the state of CA is very bad at cutting costs, let's use the high end cost estimate which is about 1/8 of $1000 so that makes the cost estimate $1 trillion $300 billion.
With a trillion dollar price tag, this stops looking worthwhile pretty fast.
Spitballing about price estimates:
- People have filled in things like this in the past, which suggests lower costs
- Human effort may be much more expensive than it was previously
- pool filling prices might include massive fixed costs and regulatory costs that wouldn't scale with volume
- The state could auction the land to a private company that might do a better job negotiating costs
If fixed costs are 90% of pool fillings and will be negligible by volume for this, and if we further use the lower bound of cost per filling, then we reduce cost by 60x to about $5 billion. Let's call that an 80% confidence interval, where the low end is clearly worth it and the high end clearly not.
- How much does it cost to build a bunch of housing there?
First Google result says $65k-86k per unit, though economies of scale might bring that down. Then the suggested 2 million units would cost ~$130-170 billion; potentially significantly more or less.
- How much value does the housing create?
The cheapest rents I could see with a casual search was something around $900/bedroom/month in Fremont.
Rounding up to $11k/year, it would take 6-8 years to recoup construction costs, not counting maintenance.
At the low end of land filling costs, $16 billion, adds less than one year to the recoup timeline. At the high end around $1 trillion, it would take about 50 years to recoup the costs. $300 billion, that ~triples to ~20 years.
Reaching the end of this, I think I'm uncertain about how economical the idea is. This is mostly because of large error bars around my cost calculations.
An investment that pays off in value created 50 years down the line is probably worth it for society, but very unlikely to happen given the investment environment today.
My ending impression is I want this post curated, because I want city managers and real estate investors to run these numbers (ideally being nerd-sniped by my terrible naïve calculations) and make the decision for themselves.