Posts
Comments
I'm a bit torn regarding the "predicting how others react to what you say or do, and adjust accordingly" part. On the one hand this is very normal and human and makes sense. It's kind of predictive empathy in a way. On the other hand, thinking so very explicitly about it and trying to steer your behavior in a way so as to get the desired reaction out of another person also feels a bit manipulative and inauthentic. If I knew another person would think that way and plan exactly how they interacted with me, I would find that quite off-putting. But maybe the solution is just "don't overdo it", and/or "only use it in ways the other person would likely consent to" (such as avoiding to accidentally say something hurtful).
My take on this is that patching the more "obvious" types of jailbreaking and obfuscation already makes a difference and is probably worth it (as long as it comes at no notable cost to the general usefulness of the system). Sure, some people will put in the effort to find other ways, but the harder it is, and the fewer little moments of success you have when first trying it, the fewer people will get into it. Of course one could argue that the worst outcomes come from the most highly motivated bad actors, and they surely won't be deterred by such measures. But I think even for them there may be some path dependencies involved where they only ended up in their position because over the years, while interacting with LLMs, they ended up running into a bunch of just ready enough jailbreaking scenarios that kept their interest up. Of course that's an empirical question though.
Some other comments already discussed the issue that often neither A nor B are necessarily correct. I'd like to add that there are many cases where the truth, if existent in any meaningful way, depends on many hidden variables, and hence A may be true in some circumstances, and B in some other circumstances, and it's a mistake to look for "the one static answer". Of course the question "when are A or B correct?" / "What does it depend on?" are similarly hard questions. But it's possible that this different framing can already help, as inquiring why the two sides believe what they believe can sometimes uncover these hidden variables, and it becomes apparent that the two sides' "why"s are not always opposite sides of a single axis.
An argument against may be that for some people there's probably a risk of getting addicted / losing control. I'm not familiar with to what degree it's possible to predict such tendencies in advance, but for some people that risk may well outweigh any benefits of arbitrate opportunities or improvements to their calibration.
Note from the future: I asked a bunch of LLMs for Terry Pratchett quotes on the human stomach, and while there's no guarantee any of them are actual non-hallucinated quotes (in different conversations I got many different ones while no single one came up twice), I think they're all pretty good:
"All he knew was that his stomach had just started investigating some of the more revolutionary options available to it."
"The stomach is smarter than the mind, which is why it likes to make all the important decisions."
"His stomach was making the kind of noises that normally precede the arrival of a self-propelled meal."
"His stomach felt like it was trying to digest a live weasel while attempting to escape through his boots."
"The trouble with having an open mind, of course, is that people will insist on coming along and trying to put things in it. And the trouble with having an empty stomach is that it wants food all the time."
“The stomach is an essential part of the nervous system. It tells the brain what it wants far more clearly than the brain manages to tell it.”
"The human stomach is an amazing thing. It can stretch to accommodate all sorts of things. In theory, anyway. It just doesn’t appreciate it when you try to prove it."
Bonus Example: The Game Codenames
There’s a nice board + online game called Codenames. The basic idea is: you have two teams, each team split into two roles, the spymaster and the operatives. All players see an array of 25 cards with a single word on each of them. Everybody sees the words, but only the spymasters see the color of these cards. They can be blue or red, for the two teams, or white for neutral. The teams then take turns. Each time, the spymaster tries to come up with a single freely chosen word that would then allow their operatives to select, by association to that word, as large as possible a number of cards of their team’s color. The spymaster hence communicates that word, as well as the number of cards to be associated with that word, to the rest of their team. The operatives then discuss amongst each other which cards are most likely to fit that provided word[1].
I’ve played this game with a number of people, and noticed that many seem to play this in “forward” mode: spymasters often just try to find some plausible word that matches some of their team’s cards, almost as if they were trying to solve the problem: if somebody saw what I saw, they should agree this word makes sense. Whereas the better question would be: which word, if my team heard it, would make them choose the cards that have our team’s color? And the operatives on the other hand usually just check plainly which cards fit this word best? But almost nobody asks themselves if the selection of cards I’ve picked now really is the one the spymaster had in mind, would they have picked the word that they did?
To name a concrete example of the latter point, let’s say the spymaster said the word “transportation” and the number 2, so you know you’re looking for exactly two cards with some word on them that relates to transportation. And after looking at all available cards, there are three candidates: “wheel”, “windshield” and “boat”. Forward reasoning would allow basically any 2 out of these 3 cards, so you basically had to guess. But with inverse reasoning you would at least notice that, if “wheel” and “windshield” were the two words the spymaster was hinting at, they would most certainly have used “car” rather than “transportation”. But as they did not, in fact, choose “car”, you can be pretty sure that “boat” should be among your selection, so you can at least be pretty sure about that one word.
Of course one explanation for all of this may be that Codenames is, after all, just a game, and doing all this inverse reasoning is a bit effortful. Still it made me realize how rarely people naturally go into “inverse mode”, even in such a toy setting where it would be comparably clean and easy to apply.
- ^
I guess explaining the rules of a game is another problem that can be approached in forward or inverse ways. The forward way would just be to explain the rules in whichever way seems reasonable to you as someone familiar with the game. Whereas the inverse way would be to think about how you can best explain things in a way such that somebody who has no clue about the game will quickly get an idea of what’s going on. I certainly tried to do the latter, but, ehhh, who knows if I succeeded.
Without having thought much about it, I would think that it's a) pretty addictive and b) "scales well". Many forms of consumption have some ~natural limit, e.g. you can only eat so much food, going to the movies or concerts or whatever takes some energy and you probably wouldn't want to do this every day. Even addictive activities like smoking tend to have at least somewhat of a cap on how much you spend on it. Whereas gambling (which sports betting probably basically is to many people) potentially can just eat up all your savings if you let it.
So it would at least seem that it has much more potential to be catastrophic for individuals with low self control, even though that's a different story than the average effect on household investment, I guess.
While much of this can surely happen to varying degrees, I think an important aspect in music is also recognition (listening to the same great song you know and like many times with some anticipation), as well as sharing your appreciation of certain songs with others. E.g. when hosting parties, I usually try to create a playlist where for each guest there are a few songs in there that they will recognize and be happy to hear, because it has some connection to both of us. Similarly, couples often have this meme of "this is our song!", which throws them back into nostalgic memories of how they first met.
None of this is to disagree with the post though. I mostly just wanted to point out that novelty and "personal fit" are just two important aspects in any person's music listening experience, and I think it's unlikely these two aspects will dominate the future of music that much.
I once had kind of the opposite experience: I was at a friend's place, and we watched the recording of a System of a Down concert from a festival that we both had considered attending but didn't. I thought it was terrific and was quite disappointed not to have attended in person. He however got to the conclusion that the whole thing was so full of flaws that he was glad he hadn't wasted money on a ticket.
Just like you, I was baffled, and to be honest just kind of assumed he was just trying to signal his high standards or something but surely didn't actually mean that.
Given that he was quite the musician himself, playing multiple instruments, and I'm quite the opposite, I now for the first time seriously consider whether he really did dislike that concert as much as he said.
I appreciate your perspective, and I would agree there's something to it. I would at first vaguely claim that it depends a lot on the individual situation whether it's wise to be wary of people's insecurities and go out of one's way to not do any harm, or to challenge (or just ignore) these insecurities instead. One thing I've mentioned in the post is the situation of a community builder interacting with new people, e.g. during EA or lesswrong meetups. For such scenarios I would still defend the view that it's a good choice to be very careful not to throw people into uncomfortable situations. Not only because that's instrumentally suboptimal, but also because you're in a position of authority and have some responsibility not to e.g. push people to do something against their will.
However, when you're dealing with people you know well, or even with strangers but on eye level, then there's much more wiggle room, and you can definitely make the case that it's the better policy to not broadly avoid uncomfortable situations for others.
Thanks for sharing your thoughts and experience, and that first link indeed goes exactly in the direction I was thinking.
I think in hindsight I would adjust the tone of my post a bit away from "we're generally bad at thinking in 3D" and more towards "this is a particular skill that many people probably don't have as you can get through the vast majority of life without it", or something like that. I mostly find this distinction between "pseudo 3D" (as in us interacting mostly with surfaces that happen to be placed in a 3D environment, but very rarely, if ever, with actual volumes) and "real 3D" interesting, as it's probably rather easy to overlook.
I find your first point particularly interesting - I always thought that weights are quite hard to estimate and intuit. I mean of course it's quite doable to roughly assess whether one would be able to, say, carry an object or not. But when somebody shows me a random object and I'm supposed to guess the weight, I'm easily off by a factor of 2+, which is much different from e.g. distances (and rather in line with areas and volumes).
That github link yields a 404. Is it just an issue with the link itself, or did something change about the dataset being public?
Indeed! I think I remember having read that a while ago. A different phrasing I like to use is "Do you have a favorite movie?", because many people actually do and then are happy to share it, and if they don't, they naturally fall back on something like "No, but I recently watched X and it was great" or so.
Good point. I guess one could come up with examples that have less of this inefficiency but still are "computationally unkind". Although in the end, there's probably some correlation between these concepts anyway. So thanks for adding that. 👌
I would add 3) at the start of an event, everyone is asked to state their hopes and expectations about the event. While it's certainly useful to reflect on these things, I (embarassingly?) often in such situations don't even have any concrete hopes or expectations and am rather in "let's see what happens" mode. I still think it's fair to ask this question, as it can provide very benefitial feedback for the organizer, but they should at least be aware that a) this can be quite stressful for some participants, and b) many of the responses may be "made up" on the fly, rather than statements backed by a sufficient level of reflection. Of course just being honest there and saying "I don't have any expectations yet and just thought the title of the event sounded interesting" is probably the best option, but I think 10-years-ago-me would probably not have been confident enough to say that, and instead made up some vague plausible sounding claims that had a higher chance of signaling "I've got my shit together and definitely thought deeply about why I'm attending this event beforehand".
I think it's a fair point. To maybe clarify a bit though, while potentially strawmanning your point a bit, my intention with the post was not so much to claim "the solution to all social problems is that sufficiently-assertive people should understand the weaknesses of insufficiently-assertive people and make sure to behave in ways that don't cause them any discomfort", but rather I wanted to try to shed some light on situations that for a long time I found confusing and frustrating, without being fully aware of what caused that perceived friction. So I certainly agree that one solution to these situations can be to "tutor the insufficiently-assertive". But still, such people will always exist in this world, and if you're, say, a community builder who frequently interacts with new people, then it can still be valuable to be aware of these traps.
Thanks a lot for the write-up, very interesting and a good resource to get back to for future workshops.
I would expect that they fare much better with a text representation. I'm not too familiar with how multimodality works exactly, but kind of assume that "vision" works very differently from our intuitive understanding of it. When we are asked such a question, we look at the image and start scanning it with the problem in mind. Whereas transformers seem like they just have some rather vague "conceptual summary" of the image available, with many details, but maybe not all for any possible question, and then have to work with that very limited representation. Maybe somebody more knowledgeable can comment on how accurate that is. And whether we can expect scaling to eventually just basically solve this problem, or some different mitigation will be needed.
Maybe I accidentally overpromised here :D this code is just an expression, namely 1.0000000001 ** 175000000000
, which, as wolframalpha agrees, yields 3.98e7.
One crucial question in understanding and predicting the learning process, and ultimately the behavior, of modern neural networks, is that of the shape of their loss landscapes. What does this extremely high dimensional landscape look like? Does training generally tend to find minima? Do minima even exist? Is it predictable what type of minima (or regions of lower loss) are found during training? What role does initial randomization play? Are there specific types of basins in the landscape that are qualitatively different from others, that we might care about for safety reasons?
First, let’s just briefly think about very high dimensional spaces. One somewhat obvious observation is that they are absolutely vast. With each added dimension, the volume of the available space increases exponentially. Intuitively we tend to think of 3-dimensional spaces, and often apply this visual/spatial intuition to our understanding of loss landscapes. But this can be extremely misleading. Parameter spaces are utterly incredibly vast to a degree that our brain can hardly fathom. Take GPT3 for instance. It has 175 billion parameters, or dimensions. Let’s assume somewhat arbitrarily that all parameters end up in a range of [-0.5, 0.5], i.e. live in a 175-billion-dimensional unit cube around the origin of that space (as this is not the case, the real parameter space is actually even much, much larger, but bear with me). Even though every single axis only varies by 1 – let’s just for the sake of it interpret this as “1 meter” – even just taking the diagonal from one corner to the opposite one in this high-dimensional cube, you would get a length of ~420km. So if, hypothetically, you were sitting in the middle of this high dimensional unit cube, you could easily touch every single wall with your hand. But nonetheless, all the corners would be more than 200km distant from you.
This may be mind boggling, but is it relevant? I think it is. Take this realization for instance: if you have two minima in this high dimensional space, but one is just a tiny bit “flatter” than the other (meaning the second derivatives overall are a bit closer to 0), then the attractor basin of this flatter minimum is vastly larger than that of the other minimum. This is because the flatness implies a larger radius, and the volume depends exponentially on that radius. So, at 175 billion dimensions, even a microscopically larger radius means an overwhelmingly larger volume. If, for instance, one minimum’s attractor basin has a radius that is just 0.00000001% larger than that of the other minimum, then its volume will be roughly 40 million times larger (if my Javascript code to calculate this is accurate enough, that is). And this is only for GPT3, which is almost 4 years old by now.
The parameter space is just ridiculously large, so it becomes really crucial how the search process through it works and where it lands. It may be that somewhere in this vast space, there are indeed attractor basins that correspond to minima that we find extremely undesirable – certain capable optimizers perhaps, that have situational awareness and deceptive tendencies. If they do exist, what could we possibly tell about them? Maybe these minima have huge attractor basins that are reliably found eventually (maybe once we switch to a different network architecture, or find some adjustment to gradient descent, or reach a certain model size, or whatever), which would of course be bad news. Or maybe these attractor basins are so vanishingly small that we basically don’t have to care about them at all, because all the computer & search capacity of humanity over the next million years would have an almost 0 chance of ever stumbling onto these regions. Maybe they are even so small that they are numerically unstable, and even if your search process through some incredible cosmic coincidence happens to start right in such a basin, the first SGD step would immediately jump out of it due to the limitations of numerical accuracy on the hardware we’re using.
So, what can we actually tell at this point about the nature of high dimensional loss landscapes? While reading up on this topic, one thing that constantly came up is the fact that, the more dimensions you have, the lower the relative number of minima becomes compared to saddle points. Meaning that whenever the training process appears to slow down and it looks like it found some local minimum, it’s actually overwhelmingly likely that what it actually found is a saddle point, hence the training process never halts but keeps moving through parameter space, even if the loss doesn't change that much. Do local minima exist at all? I guess it depends on the function the neural network is learning to approximate. Maybe some loss landscapes exist where the loss can just get asymptotically closer to some minimum (such as 0), without ever reaching it. And probably other loss landscapes exist where you actually have a global minimum, as well as several local ones.
Some people argue that you probably have no minima at all, because with each added dimension it becomes less and less likely that a given point is a minimum (because not only does the first derivative of a point have to be 0 for it to be a minimum, also all the second derivatives need to be in on it, and all be positive). This sounds compelling, but given that the space itself also grows exponentially with each dimension, we also have overwhelmingly more points to choose from. If you e.g. look at n-dimensional Perlin Noise, its absolute number of local minima within an n-dimensional cube of constant side length actually increases with each added dimension. However, the relative number of local minima compared to the available space still decreases, so it becomes harder and harder to find them.
I’ll keep it at that. This is already not much of a "quick" take. Basically, more research is needed, as my literature review on this subject yielded way more questions than answers, and many of the claims people made in their blog posts, articles and sometimes even papers seemed to be more intuitive / common-sensical or generalized from maybe-not-that-easy-to-validly-generalize-from research.
One thing I’m sure about however is that almost any explanation of how (stochastic) gradient descent works, that uses 3D landscapes for intuitive visualizations, is misleading in many ways. Maybe it is the best we have, but imho all such explainers should come with huge asterisks, explaining that the rules in very high dimensional spaces may look much different than our naive “oh look at that nice valley over there, let’s walk down to its minimum!” understanding, that happens to work well in three dimensions.
So how did it work out for you?
That seems like a rather uncharitable take. Even if you're mad at the company, would you (at least (~falsely) assuming this all may indeed be standard practice and not as scandalous as it turned out to be) really be willing to pay millions of dollars for the right to e.g. say more critical things on Twitter, that in most cases extremely few people will even care about? I'm not sure if greed is the best framing here.
(Of course the situation is a bit different for AI safety researchers in particular, but even then, there's not that much actual AI (safety) related intel that even Daniel was able to share that the world really needs to know about; most of the criticism OpenAI is dealing with now is on this meta NDA/equity level)
I would assume ChatGPT gets much better at answering such questions if you add to the initial prompt (or system prompt) to eg think carefully before answering. Which makes me wonder whether "ChatGPT is (not) intelligent" even is a meaningful statement at all, given how vastly different personalities (and intelligences) it can emulate, based on context/prompting alone. Probably a somewhat more meaningful question would be what the "maximum intelligence" is that ChatGPT can emulate, which can be very different from its standard form.
Just to note your last paragraph reminds me of Stuart Russel's approach to AI alignment in Human Compatible. And I agree this sounds like a reasonable starting point.
Thanks for the post, I find this unique style really refreshing.
I would add to it that there's even an "alignment problem" on the individual level. A single human in different circumstances and at different times can have quite different, sometimes incompatible values, preferences and priorities. And even at any given moment their values may be internally inconsistent and contradictory. So this problem exists on many different levels. We haven't "solved ethics", humanity disagrees about everything, even individual humans disagree with themselves, and now we're suddenly racing towards a point where we need to give AI a definite idea of what is good & acceptable.
Aren't LLMs already capable of two very different kinds of search? Firstly, their whole deal is predicting the next token - which is a kind of search. They're evaluation all the tokens at every step, and in the end choose the most probable seeming one. Secondly, across-token search when prompted accordingly. Say "Please come up with 10 options for X, then rate them all according to Y, and select the best option" is something that current LLMs can perform very reliably - whether or not "within token search" exists as well. But then again, one might of course argue that search happening within a single forward pass, and maybe even a type of search that "emerged " via SGD rather than being hard baked into the architecture, would be particularly interesting/important/dangerous. We just shouldn't make the mistake of assuming that this would be the only type of search that's relevant.
I think across-token search via prompting already has the potential to lead to the AGI like problems that we associate with mesa optimizers. Evidently the technology is not quite there yet because PoCs like AutoGPT basically don't quite work, so far. But conditional on AGI being developed in the next few years, it would seem very likely to me that this kind of search would be the one that enables it, rather than some hidden "O(1)" search deeply within the network itself.
Edit: I should of course add a "thanks for the post" and mention that I enjoyed reading it, and it made some very useful points!
Great post! Two thoughts that came to mind while reading it:
- the post mostly discussed search happening directly within the network, e.g. within a single forward pass; but what can also happen e.g. in the case of LLMs is that search happens across token-generation rather than within. E.g. you could give ChatGPT a chess constellation and then ask it to list all the valid moves, and then check which move would lead to which state, and if that state looks better than the last one. This would be search depth 1 of course, but still a form of search. In practice it may be difficult because ChatGPT likes to give messages only of a certain length, so it probably stops prematurely if the search space gets too big, but still, search most definitely takes place in this case.
- somewhat of a project proposal, ignoring my previous point and getting back to "search within a single forward pass of the network": let's assume we can "intelligent design" our way to a neural network that actually does implement some kind of small search to solve a problem. So we know the NN is on some pretty optimal solution for the problem it solves. What does (S)GD look like at or very near to this point? Would it stay close to this optimum, or maybe instantly diverge away, e.g. because the optimum's attractor basin is so unimaginably tiny in weight space that it's just numerically highly unstable? If the latter (and if this finding indeed generalizes meaningfully), then one could assume that even though search "exists" in parameter space, it's impractical to ever be reached via SGD due to the unfriendly shape of the search space.
Thanks a lot! Appreciated, I've adjusted the post accordingly.
Just came to my mind that these are things I tend to think of under the heading "considerateness" rather than kindness
Guess I'd agree. Maybe I was anchored a bit here by the existing term of computational kindness. :)
Fair point. Maybe if I knew you personally I would take you to be the kind of person that doesn't need such careful communication, and hence I would not act in that way. But even besides that, one could make the point that your wondering about my communication style is still a better outcome than somebody else being put into an uncomfortable situation against their will.
I should also note I generally have less confidence in my proposed mitigation strategies than in the phenomena themselves.
Thanks for the example! It reminds me of how I once was a very active Duolingo user, but then they published some update that changed the color scheme. Suddenly the duolingo interface was brighter and lower contrast, which just gave me a headache. At that point I basically instantly stopped using the app, as I found no setting to change it back to higher contrast. It's not quite the same of course, but probably also something that would be surprising to some product designers -- "if people want to learn a language, surely something so banal as a brightening up the font color a bit would not make them stop using our app".
Another operationalization for the mental model behind this post: let's assume we have two people, Zero-Zoe and Nonzero-Nadia. They are employed by two big sports clubs and are responsible for the living and training conditions of the athletes. Zero-Zoe strictly follows study results that had significant results (and no failed replications) in her decisions. Nonzero-Nadia lets herself be informed by studies in a similar manner, but also takes priors into account for decisions that have little scientific backing, following a "causality is everywhere and effects are (almost) never truly 0" world view, and goes for many speculative but cheap interventions, that are (if indeed non-zero) more likely to be beneficial rather than detrimental.
One view is that Nonzero-Nadia is wasting her time and focuses on too many inconsequential considerations, so will overall do a worse job than Zero-Zoe as she's distracted from where the real benefits can be found.
Another view, and the one I find more likely, is that Nonzero-Nadia can overall achieve better results (in expectation), because she too will follow the most important scientific findings, but on top of that will apply all kinds of small positive effects that Zero-Zoe is missing out on.
(A third view would of course be "it doesn't make any difference at all and they will achieve completely identical results in expectation", but come on, even an "a non-negligible subset of effect sizes is indeed 0"-person would not make that prediction, right?)
You're right of course - in the quoted part I link to the wikipedia article for "almost surely" (as the analogous opposite case of "almost 0"), so yes indeed it can happen that the effect is actually 0, but this is so extremely rare on a continuum of numbers that it doesn't make much sense to highlight that particular hypothesis.
For many such questions it's indeed impossible to say. But I think there are also many, particularly the types of questions we often tend to ask as humans, where you have reasons to assume that the causal connections collectively point in one direction, even if you can't measure it.
Let's take the question whether improving air quality at someone's home improves their recovery time after exercise. I'd say that this is very likely. But I'd also be a bit surprised if studies were able to show such an effect, because it's probably small, and it's probably hard to get precise measurements. But improving air quality is just an intervention that is generally "good", and will have small but positive effects on all kinds of properties in our lives, and negative effects on much fewer properties. And if we accept that the effect on exercise recovery will not be zero, then I'd say there's a chance of something like 90% that this effect will be beneficial rather than detrimental.
Similarly, with many interventions that are supposed to affect behavior of humans, one relevant question that is often answerable is whether the intervention increases or reduces friction. And if we expect no other causal effect that may dominate that one, then often the effect on friction may predict the overall outcome of that intervention.
A basic operationalization of "causality is everywhere" is "if we ran an RCT on some effect with sufficiently many subjects, we'd always reach statistical significance" - which is an empirical claim that I think is true in "almost" all cases. Even for "if I clap today, will it change the temperature in Tokyo tomorrow?". I think I get what you mean by "if causality is everywhere, it is nowhere" (similar to "a theory that can explain everything has no predictive power"), but my "causality is everyhwere" claim is an at least in theory verifiable/falsifiable factual claim about the world.
Of course "two things are causally connected" is not at all the same as "the causal connection is relevant and we should measure it / utilize it / whatever". My basic point is that assuming that something has no causal connection is almost always wrong. Maybe this happens to yield appropriate results, because the effect is indeed so small that you can simply act as if there was no causal connection. But I also believe that the "I believe X and Y have no causal connection at all" world view leads to many errors in judgment, and makes us overlook many relevant effects as well.
Indeed, I fully agree with this. Yet when deciding that something is so small that it's not relevant, it's (in my view anyway) important to be mindful of that, and to be transparent about your "relevance threshold", as other people may disagree about it.
Personally I think it's perfectly fine for people to consciously say "the effect size of this is likely so close to 0 we can ignore it" rather than "there is no effect", because the former may well be completely true, while the latter hints at a level of ignorance that leaves the door for conceptual mistakes wide open.
Just to note I wrote a separate post focusing on pretty much that last point:
Personally I have a very strong prior that nudging must have an effect > 0 - it would just be extremely surprising to me if the effect of an intervention that clearly points in one direction would be exactly 0. This may however still be compatible with the effects in many cases being too small to be worth to put the spotlight on, and I suspect it just strongly depends on the individual case and intervention.
Interesting, hadn't heard of this! Haven't fully grasped the "No evidence for nudging after adjusting for publication bias" study yet, but at first glance it looks to me as if it is rather evidence for small effect sizes than for no effect at all? Generally, when people say "nudging doesn't work", this can mean a lot of things, from "there's no effect at all" to "there often is an effect, but it's not very large, and it's not worth it to focus on this in policy debates", to "it has a significant effect, but it will never solve a problem fully because it only affects the behavior of a minority of subjects".
There's also this article making some similar points, overall defending the effectiveness of nudging while also pushing for more nuance in the debate. They cite one very large study in particular that showed significant effects while avoiding publication bias (emphasis mine):
The study was unique because these organizations had provided access to the full universe of their trials—not just ones selected for publication. Across 165 trials testing 349 interventions, reaching more than 24 million people, the analysis shows a clear, positive effect from the interventions. On average, the projects produced an average improvement of 8.1 percent on a range of policy outcomes. The authors call this “sizable and highly statistically significant,” and point out that the studies had better statistical power than comparable academic studies. So real-world interventions do have an effect, independent of publication bias.
(...)
We can start to see the bigger problem here. We have a simplistic and binary “works” versus “does not work” debate. But this is based on lumping together a massive range of different things under the “nudge” label, and then attaching a single effect size to that label.
Personally I have a very strong prior that nudging must have an effect > 0 - it would just be extremely surprising to me if the effect of an intervention that clearly points in one direction would be exactly 0. This may however still be compatible with the effects in many cases being too small to be worth to put the spotlight on, and I suspect it just strongly depends on the individual case and intervention.
Unless I misunderstand your comment, isn't it rather the opposite of odd that user stories are so popular, given that this is what the bias would predict? That being said, maybe I've argued a bit too strongly in one direction with this post - I wouldn't even say that user stories are detrimental or useless. Depending on your product, it may well be that some significant ratio of users to have strong intent. My main claim is that in most situations, the number of people who are closer to the middle of the spectrum is >0. But it's not necessary for that group to dominate the distribution.
So in my view, it can still make sense to focus on a subgroup of your users who know what they're doing, as long as you remain aware that this will not apply to all users. E.g. when A/B testing, you should expect by default that making any feature even mildly less convenient to use will have negative effects. So you should not be surprised to see that result - but it may still be the right choice to make such a change nonetheless, depending on what benefits you hope to get from it.
During winter, opening windows will raise your heating bills like mad.
Opening several windows/doors widely for a few minutes every couple of hours, rather than keeping one of them open for longer times, is supposed to mostly prevent this, as this will exchange the air in your room without significantly cooling down floor/walls/furniture. But of course you're still right that it's a trade-off, and for some people it's much easier to achieve consistently good CO2 levels than for others. For many it may be worth at least getting a CO2 monitor to be able to make better informed decisions.
One could certainly argue that improving an existing system while keeping its goals the same may be an easier (or at least different) problem to solve than creating a system from scratch and instilling some particular set of values into it (where part of the problem is to even find a way to formalize the values, or know what the values are to begin with - both of which would be fully solved for an already existing system that tries to improve itself).
I would be very surprised if an AGI would find no way at all to improve its capabilities without affecting its future goals.
Side point: this whole idea is arguably somewhat opposed to what Cal Newport in Deep Work describes as the "any benefit mindset", i.e. people's tendency to use tools when they can see any benefit in them (Facebook being one example, as it certainly does come with the benefit of keeping you in touch with people you would otherwise have no connection to), while ignoring the hidden costs of these tools (such as the time/attention they require). I think both ideas are worth to keep in mind when evaluating the usefulness of a tool. Ask yourself both if the usefulness of the tool can be deliberately increased, and if the tool's benefits are ultimately worth its costs.
I think it does relate to examples 2 and 3, although I would still differentiate between perfectionism in the sense that you actually keep working on something for a long time to reach perfection on the one hand, and doing nothing because a hypothetical alternative deters you from some immediate action on the other hand. The latter is more what I was going for here.
Good point, agreed. If "pay for a gym membership" turns out to be "do nothing and pay $50 a month for it", then it's certainly worse than "do nothing at home".
I would think that code generation has a much greater appeal to people / is more likely to go viral than code review tools. The latter surely is useful and I'm certain it will be added relatively soon to github/gitlab/bitbucket etc., but if OpenAI wanted to start out building more hype about their product in the world, then generating code makes more sense (similar to how art generating AIs are everywhere now, but very few people would care about art critique AIs).
Can you elaborate? Were there any new findings about the validity of the contents of Predictably Irrational?
This is definitely an interesting topic, and I too would like to see a continued discussion as well as more research in the area. I also think that Jeff Nobbs' articles are not a great source, as he seems to twist the facts quite a bit in order to support his theory. This is particularly the case for part 2 of his series - looking into practically any of the linked studies, I found issues with how he summarized them. Some examples:
- he claims one study shows that a study showed a 7x increase in cases of cardiovascular deaths and heart attacks, failing to mention that a) the test group was ~50% larger than the control group (so it was actually a ~5x rather than 7x increase), b) that the study itself claims these numbers are not statistically significant due to the low absolute number, and c) that you could get the opposite result from the study when looking at the number of all cause mortality, which happened to be ~4x as large for the control group as for the test group (which too is not statistically significant of course, but still)
- he cites a study on rats, claiming that it shows that replacing some fat in their diet with "fats that you usually find in vegetable oil" (quite a suspicious wording) increased cancer metastasis risk 4fold - but looking into the study, a) these rats had a significantly increased caloric intake when compared to the test group, and b) 90% of the fat they consume came from lard, rather than vegetable oils, making this study entirely useless for the whole debate
- for another study he points out the negative effects of safflower oil, but conveniently fails to mention that the same study found an almost as large negative effect for olive oil (which seems to be one of his favorites)
(note I wrote this up from memory, so possible I've mixed something up in the examples above - might be worth writing a post about it with properly linked sources)
I still think he's probably right about many things, and it's most certainly correct that oils high in Omega6 in particular aren't healthy (which might indeed include Canola oil, which I was not aware of before reading his articles). Still he seems to be very much on an agenda to an extent that it prevents him from summarizing studies accurately, which is not great. Doesn't mean he's wrong, but also means I won't trust anything he says without checking the sources.
I could well imagine that there are there are strong selection effects at play (more health-concerned people being more likely to give veganism a shot), and the positive effects of the diet just outweighing the possible slight increase in plant oil usage. And I wouldn't even be so sure if vegans on average consume more plant oil than non-vegans - e.g. vegans probably generally consume much less processed food, which is a major source of vegetable oil.
In The Rationalists' Guide to the Galaxy the author discusses the case of a chess game, and particularly when a strong chess player faces a much weaker one. In that case it's very easy to make the prediction that the strong player will win with near certainty, even if you have no way to predict the intermediate steps. So there certainly are domains where (some) predictions are easy despite the world's complexity.
My personal rather uninformed take on the AI discussion is that many of the arguments are indeed comparable in a way to the chess example, so the predictions seem convincing despite the complexity involved. But even then they are based on certain assumptions about how AGI will work (e.g. that it will be some kind of optimization process with a value function), and I find these assumptions pretty intransparent. When hearing confident claims about AGI killing humanity, then even if the arguments make sense, "model uncertainty" comes to mind. But it's hard to argue about that since it is unclear (to me) what the "model" actually is and how things could turn out different.