Posts
Comments
I believe that technical work which enhances safety culture is generally very positive.
All of the examples that you mentioned share one critical non-technical aspect though. Their results are publicly available (I guess they were funded by general public, e.g. in case of "BadLlama" - by donations and grants to a foundation Palisade Research and IIIT, an Indian national institute). If you took the very same "technical" research and have it only available to a potentially shady private company, then that technical information could help them to indeed circumvent Llama's safeguards. At that point, I'm not sure if one could still confidently call it "overwhelmingly positive".
I agree that the works that you mentioned are very positive, but I think that the above non-technical aspect is necessary to take into consideration.
Thank you very much for the in-depth comment, it is indeed very helpful for me to hear. Let me try to address your points:
- "How does evolution go backward in time..." - I don't think it does. Let me try to explain the mechanism:
(1) groups of people, whether families, tribes, villages and then larger ones have existed for pretty much as long as human species (even down to pre-humans). We called it a "state", as it is the current incantation, but generally speaking a system of cooperating groups of people has perhaps always been stronger than individual people, hence I think it did have as much impact on human evolution as individual fitness;
(2) individual people's traits are evolutionally encouraged partially directly (how well people are progressing in their own society) and partially indirectly (via their society - how well their society is doing as compared with other societies) and the latter part plays a role in choosing such individual traits which are helpful for the state;
(3) states are currently the strongest actors, which means that nowadays they dictate the rules of further evolution in many aspects;
(4) happiness (which is explained earlier in the article) is strongly connected with a goal function of individual, which means that it drives which behaviours are displayed by an individual (those behaviours which bring happiness) and conversely, which behaviours are not displayed even though they could give that individual advantage over their peers within society (e.g. cheating, lying - assuming that they are undetected)
To summarise, we think that the social traits have always been a crucial fitness factor of the groups and via them, indirectly also a crucial fitness factor of individuals, while happiness (as the goal function) is necessary to give incentive to those social traits.
The above reasoning is a general expectation of evolution behaviour, which can be mitigated, by many factors, e.g. emigration to name just one.
- "The argument jumps between happiness and kindness..." - kindness is an example of such a trait which is expected to allow a society to function more efficiently than one consisting of unkind individuals. This is only an expectation (or our own observation) and is not meant to be a definitive claim, rather a hypothesis, or prediction. I think the reasoning from above point does include the answer to what sort of mechanism would cause kindness to become a crucial fitness factor of an individual (assuming that kind citizens really make society work better).
- "Where is the evidence that people with less kindness necessarily form less effective societies that fail to survive?" - completely agree, there is none in this article. It is a hypothesis, which we would expect to see only from intuitive observation, but it would be a very interesting area to dive into. To be fair, kindness was only used here as an example to reason about what might happen to traits which are helpful to build effective societies.
- "Large states extinguishing both kindness and happiness is just stated as a fact. Where is the evidence?" - please note that we didn't say "large states", but "totalitarian states" and there is a fundamental difference. From large states we might only expect large power, but the story with totalitarian states is more complicated. An individual has a certain degree of freedom, where they can be either helpful or unhelpful to the fitness of their society. That means, that society which has only useful citizens is stronger than one where all of its citizens try to cheat on it. However, with time, the society can evolve in many different directions - one particularly bothering is totalitarian state, which does everything it can to limit the freedom of an individual. Once that freedom is limited, the state is resistant to a citizen trying to be dishonest - they are immediately brought back to order. That in turn means, that the old mechanisms which generated usefulness to the state are no longer needed. And as it happens, that old mechanism which generated usefulness was an individual's desire to be useful. That desire will no longer be needed in a totalitarian state. The more we look into that, the more similar desires become less and less relevant, as generally, the concept of a human choice was removed. That would mean, that the role of happiness in a totalitarian state would be diminished. Please note, that I don't mean that a state wouldn't care for citizen's happiness, but I mean that the whole concept of happiness (understood as a goal function of an individual) would be less relevant.
As for the evidence - you are right that there is none in this article. This is because we are making a hypothesis - something that stems from studying theory of systems, the mechanisms of evolution of societies and forming anticipations, rather than an empirical study, e.g. surveys in totalitarian vs non-totalitarian states, which, if we could get our hand on, would surely be very interesting, but we expect it to be hard to gather.
"... and so on" - if you could please provide more arguments, I'd be genuinely very grateful, as the ones you listed so far were very useful for me.
Thank you.
Thank you! I've updated the article.
We expected that there may be some minor things that people will not like about the article, but the current negative karma suggests that we have misjudged it. Since this is our first article, it would be especially helpful to hear your comments.
Thanks! The key to topic selection is where we find that we are most disagreeing with the popular opinions. For example, the number of times I can cope with hearing someone saying "I don't care about privacy, I have nothing to hide" is limited. We're trying to have this article out before that limit is reached. But in order to reason about privacy's utility and to ground it in root axioms, we first have to dive into why we need freedom. That, in turn requires thinking about mechanisms of a happy society. And that depends on our understanding of happiness, hence that's where we're starting.
Hi LessWrong Community!
I'm new here, though I've been an LW reader for a while. I'm representing complicated.world website, where we strive to use similar rationality approach as here and we also explore philosophical problems. The difference is that, instead of being a community-driven portal like you, we are a small team which is working internally to achieve consensus and only then we publish our articles. This means that we are not nearly as pluralistic, diverse or democratic as you are, but on the other hand we try to present a single coherent view on all discussed problems, each rooted in basic axioms. I really value the LW community (our entire team does) and would like to start contributing here. I would also like to present from time to time a linkpost from our website - I hope this is ok. We are also a not-for-profit website.