I partly support the spirit behind this feature, of providing more information (especially to the commenter), making the readers more engaged and involved, and expressing a reaction with more nuance than with a mere upvote/downvote. I also like that, as with karma, there are options for negative (but constructive) feedback, which I mentioned here when reviewing a different social discussions platform that had only positive reactions such as "Aha!" and "clarifying".
In another sense, I suspect (but could be wrong) that this extra information could also have the opposite effect of "anchoring" the readers of the comments and biasing them towards the reactions left by others. If they saw that a comment had been reacted with a "verified" or "wrong", for example, they could anchor on that before reading the comment. Maybe this effect would be less pronounced than in other communities, but I don't think LessWrong would be unaffected by this.
(Comment on UI: when there are nested comments, it can be confusing to tell whether the reaction corresponds to the parent or the child comment:
Certainly; it wasn't my intention to make it seem like an 'either-or'. I believe there's a lot of room for imported quality teaching, and a fairly well-educated volunteer might be better at teaching than the average local teacher. I didn't find how they taught there too effective: a lot of repeating the teacher's words, no intuition built for maths or physics…I think volunteers could certainly help with that. Also by teaching the subjects they are more proficient at than the local teachers (e.g. English). I agree there is the potential to use volunteers in a variety of ways to raise the level of education, and also to try to make the changes permanent once the volunteers leave.
Strong upvote. I found that almost every sentence was extremely clear and conveyed a transparent mental image of the argument made. Many times I found myself saying to myself "YES!" or "This checks" as I read a new point.
That might involve not working on a day you’ve decided to take off even if something urgent comes up; or deciding that something is too far out of your comfort zone to try now, even if you know that pushing further would help you grow in the long term
I will add that, for many routine activities or personal dilemmas with short- and long-term intentions pulling you in opposite directions (e.g. exercising, eating a chocolate bar), the boundaries you set internally should be explicit and unambiguous, and ideally be defined before being faced by the choice.
This is to avoid rationalising momentary preferences (I am lazy right now + it's a bit cloudy -> "the weather is bad, it might rain, I won't enjoy running as much as if it was sunny, so I won't go for a run") that run counter to your long-term goals, where the result of defecting a single time would be unnoticeable for the long run. In this cases it can be helpful to imagine your current self in a bargaining game with your future selves, in a sort of prisoner's dilema. If your current now defects, your future selves will be more prone to defecting as well. If you coordinate and resist tempation now, future resistance will be more likely. In other words, establishing a Schelling fence.
At the same time, this Schelling fence shouldn't be too restrictive nor be merciless towards any possible circumstance, because then this would make you more demotivated and even less inclined to stick to it. One should probably experiment with what works for him/her in order to find a compromise between a bucket broad and general enough for 70-90% of scenarios to fall into, while being merciful towards some needed exceptions.
Thank you very much for this sequence. I knew fear was a great influence (or impediment) over my actions, but I hadn't given it such a concrete form, and especially a weapon (= excitement) to combat it, until now.
Following matto's comment, I went through the Tunning Your Cognitive Strategies exercise, spotting microthoughts and extracting the cognitive strategies and deltas between such microthoughts. When evaluating a possible action, the (emotional as much as cognitive) delta "consider action X -> tiny feeling in my chest or throat -> meh, I'm not sure about X" seemed quite recurring. Thanks to your pointers on fear and to introspecting about it, I have added "-> are you feeling fear? -> yes, I have this feeling in my chest -> is this fear helpful? -> Y, so no -> can you replace fear with excitement?" (a delta about noticing deltas) as a cognitive strategy.
Why I (beware of other-optimizing) can throw away fear in most situations is that I have developed the mental techniques, awareness and strength to counter the negatives which fear wants to point at.
As many, I developed fear as a kid, in response to being criticised or rejected, at a time when I didn't have the mental tools to deal with these situations. For example, I took things too personally, thought others' reactions were about me and my identity, and failed to put myself in others' shoes and understand that when other kids criticise it is often unfounded and just to have a laugh. To protect my identity I developed aversion, a bias towards inaction, and fear of failure and of being criticised. This propagated to also lead to demotivation, self-doubt, and underconfidence.
Now I can evaluate whether fear is an emotion worth having. Fear points at something real and valuable: the desire to do things well and be liked. But as I said, for me personally fear is something I can do away with in most situations because I have the tools to respond better to negative feedback. If I write an article and it gets downvoted, I won't take it as a personal issue that hurts my intrinsic worth; I will use the feedback to improve and update my strategies. In several cases, excitement can be much more useful (and motivating, leading to action) than fear: excitement of commenting or writing on LessWrong over fear of saying the wrong thing; excitement of talking or being with a girl rather than fear of rejection.
Thank you very much for this post, I find it extremely valuable.
I also find it especially helpful for this community, because it touches on what I believe are two main sources of anxiety over existential dread that might be common among LWers:
Doom itself (end of life and Earth, our ill-fated plans, not being able to see our children or grandchildren grow, etc.), and
uncertainty over one's (in)ability to prevent the catastrophe (can I do better? Even if it's unlikely I will be the hero or make a difference, isn't it worth wagering everything on this tiny possibility? Isn't the possibility of losing status, friends, resources, time, etc. better than the alternative of not having tried our best and humanity coming to an end?)
It depends on the stage in one's career, the job/degree/path one is pursuing, and other factors, but I expect that many readers here are unusually prone to the second concern compared to outsiders, perhaps due to their familiarity with coordination failures and defections, their intuition that there's always a level above and possibility for doing better/optimisation...I am not sure how this angst over uncertainty, even if it's just a lingering thought in the back of one's mind, can really be cleared, but particularly Fabricated Options conceptualises a response to it and says "we'll always be uncertain, but don't stress too much, it's okay".
As others have pointed out, there's a difference between a) problems to be tackled for the sake of the solution, vs b) problems to be tackled for the sake (or fun) of the problem. Humans like challenges and puzzles and to solve things themselves rather than having the answers handed down to them. Global efforts to fight cancer can be inspiring, and I would guess a motivation for most medical researchers is their own involvement in this same process. But if we could push a button to eliminate cancer forever, no sane person would refuse to.
I think we should aim to have all a) solved asap (at least those problems above a certain threshold of importance), and maintain b). At the same time, I suspect that the value we attach to b) also bears some relation to the importance of the solution to those problems. E.g. that a theoretical problem can be much more immersive, and eventually rewarding, when the whole of civilisation is at stake, than when it's a trivial puzzle.
So I wonder how to maintain b) once the important solutions can be provided much faster and easily by another entity or superintelligence. Maybe with fully immersive similations that reproduce e.g. the situation and experience of trying to find a cure to cancer, or with large-scale puzzles (such as scape rooms) but which are not life-or-death (nor happiness-or-suffering).
The phases you mentioned in learning anything seem especially relevant for sports.
1. To have a particular kind of feelings (felt senses) that represent something (control, balance, singing right, playing the piano right, everything being done) 2. A range of intensity that we should keep that feeling sense in, in some given context (either trying to make sure we have some positive feeling, or that we avoid some negative feeling) 3. Various strategies for keeping it within that range
Below the surface, every sport is an extremely complex endeavour for the body, and mastering it is a marvelous achievement of the mind. You realise this particularly when starting out. I had my first golf class yesterday, and it's far from the laid-back activity I thought it was. Just knowing how to grip the club correctly is a whole new world: whether the hands overlap or interlock, where the thumb is pointing at, getting the right pressure...This is before even starting with the backswing, impact, and follow-through.
In fact, though, knowing is not the right word. It's feeling. I have been playing tennis for my whole life, and as I was shown the techniques for golf I constantly compared them with those of tennis, with which it shares many postures and motions. It is astonishing how complex and how sensitive each stroke or swing is, and yet how it gets done seemlessly and almost unthinkingly when one masters it. If one tried to get each tiny detail exactly right, it seems impossible we could even hit the ball. Timothy Gallwey in The Inner Game of Tennis presented this same process of focusing and of being aware of your body and sensations in order to enhance these felt senses and let your mind adjust the intensity to the right felt standards.
On a different note, a failure mode of mine as a youngster, and which I'm still trying to overcome, was related to the fear of being accused of something, but with completely different countermeasures than the example you gave; it's more like a contradictory failure mode.
My sister was often envious and critical of any dissonant action, so I became afraid of her disapproving anything I did, at any moment. At the same time, if I made the same choices as her, she would also accuse me of copying her. So this ended up making me try to settle in a neutral territory and almost become a yes-boy.
For example, in a restaurant I would be afraid of ordering salmon, because my sister might order it, or because even if she didn't, it might seem like I was copying her predilection for healthy food. However, I would also be afraid of overcorrecting and of ordering something too unhealthy, or of asking for more food because I hadn't had enough. And so I would end up ordering a middle-ground option, like, say, steak.
Thank you for your explanations. My confusion was not so much from associating agency with consciousness and morality or other human attributes, but with whether it was judged from an inside, mechanistic point of view, or from an outside, predicting point of view of the system. From the outside, it can be useful to say that "water has the goal to flow downhill", or that "electrons have the goal to repel electrons and attract protons", inasmuch as "goal" is referred to as "tendency". From an inside view, as you said, it's nothing like the agency we know; they are fully deterministic laws or rules. Our own agency is in part an illusion, because we too act deterministically; the laws of physics, but more specifically, the patterns or laws of our own human behaviour. These seem much more complex and harder for us to understand than the laws of gravity or electromagnetism, but reasons do exist for every single one of our actions and decisions, of course.
A key property of agents is that the more agentic a being is, the more you can predict its actions from its goals since its actions will be whatever will maximize the chances of achieving its goals. Agency has sometimes been contrasted with sphexishness, the blind execution of cached algorithms without regard for effectiveness.
Although, at the same time, agency and sphexishness might not be truly opposed; one refers to an outside perspective, the other to an inside perspective. We are all sphexish in a sense, but we attribute to others and even to the I this agency property because we are ignorant of many of our own rules.
(I reply both to you and @Ericf here). I do struggle a bit to make up my mind on whether drawing a line of agency is really important. We could say that a calculator has the 'goal' of returning the right result to the user; we don't treat a calculator as an agent, but is it because of its very nature and the way in which it was programmed, or is it for a matter of capabilities, it being incapable of making plans and considering a number of different paths to achieve its goals?
My guess is that there is something that makes up an agent and which has to do with the ability to strategise in order to complete a task; i.e. it has to explore different alternatives and choose the ones that would best satisfy its goals. Or at least a way to modify its strategy. Am I right here? And, to what extent is a sort of counterfactual thinking needed to be able to ascribe to it this agency property; or is following some pre-programmed algorithms to update its strategy enough? I am not sure about the answer, and about how much it matters.
There are some other questions I am unclear about:
Would having a pre-programmed algorithm/map on how to generate, prioritise and execute tasks (like for AutoGPT) limit its capacity for finding ways to achieve its goals? Would it make it impossible for it to find some solutions that a similarly powerful AI could have reached?
Is there a point at which it is unnecessary for this planning algorithm to be specified, since the AI would have acquired the capacity to plan and execute tasks on its own?
This seems to me more like a tool AI, much like a piece of software asked to carry out a task (e.g. an Excel sheet for doing calculations), but with the addition of processes or skills for the creation of plans and searches for solutions which would endow it with an agent-like behaviour. So, for the AutoGPT-style AI here contemplated, it appears to me like this agent-like behaviour would not emerge out of the AI's increased capabilities and achievement of general intelligence to reason, devise accurate models of the world and of humans, and plan; nor would it emerge out of a set of values specified. It would instead come from the capabilities to plan that would be specified.
I am not sure this AutoGPT-like AI counts as an agent in the sense of conferring the advantages of a true agent AI — i.e. having a clear distinction between beliefs and values. Although I would expect that it would still be able to produce the harmful consequences you mentioned (perhaps, as you said, starting with asking for permission from the user for access to his resources or private information, and doing dangerous things with those) as it was asked to carry out more complex and less-well-understood tasks, with increasingly complex processes and increasing capabilities. The level of capabilities and the way of specifying the planning algorithms, if any, seem very relevant.
The first times I read LW articles and especially those by Eliezer, it was common for me to think that I simply wasn't smart enough to follow their lines of argumentation. It's precisely missing these buckets and handles, these modes of thoughts and the expressions/words used to communicate them, what made it hard at the start; as I acquired them, I could feel I belonged the community. I suppose this occurs to all newbies, and it's understandable to feel this helplessness and inability to contribute for as long as you haven't acquired the requisite material. High standards need high barriers of entry.
At the same time, however, I think the use of plenty of unknown-to-me terminology, all linking and referring to other posts, could have as easily put me off as it made me curious to explore this new world. And it has probably repelled many potentially very capable contributors, which didn't lack the desire or talent, but simply experienced a discouraging introduction. (And kept them out of all the advantages of this new camp and perspective).
Only perhaps a sufficient promotion/display of the benefits and value of this community can outweigh the costs of starting for an outsider (as long as we're unwilling to reduce the costs; I think there are good reasons to keep the costs high). A way to increase the perceived interests could be to for example ensure that other sites who link to LW, and which might be a source of newcomers, sufficiently introduce and promote the benefits of LW and this rationalist community, rather than merely linking this seemingly highly entangled and exclusive/elitist website. But I feel this would still be insufficient to properly adjust this filter of entry to LW — a more extensive list of measures would be better.
I like this model, much of which I would encapsulate in the tendency to extrapolate from past evidence, not only because it resonates with the image I have of the people who are reluctant to take existential risks seriously, but because it is more fertile for actionable advice than the simple explanation of "because they haven't sat down to think deeply about it". This latter explanation might hold some truth, but tackling it would be unlikely to make them take more actions towards reducing existential risks if they weren't aware of, and weren't able to fix, possible failure modes in their thinking, and weren't aware that AGI is fundamentally different and extrapolating from past evidence is unhelpful.
I advocate shattering the Overton window and spreading arguments on the fundamental distinctions between AGI and our natural notions of intelligence, and these 4 points offer good, reasonable directions for addressing that. But the difficulty also lies in getting those arguments across to people outside specific or high-end communities like LW; in building a bridge between the ideas created at LessWrong, and the people who need to learn about them but are unlikely to come across LessWrong.
The broad spirit they want to convey with the word "generalisation", which is that two systems can exhibit the same desired behaviour in training but result in completely different goals in testing or deployment, seems fair as the general problem. But I agree that to generalise can give the impression that it's an "intentional act of extrapolation", to create a model that is consistent with a certain specification. And there are many more ways in which the AI can behave well in training and not in deployment, without need to assume it's extrapolating a model.
And since two systems can tell jokes in training when the specification is to make people happy, and one end up pumping people with opioids and the other having no consideration for happiness, then any of these or other failure modes could happen despite being sure their behaviours were consistent with the programmers' goal in training.
This is a really complicated issue because different priors and premises can lead you to extremely different conclusions.
For example, I see the following as a typical view on AI among the general public: (the common person is unlikely to go this deep into his reasoning, but could come to these arguments if he had to debate on it)
Premises: "Judging by how nature produced intelligence, and by the incremental progress we are seeing in LLMs, artificial intelligence is likely to be achieved by packing more connections into a digital system. This will allow the AI to generate associations between ideas and find creative solutions to problems more easily, think faster, have greater memory, or be more error-proof. This will at one point generate an intelligence superior to ours, but it will not be fundamentally different. It will still consist of an entangled network of connections, more powerful and effective than ever, but incapable of "jumping out of the system". These same connections will, in a sense, limit and prevent it from turning the universe into a factory of paperclips when asked to produce more paperclips. If bigger brains hadn't made childbirth dangerous or hadn't been more energy-consuming, nature could have produced a greater, more complex intelligence, without the risk of it destroying the Earth. Maybe this is not the only way to build an artificial superintelligence, but it seems a feasible way and the most likely path in light of the developments to date. Key issues will need to be settled — regarding AI consciousness, its training data, or the subsequent social changes it will bring —, but the AI will not be existentially threatening. In fact, greater existential risks would come from having to specify the functions and rules of the AI, as in GOFAI, where you would be more likely to stumble upon the control problem and the like. But in any case, GOFAI would take far too long to develop to be concerning right now."
Conclusion: "Stopping the development of AIs would make sense to solve the above problems, but not at the risk of creating big power conflicts or even of postponing the advent of the benefits of AI."
I do not endorse these views of AI (although I assign a non-negligible probability to superintelligence first coming through this gradual and connectivist, and existentially harmless, increase in capabilities), but if its main cruxes are not clarified and disputed, we might be unable to make people come to different conclusions. So while the Overton window does need to be widened to make the existential concerns of AI have any chance of influencing policies, it might require a greater effort that involves clarifying the core arguments and spreading ideas to e.g. overcome the mind projection fallacy or understand why artificial superintelligence is qualitatively different from human intelligence.
I have been using the Narwhal app for the past few days, a social discussion platform "designed to make online conversations better" that is still at its prototype stage. This is how it basically works: there are several topics of discussion posted by other users, formulated with an initial question (e.g. "How should we prioritise which endangered species to protect?" or "Should Silicon Valley be dismantled, reformed, or neither?") and a description, and you can comment on any or reply to others' comments. You can also suggest your own discussions.
Here are my initial impressions:
Like writing a post or like commenting on LessWrong, in a way it is a tool that helps you think. Expressing ideas helps to form them. Having a space for answering to a specific question motivates you to think about it, which you wouldn't have done in such depth otherwise. It inspires you to expand your web of original ideas.
So far there's a high level of respect in the conversations. This could be due to how the app is designed; discussion topics have to be approved by the Narwhal team, comments might be moderated, comments need to be +140 characters long...Or it could be because it's just starting, with possibly no more than 10 users actively commenting, one or several of the most prominent ones being part of the Narwhal team, most of the rest having joined because we were precisely pursuing a platform for respectful discussions, and only around 2 discussions being posted each day. Or it could be a combination of both. It remains to be seen how this would change when more people, not all well-intended, joined in, with multiple discussions happening at the same time and a user being unable to keep track of them all. So far the algorithm hasn't really needed to be tested.
Respect is promoted, but I haven't seen many disagreements or a collaborative practise of working to uncover cruxes and change one's beliefs. You can give 'Aha's to enlightening comments or rate them as 'provocative', 'clarifying' or 'new to me', but this is all positive feedback that doesn't encourage changing your mind. There are no negative options to signal when there's disagreement or a conflict of points of view that needs to be cleared.
Relatedly, my main concern is that there aren't strong enough incentives for rational replies that allow for real progress on key questions. The number of 'Aha's a user has ever received only appears once you click on the profile and not when she/he leaves a comment. So having accumulated many 'Aha's doesn't mean too much. Most importantly, you don't get penalised when you make a bad or untruthful comment. Probably it can get deleted by the moderators if it is disrespectful, but if you fail to clarify a point, if you argue irrationally from your trench, or if you are overly biased, it might be irrelevant for your reputation. The incentive is for the comment to be good enough, not as good as it can be. Hence, I much prefer LessWrong's options to upvote or downvote comments and state how much you agree or disagree with them, with positive and negative karma.
There are several other features missing, such as direct messages to continue the conversations in private, the ability to filter or search for topics, or to follow other people or topics. But I suppose the Narwhal team already has these in mind.
Besides, the quality of responses to AI, especially AI alignment and safety, seems a bit low, at least compared to the LW community.
All in all, it feels like a quiet, peaceful plaza for discussions, an arena for people looking for a respite from the lousy environments of conventional social media like Twitter, but I have questions about whether it really motivates progress or it's mostly signalling and trying to sound convincing, and how much can be extrapolated from this initial experience regarding how the app will scale up.
Nice to hear the high standards you continue to pursue. I agree that LessWrong should set itself much higher standards than other communities, even than other rationality-centred or -adjacent communities.
My model of this big effort to raise the sanity waterline and prevent existential catastrophes contains three concentric spheres. The outer sphere is all of humanity; ever-changing yet more passive. Its public opinion is what influences most of the decisions of world leaders and companies, but this public opinion can be swayed by other, more directed forces.
The middle sphere contains communities focused on spreading important ideas and doing so by motivating a rationalist discourse (for example, ACX, Asterisk Magazine, or Vox's Future Perfect). It aims, in other words, for this capacity to sway public opinion, to make key ideas enter popular discussion.
And the inner sphere is LessWrong, which shares the same aims as the middle sphere, and in addition is the main source of generation of ideas and patterns of thought. Some of these ideas (hopefully a concern for AI alignment, awareness of the control problem, or Bayesianism, for instance) will eventually trickle down to the general public; others, such as technical topics related to AI safety, don't need to go down to that level because they belong to the higher end of the spectrum which is directly working to solve these issues.
So I very much agree with the vision to maintain LW as a sort of university, with high entry barriers in order to produce refined, high-quality ideas and debates, while at the same time keeping in mind that for some of these ideas to make a difference, they need to trickle down and reach the public debate.
Could we take from Eliezer's message the need to redirect more efforts into AI policy and into widening the Overton window to try, in any way we can, to give AI safety research the time it needs? As Raemon said, the Overton window might be widening already, making more ideas "acceptable" for discussion, but it doesn't seem enough. I would say the typical response from the the overwhelming majority of the population and world leaders to misaligned AGI concerns still is to treat them as a panicky sci-fi dystopia rather than to say "maybe we should stop everything we're doing and not build AGI".
I'm wondering if not addressing AI policy sufficiently might be a coordination failure from the AI alignment community; i.e. from an individual perspective, the best option for a person who wants to reduce existential risks probably is to do technical AI safety work rather than AI policy work, because AI policy and advocacy work is most effective when done by a large number of people, to shift public opinion and the Overton window. Plus it's extremely hard to make yourself heard and influence entire governments, due to the election cycles, incentives, short-term thinking, bureaucracy...that govern politics.
Maybe, now that AI is starting to cause turmoil and enter popular debate, it's time to seize this wave and improve the coordination of the AI community. The main issue is not whether a solution to AI alignment is possible, but whether there will be enough time to come up with one. And the biggest factors that can affect the timelines probably are (1) big corporations and governments, and (2) how many people work on AI safety.