LessWrong 2.0 Reader
View: New · Old · Topnext page (older posts) →
next page (older posts) →
Yes, I agree that there is this difference in few examples I gave, but I don't agree that this difference is crucial.
Even if the agent puts max effort to keep its utility function stable over time, there is no guarantee it will not change. Future is unpredictable. There are unknown unknowns. And effect of this fact is both:
It seems you agree with 1st. I don't see the reason you don't agree with 2nd.
mateusz-baginski on DeepSeek beats o1-preview on math, ties on coding; will release weightsIt's predictably censored on CCP-sensitive topics.
(In a different chat.) After the second question, it typed two lines (something like "There have been several attempts to compare Winnie the Pooh to a public individual...") and then overwrote it with "Sorry...".
dakara on How can we prevent AGI value drift?My biggest concern with intent alignment of AGI is that we might run into the issue of AGI being used for something like a totalitarian control over everyone who doesn't control AGI. It becomes a source of nearly unlimited power. The first company to create intent-aligned AGI (probably ASI at that point) can use it to stop all other attempts at building AGI. At that point, we'd have a handful of people wielding incredible power. It seems unlikely that they'd just decide to give it up. I think your "big if" is a really, really big if.
But other than that, your plan definitely seems workable. It avoids the problem of value drift, but unfortunately it incurs the cost dealing with power-hungry humans.
gerardus-mercator on Claude seems to be smarter than LessWrong communityFor the sake of clarity, let's discuss expected utility functions, which I mentioned above (or "pragmatism functions", say) from strategies to numbers, as opposed to utility functions from world-states to numbers, in order to make it clear that the actual utility function of an agent doesn't change.
That's another one of the reasons that I wasn't persuaded by your new example; in your new example, the agent believes that its future self will still be trying to create paperclips (same terminal goal) and will be better at that thanks to its greater knowledge (different instrumental goals although it doesn't know what), but in your old example, the agent believes that its future self will be trying to destroy paperclips (opposite terminal goal). There's a difference between having the rule-of-thumb "my current list of incidental goals might be incomplete, I should keep an eye out for things that are incidentally good" and having the rule-of-thumb "I shouldn't try to protect my terminal goal from changes". The whole point of those rules of thumb is to fulfill the terminal goal, but the second rule of thumb is actively harmful to that.
I do think that the first rule of thumb would be prudent for an agent to have, to one extent or another, to be clear.
I just think that - stepping back from the new example, and revisiting the old example, which seems much more clear-cut - the agent wouldn't tolerate a change in its utility function, because that's bad according to its current utility function. This doesn't apply to the new example because the pragmatism function is a different thing that the agent is trying to improve (and thus change).
(I find myself again emphasizing the difference between terminal and instrumental. I think it's important to keep in mind that difference.)
It could be really interesting how the employemnt looks before and after the camp.
clone-of-saturn on Lighthaven Sequences Reading Group #12 (Tuesday 11/26)Okay, but you're not comparing like with like. Terminator 2 is an action movie, and I agree that action movies have gotten better since the 1960s. But in terms of sci-fi concepts introduced per second, I would suspect 2001 has more. Some movies from the 1990s that are more straight sci-fi would be Gattaca or Contact, but I don't think many people would consider these categorically better than 2001.
sohaib-imran on Akash's ShortformOne thing I’d be bearish on is visibility into the latest methods being used for frontier AI methods, which would downstream reduce the relevance of alignment research except for the research within the manhattan-like project itself. This is already somewhat true of the big labs eg. methods used for o1 like models. However, there is still some visibility in the form of system cards and reports which hint at the methods. When the primary intention is racing ahead of China, I doubt there will be reports discussing methods used for frontier systems.
jmh on NeutralityInteresting but I've just skim so will need to come back. With that caveat made, I seem to have had a couple of thought that keep recurring for me that seem compatible or complementary with your thoughts.
First, where do we define the margin between public and private. It strikes me that a fair amount of social strife does revolve around a tension here. We live in a dynamic world so thinking that the sphere of private actions will remain static seem unlikely but as the world changed (knowledge, applied knowledge driving technology change, movement of people resulting in cultural transmission and tensions...) will be forces resulting in a change in the line between public and private.
While I'm not entirely sure it is the best framing, I do think of this in the form of externalities. Negative externalities are the more challenging form. What I think starts happening is that we live in t=0 and some set of private activities are producing very little negative impacts on others. But we find by t=10 some of the elements in that set of private activities are now producing a large enough total negative external effect that:
At some point either most accept that a new definition of "private" exists and the old ways have changed or society reached the point that those who have not adapted will be treated as criminal and removed from society.
The other thing I've been thinking about is related. One hears the where's my flying car, it's the 21st Century already quip now and then. But I think a better one might be: It's the 21st Century, why am I still living under and 18th Century form of government?
I think these relate some of your post in that a lot of the social conflict you point to is driven by the shifting margin between public and private sphere of action. As that margin shifts people use the government to address those new conflicts within the society. But few if any governments differ substantially from those that have existed for centuries. I would characterize that vision of government, even when thinking of representative democracies, as that of an actor/agent. Government takes actions, just like the private members of society do. It should function, as you say, in a neutral way. Part of the failing there comes from government, being an actor/agent, then has its own interests, agendas and biases.
That government as an active participant contrasts a bit with how I think most people think of markets. Markets don't really do anything. They are simply an environment in which active entities come and interact with each other. Markets don't set price or quality or even really type of item -- these are all unplanned outputs. The market itself is indifferent to all those, it's neutral in the sense you use that term.
Well, in the 21st Century might we not think that how governments are structured might also shift? While I am far from sure that the shift would be correctly called divestiture or privatization (which seems most people think of when talking about fixing government -- or for some calling for increasing what its already doing) I do think the shift might be away from an acting entity and more into some type of passive environment that has some commonality with markets. In a very real sense governments are already a type of market setting but not a price/money exchange one (the representatives are not quite but out bids and offers on votes) but clearly these is an demand mediation and supply process going on. But currently the market-like aspect of government is about integrating voter/members of society demands and then the government makes a decision and takes the actions it wants. I would think some areas might be suitable for taking out the government being the actor and let the actions be decentralized among the people. Probably not individual action, I suspect some sub-agent presence will exist to reduce organizational/transaction costs but certainly the process would look more market-like and be a more neural setting. That might well then remove a lot of the divisiveness and conflict we see with the existing "old school" forms of government.
That is all probably a bit poorly written and expressed but it's a quick dump of a couple of not fully thought out ideas.
christian-z-r on What are the good rationality films?Riders of Justice: imdb.com/title/tt11655202/
Recognizing patterns in a mainly random world, psycho-therapeutic hacking strategies. Can't say much more without risking spoilers.
trevorone on Lighthaven Sequences Reading Group #12 (Tuesday 11/26)It was more of a 1970s-90s phenomenon actually, if you compare the best 90s moves (e.g. terminator 2) to the best 60s movies (e.g. space odyssey) it's pretty clear that directors just got a lot better at doing more stuff per second. Older movies are absolutely a window into a higher/deeper culture/way of thinking, but OOMs less efficient than e.g. reading Kant/Nietzsche/Orwell/Asimov/Plato. But I wouldn't be surprised if modern film is severely mindkilling and older film is the best substitute.