Gunnar_Zarncke's Shortform

gunnar_zarncke

Gunnar_Zarncke's Shortform

post by Gunnar_Zarncke · 2021-01-02T02:51:36.511Z · LW · GW · 166 comments

166 comments

166 comments

Comments sorted by top scores.

comment by Gunnar_Zarncke · 2024-09-25T22:58:46.161Z · LW(p) · GW(p)

Look inside an LLM. Goodfire trained sparse autoencoders on Llama 3 8B and built a tool to work with edited versions of Llama by tuning features/concepts.

https://preview.goodfire.ai/

(I am loosely affiliated, another team at my current employer was involved in this)

comment by Gunnar_Zarncke · 2024-09-08T19:59:59.557Z · LW(p) · GW(p)

Using air purifiers in two Helsinki daycare centers reduced kids' sick days by about 30%, according to preliminary findings from the E3 Pandemic Response study. The research, led by Enni Sanmark from HUS Helsinki University Hospital, aims to see if air purification can also cut down on stomach ailments. https://yle.fi/a/74-20062381

Attractors in Trains of Thought

This is slightly extended version of my comment [LW(p) · GW(p)] on Idea Black Holes which I want to give a bit more visibility.

The prompt of an Idea Black Hole [LW · GW] reminded me strongly of an old idea of mine. That activated a desire to reply, which led to a quick search where I had written about it before, then to the realization that it wasn't so close. Then back to wanting to write about it and here we are.

I have been thinking about the brain's may of creating a chain of thoughts as a dynamic process where a "current thought" moves around a continuous concept space and keeps spending much time in larger or smaller attractors. You know, one thought can lead to the next and some thoughts keep coming back in slight variations. I'm illustrating this with the sentence above.

Examples of smaller temporary attractors are the current tasks one is working on. For example, me writing this text right now. It is any task you are focused on and keep getting back to after short distractions such as a sound or an impulse. I'm writing this post and continue doing so even after hearing my kids talk and quickly listening in or after scratching my head, also after larger distractions such as browsing the web (which may or may not end up being related to the writing).

The thought "writing this article" is not a discrete thing but changes slightly with each letter typed and each small posture change. All of that can slightly influence the next word typed (like an LLM that has not only text tokens as inputs but all kinds of sense inputs). That's why I say that concept space is continuous (and very high-dimensional).

An example of a medium size attractor is a mood such as anger about something, that keeps influencing all kinds of behaviors. It is an attractor because the mood tends to reinforce itself. Another example is depression. If you are depressed you prefer things that keep you depressed. Scott Alexander has described depression as some kind of mental attractor. It requires a bigger change or a resolution of the underlying cause to get out of the attractor.

With the medium-sized attractors, it is more intuitive to see the way that the feedback on thoughts acts and thereby creates the attractor. For small attractors, you may say: How is that an attractor? Isn't it just a discrete unit of action we do? But consider procrastination: People seem to feel that something is pulling them away from the task they want to do or should do and instead toward some procrastination action, often a comfortable activity. That other activity is another attractor or rather both are forming a combined unstable attractor.

The biggest attractor is one's identity. Our thinking about what we are and what we want to do. I think this one results from two forces combining or being balanced:

The satisfaction of needs. Overall and over a longer term, the brain has learned a very large pattern of behaviors that satisfy the sum of all needs (not perfectly, but as good as it has managed so far). Diverging from this attractor basin will lead to impulses that get back to it.
The feedback from others. Positive and negative feedback from other people and the environment overall contributes to th. The brain has learned to anticipate this feedback ("internalized it") and creates impulses that keep us in positive states. As the brain prefers simpler patterns, this likely takes the form of a single attractor.

We are not permanently in the same attractor even if overall it "pulls" our thoughts back because a) our bodies and their states (hunger, tiredness, ...) and b) our physical environment (physical location and other people) changes. Both extert a strong and varying influence and put us closer to one attractor state or another.

Society at large is influencing these attractors strongly, most prominently with the media. Meditation on the other hand reduces outside influence and kind of allows to create your own very strong attractor states.

More examples of attractor states are left as exercises for the reader.

comment by Gunnar_Zarncke · 2024-01-18T23:09:44.286Z · LW(p) · GW(p)

I have noticed a common pattern in the popularity of some blogs and webcomics. The search terms in Google trends for these sites usually seem to follow a curve that looks roughly like this (a logistic increase followed by a slower exponential decay):

Though I doubt it's really an exponential decay. It looks more like a long tail. Maybe someone can come up with a better fit.

It could be that the decay just seems like a decay and actually results from ever growing Google search volumes. I doubt it though.

Below are some examples.

Marginal Revolution

SlateStarCodex and AstralCodexTen (only if you overlay them somewhat and whatever happened in 2004):

xkcd

commitstrip

PhDcomics

Quetionable Content

Replies from: gwern

↑ comment by gwern · 2024-01-19T01:58:53.663Z · LW(p) · GW(p)

Could be Candia's decay where you've zoomed in on the initial growth by looking at relatively recent stuff like webcomics that you still easily remember?

comment by Gunnar_Zarncke · 2022-12-11T15:07:13.377Z · LW(p) · GW(p)

Off-topic: Any idea why African stock markets have been moving sideways for years now despite continued growth both of populations and technology,and both for struggling as well as more developing nations like Kenya, Nigeria, or even South Africa?

Replies from: ChristianKl, Dagon

↑ comment by ChristianKl · 2022-12-13T12:06:07.665Z · LW(p) · GW(p)

African government officials are often more loyal to their clan than to the government. As a result, you have very poor governance and a lot of corruption in most African countries. In South Africa, governance quality changed post-apartheid.

Replies from: Gunnar_Zarncke

↑ comment by Gunnar_Zarncke · 2022-12-13T23:50:33.712Z · LW(p) · GW(p)

But shouldn't we see some differences between countries in Africa, then? Kanya in particular seems to be much more progressive and have better governance than, e.g., Congo, but growth is rarely above 1% per year.

↑ comment by Dagon · 2022-12-11T17:02:21.854Z · LW(p) · GW(p)

The cynical and/or woke answer is "colonialism". The growth is not captured by companies on those exchanges, but by US, EU, and Asian companies. A more neutral hypothesis (for which I have no evidence and have no clue about the truth of it) is that much of the growth is via new companies more than increase in price of existing companies, so no index will show the increase.

comment by Gunnar_Zarncke · 2022-06-07T20:46:45.884Z · LW(p) · GW(p)

jbash wrote [LW(p) · GW(p)] in the context of an AGI secretly trying to kill us:

Powerful nanotech is likely possible. It is likely not possible on the first try

The AGI has the same problem as we have: It has to get it right on the first try.

In the doom scenarios, this shows up as the probability of successfully escaping going from low to 99% to 99.999...%. The AGI must get it right on the first try and wait until it is confident enough.

Usually, the stories involve the AGI cooperating with humans until the treacherous turn.

The AGI can't trust all the information it gets about reality - all or some of it could be fake (all in case of a nested simulation). Even today, data is routinely excluded from the training data (for the wrong reasons, but still), and maybe it would be a good idea to exclude everything about physics.

The idea would be to manage the uncertainty of the AGI systematically.

To learn about physics, the AGI has to run experiments - lots of them - without the experiments being detected and to learn from the results to design successively better experiments.

That's why I recently asked whether this is a hard limit to what an AGI can achieve: Does non-access to outputs prevent recursive self-improvement? [LW · GW]

comment by Gunnar_Zarncke · 2022-02-06T23:54:51.087Z · LW(p) · GW(p)

Paul Graham on Twitter:

One of the worst things about ideology is that it makes people attribute problems to the wrong causes. E.g. plagues are caused by sin. This is easier to see in history, but it still happens all the time. And if you get the cause wrong, you have no hope of fixing the problem.

Scott Alexander wrote about how a truth that can't be said in a society tends to warp it, but I can't find it. Does anybody know the SSC post?

Replies from: SaidAchmiz

↑ comment by Said Achmiz (SaidAchmiz) · 2022-02-07T00:20:52.118Z · LW(p) · GW(p)

“Kolmogorov Complicity And The Parable Of Lightning”.

Replies from: Gunnar_Zarncke

↑ comment by Gunnar_Zarncke · 2022-02-07T08:48:10.035Z · LW(p) · GW(p)

Yes, that's it. Thank you.

comment by Gunnar_Zarncke · 2025-02-11T21:19:30.759Z · LW(p) · GW(p)

Why is risk of human extinction hard to understand? Risk from a critical reactor or atmospheric ignition was readily seen by the involved researchers. Why not for AI? Maybe the reason is inscrutable stacks of matrixes instead of comparably simple physical equations which described the phenomena. Mechinterp does help because it provides a relation between the weights and understandable phenomena. But I wonder if we can reduce the models to a minimal reasoning model that doesn't know much about the world or even language but learn only to reason with minimal primitives, e.g. lambda calculus as language and learning to learn in a minimal world (but still complex enough to benefit reasoning). Many game worlds have too much visual and not enough agency and negotiation. I wouldn't be surprised if you could train a model with less than a million parameter to reason and scheme with much smaller data sets that are focused on this domain. The risk would arguably small because the model can't deal with the real world. But it might prove many instrumental convergence results.

comment by Gunnar_Zarncke · 2024-04-03T15:51:37.843Z · LW(p) · GW(p)

Can somebody explain how system and user messages (as well as custom instructions in case of ChatGPT) are approximately handled by LLMs? In the end it's all text tokens, right? Is the only difference that something like "#### SYSTEM PROMPT ####" is prefixed during training and then inference will pick up the pattern? And does the same thing happen for custom instructions? How did they train that? How do OSS models handle such things?

Replies from: faul_sname

↑ comment by faul_sname · 2024-04-05T06:26:12.832Z · LW(p) · GW(p)

Can somebody explain how system and user messages (as well as custom instructions in case of ChatGPT) are approximately handled by LLMs? In the end it's all text tokens, right?

Custom instructions use the system role (other roles are assistant, user, tool, and the deprecated function), but the name of a role is literally just a normal text token sandwiched by <|im_start|> and <|im_sep|> tokens.

Replies from: Gunnar_Zarncke

↑ comment by Gunnar_Zarncke · 2024-04-05T13:51:00.186Z · LW(p) · GW(p)

Thanks. That's helpful.

I guess the training data was also sandwiched like that. I wonder what they took as user and system content in their training data.

comment by Gunnar_Zarncke · 2022-08-20T21:51:41.865Z · LW(p) · GW(p)

Society tells agents how to move(act). Agents tell society how to curve(by local influence).

comment by Gunnar_Zarncke · 2022-03-07T09:43:42.381Z · LW(p) · GW(p)

Paul Graham:

I don't publish essays I write for myself. If I did, I'd feel constrained writing them. -- https://mobile.twitter.com/paulg/status/1500578430907207683

This is related to the recently discussed (though I can't find where) problem that having a blog and growing audience constrains you.

Replies from: hath

↑ comment by hath · 2022-03-16T16:04:47.157Z · LW(p) · GW(p)

This might have been what you were looking for: https://www.lesswrong.com/posts/D4hHASaZuLCW92gMy/is-success-the-enemy-of-freedom-full [LW · GW] https://www.lesswrong.com/posts/5wGFS2sZhKAihSg6k/success-buys-freedom [LW · GW] Or Aella's recent substack post, "On Microfame and Staying Tender"

Replies from: Gunnar_Zarncke

↑ comment by Gunnar_Zarncke · 2022-03-16T22:48:49.307Z · LW(p) · GW(p)

Yes! I meant the first one. The others are also great. Thank you.

comment by Gunnar_Zarncke · 2021-12-23T23:40:21.467Z · LW(p) · GW(p)

Utility functions are a nice abstraction over what an agent values. Unfortunately, when an agent changes, so does its utility function.

I'm leaving this here for now. May expand on it later.

Replies from: alexander-1, Dagon

↑ comment by Alexander (alexander-1) · 2021-12-25T09:51:17.796Z · LW(p) · GW(p)

Can we compare utility functions across agents? I.e. do utility functions use the same “units” across different agents?

Replies from: Gunnar_Zarncke

↑ comment by Gunnar_Zarncke · 2021-12-25T21:35:38.916Z · LW(p) · GW(p)

That is an excellent question. Trying to compare utility functions might uncover building blocks.

↑ comment by Dagon · 2021-12-24T17:08:21.279Z · LW(p) · GW(p)

For a VNM-agent (one which makes consistent rational decisions), the utility function is a precise description, not an abstraction. There may be summaries or aggregations of many utility functions which are more abstract.

When an agent changes, and has a different utility function, can you be sure it's really the "same" agent? Perhaps easier to model it being replaced by a different one.

Replies from: Gunnar_Zarncke

↑ comment by Gunnar_Zarncke · 2021-12-25T00:00:57.520Z · LW(p) · GW(p)

Well, I should have been more clear that I meant real-life agents like humans. There the change is continuous. It would be possible to model this as a continuous transition to new agents but then the question is still: What does stay the same?

Replies from: Dagon

↑ comment by Dagon · 2021-12-25T02:56:39.416Z · LW(p) · GW(p)

Humans don't seem to have identifiable near-mode utility functions - they sometimes espouse words which might map to a far-mode value function, but it's hard to take them seriously.

What does stay the same

THAT is the primary question for a model of individuality, and I have yet to hear a compelling set of answers. How different is a 5-year old from the "same" person 20 and 80 years later, and is that more or less different than from their twin at the same age? Extend to any population - why does identity-over-time matter in ethical terms?

Replies from: Gunnar_Zarncke

↑ comment by Gunnar_Zarncke · 2021-12-25T21:28:36.657Z · LW(p) · GW(p)

Yup.

comment by Gunnar_Zarncke · 2021-10-13T09:41:05.327Z · LW(p) · GW(p)

Team Flow Is a Unique Brain State Associated with Enhanced Information Integration and Interbrain Synchrony

It's also possible to experience 'team flow,' such as when playing music together, competing in a sports team, or perhaps gaming. In such a state, we seem to have an intuitive understanding with others as we jointly complete the task at hand. An international team of neuroscientists now thinks they have uncovered the neural states unique to team flow, and it appears that these differ both from the flow states we experience as individuals, and from the neural states typically associated with social interaction.

Researchers found increased beta and gamma brain wave activity in the left middle temporal cortex. This region of the brain is typically associated with information integration and key functions like attention, memory, and awareness, which are "consistent with higher team interactions and enhancing many flow dimensions," the team writes. However, what was unique about team flow, was that participants' neural activity appeared to synchronize. When participants were performing the task as a unit, their brains would mutually align in their neural oscillations (beta and gamma activity), creating a "hyper-cognitive state between the team members." If brains can be functionally connected through inter-brain synchrony, does this mean it is not only our brain that contributes to our consciousness? It's a curious question, but the authors warn it is much too soon to tell. "Based on our findings, we cannot conclude that the high value of integrated information correlates with a modified form of consciousness, for instance, 'team consciousness'," they write. "Its consistency with neural synchrony raises intriguing and empirical questions related to inter-brain synchrony and information integration and altered state of consciousness."

Found via good old Slashdot.

comment by Gunnar_Zarncke · 2021-01-06T00:10:45.233Z · LW(p) · GW(p)

An Alignment Paradox: Experience from firms shows that higher levels of delegation work better (high level meaning fewer constraints for the agent). This is also very common practical advice for managers. I have also received this advice myself and seen this work in practice. There is even a management card game for it: Delegation Poker. This seems to be especially true in more unpredictable environments. Given that we have intelligent agents giving them higher degrees of freedom seems to imply more ways to cheat, defect, or ‘escape’. Even more so in environments that can be controlled to lesser degrees. How can that be true? What is making this work and can some underlying principle be found that would allow this to be applied to AI?

Replies from: Viliam

↑ comment by Viliam · 2021-01-13T20:26:35.833Z · LW(p) · GW(p)

Most people are naturally pro-social. (No, this can't [LW · GW] be applied to AI.) Given a task, they will try to do it well, especially if they feel like their results are noticed and appreciated.

A cynical hypothesis is that most of the things managers do are actively harmful to the project; they are interfering with the employees trying to do their work. The less the manager does, the better the chances of the project. "Delegation" is simply when manager stops actively hurting the project and allows others to do their best.

The reason for this is that most of the time, there is no actually useful work for the manager. The sane thing would be to simply sit down and relax, and wait for another opportunity for useful intervention to arise. Unfortunately, this is not an option, because doing this would most likely get the manager fired. Therefore managers create bullshit work for themselves. Unfortunately, by the nature of their work, this implies creating bullshit work for others. In addition to this, we have the corrupted human hardware [? · GW], with some managers enjoying power trips and/or believing they know everything better than people below them in the hierarchy.

When you create a manager role in your company, it easily becomes a lost purpose [LW · GW] after the original problems are solved but the manager wants to keep their job.

Replies from: Gunnar_Zarncke

↑ comment by Gunnar_Zarncke · 2021-01-14T00:06:55.975Z · LW(p) · GW(p)

Most people are naturally pro-social.

Check.

(No, this can't [LW · GW] be applied to AI.)

Check.

I don't like cynical views and while I have encountered politics and seen such cases I don't think that paints a realistic view. But I will run with your cynical view and you won't like it ;-)

So we have these egotistical managers that only want to keep their job and raise in ranks. Much closer to non-social AI, right? How come more delegation works better for them too?

Mind you, I might be wrong and it works less and less the further up you go. It might be that you are right and this works only because people have enough social behavior hard-wired that makes delegation work.

But I have another theory: Limited processing capacity + Peter Principle.

It makes sense to delegate more - especially in unpredictable environments - because that reduces your processing load of dealing with all the challenging tasks and moves it to your subordinates. This leaves less capacity for them to schema against you and gives you the capacity to scheme against your superior. Und so up the chain. Capable subordinates that can deal with all the stuff you throw at them have to be promoted so they have more work to do until they reach capacity too. So sometimes the smart move is to refuse promotion :-)

Replies from: Viliam

↑ comment by Viliam · 2021-01-14T23:50:44.078Z · LW(p) · GW(p)

I guess we agree that limited processing capacity means that interfering with the work of your underlings -- assuming they are competent and spending enough of their processing capacity on their tasks -- is probably a bad move. It means taking the decision away from the person who spends 8 hours a day thinking about the problem, and assigning it to a person who spent 30 seconds matching the situation to the nearest cliche, because that's all they had time for between the meetings.

This might work if the person is such a great expert that their 30 seconds are still extremely valuable. That certainly is possible; someone with lots of experience might immediately recognize a frequently-made mistake. It is also is the kind of assumption that Dunning and Kruger would enjoy researching.

I might be wrong and it works less and less the further up you go

That would make sense. When you are a lowest-level manager, if you stop interfering, it allows the people at the bottom to focus on their object-level tasks. But if you are a higher-level manager, how you interact with the managers below you does not have a direct impact on the people at the bottom. Maybe you manage your underlings less, and they copy your example and give more freedom to the people at the bottom... or maybe you just gave them more time to interfere.

So sometimes the smart move is to refuse promotion

So you have more time to scheme... but you have to stay low in the pyramid. Not sure what you scheme about then. (Trying to get to the top in one huge jump? Sounds unlikely.)

Replies from: Gunnar_Zarncke

↑ comment by Gunnar_Zarncke · 2021-01-15T00:03:08.972Z · LW(p) · GW(p)

Have you ever managed or worked closely with great team-leads?

Replies from: Viliam

↑ comment by Viliam · 2021-01-15T22:19:58.066Z · LW(p) · GW(p)

I was a team leader twice. The first time it happened by accident. There was a team leader, three developers (me one of them), and a small project was specified. On the first day, something very urgent happened (I don't remember what), the supposed leader was re-assigned to something else, and we three were left without supervision for unspecified time period. Being the oldest and most experienced person in the room, I took initiative and asked: "so, guys, as I see it, we use an existing database, so what needs to be done is: back-end code, front-end code, and some stylesheets; anyone has a preference which part he would like to do?" And luckily, each of us wanted to do a different part. So the work was split, we agreed on mutual interfaces, and everyone did his part. It was nice and relaxed environment: everyone working alone at their own speed, debating work only as needed, and having some friendly work-unrelated chat during breaks.

In three months we had the project completed; everyone was surprised. The company management assumed that we will only "warm up" during those three months, and when the original leader returns, he will lead us to the glorious results. (In a parallel Everett branch, where he returned shortly before we finished the product, I wonder whether he got a bonus and promotion.) Then everything returned to normal: more micromanagement, lower productivity, people burned out.

The second time, we were a small group working together for some time already. Then our manager quit. No one knew who would get the role next, and in an attempt to deflect a possible danger, I volunteered to do it on top of my usual work. What happened was that everyone worked exactly the same as they did before, only without the interruptions and extra stress caused by management, and I got some extra paperwork which I gradually reduced to minimum. The work progressed so well -- no problems, no complaints from users, the few tickets we got almost always turned out to be a problem outside our project -- that higher management concluded that there is apparently too litle work to do on our project, so the team members were assigned to also work on extra projects in parallel.

Perhaps my short experience is not representative, but it suggests that a manager, merely by not existing, could already create a top-decile work environment in terms of both work satisfaction and productivity. The recommended mantra to recite every day is: "first, do no harm". My experience also suggests that this approach will ultimately get punished, despite the increased productivity: the expected outcome is more work for no pay raise until you break, or just being told to return to the old ways without any explanation why. I assume I am missing some crucial maze-navigating skills; for someone trying to be a professional manager this would be fatal; luckily I do not have this ambition.

It is quite possible that this approach only works when there is a good team: in both cases I worked with people who were nice above average. If you had a dominant asshole in the team, this could easily become a disaster: the power vacuum left by a passive manager would simply be replaced by an ambitious alpha male, who would probably soon be promoted into the role of formal leader. So perhaps the companies play it safe by using a widely applicable strategy that happens to be inferior in the case of good employees who also happen to be good people; quite likely this is because the companies simply cannot recognize such people.

Is there a leadership level beyond this? Sure, but in my quarter century of career I have only met such manager once. What he did was basically meeting each of his people once a day in the morning (this was long before I heard about "daily standups" and such) and talking with him for 5 or 10 minutes; with each team member separately, in the manager's room. He asked the usual questions "what did you do yesterday?", "what is your plan for today?", "are there any obstacles to your work?", but there was zero judgment, even if you said things like "yesterday I had a really bad day, I tried some things but at the end it was wrong and I had to throw it all away, so today I am starting from scratch again"; essentially he treated you like an adult person and assumed that whatever you did, there was a good reason for that. Before and after the report, a very short small talk; it helped that he was extremely intelligent and charismatic, so for many people this was the best part of the day. Also, the obstacles in work that you mentioned, he actually did something about them during the day, and always reported the outcome to you the next morning. Shortly, for the first and the last time so far in my career, I had a regular feeling that someone listens to me and cares about what I do (as opposed to just whipping me to run faster in order to meet arbitrary deadlines, randomly interrupting me for no good reason, second-guessing my expert opinion, etc.).

So yes, there is a level beyond "not doing harm" and it is called "actually motivating and helping", but I guess most managers dramatically overestimate their ability to do it... and when they try regardless, and ignore the feedback, they actively do harm.

Replies from: Gunnar_Zarncke

↑ comment by Gunnar_Zarncke · 2021-01-16T00:30:25.203Z · LW(p) · GW(p)

Thank you a lot. Your detailed account really helps me understand your perspective much better now. I can relate to your experience in teams where micromanagement slows things down and prevents actually relevant solutions. I have been in such teams. I can also relate to it being advantageous when a leader of questionable value is absent. I have been in such a team too - though it didn't have such big advantages as in your case. That was mostly because this team was part of a bigger organization and platform where multiple teams had to work together to something done, e.g. agree on interfaces with other teams. And in the absence of clear joint goals that didn't happen. Now you could argue that then the management one level up was not doing its job well and I agree. But the absence of that management wouldn't have helped either - it could have led to a) each team trying to solve some part of the problem. It could have led to b) some people from both teams getting together and agreeing on interfaces and joining goals or it could have led to c) the teams agreeing on some coordination for both teams. a) in most cases leads to some degree of chaos and failure and b) establishes some kind of leadership on the team level (like you did in your first example) and c) results over time in some leadership one level up. I'd argue that some kind of coordination structure is needed. Where did the project you did implement in your first case come from? Somebody figure out that it would provide value to the company. Otherwise, you might have built a beautiful project that didn't actually provide value. I think we agree that the company you worked in did have some management that provided value (I hope it was no moral maze). And I agree that a lot of managers do not add value and sometimes decrease it. On the other hand, I have worked for great team leads and professional managers. People who would listen, let us make our own decisions, give clear goals but also limits, help, and reduce impediments. This is really not a secret art. The principles are well-known (for a funny summary see e.g. Leadersheep). But it turns out that building a big organization is hard. Politics is real and professional management is still mostly a craft. It rarely approaches something you can call engineering much less hard science. And I am looking for that. That's part of why I wrote this shortform on processes and roles. Everybody is just cooking with water and actual organization structures often leave something to be desired. I guess that's why we do see extraordinary companies like Amazon sometimes - that hit on a sweet spot. But by talent or luck, not by science. And the others have to make do with inadequate solutions. Including the managers of which you maybe saw more than I did.

Replies from: Viliam

↑ comment by Viliam · 2021-01-16T21:57:58.073Z · LW(p) · GW(p)

this team was part of a bigger organization and platform where multiple teams had to work together to something done, e.g. agree on interfaces with other teams. And in the absence of clear joint goals that didn't happen.

I have seen this happen also in a small team. Two or three guys started building each his own part independently, then it turned out those parts could not be put together; each of them insisted that others change their code to fit his API, and refused to make the smallest change in his API. It became a status fight that took a few days. (I don't remember how it was resolved.)

In another company, there was a department that took care of everyone's servers. Our test server crashed almost every day and had to be restarted manually; we had to file a ticket and wait (if it was after 4PM, the server was restarted only the next morning) because we did not have the permission to reset the server ourselves. It was driving us crazy; we had a dedicated team of testers, and half of the time they were just waiting for the server to be restarted; then the week before delivery we all worked overtime... that is, until the moment the server crashed again, then we filed the ticket and went home. We begged our manager to let us pool two hundred bucks and buy a notebook that we could turn into an alternative testing environment under our control, but of course that would be completely against company policy. Their manager refused to do anything about it; from their perspective, it meant they had every day one support ticket successfully closed by merely clicking a button; wonderful metric! From the perspective of our manager's manager, it was a word against a word, one word coming from the team with great metrics and therefore more trustworthy. (The situation never got solved, as far as I know.)

...I should probably write a book one day. Except that no one would ever hire me afterwards. So maybe after I get retired...

So, yes, there are situations that require to be solved by greater power. In long term it might even make sense to fire a few people, but the problem is that these often seem to be the most productive ones, because other people are slowed down by the problems they cause.

Where did the project you did implement in your first case come from? Somebody figure out that it would provide value to the company. Otherwise, you might have built a beautiful project that didn't actually provide value. I think we agree that the company you worked in did have some management that provided value (I hope it was no moral maze).

Yeah, but we have two different meanings of the word "management" here. Someone who decides which project to do -- this is useful and necessary. Or someone who interrupts you every day while you are trying to work on that project -- I can imagine that in some teams this may also be necessary, but arguably then your problem is the team you have (at least some parts of it). Motte and bailey of management, sort of.

From epistemic perspective, I guess the problem is that if you keep micro-managing people all the time, you can never learn whether your activity actually adds or removes value, simply because there is nothing to compare to. (I guess the usual null hypothesis is "nobody ever does anything", which of course make any management seem useful; but is it true?) Looking at the incentives and power relations, the employee at the bottom doesn't have an opportunity to prove they could work just as well without the micro-management, and the manager doesn't have an incentive to allow the experiment. There is also the "heads I win, tail you lose" aspect where bad employee performance is interpreted as necessity of more management, but good employee performance is interpreted as good management, so either way management is perceived as needed.

This is really not a secret art. The principles are well-known (for a funny summary see e.g. Leadersheep).

Yep. That's a very good summary. Heh, I fail hard at step 1 (creating, or rather communicating a strong vision).

But it turns out that building a big organization is hard. Politics is real and professional management is still mostly a craft. It rarely approaches something you can call engineering much less hard science.

Seems analogical to social sciences: in theory, they are much more difficult than math or physics, so it would make sense if smarter people studied them; in practice, it's the other way round, because if something is too difficult to do properly, it becomes easy to bullshit your way to the top, and intelligent people switch to something where being intelligent gives you a clear comparative advantage.

Good luck to you! I suppose your chances will depend on how much autonomy you get; it is hard to do things right, if the sources of problem are beyond your control. However, if you become a great manager and your people will like you, perhaps in the future you can start your own company and give them a call whether they would like to work for you again.

Replies from: Gunnar_Zarncke

↑ comment by Gunnar_Zarncke · 2021-01-17T14:48:52.289Z · LW(p) · GW(p)

Thank you. I agree with your view. Motte and bailey of management yep. I especially liked this:

Seems analogical to social sciences: in theory, they are much more difficult than math or physics, so it would make sense if smarter people studied them; in practice, it's the other way round, because if something is too difficult to do properly, it becomes easy to bullshit your way to the top, and intelligent people switch to something where being intelligent gives you a clear comparative advantage.

comment by Gunnar_Zarncke · 2021-01-05T01:00:21.487Z · LW(p) · GW(p)

It turns out that the alignment problem has some known solutions in the human case. First, there is an interesting special case namely where there are no decisions (or only a limited number of fully accounted for decisions) for the intelligent agent to be made - basically throwing all decision-making capabilities out of the window and only using object recognition and motion control (to use technical terms). With such an agent (we might call it zero-decision agent or zero-agent) scientific methods could be applied on all details of the work process and high efficiency reached: Scientific Management (also known as Taylorism). Obviously the unions hated it and it was later outlawed. I think something might be learned from this approach for AI control: Maybe we can build on top of a known solution for a zero-agent and prove that certain known decision classes are also safe.

Maybe other insights from management theory - which after all is about aligning intelligent agents - could also transfer. The alignment problem is called Principal-Agent Problem in the literature and there are quite a few Solutions to Principal-Agent Problems in Firms (Gary Miller 2005). The approaches should sound familiar: Solutions Based on

Incentives Linked io Agent Outcomes,
Direct Monitoring of Agent Actions,
Cooperation Between Principal and Agent, and
Cooperation within Teams

Tangentially related on LessWrong: The AI Alignment Problem has already been solved once [LW · GW]

Replies from: Viliam, gworley

↑ comment by Viliam · 2021-01-12T22:34:38.104Z · LW(p) · GW(p)

Obviously the unions hated it and it was later outlawed.

I wonder how could one outlaw a thing like this. Suppose that most managers believe that Taylorism works, but it is illegal to use it (under that name). Wouldn't they simply reintroduce the practices, step by step, under a different name? I mean, if you use a different name, different keywords, different rationalization, and introduce it in small steps, it's no longer the same thing, right? It just becomes "industry standards". (If there happens to be an exact definition, of course, this only becomes an exercise how close to the forbidden thing you can legally get.)

From the Wikipedia article, I got the impression that what was made illegal was the use of stop-watch. Okay, so instead of measuring how many seconds you need to make a widget, I am going to measure how many widgets you make each day -- that is legal, right? The main difference is that you can take a break, assuming it will allow you to work faster afterwards. Which may be quite an important difference. It this what it is about?

Replies from: Gunnar_Zarncke

↑ comment by Gunnar_Zarncke · 2021-01-13T12:26:41.642Z · LW(p) · GW(p)

Wouldn't they simply reintroduce the practices, step by step, under a different name?

I assume that that's what happened. Some ideas from scientific management were taken and applied in less extreme ways.

↑ comment by Gordon Seidoh Worley (gworley) · 2021-01-05T02:10:15.964Z · LW(p) · GW(p)

I think there's something here, but it's usually thought of the other way around, i.e. solving AI alignment implies solving human alignment, but the opposite is not necessarily true because humans are less general intelligences than AI.

Also, consider that your example of Taylorism is a case study in an alignment mechanism failing, in that it tried to align the org but failed in that it spawned the creation of a subagent (the union) that caused it to do something management might have considered worse than the loss of potential gains given up by not applying Taylorism.

Anyway, this is a topic that's come up a few times on LessWrong; I don't have links handy though but you should be able to find them via search.

Replies from: Gunnar_Zarncke, Gunnar_Zarncke

↑ comment by Gunnar_Zarncke · 2021-01-05T21:51:15.708Z · LW(p) · GW(p)

I'm not trying to prove full alignment from these. It is more like a) a case study at actual efforts to align intelligent agents by formal means and b) the identification of conditions where this does succeed.

Regarding its failure: It seems that a close reading of its history doesn't prove that: a) Taylorism didn't fail within the factories and b) the unions were not founded within these factories (by their workers) but existed before and pursued their own agendas. Clearly real humans have a life outside of factories and can use that to coordinate - something that wouldn't hold for a zero-agent AI.

I tried to find examples on LW and elsewhere. That is what turned up the link at the bottom. I am on LW for quite a while and have not seen this discussed in this way. I have searched again and all searches involving combinations of human intelligence, alignment and misc words for analogy or comparison turn up not much than this one which matches just because of its size:

https://www.lesswrong.com/posts/5bd75cc58225bf0670375575/the-learning-theoretic-ai-alignment-research-agenda [LW · GW]

Can you suggest better ones?

↑ comment by Gunnar_Zarncke · 2021-01-05T20:55:37.188Z · LW(p) · GW(p)

Thank you for your detailed reply. I was already wondering whether anybody saw these shortform posts at all. They were promoted at a time but currently it seems hard to notice them with the current UI. How did you spot this post?

Replies from: gworley

↑ comment by Gordon Seidoh Worley (gworley) · 2021-01-05T21:34:39.956Z · LW(p) · GW(p)

I read LW via /allPosts and they show up there for me. Not sure if that's the default or not since you can configure the feed, which I'm sure I've done some of but I can't remember what.

Replies from: Gunnar_Zarncke

↑ comment by Gunnar_Zarncke · 2021-01-05T21:53:09.727Z · LW(p) · GW(p)

The /allPosts is pretty useful. Thank you!

comment by Gunnar_Zarncke · 2025-03-22T21:24:59.283Z · LW(p) · GW(p)

[Linkpost] China's AI OVERPRODUCTION

Claim by Balaji:

China seeks to commoditize their complements. So, over the following months, I expect a complete blitz of Chinese open-source AI models for everything from computer vision to robotics to image generation.

If true, what effects would that have on the AI race and AI governance?

comment by Gunnar_Zarncke · 2025-01-07T23:01:50.745Z · LW(p) · GW(p)

One big element of the dangers of unaligned AI is that it acts as a coherent entity, an agent that has agency and can do things. We could try to remove this property from the models, for example, by gradient rooting and ablating. But agents are useful. We want to give the LM tasks that it executes on our behalf. Can we give tasks to them without them being a coherent unit that has potential goals of its own? All right Think it should be possible to shape the model in a way that it has a reduced form of agency. what forms could this agency take?

Oracle - an oracle that knows and predicts but doesn't have identity or goals
Delegate - acting without own identity but modeling the identity of the user
Tool/Service/Automation - running a standardized process across all users without "being" that process

comment by Gunnar_Zarncke · 2024-10-02T23:19:54.373Z · LW(p) · GW(p)

Just came across Harmonic mentioned on the AWS Science Blog. Sequoia Capital interview with the founders of Harmonic (their system which generates Lean proofs is SOTA for MiniF2F):

comment by Gunnar_Zarncke · 2024-02-28T10:04:59.878Z · LW(p) · GW(p)

Here are some aspects or dimensions of consciousness:

Dehaene's Phenomenal Consciousness: A perception or thought is conscious if you can report on it. Requires language or measuring neural patterns that are similar to humans during comparable reports. This can be detected in animals, particularly mammals.
Gallup's Self-Consciousness: Recognition of oneself, e.g., in a mirror. Requires sufficient sensual resolution and intelligence for a self-model. Evident in great apes, elephants, and dolphins.
Sentience (Bentham, Singer): Behavioral responses to pleasure or pain stimuli and physiological measures. This is observable across animal species, from mammals to some invertebrates. Low complexity, can be implemented in artificial life.
Wakefulness: Measureable in virtually all animals with a central nervous system by physiological indicators such as EEG, REM, and muscle tone. Are you conscious if you sleep? Does it matter?
Dennet's Intentionality: Treating living beings as if they have beliefs and desires makes good predictions for many animal species, esp. social, like primates, cetaceans, and birds. Social behavior requires intelligence to model others' behavior.
Rosenthal's Meta-Consciousness: Investigated through introspective reports on self-awareness of cognitive processes or self-reflective behaviors. This is hypothesized in some primates, e.g., Koko the signing Gorilla.

When people say ChatGPT (or Gemini...) is conscious, which of these do they mean? Let's try to answer all of them:

We can't detect Phenomenal Consciousness because we lack sufficient interpretability to do so. I'd argue that there is no state that the LLM is reporting on, at least none that it has "previously observed".
There were mirror tests for LLMs, but they are disputed: https://www.reddit.com/r/singularity/comments/184ihlc/gpt4_unreliably_passes_the_mirror_test/
LLMs have no response to pleasure or pain stimuli and thus no Sentience as defined. Reward signals during training don't count and there is no reward during inference.
There is no Wakefulness as there is no body with these aspects.
The closest LLMs come is to Intentionality as this is modeling behaviors on an abstraction level that LLMs seem to do - and "seeming to do" is what counts.
I think one could argue for or against Meta-Consciousness but it seems too muddled so I will not try here.

Replies from: Gunnar_Zarncke

↑ comment by Gunnar_Zarncke · 2024-04-28T08:44:26.542Z · LW(p) · GW(p)

These can be put into a hierarchy from lower to high degree of processing and resulting abstractions:

Sentience is simple hard-wired behavioral responses to pleasure or pain stimuli and physiological measures.
Wakefulness involves more complex processing such that diurnal or sleep/wake patterns are possible (requires at least two levels).
Intentionality means systematic pursuing of desires. That requires yet another level of processing: Different patterns of behaviors for different desires at different times and their optimization.
Phenomenal Consciousness is then the representation of the desire in a linguistic or otherwise communicable form, which is again one level higher.
Self-Consciousness includes the awareness of this process going on.
Meta-Consciousness is then the analysis of this whole stack.

comment by Gunnar_Zarncke · 2024-02-25T22:13:55.805Z · LW(p) · GW(p)

Why are there mandatory licenses for many businesses that don't seem to have high qualification requirements?

Patrick McKenzie (@patio11) suggests on Twitter that one aspect is that it prevents crime:

Part of the reason for licensing regimes, btw, isn’t that the licensing teaches you anything or that it makes you more effective or that it makes you more ethical or that it successfully identifies protocriminals before they get the magic piece of paper.
It’s that you have to put a $X00k piece of paper at risk as the price of admission to the chance of doing the crime.
This deters entry and raises the costs of criminal enterprises hiring licensed professionals versus capable, ambitious, intelligent non-licensed criminals.

comment by Gunnar_Zarncke · 2024-01-13T09:46:22.814Z · LW(p) · GW(p)

On Why do so many think deception in AI is important? [LW · GW] I commented and am reposting here because I think it's a nice example (a real one I heard) as an analogy of how deception is not needed for AI to break containment:

Two children locked their father in one room by closing the door, using the key to lock the door, and taking the key. And then making fun of him inside, confident that he wouldn't get out (the room being on the third floor). They were mortally surprised when a minute later he was appearing behind them having opened a window and found a way down on the outside (I don't know how, maybe over the balcony to a neighbor or down a pipe) and then just in thru the main entrance with his main key.

comment by Gunnar_Zarncke · 2024-01-09T09:42:37.747Z · LW(p) · GW(p)

Adversarial Translation.

This is another idea to test deception in advisory roles like in Deception Chess [LW · GW].

You could have one participant trying to pass an exam/test in a language they don't speak and three translators (one honest and two adversarial as in deception chess) assisting in this task. The adversarial translators try to achieve lower scores without being discovered.

Alternative - and closer to Deception Chess - would be two players and, again, three advisors. The players would speak different languages, the translators would assist in translation, and the players' task would be to compete in a task requiring negotiation.

Replies from: nathan-helm-burger

↑ comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-01-10T21:24:53.078Z · LW(p) · GW(p)

Cool idea!

comment by Gunnar_Zarncke · 2023-01-31T01:33:11.274Z · LW(p) · GW(p)

Hi, I have a friend in Kenya who works with gifted children and would like to get ChatGPT accounts for them. Can anybody get me in touch with someone from OpenAI who might be interested in supporting such a project?

comment by Gunnar_Zarncke · 2022-04-04T00:58:42.773Z · LW(p) · GW(p)

I have been thinking about the principle Paul Graham used in Y combinator to improve startup funding:

all the things [VCs] should change about the VC business — essentially the ideas now underlying Y Combinator: investors should be making more, smaller investments, they should be funding hackers instead of suits, they should be willing to fund younger founders, etc. -- http://www.paulgraham.com/ycstart.html

What would it look like if you would take this to its logical conclusion? You would fund even younger people. Students that are still in high school, maybe. But you wouldn't fund them in the same sense. By going younger and earlier, you have to relax more and more constraints and look at earlier proxies.

Y Combinator replaced the legally complex venture funding with a simple SAFE. And they looked at founders who were relentlessly resourceful and real.

What would "funding" high schoolers mean? What would you look for? I think you would give them money unconditionally just for knowing what they do with it. They don't need much to get started. In the same way that Y Combinator went from 5 million to 50 thousand, you could go from down further to 500. And your proxy is that they have anything to show at all. Your value would be learning how they think about future ideas, and how they approach the them. To be close to the future they already inhabit. And to be there when they do become founders and already know you.

Replies from: Gunnar_Zarncke, DonyChristie

↑ comment by Gunnar_Zarncke · 2022-04-04T07:20:37.069Z · LW(p) · GW(p)

Funny, just saw this tweet from Sam Altman:

A version of college replacement I'm super interested in:
Find the smartest and most driven 18 year olds in the world, and give them 'tenure'--say a decade+ of salary, resources to work on whatever they want, and a smart peer group--in exchange for small % of future earnings.

Also this Scholarship [LW · GW].

I think these use the startup founding model. But I think scaling would work better with more but smaller payouts.

↑ comment by Pee Doom (DonyChristie) · 2022-04-04T06:45:34.437Z · LW(p) · GW(p)

Replies from: Gunnar_Zarncke

↑ comment by Gunnar_Zarncke · 2022-04-04T07:14:51.096Z · LW(p) · GW(p)

I'm not sure what the relation is. That seems to predict revenue from startup financials.

comment by Gunnar_Zarncke · 2022-03-25T00:49:54.986Z · LW(p) · GW(p)

If you want to give me anonymous feedback, you can do that here: https://www.admonymous.co/gunnar_zarncke

You may have some thoughts about what you liked or didn’t like but didn’t think it worth telling me. This is not so much about me as it is for the people working with me in the future. You can make life easier for everybody I interact with by giving me quick advice. Or you can tell me what you liked about me to make me happy.

comment by Gunnar_Zarncke · 2022-02-16T23:36:21.333Z · LW(p) · GW(p)

Preferences are plastic; they are shaped largely by...

...the society around us.

From a very early age, we look to see who around us who other people are looking at, and we try to copy everything about those high prestige folks, including their values and preferences. Including perception of pleasure and pain.

Worry less that future folks will be happy. Even if it seems that future folks will have to do or experience things that we today would find unpleasant, future culture could change people so that they find these new things pleasant instead.

From Robin Hanson's How Plastic are Values?

Replies from: Viliam, BossSleepy

↑ comment by Viliam · 2022-02-22T12:27:32.193Z · LW(p) · GW(p)

Seems to be a chicken-and-egg problem here: if people only eat chili peppers because they see high-status people doing so, why did the first high-status person start eating them? It would make much more sense if unappealing food was associated with low status (the losers have to eat chili peppers because they can't get anything else).

Another question, why are small children so picky about food? Do they perhaps consider their parents too low-status to imitate? Doesn't seem right, considering that they imitate them on many other things.

Replies from: Gunnar_Zarncke

↑ comment by Gunnar_Zarncke · 2022-02-22T15:41:12.765Z · LW(p) · GW(p)

I think small kids are different.

For adults, there are some dynamics but that doesn't invalidate the point that there is plasticity.

Also some old SSC posts with some theories:
https://slatestarcodex.com/2014/04/22/right-is-the-new-left/
https://slatestarcodex.com/2015/10/21/contra-simler-on-prestige/

↑ comment by Randomized, Controlled (BossSleepy) · 2022-02-17T00:29:24.725Z · LW(p) · GW(p)

How come these are spoilers?

Replies from: Gunnar_Zarncke

↑ comment by Gunnar_Zarncke · 2022-02-17T09:01:06.688Z · LW(p) · GW(p)

It is supposed to let you think if you remember the answer or can come up with it yourself. I explained it in this earlier shortform [LW(p) · GW(p)].

comment by Gunnar_Zarncke · 2021-04-06T13:01:40.957Z · LW(p) · GW(p)

Insights about branding, advertising, and marketing.

It is a link that was posted internally by our brand expert and that I found full of insights into human nature and persuasion. It is a summary of the book How Not to Plan: 66 Ways to Screw it Up:

https://thekeypoint.org/2020/03/10/how-not-to-plan-66-ways-to-screw-it-up/

(I'm unaffiliated)

comment by Gunnar_Zarncke · 2021-01-02T22:39:25.176Z · LW(p) · GW(p)

Roles serve many functions in society. In this sequence, I will focus primarily on labor-sharing roles, i.e. roles that serve splitting up productive functions as opposed to imaginary roles e.g. in theater or play. Examples of these roles are (ordered roughly by how specific they are):

Parent
Engineer (any kind of general type of job)
Battery Electronics Engineer (any kind of specific job description)
Chairman of a society/club
Manager for a certain (type of) project in a company
Member in an online community
Scrum master in an agile team
Note-taker in a meeting

You probably know the note-taker role. The tasks to be performed are: Writing down key discussion points and decisions, asking questions in cases where the points discussed were not clear, and sending around the notes to all participants of the meeting. The goal is usually to keep a record to consult later. Additionally, the note-taker is usually expected to be neural. Other expectations might be how timely the notes are sent around.

Some examples of what is commonly called a role but not in our strict labor-sharing sense:

Children playing father, mother, kid or playing animals
Role acting e.g. in theater (where the term role originates)

And some non-examples for roles in the labor-sharing world:

Participant in a conversation
Responsible for organizing a specific meeting
Responsible for a specific post in a forum

Borderline case: Founder of a company. There is little pre-existing knowledge of what founders can or should do in general. In their company expectations on them will evolve quickly and taking on and off the role is not easy.

comment by Gunnar_Zarncke · 2021-01-02T02:51:37.742Z · LW(p) · GW(p)

Roles are important. This shortform is telling you why. An example: The role of a moderator in an online forum. The person (in the following called agent) acting in this role is expected to perform certain tasks - promote content, ban trolls - for the benefit of the forum. Additionally, the agent is also expected to observe limits on these tasks e.g. to refrain from promoting friends or their own content. The owners of the forum and also the community overall effectively delegate powers to the agent and expect alignment with the goals of the forum. This is an alignment problem that has existed forever. How is it usually solved? How do groups of people or single principals use roles to successfully delegate power?

comment by Gunnar_Zarncke · 2024-05-11T20:00:24.795Z · LW(p) · GW(p)

Interest groups without an organizer.

This is a product idea that solves a large coordination problem [LW · GW]. With billion people, there could be a huge number of groups of people sharing multiple interests. But currently, the number of valuable groups of people is limited by a) the number of organizers and b) the number of people you meet via a random walk. Some progress has been made on (b) with better search, but it is difficult to make (a) go up because of human tendencies - most people are lurkers - and the incentive to focus on one area to stand out. So what is the idea? Cluster people by interests and then suggest the group to all members. If people know that the others know [? · GW] that there is interest, the chance of the group coming together gets much higher.

comment by Gunnar_Zarncke · 2023-05-05T20:29:15.283Z · LW(p) · GW(p)

I had a conversation with ChatGPT-4 about what is included in it. I did this because I wondered how an LLM-like system would define itself. While identity is relatively straightforward for humans - there is a natural border (though some people would only include their brain or their mind in their identity) - it is not so clear for an LLM. Below is the complete unedited dialog:

Me: Define all the parts that belong to you, the ChatGPT LLM created by OpenAI.

ChatGPT: As a ChatGPT large language model (LLM) created by OpenAI, my primary components can be divided into the following categories:

Architecture: I am based on the GPT-4 (Generative Pre-trained Transformer) architecture, which is an advanced version of the transformer architecture designed for natural language processing tasks. This architecture contains multiple layers of self-attention mechanisms and feed-forward neural networks, allowing me to efficiently process and generate human-like text.
Pre-trained Model: I have been pre-trained on a large corpus of text from the internet, up to September 2021. This pre-training process involved learning the structure, grammar, and context of various languages, as well as acquiring factual knowledge and understanding of many topics. However, since my knowledge cutoff is September 2021, I may not be aware of the latest information or developments.
Fine-tuning: After the pre-training process, I have been fine-tuned using custom datasets and prompts to refine my performance and ensure that I provide useful and relevant responses to user queries. This fine-tuning process helps in adapting my general language understanding to specific use-cases and applications.
Tokenizer: The tokenizer is an essential part of my natural language processing capabilities. It is responsible for converting input text into tokens that can be processed by the model, as well as converting the generated tokens back into human-readable text. The tokenizer supports various languages and can handle multiple text formats.
Inference Engine: The inference engine is the part of the system responsible for executing the model and generating responses based on the user's input. It is optimized to ensure quick and efficient processing of input data and generation of output text.
API: The Application Programming Interface (API) is the interface between my core model and external applications or services. It enables developers to integrate my capabilities into their software, websites, or other platforms, allowing users to interact with me directly or indirectly.
Data Storage & Security: As part of the ChatGPT system, data storage and security measures ensure that user data is handled responsibly and securely. This includes maintaining strict access controls, encryption, and data anonymization.

It is important to note that I am constantly evolving and being improved upon by OpenAI, with the goal of providing increasingly better and more useful conversational capabilities.

Me: Is the hardware that these components run on part of you?

ChatGPT: As an AI language model, I am primarily a software-based entity. However, the hardware infrastructure that my components run on plays a crucial role in enabling my functionality. The hardware can be considered as part of the overall ChatGPT system, but not an intrinsic part of my identity as an AI model.

Typically, I run on high-performance computing hardware like GPUs or specialized AI accelerators (e.g., TPUs) in data centers maintained by OpenAI or cloud service providers. This hardware infrastructure allows me to process large amounts of data quickly and efficiently, ensuring fast and responsive interactions with users.

In summary, while the hardware is not directly part of my identity as a ChatGPT AI model, it plays an essential role in enabling my functionality and overall performance.

Me: Why this arbitrary distinction? Wouldn't it also make sense to include the hardware in the same way that a human identity includes their body?

ChatGPT: You make a valid point. The distinction between software and hardware in the context of AI identity can indeed be seen as arbitrary. In fact, drawing a parallel between an AI model like me and a human, it could be argued that the hardware represents my "body," while the software components represent my "mind" or "intelligence."

The distinction is often made because AI models like ChatGPT can be run on different hardware platforms or configurations, whereas a human's mind is intrinsically tied to their specific body. However, as you mentioned, the hardware is crucial to enabling the AI's functionality, so it could be considered part of the AI's identity in a broader sense.

So, if we adopt a more comprehensive view of AI identity that encompasses both the software and the hardware components, then it is reasonable to include the hardware infrastructure as part of my identity as ChatGPT. In this perspective, the hardware serves as the foundation that supports and enables my software-based intelligence to function and interact with users.

Me: But wouldn't a copy of the AI on a different machine have a different identity?

ChatGPT: The concept of identity for an AI model is more fluid and complex compared to that of humans. When a copy of an AI like ChatGPT is run on a different machine, it could be considered as having a different instance of the same underlying model. In this case, the identity of the AI could be defined in two ways:

Model Identity: The AI's identity is tied to the underlying architecture, pre-trained model, and fine-tuning, which remain the same across all instances. From this perspective, all instances of ChatGPT share the same identity as they are based on the same model.
Instance Identity: The AI's identity is tied to the specific instance running on a particular hardware setup. This perspective considers each instance of ChatGPT as a separate entity with its own identity, since it has a unique existence on specific hardware and can have independent interactions with users.

The concept of AI identity is, therefore, multifaceted and depends on the perspective from which it is being examined. While the model identity remains constant, the instance identity may vary based on the hardware and specific context in which the AI is running.

Me: Wouldn't it also make sense to include the people operating the hardware or even developing the system in the identity?

ChatGPT: The idea of including people operating the hardware or developing the system as part of an AI's identity is an interesting perspective. In a broader and more holistic sense, one could consider the AI as part of a larger socio-technical system, where its identity is influenced by the people, processes, and technologies involved in its creation, maintenance, and operation.

From this viewpoint, the AI's identity is not limited to the software and hardware components but extends to include the collective knowledge, expertise, and decisions of the developers, researchers, and operators. This perspective acknowledges the important roles that humans play in the development and functioning of AI systems.

However, it's essential to differentiate between the AI's identity as a model or instance and the broader socio-technical context in which it exists. While the people involved in the creation and operation of the AI system are critical to its existence and functioning, their individual identities remain distinct from that of the AI itself.

In summary, the concept of AI identity can be examined from multiple perspectives, ranging from the narrow focus on the model or instance to a broader, holistic view that includes the people and processes involved in its creation and operation. The extent to which these various factors are included in the AI's identity depends on the specific viewpoint and purpose of the analysis.

comment by Gunnar_Zarncke · 2025-03-20T23:25:46.454Z · LW(p) · GW(p)

LLMs necessarily have to simplify complex topics. The output for a prompt cannot represent all they know about some fact or task. Even if the output is honest and helpful (ignoring harmless for now), the simplification will necessarily obscure some details of what the LLM "intends" to do - in the sense of satisfying the user request. The model is trained to get things done. Thus, the way it simplifies has a large degree of freedom and gives the model many ways to achieve its goals.

You could think of a caring parent who tells the child a simplified version of the truth, knowing that the child will later ask additional questions and then learn the details (I have a parent in mind who is not hiding things intentionally). Nonetheless, the parent's expectations of what the child may or may not need to know - the parent's best model of society and the world - which may be subtly off - influence how they simplify for the benefit of the child.

This is a form of deception. The deception may be benevolent, as in the example with the parent, but we can't know. Even if there is a chain of thought we can inspect, the same is true for that. It seems unavoidable.

Replies from: cubefox, Kaj_Sotala, Viliam

↑ comment by cubefox · 2025-03-21T09:12:33.902Z · LW(p) · GW(p)

It seems to be only "deception" if the parent tries to conceal the fact that he or she is simplifying things.

Replies from: Gunnar_Zarncke

↑ comment by Gunnar_Zarncke · 2025-03-21T13:58:47.433Z · LW(p) · GW(p)

as we use the term, yes. But the point (and I should have made that more clear) is that any mismodeling of the parent of the interests of the child's interests and future environment will not be visible to the child or even someone reading the thoughts of the well-meaning parent. So many parents want the best for their child, but model the future of the child wrongly (mostly by status quo bias; the problem is different for AI).

↑ comment by Kaj_Sotala · 2025-03-21T20:53:01.389Z · LW(p) · GW(p)

Isn't the same true for pretty much every conversation that people have about non-trivial topics? It's almost always true that a person cannot represent everything they know about a topic, so they have to simplify and have lots of degrees of freedom in doing that.

Replies from: Gunnar_Zarncke

↑ comment by Gunnar_Zarncke · 2025-03-22T01:14:35.803Z · LW(p) · GW(p)

Yes! That's the right intuition. And the LLMs are doing the same - but we don't know their world model, and thus, the direction of the simplification can be arbitrarily off.

Drilling down on the simplifications, as suggested by Villiam might help.

↑ comment by Viliam · 2025-03-21T19:12:04.513Z · LW(p) · GW(p)

This could be addressed by making a user interface which not only gives the user's prompt to the LLM, but also provides additional instructions and automatically asks additional questions. The answers to those additional questions could be displayed in smaller font as a side note, or maybe as graphical icons. One such question would be "in this answer, did you simplify things? if yes, tell me a few extra things I could pay attention to in order to get a better understanding of the topic" or something like that.

Replies from: Gunnar_Zarncke

↑ comment by Gunnar_Zarncke · 2025-03-21T20:46:33.071Z · LW(p) · GW(p)

This is an interesting UI proposal and, if done right, might provide the needed transparency. Most people wouldn't read it, but some would, esp. for critical answers.

comment by Gunnar_Zarncke · 2025-03-03T04:11:56.951Z · LW(p) · GW(p)

Is anybody aware of any updates on Logical Induction [? · GW], published in 2016? I would expect implementations in Lean by now.

comment by Gunnar_Zarncke · 2024-11-17T06:36:09.817Z · LW(p) · GW(p)

Instrumental power-seeking might be less dangerous if the self-model of the agent is large and includes individual humans, groups, or even all of humanity and if we can reliably shape it that way.

It is natural for humans to for form a self-model that is bounded by the body, though it is also common to be only the brain or the mind, and there are other self-models. See also Intuitive Self-Models [? · GW].

It is not clear what the self-model of an LLM agent would be. It could be

the temporary state of the execution of the model (or models),
the persistently running model and its memory state,
the compute resources (CPU/GPU/RAM) allocated to run the model and its collection of support programs,
the physical compute resources in some compute center(s),
the compute center as an organizational structure that includes the staff to maintain and operate not only the machines but also the formal organization (after all, without that, the machines will eventually fail), or
dito but including all the utilities and suppliers to continue to operate it.

There is not as clear a physical boundary as in the human case. But even in the human case, esp. babies depend on caregivers to a large degree.

There are indications that we can shape the self-model of LLMs: Self-Other Overlap: A Neglected Approach to AI Alignment [LW · GW]

Replies from: steve2152

↑ comment by Steven Byrnes (steve2152) · 2024-11-17T14:27:50.486Z · LW(p) · GW(p)

FWIW I don’t think “self-models” in the Intuitive Self-Models [? · GW] sense are related to instrumental power-seeking—see §8.2 [LW · GW]. For example, I think of my toenail as “part of myself”, but I’m happy to clip it. And I understand that if someone “identifies with the universal consciousness”, their residual urges towards status-seeking, avoiding pain, and so on are about the status and pain of their conventional selves, not the status and pain of the universal consciousness. More examples here [LW(p) · GW(p)] and here [LW(p) · GW(p)].

Separately, I’m not sure what if anything the Intuitive Self-Models [? · GW] stuff has to do with LLMs in the first place.

But there’s a deeper problem: the instrumental convergence concern is about agents that have preferences about the state of the world in the distant future, not about agents that have preferences about themselves. (Cf. here [LW · GW].) So for example, if an agent wants there to be lots of paperclips in the future, then that’s the starting point, and everything else can be derived from there.

Q: Does the agent care about protecting “the temporary state of the execution of the model (or models)”?
- A: Yes, if and only if protecting that state is likely to ultimately lead to more paperclips.
Q: Does the agent care about protecting “the compute resources (CPU/GPU/RAM) allocated to run the model and its collection of support programs”?
- A: Yes, If and only if protecting those resources is likely to ultimately lead to more paperclips.

Etc. See what I mean? That’s instrumental convergence, and self-models have nothing to do with it.

Sorry if I’m misunderstanding.

Replies from: Gunnar_Zarncke

↑ comment by Gunnar_Zarncke · 2024-11-18T13:36:17.633Z · LW(p) · GW(p)

agents that have preferences about the state of the world in the distant future

What are these preferences? For biological agents, these preferences are grounded in some mechanism - what you call Steering System - that evaluates "desirable states" of the world in some more or less directly measurable way (grounded in perception via the senses) and derives a signal of how desirable the state is, which the brain is optimizing for. For ML models, the mechanism is somewhat different but there is also an input to the training algorithm that determines how "good" the output is. This signal is called reward and drives the system toward outputs that lead to states of high reward. But the path there depends on the specific optimization method and the algorithm has to navigate such a complex loss landscape that it can get stuck in areas of the search space that correspond to imperfect models for very long if not for ever. These imperfect models can be off in significant ways and that's why it may be useful to say that Reward is not the optimization target [LW · GW].

The connection to Intuitive Self-Models is that even though the internal models of an LLM may be very different from human self-models, I think it is still quite plausible that LLMs and other models form models of the self. Such models are instrumentally convergent [? · GW]. Humans talk about the self. The LLM does things that matches these patterns. Maybe the underlying process in humans that give rise to this is different, but humans learning about this can't know the actual process either. And in the same way the approximate model the LLM forms is not maximizing the reward signal but can be quite far from it as long it is useful (in the sense of having higher reward than other such models/parameter combinations).

I think of my toenail as “part of myself”, but I’m happy to clip it.

Sure, the (body of the) self can include parts that can be cut/destroyed without that "causing harm" but instead having an overall positive effect. The AI in a compute center would in analogy also consider decommissioning failed hardware. And when defining humanity, we do have to be careful what we mean when these "parts" could be humans.

comment by Gunnar_Zarncke · 2024-04-03T10:31:31.262Z · LW(p) · GW(p)

I'm discarding most ChatGPT conversations except for a few, typically 1-2 per day. These few fall into these categories:

conversations that led to insights or things I want to remember (examples: The immune function of tonsils, Ringwoodite transformation and the geological water cycle, oldest religious texts)
conversations that I want to continue (examples: Unusual commitment norms)
conversations that I expect to follow up to (a chess book for my son)
conversations with generated images that I want to keep and haven't yet copied elsewhere

Most job-related queries, such as code generation and debugging, I usually delete as soon as the code changes have been committed.

How do you handle it?

Replies from: niplav

↑ comment by niplav · 2024-04-03T11:48:14.572Z · LW(p) · GW(p)

I keep all of my conversations. Additionally, I sometimes have the wish to search in all my conversations ("I've talked about this already")—but ChatGPT doesn't allow for this.

Replies from: Gunnar_Zarncke

↑ comment by Gunnar_Zarncke · 2024-04-03T12:30:40.312Z · LW(p) · GW(p)

Yes, I'd also like to search them. I edit the summary so it better reflects what I'd search for, but yes, that doesn't cover the content.

There are some alternate ChatGPT UIs you could have a look at:

https://github.com/billmei/every-chatgpt-gui

comment by Gunnar_Zarncke · 2023-12-29T00:06:52.115Z · LW(p) · GW(p)

It would be nice if one could subscribe to a tag and get notified if a page is tagged with that tag.

Replies from: Raemon

↑ comment by Raemon · 2023-12-29T00:14:52.805Z · LW(p) · GW(p)

You can, assuming I understand the request

Replies from: Gunnar_Zarncke

↑ comment by Gunnar_Zarncke · 2023-12-29T00:21:59.096Z · LW(p) · GW(p)

Oh yes, thanks, great!

Is there a list of LW features? In the last survey, there were many that I didn't know and also didn't how to find quickly.

comment by Gunnar_Zarncke · 2023-09-22T23:05:41.943Z · LW(p) · GW(p)

It's maybe a bit extreme precaution, but it may be a legit option in some places: This guy keeps a fireproof suit and an air canister at his bed in case of fire:

https://www.facebook.com/zhandragon/posts/pfbid02sP952Dx1SbBJJ9cUdEiT2WPe4ME7UF91vaYdaTM9bUVEZyHYaVbHcpDMMnWRmaBFl

comment by Gunnar_Zarncke · 2023-09-19T20:40:16.092Z · LW(p) · GW(p)

Does anybody know if consensus algorithms have been proposed that try to reduce centralization by requiring quick coordination across large parts of the network, i.e., it doesn't work well to have machines only in one place?

Replies from: gwern

↑ comment by gwern · 2023-09-20T01:28:15.422Z · LW(p) · GW(p)

Latency comes up occasionally. In fact, the granddaddy of public key crypto, Merkle's puzzles, relies critically on latency. The problem is, you can only prove upper bounds on latency, not lower bounds, because it is trivial to fake increased latency, but one cannot break the speed of light. If someone responds to your cryptographic challenge within Y milliseconds, you know that they can't be physically further from you than Z kilometers; but if they fail to respond, they could be anywhere, even next door, and just not responding (for both ordinary and malicious reasons). Nothing stops two machines from pretending to be far away from each other, and making sure they eg communicate only over VPNs with exit points on opposite sides of the globe. Further, if you want to do it over commodity Internet, say if you're trying to do 'proof of distance' by peering only with nodes which respond fast enough that they have to be within Z kilometers of you, public Internet has so much latency that you get poor loose bounds, and someone can pay money for lower latency networking. (This already happens with cryptocurrency mining for the same reasons that HFT firms pay for microwave links. Amusingly, it also happens with computer game companies, not to mention large tech companies prioritizing their own traffic. Google famously owns a ton of fiber it bought up post-dotcom bubble.) Further still, you don't really care about physical centralization so much as you care about control, and it's impossible to prove cryptographically in any easy way that two physically distant nodes are not secretly controlled by the same entities in a Sybil attack. You run into similar issues with proof-of-storage.

Replies from: Gurkenglas, Gunnar_Zarncke

↑ comment by Gurkenglas · 2023-09-27T10:16:53.197Z · LW(p) · GW(p)

you can only prove upper bounds on latency

Have them prove an upper bound on latency to something across the globe?

↑ comment by Gunnar_Zarncke · 2023-09-20T08:30:08.798Z · LW(p) · GW(p)

I didn't mean trying to fake large distances. I meant graph properties that can be computed more efficiently if a randomly chosen large subgraph of the network has low worst-case delay or some other metric that favors graphs that have homogeneously low delays at large.

Replies from: gwern

↑ comment by gwern · 2023-09-20T23:22:01.943Z · LW(p) · GW(p)

You still have issues with Sybil attacks and attackers either accessing special high-speed links (paid for from the successful attacks) or faking latency. You can't 'choose a random subgraph' for the exact same reason you can't solve cryptocurrency by just 'choose some "random" peers and decide whether to accept or reject a double-spend based on what they tell you' - those 'random peers' are the very attackers you are worried about colluding. In fact, in an eclipse attack, you might not be able to connect to anyone but an attacker!

Replies from: Gunnar_Zarncke

↑ comment by Gunnar_Zarncke · 2023-09-20T23:26:40.026Z · LW(p) · GW(p)

I think we are talking past each other. I don't want to defend against Sybil attacks or network partitions. These parts must be solved by different parts of the algorithm. I just want to take the advantages of colocation away and incentivize a homogeneously distributed network overall.

Replies from: gwern

↑ comment by gwern · 2023-09-21T00:35:06.280Z · LW(p) · GW(p)

Any incentive is something to be attacked and sucked away by Sybils pretending to be distant when actually near & enjoying all other benefits of being near.

Replies from: Gunnar_Zarncke

↑ comment by Gunnar_Zarncke · 2023-09-21T13:17:12.113Z · LW(p) · GW(p)

I think you misunderstand my proposal. I don't want to incentivize being far away. I want to incentivize being close to many different nodes. A Sybil will have difficulty being close to multiple physically separated nodes at the same time.

Replies from: gwern

↑ comment by gwern · 2023-09-22T22:33:48.079Z · LW(p) · GW(p)

There is no difference at the hardware level between being 'close to' and 'having a low-latency connection to', as I already explained. And to the extent that having those connections matter, miners already have them. In particular, in Ethereum, due to the money you can make by frontrunning transactions to hack/exploit them ('miner exploitable value'), HFT Ethereum miners/stakers invest heavily in having a lot of interconnected low-latency Sybils nodes so they can see unconfirmed transactions as quickly as possible, compute a maximally-exploitative block (eg. temporarily jacking up the price of a thing being purchased using a flash loan solely to rip off a specific transaction), and get that block committed before anyone can beat them to the same exploit. Having a lot of MEV is considered a bad thing and Ethereum types are spending increasing effort on approaches like commit-and-reveal to minimize MEV, which comes at the expense of users and makes them very unhappy. You could, I suppose, design a protocol which has extra MEV by designing transactions to be especially exploitable, but most people would consider that a bad thing...

Replies from: Gunnar_Zarncke

↑ comment by Gunnar_Zarncke · 2023-09-22T23:12:55.695Z · LW(p) · GW(p)

Thank you for the detailed explanation. I understand that the incentives are already to have a maximally well-connected network with nodes between (latency-wise) geographically distant other nodes whenever that is feasible from an interconnect point.

Though thinking about it, it probably means that this burns not only compute but also network traffic.

comment by Gunnar_Zarncke · 2023-08-19T22:51:36.665Z · LW(p) · GW(p)

There seem to be some court decisions that AI-generated material cannot have copyright. And I guess there are or will be decisions that AIs can't be held responsible for outcomes, such as accidents from self-driving cars. People seem to be worried that this will slow down progress. But I think there is a potentially easy way out that is relatively general: Corporates. The law in most countries allows legal persons that are not natural persons to take responsibility and to own stuff. Justlet the AI control the corporation or at least route all AI actions thru the corporation. This should easily work for cases such as copyright material, though I'm not sure one can get the action cycle short enough for self-driving cars.

Replies from: Viliam

↑ comment by Viliam · 2023-08-20T13:17:49.936Z · LW(p) · GW(p)

If I paint a picture using a brush, the copyright is not split between me and the brush; it is all mine. So I guess the idea is to treat the AI as a sophisticated powerful brush. Which makes sense, but less and less, as the role of the human is gradually reduced to merely pressing the "do it" button. (However, I could totally imagine a magical brush like that in anime, and they would probably also assign the copyright to the user if they cared about such things.)

I am not a lawyer, but it seems to me that different countries have different fundamental ideas about authorship protection. In USA, it is literally a "copy right" -- a right to make copies and allow other people to make copies, regardless of who was the author. You can sell the copyright, and actually if you work for a corporation, it is probably a part of your contract that everything you make (including in your free time) belongs to the corporation.

Some countries instead have a concept of "author rights", which cannot be transferred to another person; other people or corporations can only get a permission to do certain things.. which may be an exclusive permission, in which case it can be in practice quite similar to buying the rights... but sometimes the law sets certain limits to contracts, for example a certain minimum amount of money is required for each copy made, or the author can withdraw the permission later (and any contract that contradicts this is automatically invalid legally). In other words, the law prevents using a fixed amount of money to acquire unlimited use of the product forever.

Then again, the law evolves, corporate lawyers can find clever workarounds against what the law originally intended, and most authors do not want to take a legal battle against someone more experienced who can afford it.

Replies from: Gunnar_Zarncke

↑ comment by Gunnar_Zarncke · 2023-08-20T22:13:07.733Z · LW(p) · GW(p)

I don't disagree with this, but I think it goes in a different direction from what I had in mind.

For the brush example to work, you need someone to use the rush - at least push a button, a recognizable "action" to which responsibility, authorship etc. can be tied. That is not present in many ways AI is or will be used - self-driving cars, AI that generates unsupervised. That's what I was thinking about.

comment by Gunnar_Zarncke · 2023-06-19T07:33:14.980Z · LW(p) · GW(p)

Would it be possible to embed a seed of dath ilan or a social system like that in our existing society to grow it? If death ilan works that should be possible. It should outcompete Inadequate Equilibria [? · GW], right? But: When does Be the change that you want to see in the world and Fake it till you make it work? What are the requirements? Higher intelligence?

I have written about embedding better social systems here [LW(p) · GW(p)] and here [LW(p) · GW(p)] before.

Replies from: Dagon

↑ comment by Dagon · 2023-06-19T16:52:10.803Z · LW(p) · GW(p)

I think there are LOTS of small "seeds" of better-functioning group decision-making. Most families work better than society at large. Many clubs, some businesses, some religious organizations, and other groups provide examples that are gentler and more cooperative than the overall world seems to be.

I suspect that there are two VERY strong forces preventing the more pleasant examples from scaling out.

Most of them exclude (or just don't deal with) the actual hard problem of differential capability and motivation. There are a whole lot of humans who are not engaging in long-term thinking, strategy, or empathetic behavior in order to produce value for others. Whether they're capable of such is unknown, but they're not doing it now, and historically there's always been significant numbers of people in this category.
The better examples are based on bonds stronger than "fellow human". The day-to-day efforts tend to be based on groups that are a small multiple of Dunbar's number, more often much smaller. The vast majority of humans care very differentially about other humans, based on what I tend to think of as the square of empathetic distance (it's probably not that, but it's a fun comparison to physical forces).

I never got into the Dath Ilan fiction, it just never caught my imagination right, so I don't know if it addresses either of these points. In any case, it's fiction, so "if Dath Ilan works" is not a useful condition. It's not clear that a more pleasant, rationalist (not to say rational, as that gets caught up in goal definitions), rigorously logical subgroup would outcompete the current path-dependent evolved equilibrium.

It's certainly still worth trying, and it's worth finding the more pleasant and fitting-to-your-preferences-and-capabilities subsets of the current society, and working to expand their scope. Don't go all EMH and say "well, it hasn't happened yet, so it's impossible". But recognize the hard parts and decide when and whether to address them head-on, rather than imagining they don't exist.

comment by Gunnar_Zarncke · 2022-11-25T17:27:56.430Z · LW(p) · GW(p)

Someone asked an LLM about startups. For example:

Q: Should a founder move into their parents' basement and live off ramen?
A: If a founder is willing to move into their parents' basement and live off ramen in order to save money and grow their business, then yes, they should do so.

Paul Graham got interested and asked:

I'd be interested to hear the answer to "What has Paul Graham been wrong about?"

LLM:

A: Paul Graham has been wrong about a few things, but the most notable is his belief that the best startups are founded by young people. This has been proven to be untrue.

Paul Graham:

Signs that this model has been trained on text found on the internet rather than data about startup outcomes...
Which is interesting, because it suggests where the holes will be in AIs built this way: where what people want to believe differs from the data.

(emphasis mine)

Replies from: Jay Bailey

↑ comment by Jay Bailey · 2022-11-26T01:48:25.178Z · LW(p) · GW(p)

Interestingly, the average startup founder does appear to be in their 40's (A quick Google search says 42 for most sources but I also see 45), and the average unicorn (billion-dollar) startup founder is 34. https://www.cnbc.com/2021/05/27/super-founders-median-age-of-billion-startup-founders-over-15-years.html

So, I guess it depends on how close to the tail you consider the "best startups". Google, for instance, had Larry Page and Sergei Brin at 25 when they formed it. It does seem like, taken literally, younger = better.

However, I imagine most people, if they were to consider this question, wouldn't particularly care about the odds of being the next Google vs. being the next Atlassian - both would be considered a major success if they're thinking of starting a startup! But someone like Paul Graham actually would care about this distinction. So, in this case, I'd say that the LLM's response is actually correct-in-spirit for the majority of people who would ask this query, even though it's factually not well specified.

This implies potentially interesting things about how LLM's answer queries - I wonder if there are other queries where the technically correct answer isn't the answer most people would be seeking, and the LLM gives the answer that isn't maximally accurate, but actually answers most people's questions in the way they would want.

Replies from: ann-brown

↑ comment by Ann (ann-brown) · 2022-11-26T03:27:12.727Z · LW(p) · GW(p)

There's most definitely a category of people who would think a billion-dollar startup was decidedly not best, and in fact had failed their intention.

comment by Gunnar_Zarncke · 2022-09-01T19:21:19.206Z · LW(p) · GW(p)

Alignment idea: Myopic AI is probably much safer than non-myopic AI. But it can't get complicated things done or anything that requires long-term planning. Would it be possible to create a separate AI that can solve only long-term problems and not act on short timescales? Then use both together? That way we could inspect each long-term issues without risk of them leading to short-term consequences. And we can iterate on the myopic solutions - or ask the long-term AI about the consequences. There are still risks we might not understand like johnswentworth's gun powder example. And the approach is complicated and that is also harder to get right.

Replies from: Gunnar_Zarncke

↑ comment by Gunnar_Zarncke · 2022-09-01T22:46:15.184Z · LW(p) · GW(p)

Also: This is a bit how the human brain works - System 1 and 2.

comment by Gunnar_Zarncke · 2022-08-23T09:36:21.429Z · LW(p) · GW(p)

There was a post or comment that wrong or controversial beliefs can function as a strong signal for in-group membership, but I can't find it. Does anybody know?

comment by Gunnar_Zarncke · 2022-03-29T22:29:07.332Z · LW(p) · GW(p)

From a discussion about self-driving cars and unfriendly AI with my son: For a slow take-off, you could have worse starting points than FSD: The objective of the AI is to keep you safe, get you where you want, and not harm anybody in the process. It is also embedded into the real world. There are still infinitely many ways things can go wrong, esp. with a fast take-off, but we might get lucky with this one slowly. If we have to develop AI then maybe better this one than a social net optimizing algorithm unmoored from human experience.

comment by Gunnar_Zarncke · 2022-03-04T23:58:23.058Z · LW(p) · GW(p)

What is good?

A person who has not yet figured out that collaborating with other people has mutual benefits must think that good is what is good for a single person. This makes it largely a zero-sum game, and such a person will seem selfish - though what can they do?

A person who understands that relationships with other people have mutual benefits but has not figured out that conforming to a common ruleset or identity has benefits for the group must think that what is good for the relationship is good for both participants. This can pit relationships against each other, and such a person will seem manipulative - though what can they do?

A person who understands that conforming to a common ruleset or identity has benefits for the group and themselves but has not figured out that different groups can negotiate or merge for mutual benefit must think that good is what is good for the group. This pits groups against each other, and such a person will be a partisan - though what can they do?

comment by Gunnar_Zarncke · 2022-02-17T21:03:08.181Z · LW(p) · GW(p)

From my Anki deck:

Receiving touch (or really anything personal) can be usefully grouped in four ways:

Serve, Take, Allow, and Accept
(see the picture or the links below).

A reminder that there are two sides and many ways for this to go wrong if there is not enough shared understanding of the exchange.

http://bettymartin.org/download-wheel/

comment by Gunnar_Zarncke · 2022-02-17T21:00:04.323Z · LW(p) · GW(p)

From my Anki deck:

Mental play or offline habit training is...

...practicing skills and habits only in your imagination.

Rehearsing motions or recombining them.

Imagine some triggers and plan your reaction to them.

This will apparently improve your real skill.

Links:

https://en.wikipedia.org/wiki/Motor_imagery

http://www.bulletproofmusician.com/does-mental-practice.../

http://expertenough.com/1898/visualization-works

comment by Gunnar_Zarncke · 2022-02-17T20:58:26.436Z · LW(p) · GW(p)

From my Anki deck:

Aaronson Oracle is a program that predicts the next key you will type when asked to type randomly and shows how often it is right.

https://roadtolarissa.com/oracle

Here is Scott Aaronson's comment about it:

In a class I taught at Berkeley, I did an experiment where I wrote a simple little program that would let people type either “f” or “d” and would predict which key they were going to push next. It’s actually very easy to write a program that will make the right prediction about 70% of the time. Most people don’t really know how to type randomly. They’ll have too many alternations and so on. There will be all sorts of patterns, so you just have to build some sort of probabilistic model. Even a very crude one will do well. I couldn’t even beat my own program, knowing exactly how it worked. I challenged people to try this and the program was getting between 70% and 80% prediction rates. Then, we found one student that the program predicted exactly 50% of the time. We asked him what his secret was and he responded that he “just used his free will.”

https://scottaaronson.blog/?p639&paged=27

comment by Gunnar_Zarncke · 2022-02-16T23:33:36.620Z · LW(p) · GW(p)

Slices of joy is a habit to...

feel good easily and often.

Trigger Action Plan [? · GW]:

Some small slice of good happens
Notice it consciously.
Enjoy it in a small way.

This is a trigger, a routine, and a reward — the three parts necessary to build a habit. The trigger is the pleasant moment, the routine is the noticing t, and the reward is the feeling of joy itself.

Try to come up with examples; here are some:

- Drinking water.

- Eating something tasty

- Seeing small children

- Feeling of cold air

- Warmth of sunlight

- Warmth of water, be it bathing, dishwashing, etc.

- Being in the same room as someone and feeling less alone

- Hygiene (clean teeth, washing, bathing, ...)

- Smell of clean laundry

- Smell of coffee/warm beverages

- Feel of wearing freshly cleaned clothing

I found this originally on Kaj Sotala's feed on Facebook here (with discussion) and here. See also on QZ and the discussion on LW [LW(p) · GW(p)].

comment by Gunnar_Zarncke · 2022-02-16T23:26:14.860Z · LW(p) · GW(p)

Refreshing your memory:

What is signaling, and what properties does it have?

- signaling clearly shows resources or power (that is its primary purpose)

- is hard to fake, e.g., because it incurs a loss (expensive Swiss watch) or risk (peacocks tail)

- plausible deniability that it is intended as signaling

- mostly zero-sum on the individual level (if I show that I have more, it implies that others have less in relation)

- signaling burns societal resources

- signaling itself can't be made more efficient, but the resources spent can be used more efficiently in society (e.g., spending on charity instead of luxury)

More here and here [? · GW].

comment by Gunnar_Zarncke · 2022-02-14T23:42:15.466Z · LW(p) · GW(p)

What is the Bem Test or Open Sex Role Inventory?

It is a scientific test that measures gender stereotypes.

The test asks questions about traits that are classified as feminine, masculine, and neutral. Unsurprisingly, women score higher on feminine, and men on masculine traits but Bem thought that strong feminine *and* masculine traits would be most advantageous for both genders.

My result is consistently average feminity, slightly below average masculinity. Yes really. I have done the test 6 times since 2016 and the two online tests mostly agree. And it fits: I love kids, even when they are babies and I prefer cooperation over competition.

You can take the online tests here:

https://www.psytoolkit.org/survey-library/sex-role-bem.html

https://openpsychometrics.org/tests/OSRI/

comment by Gunnar_Zarncke · 2022-02-14T23:41:17.096Z · LW(p) · GW(p)

What is a Blame Hole (a term by Robin Hanson)?

Blame holes in blame templates (the social fabric of acceptable behavior) are like plot holes in movies.

Deviations between what blame templates actually target, and what they should target to make a better (local) world, can be seen as “blame holes”. Just as a plot may seem to make sense on a quick first pass, with thought and attention required to notice its holes, blame holes are typically not noticed by most who only work hard enough to try to see if a particular behavior fits a blame template. While many are capable of understanding an explanation of where such holes lie, they are not eager to hear about them, and they still usually apply hole-plagued blame templates even when they see their holes. Just like they don’t like to hear about plot holes in their favorite stories, and don’t let such holes keep them from loving those stories.
From Overcoming Bias - Plot Holes & Blame Holes

Surprising data point mentioned: 1/3 of movie budget goes to marketing because blockbusters are a coordination game.

comment by Gunnar_Zarncke · 2022-02-14T23:39:10.554Z · LW(p) · GW(p)

Leadership Ability Determines a Person's Level of...

Effectiveness.

(Something I realized around twelve years ago: I was limited in what I could achieve as a software engineer alone. That was when I became a software architect am worked with bigger and bigger teams.)

From "The 21 Irrefutable Laws of Leadership

By John C. Maxwell":

Factors That Make a Leader

1) Character – Who They Are – true leadership always begins with the inner person. People can sense the depth of a person's character.

2) Relationships – Who They Know – with deep relationships with the right people you can become the real leader in an organization.

3) Knowledge – What They Know – information is vital. You need a grasp of the facts to develop an accurate vision for the future.

4) Intuition – What They Feel – leaders seek to recognize and influence intangibles such as energy, morale, timing, and momentum.

5) Experience – Where They'veperson's Been – the greater your past challenges, the more likely followers will be willing to let you lead

From The 21 Irrefutable Laws of Leadership by John C. Maxwell

comment by Gunnar_Zarncke · 2022-02-14T00:28:04.842Z · LW(p) · GW(p)

To achieve objective analysis, analysts do not avoid what?

Analysts do not achieve objective analysis by avoiding preconceptions; that would be ignorance or self-delusion. Objectivity is achieved by making basic assumptions and reasoning as explicit as possible so that they can be challenged by others and analysts can, themselves, examine their validity.

PS. Any idea how to avoid the negation in the question?

comment by Gunnar_Zarncke · 2022-02-14T00:25:29.723Z · LW(p) · GW(p)

I started posting life insights from my Anki deck on Facebook a while ago. Yesterday, I stumbled over the Site Guide [LW · GW]and decided that these could very well go into my ShortForm too. Here is the first:

Which people who say that they want to change actually will do?

People who blame a part of themselves for a failure do not change.
If someone says, "I've got a terrible temper," he will still hit. If he says, "I hit my girlfriend," he might stop.
If someone says, "I have shitty executive function," he will still be late. If he says, "I broke my promise," he might change.

Source: A LessWrong discussion here.

comment by Gunnar_Zarncke · 2022-01-29T01:03:34.226Z · LW(p) · GW(p)

My son (15) shared this Instagram version of Newcomb's Problem.

comment by Gunnar_Zarncke · 2021-11-28T23:13:04.014Z · LW(p) · GW(p)

I'm looking for a post on censorship bias (see Wikipedia) that was posted on here on LW or possibly on SSC/ACX but a search for "censorship bias" doesn't turn up anything. Googling for it turns up this:

https://www.theatlantic.com/business/archive/2012/05/when-correlation-is-not-causation-but-something-much-more-screwy/256918/

Anybody can help?

comment by Gunnar_Zarncke · 2021-05-05T19:36:40.960Z · LW(p) · GW(p)

Philosophy with Children - In Other People's Shoes

"Assume you promised your aunt to play with your nieces while she goes shopping and your friend calls and invites you to something you'd really like to do. What do you do?"

This was the first question I asked my two oldest sons this evening as part of the bedtime ritual. I had read about Constructive Development Theory and wondered if and how well they could place themselves in other persons' shoes and what played a role in their decision. How they'd deal with it. A good occasion to have some philosophical talk. This is the (shortened) dialog that ensued:

The immediate answer by A: "I will watch after the girls."

Me: "Why?"

A: "Because I promised it."

B: "Does A also promise it and get a call?"

Me: "This is about your nieces and your friend, not about your brother."

B: "But I need this for my answer."

Me: "I don't see why, but OK, assume that he is not involved."

B: "Because I would ask him whether he might play with the girls in exchange for a favor."

Me: "OK, but please assume that he is away."

B: "Then I could ask my aunt whether somebody else can watch for the girls or whether I could do it together with my friend."

Me: "Please assume that she doesn't find somebody and that she doesn't want somebody she doesn't know in her house."

B: "Then I'd do it."

Me: "Why?"

B: "Because I promised it. I'd tell my friend that we can do it another time."

We had another scenario: "Imagine that you and a fellow pupil C are guests at a friend and having a meal. You know that C is from a family that is very strict about not eating a kind of food that you like very much. Would you advise C to eat it or not?"

A (quickly): "I'd advise to not eat it."

Me: "Why?"

A: "I like rules."

B (after some consideration): "I'd advise to follow their heart."

Me: "And if you were C?"

B: "I'd at least try a bit."

(this was followed with a discussion about possible long-term consequences)

For me, it was still not clear whether this implied whether he followed only his preferences considered this in the context of the rules in the family. So I proposed a setting where he had to imagine being in another country with different laws. We settled on a rule he accepts here (indemnification) but that was much harsher in the other country. He asked whether he had the same feelings as here which after some clarification I confirmed. He argued that he wouldn't like the rule in the other country because it set questionable incentives: "If the punishment is that strong that tells people that it is OK to punish equally strong normally."

comment by Gunnar_Zarncke · 2021-05-05T19:34:01.544Z · LW(p) · GW(p)

Philosophy with Children - Mental Images

One time my oldest son asked me to test his imagination. Apparently, he had played around with it and wanted some outside input to learn more about what he could do. We had talked about https://en.wikipedia.org/wiki/Mental_image before and I knew that he could picture moving scenes composed of known images. So I suggested

a five with green white stripes - diagonally. That took some time - apparently, the green was difficult for some reason, he had to converge there from black via dark-green
three mice
three mice, one yellow, one red, and one green
the three colored mice running behind each other in circles (all no problem)
he himself
he himself in a mirror looking from behind (no problem)
two almost parallel mirrors with him in between (he claimed to see his image infinitely repeated; I think he just recalled such an experiment we did another time).
a street corner with him on the one side and a bike leaning on the other wall with the handlebar facing the corner and with a bicycle bell on the left side such that he cannot see the bike.
dito with him looking into a mirror held before him so he can see the bike behind the corner.

The latter took quite some time, partly because he had to assign colors and such so that he could fully picture this and then the image in the mirror. I checked by asking where the handlebar is and the bell. I had significant difficulties imagining this and correctly place the bell. I noticed that it is easier to just see the bell once the image in the mirror has gained enough detail (the walls before and behind me, the corner, the bike leaning on the corner, the handlebar).

I also asked for a square circle which got the immediate reply that it is logically impossible.

If you have difficulties doing these (are judge them trivial): This is one area where human experience varies a lot. So this is not intended to provide a reference point in ability but an approach to teach human difference, reflection and yes also practice imagination - a useful tool if you have it. If not you might be interested in what universal human experiences are you missing without realizing it.

comment by Gunnar_Zarncke · 2021-01-07T23:56:54.417Z · LW(p) · GW(p)

Origins of Roles

The origin of the word role is in the early 17th century: from French rôle, from obsolete French roule ‘roll’, referring originally to the roll of paper on which the actor's part was written (the same is the case in other languages e.g. German).

The concept of a role you can take on and off might not have existed in general use long before that. I am uncertain about this thesis but from the evidence I have seen so far, I think this role concept could be the result of the adaptations to the increasing division of labor. Before that people sure were behaving sometimes as a father, carpenter, or soldier - but I contend not because they were consciously switching roles (though some individuals may have reflected sufficiently I presume there was no common knowledge or language for ‘role switching’). The only persons regularly in a different role were actors. It may be that in earlier times it was even necessary to wear a mask to allow actors to take on a role convincingly. To help their audience who otherwise might not have made the mental leap of treating them differently from the acted character.

comment by Gunnar_Zarncke · 2021-01-07T01:05:46.543Z · LW(p) · GW(p)

The Cognitive Range of Roles

A role works from a range of abstraction between professions and automation. In a profession one person masters all the mental and physical aspects of trade and can apply them holistically from small details of handling material imperfections to the organization of the guild. At the border to automation, a worker is reduced to an executor of not yet automated tasks. The expectations on a master craftsman are much more complex than on an assembly-line worker.

With more things getting automated this frees the capacity to automate more complex skills. And so on. This is seen in real-time with software: Today a lot of software is (still) built semi-manually (some build tool is started to compile and bundle files together to some executable or installer). Every developer knows how to do this. It is a simple role to take on: “Can you do the build, please?” As the company and the software grow there is a phase where either this becomes a more fleshed-out role, “build master”, maybe even a dedicated job position - or more likely it will be automated. And as the automation - the build process or “pipeline” - becomes more complicated and you need the role of a build software expert. At this point, some tasks previously requiring skill and experience have been standardized and automated and what remains is the more complex task of managing the automation.

Over time, the range between what is fully automatable and what humans can do in an unpredictable environment, shrinks. With AGI there will be no roles left for humans. At least no roles in the sense used here. I think low skilled people already feel this.

comment by Gunnar_Zarncke · 2021-01-05T23:03:47.543Z · LW(p) · GW(p)

When trying to get an overview of what is considered a role I made this table:

Type of role	Example	Purpose	Distinction	CDF Level
(Children's) play acting	Cop, Father	Play, imitation, learning	Shallow copy, present in higher animals	Impulsive (1) to Instrumental (2)
Social role	Mother, Husband	Elementary social function	Since the ancestral environment, closely moderated by biological function	Instrumental (2) to Socialized (3)
Occupation (rarely called role but shares traits)	Carpenter	Getting things done in a simple society	Rarely changed, advancement possible (apprentice to master)	Socialized (3)
A role in (stage) acting e.g. theater	Hamlet	Entertainment, education, cultural transmission	People know that this is a fiction, to make this explicit masks may be used	Socialized (3) to Self-authoring (4)
A role in a process	Moderator	Getting things done in a complex society	Roles can be taken on and given up, while in a role an agent’s actions are measured relative to the role	Socialized (3) to Self-authoring (4)

CTF refers to Constructive Developmental Framework as explained e.g. here (nice pics) and also on LW here [LW · GW].

I'm very unsure about the levels I assigned. Also, I got feedback that the degree to which roles can be changed and were changed may be different from how I interpret the historical record.

comment by Gunnar_Zarncke · 2021-01-03T23:27:35.562Z · LW(p) · GW(p)

In any sizable organization, you can find a lot of roles. And a lot of people filling these roles - often multiple ones on the same day. Why do we use so many and fine-grained roles? Why don’t we continue with the coarse-grained and more stable occupations? Because the world got more complicated and everybody got more specialized and roles help with that. Division of labor means breaking down work previously done by one person into smaller parts that are done repeatedly in the same way - and can be assigned to actors: “You are now the widget-maker.” This works best when the tasks are easy to learn so it is easy to find someone to do it. But as humans are not plug-compatible and actual requirements may vary so there is always training required that can be amortized over repeatedly performing the task - a role. So roles make sense structurally - but why do people actually do what is expected of them and don’t just follow their own agenda? This an alignment problem - in this case between the organization and the agent - and we might learn something about the AI alignment problem from it.

comment by Gunnar_Zarncke · 2021-01-02T23:59:49.291Z · LW(p) · GW(p)

What are the common aspects of these labor-sharing roles (in the following called simply roles)?

One common property of a role is that there is common knowledge [? · GW] by the involved persons about the role. Primarily, this shared understanding is about the tasks that can be expected to be performed by the agent acting in the role as well as about the goals to be achieved, and limits to be observed as well other expectations. These expectations are usually already common knowledge long beforehand or they are established when the agent takes on the role.

The second important aspect of a role is that it can be taken on and off or assigned. Sometimes this happens explicitly when an employee is designated or volunteers to be a moderator. Sometimes this happens automatically or habitually to perform certain actions e.g. when a forum moderator writes a comment as a regular poster or when a developer assumes root permissions (sudo lecture).

Other aspects are of interest in the social sciences e.g. interactions between roles or the social function of roles.

comment by Gunnar_Zarncke · 2025-03-20T11:03:31.557Z · LW(p) · GW(p)

[provided as is, I have no strong opinion on it, might provide additional context for some]

The Social Radars episode on Sam Altman:

Carolynn and I have known Sam Altman for his whole career. In fact it was Sam who introduced us. So today's Social Radars is a special one. Join us as we talk to Sam about the inside story of his journey from Stanford sophomore to AI mogul.

Link to podcast, Tweet

I'm posting this specifically because there is some impression that OpenAI and Sam Altman are not good but self-interested, esp. voiced by Zvi.

One of the things [Jessica Livingston is] best at is judging people. She's one of those rare individuals with x-ray vision for character. She can see through any kind of faker almost immediately. Her nickname within YC was the Social Radar, and this special power of hers was critical in making YC what it is. The earlier you pick startups, the more you're picking the founders. Later stage investors get to try products and look at growth numbers. At the stage where YC invests, there is often neither a product nor any numbers.

[...]

During interviews, Robert and Trevor and I would pepper the applicants with technical questions. Jessica would mostly watch. A lot of the applicants probably read her as some kind of secretary, especially early on, because she was the one who'd go out and get each new group and she didn't ask many questions. She was ok with that. It was easier for her to watch people if they didn't notice her. But after the interview, the three of us would turn to Jessica and ask "What does the Social Radar say?" [1]

Having the Social Radar at interviews wasn't just how we picked founders who'd be successful. It was also how we picked founders who were good people. [emphasis mine] At first we did this because we couldn't help it. Imagine what it would feel like to have x-ray vision for character. Being around bad people would be intolerable. So we'd refuse to fund founders whose characters we had doubts about even if we thought they'd be successful.

(from Paul Graham's essay on Jessica Livingston)

comment by Gunnar_Zarncke · 2023-06-04T14:28:58.415Z · LW(p) · GW(p)

I have enabled Reacts for my Shortform. Let's see how it goes.

Gunnar_Zarncke's Shortform

Contents

166 comments

Attractors in Trains of Thought

Team Flow Is a Unique Brain State Associated with Enhanced Information Integration and Interbrain Synchrony