How Business Solved (?) the Human Alignment Problem

post by Gianluca Calcagni (gianluca-calcagni) · 2024-12-31T20:39:59.067Z · LW · GW · 1 comments

Contents

    Is This Relevant for AI Safety?
  Human Workers vs Robots
  Picking a Specific Context
    Best Practices in Team Structure
    TLDR;
  Some Lessons
    Product Owners & Project Managers
    Business Analysts
    Architects & SMEs
    Developers & Testers
    The Future of Human Work
  Further Links
  Who I am
  Revision History
None
1 comment

When I looked at mesa-optimization [? · GW] for the first time, my mind immediately associated it with a familiar problem in business: human individuals may not be “mesa optimizers” in a strict sense, but they can act as optimizers and they are expected to do so when a manager delegates (=base optimization) some job (=base objective).
Hence my impression was that alignment in mesa-optimization would be somehow connected to alignment in job-delegation.

Humans have been dealing with the latter problem for millions of years[1], and corporations have been dealing with it for centuries. What did they learn?

The purpose of this post is to show a few actual best practices when delegating business projects, especially in terms of team structure and roles. Towards the end of the post, I am going to discuss how AI Agents may change the current status quo of this job sector.

Is This Relevant for AI Safety?

In any case, I hope this post will pique your curiosity and it will be an enjoyable and fruitful reading.

Remark: in the following, for simplicity, I am describing all AI models as “robots” - it doesn’t matter if the robots are physical or not.

Human Workers vs Robots

There are some obvious ways in which humans workers will behave differently in respect to a robot that is built as an (agentic) optimizer:

  1. (Most) Humans care about the well-being of humanity [? · GW].
  2. Human workers wouldn’t obsess into filling the universe with paperclips [? · GW].
  3. Human workers are not rational agents with a definite utility function [? · GW].
  4. Human workers don’t always optimize [? · GW] to the best of their abilities, even if they could.
  5. Human workers have physiological needs, social needs, self-actualization needs etc.
  6. Finally, human workers demand compensation for their effort.

Interestingly though, there are also some points where human workers and robots will behave similarly:

  1. Exactly as robots, human workers may be misaligned with a given base objective, in the sense that they understand it but they secretly disagree with it.
  2. Exactly as robots, human workers may easily misinterpret a task.
  3. Exactly as robots, human workers may not be able to meet expectations.
  4. Exactly as robots, human workers may be stubborn and refuse to stop doing something they want to do.
  5. Exactly as robots, human workers may behave differently in the workplace (=in deployment) in respect to how they behaved during the probationary period (=in training).
  6. Exactly as robots, human workers form a set of beliefs and a-priori assumptions.
  7. Exactly as robots, human workers develop their own preferences and inclinations.
  8. Exactly as robots, human workers may deceive, prioritise their own interests, lie about their true intentions, and so on.
  9. Exactly as robots, human workers may try to build influence to convince and manipulate others.
  10. Exactly as robots, human workers may be tempted to exploit any weakness of their workplace / peers / managers.

If you look at the list above, you may wonder how corporations can survive at all! And yet, somehow, they found a way[2].

Picking a Specific Context

Let’s suppose you are the CEO of a company and you want to start a project - for example, you want to create a webshop where your customers can buy your products. If you don’t have internal expertise to develop the project, you will be forced to involve third-party vendors to build the webshop for you: that is a typical example of job-delegation in business. As a CEO, you will need to ask yourself some questions:

Let’s focus now on how to structure the team that will be working on this project.

Best Practices in Team Structure

Based on my experience, most best practices can actually be summarised into a single sentence: provide clarity. That alone is usually a reliable estimator of success.
Clarity can take many forms: leadership provides clarity, and so does engagement, communication, planning, scoping, guidelines, and governance structure.

In the following, I am going to focus specifically on the team structure because I believe it contains many interesting lessons. Before doing that, I am going to define some typical project roles below - if you know them already, feel free to skip them.

TLDR;

At the end, the project manager will be blamed if the webshop is late or shoddy, while the product owner will be blamed if the webshop is disliked by too many customers.

Some Lessons

Let me split the main focus of each role in the following way:

In terms of AI Safety, we can finally extrapolate some considerations.

Product Owners & Project Managers

The terminal goal of a product owner is: to maintain ownership of the product and act upon it for good. I seriously doubt this is a goal that we should give to robots! Even if the AI alignment problem was 100% solved, the political complications (related to managing your sponsor’s consent) would make this role very sensitive.

The same considerations apply for project managers, although some tasks (especially related to resource allocation) may, in principle, be automated. I can see a future where most projects are entirely developed and deployed automatically, without needing a project manager: however, in such a future, the product owner will need to absorb parts of the tasks and responsibilities that are currently held by project managers instead.

Business Analysts

The purpose of the business analyst is to be a descriptor of reality: by using the concept of direction of fit [LW · GW], the only direction that matters is "The-Mind-Should-Fit-The-World", and the mind should describe what it sees in an unbiased way.

It may be difficult to train that mindset into a robot, but I don't see a reason why it shouldn't be achievable in principle: if we succeed, that would represent a tremendous success in terms of AI Safety since pure descriptors are, by nature, non-agentic and mostly harmless (if we exclude corner cases - e.g. self-fulfilling prophecies [? · GW]; see the next paragraph).

Architects & SMEs

Can we entrust a robot to the point that it becomes our advisor? While I would not recommend such a thing in general, there are some safeguards that we could adopt while doing so: by using robots with narrow expertise, agnostic drives, zero access privileges, and no consent management skills, we could minimise some potential issues.

The big problem here are the self-fulfilling prophecies [? · GW], that are not just "corner cases" for this role: they are the bread and butter of risk management! For that reason, I believe that human advisory will always be needed on top of robotic advisory. Accountability will become a major factor for the future of architects and SMEs.

Developers & Testers

The job market is clearly driving developers and testers into being:

  1. Myopic, in the sense that they only care about the current task, or the current sprint, or the current project - but nothing more than that.
  2. Narrow, in the sense that they are highly specialised and they have trouble generalising their expertise.
  3. Replaceable, in the sense that they can be replaced by some other resource at any time with minor consequences for the project.
  4. Parallelizable, in the sense that multiple resources can be employed simultaneously and their work can be combined together.
  5. Incremental, in the sense that their work makes progress by means of small iterative auditable steps.
  6. Low-path Dependent, in the sense that any delivered work does not have significant relevance for the future and it can be rolled-back, refactored, or replaced anytime.

I do not believe that such properties are necessary for generative/discriminative AIs - however, since we have proved over and over that such properties are sufficient to succeed in a project, I believe we should keep them in place as intended limitations (thus providing an additional form of safety).

The Future of Human Work

What is going to happen to the job market in a post-scarcity world[8]?
While the jobs of developers and project managers may be seriously at risk (excluding rare specialists), I believe that: (1) most private businesses will prefer working with human consultants for a long time yet; and (2) accountability will always be assigned to humans, especially for advanced enterprise endeavours.

Artistic direction, business prioritization, and technical advisory will (and shall!) stay in human hands - even if supported by machines. Although I believe the market will move in the direction I described so far, it is completely unclear to me where mankind will end up being in the far future.

Further Links

Control Vectors as Dispositional Traits [LW · GW] (my first post)

All the Following are Distinct [LW · GW] (my second post)

An Opinionated Look at Inference Rules [LW · GW] (my third post)

Can AI Quantity beat AI Quality? [LW · GW] (my fourth post)

I Recommend More Training Rationales [LW · GW] (my previous post).

Who I am

My name is Gianluca Calcagni, born in Italy, with a Master of Science in Mathematics. I am currently (2025) working in IT as a consultant with the role of Salesforce Certified Technical Architect. My opinions do not reflect the opinions of my employer or my customers. Feel free to contact me on Twitter or Linkedin.

Revision History

[2024-12-31] Post published.

  1. ^

    Slavery comes to mind…

  2. ^

    I am not saying that corporations are always successful in their endeavour: according to some auditors, only about 35% of the business projects are declared to be a total success, about 20% are actually considered a total failure, and the rest of the projects lie somewhere in the middle. Human alignment seems very hard!

  3. ^

    To be clear: the “ownership” is only nominal! Copyrights and assets are retained by the company that is sponsoring the project. What is owned is the artistic + functional direction of the product.

  4. ^

    According to this definition, even System Administrators are considered "developers".

  5. ^

    Also called "QA Engineer", where QA stands for Quality Assurance.

  6. ^

    In accordance with some monthly budget.

  7. ^

    Here I am referring to internal company politics, that can be very harsh.

  8. ^

    I am playing the optimist guy here, as I don't give for granted that AI will lead to a post-scarcity future. Existential risks feel very real.

1 comments

Comments sorted by top scores.

comment by Gianluca Calcagni (gianluca-calcagni) · 2025-01-01T13:18:10.759Z · LW(p) · GW(p)

Thanks to anyone that took the time to read and vote this post - regardless if it was a positive or a negative vote, I still appreciate it.

If you happen to downvote me, I'd appreciate it if you could explain the reason why: this is the second time that happens (for one of my previous posts I chose a title that was sounding like clickbait - I then corrected it), and I am curious to understand your feedback this time as well.

The reason why I write posts here from time to time is simply to be challenged and be exposed to different points of view: that cannot happen without an exchange (even if harsh).

Let me take this chance to wish you all a happy new year 2025!
Gianluca