Posts
Comments
Excellent work! Thanks for what you do
fwiw while it's fair to call this "heavy nudging", this mirrors exactly what my prompts for agentic workflows look like. I have to repeat things like "Don't DO ANYTHING YOU WEREN'T ASKED" multiple times to get them to work consistently.
I found this post to be incredibly useful to get a deeper sense of Logan's work on naturalism.
I think his work on Naturalism is a great and unusual example of original research happening in the rationality community and what actually investigating rationality looks like.
Emailed you.
In my role as Head of Operations at Monastic Academy, every person in the organization is on a personal improvement plan that addresses the personal responsibility level, and each team in the organization is responsible for process improvements that address the systemic level.
In the performance improvement weekly meetings, my goal is to constantly bring them back to the level of personal responsibility. Any time they start saying the reason they couldn't meet their improvement goal was because of X event or Y person, I bring it back. What could THEY have done differently, what internal psychological patterns prevented them from doing that, and what can they do to shift those patterns this week.
Meanwhile, each team also chooses process improvements weekly. In those meetings, my role is to do the exact opposite, and bring it back to the level of process. Any time they're examining a team failure and come to the conclusion "we just need to prioritize it more, or try harder, or the manager needs to hold us to something", I bring it back to the level of process. How can we change the order or way we do things, or the incentives involved, such that it's not dependent on any given person's ability to work hard or remember or be good at a certain thing.
Personal responsibility and systemic failure are different levels of abstraction.
If you're within the system and doing horrible things while saying, "🤷 It's just my incentives, bro," you're essentially allowing the egregore to control you, letting it shove its hand up your ass and pilot you like a puppet.
At the same time, if you ignore systemic problems, you're giving the egregore power by pretending it doesn't exist—even though it’s puppeting everyone. By doing so, you're failing to claim your own power, which lies in recognizing your ability to work towards systemic change.
Both truths coexist:
- There are those perpetuating evil by surrendering their personal responsibility to an evil egregore.
- There are those perpetuating evil by letting the egregore run rampant and denying its existence.
The solution requires addressing both levels of abstraction.
I think the model of "Burnout as shadow values" is quite important and loadbearing in my own model of working with many EAs/Rationalists. I don't think I first got it from this post but I'm glad to see it written up so clearly here.
Any easy quick way to test is to offer some free coaching in this method.
Can you say more about how you've used this personally or with clients? What approaches you tried that didn't work, and how this has changed if at all to be more effective over time?
There's a lot here that's interesting, but hard for me to tell from just your description how battletested this is
What would the title be?
I still don't quite get it. We already have an Ilya Sutskever who can make type 1 and type 2 improvements, and don't see the sort of jump's in days your talking about (I mean, maybe we do, and they just look discontinuous because of the release cycles?)
Why do you imagine this? I imagine we'd get something like one Einstein from such a regime, which would maybe increase the timelines over existing AI labs by 1.2x or something? Eventually this gain compounds but I imagine that could tbe relatively slow and smooth , with the occasional discontinuous jump when something truly groundbreaking is discovered
Right, and per the second part of my comment - insofar as consciousness is a real phenomenon, there's an empirical question of if whatever frame invariant definition of computation you're using is the correct one.
Do you think wants that arise from conscious thought processes are equally valid to wants that arise from feelings? How do you think about that?
while this paradigm of 'training a model that's an agi, and then running it at inference' is one way we get to transformative agi, i find myself thinking that probably WON'T be the first transformative AI, because my guess is that there are lots of tricks using lots of compute at inference to get not quite transformative ai to transformative ai.
my guess is that getting to that transformative level is gonna require ALL the tricks and compute, and will therefore eek out being transformative BY utilizing all those resources.
one of those tricks may be running millions of copies of the thing in an agentic swarm, but i would expect that to be merely a form of inference time scaling, and therefore wouldn't expect ONE of those things to be transformative AGI on it's own.
and i doubt that these tricks can funge against train time compute, as you seem to be assuming in your analysis. my guess is that you hit diminishing returns for various types of train compute, then diminishing returns for various types of inference compute, and that we'll get to a point where we need to push both of them to that point to get tranformative ai
This seems arbitrary to me. I'm bringing in bits of information on multiple layers when I write a computer program to calculate the thing and then read out the result from the screen
Consider, if the transistors on the computer chip were moved around, would it still process the data in the same way and wield the correct answer?
Yes under some interpretation, but no from my perspective, because the right answer is about the relationship between what I consider computation and how I interpret the results in getting
But the real question for me is - under a computational perspective of consciousness, are there features of this computation that actually correlate to strength of consciousness? Does any interpretation of computation get equal weight? We could nail down a precise definition of what we mean by consciousness that we agreed on that didn't have the issues mentioned above, but who knows whether that would be the definition that actually maps to the territory of consciousness?
For me the answer is yes. There's some way of interpreting the colors of grains of sands on the beach as they swirl in the wind that would perfectly implement the miller robin primality test algorithm. So is the wind + sand computing the algorithm?
No, people really do see it, that whispiness can be crisp and clear
I'm not the most visual person. But occasionally when I'm reading I'll start seeing the scene. I then get jolted out of it when I realize I don't know how I'm seeing the words as they've been replaced with the imagined visuals
I used to think "getting lost in your eyes" was a metaphor, until I made eye contact with particularly beautiful woman in college and found myself losing track of where I was and what I was doing.
Tad James has a fascinating theory called timeline therapy. In it, he explores how different people represent their timelines and his theory about how shifting those representations will change fundamental ways you relate to the world.
fwiw i think that your first sentence makes sense, and second sentence doesn't understand why
i think people OBVIOUSLY have a sense of what meaning is, but it's really hard to describe
ah that makes sense
in my mind this isn't resources flowing to elsewhere, it's either:
- An emotional learning update
- A part of you that hasn't been getting what it wants speaking up.
this is great, thanks for sharing
in my model that happens through local updates, rather than a global system
for instance, if i used my willpower to feel my social anxiety completely (instead of the usual strategy of suppression) while socializing, i might get some small or large reconsolidation updates to the social anxiety, such that that part thinks it's needed in less situations or not at all
alternatively, the part that has the strategy of going to socialize and feeling confident may gain some more internal evidence, so it wins the internal conflict slightly more (but the internal conflict is still there and causes a drain)
i think the sort of global evaluation you're talking about is pretty rare, though something like it can happen when someone e.g. reaches a deep state of love through meditation, and then is able to access lots of their unloved parts that are downstream TRYING to get to that love and suddenly a big shift happens to whole system simultaneously (another type of global reevaulation can take place through reconsolidating deep internal organizing principles like fundamental ontological constraints or attachment style)
also, this 'subconscious parts going on strike' theory makes slightly different predictions than the 'is it good for the whole system/live' theory
for instance, i predict that you can have 'dead parts' that e.g. give people social anxiety based on past trauma, even though it's no longer actually relevant to their current situation.
and that if you override this social anxiety using 'live willpower' for a while, you can get burnout, even though the willpower is in some sense 'correct' about what would be good for the overall flourishing of the system given the current reality.
A lot of people are looking at the implications of o1's training process as a future scaling paradigm, but it seems to me that this implementation of applying inference time compute to just in time fine tune the model for hard questions is equally promising and may have equally impressive results if it scales with compute, and has equal potential in terms of low hanging fruit to be picked to improve it.
Don't sleep on test time training as a potential future scaling paradigm.
I often talk about w/ clients burnout as your subconscious/parts 'going on strike' because you've ignored them for too long
I never made the analogy to Atlas Shrugged and the live money leaving the dead money because it wasn't actually tending to the needs of the system, but now you've got me thinking
really, say more?
Another definition along the same vein:
Trauma is overgeneralization of emotional learning.
A real life use for smart contracts 😆
However, this would not address the underlying pattern of alignment failing to generalize.
Is there proof that this is an overall pattern? It would make sense that models are willing to do things they're not willing to talk about, but that doesn't mean there's a general pattern that e.g. they wouldn't be willing to talk about things, and wouldn't be willing to do them, but WOULD be willing to some secret third option.
I don't remember them having the actual stats, not watching it again though. I wonder if they published those elsewhere
They replicated it within the video itself?
Enjoyed this video by Veritasium with data showing how Politics is the Mind Killer
I'll send out to you round 2 when I've narrowed things done. Right now I'm looking for gut check system 1 decisions, and if you have trouble doing tahat I'd recommend waiting.
Want to help me out?
Vote on the book cover for my new book!
It'll be up for a couple of days. The contest website only gives me a few days before I have to pick finalists.
https://form.jotform.com/243066790448060
IME you can usually see in someone's face or body when they have a big release, just from the release of tension.
But I think it's harder to distinguish this from other hypotheses I've heard like "negative emotions are stored in the tissues" or "muscular tension is a way of stabilizing intentions."
Oh yes, if you're going on people's words, it's obviously not much better, but the whole point of vibing is that it's not about the words. Your aesthetics, vibes, the things you care about will be communicated non-verbally.
I predict you would enjoy the free-association game better if you cultivated the skill of vibing more.
Yes, this is an excellent point I didn't get across in the past above.
Yes, if people were using Wikipedia in the way they are using the LLMs.
In practice that doesn't happen though, people cite Wikipedia for facts but are using LLMs for judgement calls.
Of course a random person is biased. Some people will will have more authority than others, and we'll trust them more, and argument screens off authority.
What I don't want people to do is give chatGPT or Claude authority. Give it to the wisest people you know not Claude.
What they're saying is I got a semi-objective answer fast.
Exactly. Please stop saying this. It's not semi-objective. The trend of casually treating LLMs as an arbiter of truth leads to moral decay.
I doubt the orga got much of their own bias into the RLHF/RLAIF process
This is obviously untrue, orgs spend lots of effort making sure their AI doesn't say things that would give them bad press for example.
I desperately want people to stop using "I asked Claude or ChatGPT" as a stand-in for "I got an objective third party to review"
LLMs are not objective. They are trained on the internet which has specific sets of cultural, religious, ideological biases, and then further trained via RL to be biased in a way that a specific for-profit entity wanted them to be.
I happened to log on at that time and thought someone had launched a nuke
So far I’m seeing data that’s strongly in favor of it being easy for me to facilitate rapid growth for many people in this space. But am I missing something here? If you have any ideas please let me know in the comments.
My take:
You can facilitate rapid growth in these areas.
I don't think you're particularly unique in this regard. There are several people who I know (myself included) who can create these sorts of rapid changes on a semi-consistent basis. You named a few as reviewers. There are far more coaches/therapists who are ineffective, but lots of highly effective practitioners who can create rapid change using experiential methods.
@PJ_Eby @Kaj_Sotala @Damon Sasi all come to mind as people on LW who can do this. Having worked with many coaches and therapists, I assure you that many others also have the skill.
Right now I think you're overestimating just how consistent what you do is, and the results focus you're taking is likely creating other negative effects in the psyche that will have to be cleaned up later. It will also mean that if you don't get to the issue in the first session, it will be harder and harder for your work to have an impact over time.
But in general the approach you're taking can and will create rapid results in some people that haven't seen results before.
I've really been enjoying Charlie Anderson's YouTube channel for this reason, trying to find the absolute best way to make pizza.
https://youtube.com/@charlieandersoncooking?si=uhpLcNDyE7jLbTMY
It seems like the obvious thing to do with a model like o1 trained on reasoning through problems would be to train it to write code that helps it solve reasoning problems.
Perhaps the idea was to not give it this crutch so it could learn those reasoning skills without the help of code.
But it seems like from the examples that while its great at high level reasoning and figuring out where it went wrong, it still struggles with basic things like counting, which, if it had the instinct to write code in those areas which it's likely to get tripped up, would be easily solved.
Sorta surprised that this got so many up votes with the clickbaity title, which goes against norms around here
Otherwise th content seems good
I'm not talking about 10 year time horizons no