Posts

Reprograming the Mind: Meditation as a Tool for Cognitive Optimization 2024-01-11T12:03:41.763Z
How well does your research adress the theory-practice gap? 2023-11-08T11:27:52.410Z
Jonas Hallgren's Shortform 2023-10-11T09:52:20.390Z
Advice for new alignment people: Info Max 2023-05-30T15:42:20.142Z
Respect for Boundaries as non-arbirtrary coordination norms 2023-05-09T19:42:13.194Z
Max Tegmark's new Time article on how we're in a Don't Look Up scenario [Linkpost] 2023-04-25T15:41:16.050Z
The Benefits of Distillation in Research 2023-03-04T17:45:22.547Z
Power-Seeking = Minimising free energy 2023-02-22T04:28:44.075Z
Black Box Investigation Research Hackathon 2022-09-12T07:20:34.966Z
Announcing the Distillation for Alignment Practicum (DAP) 2022-08-18T19:50:31.371Z
Does agent foundations cover all future ML systems? 2022-07-25T01:17:11.841Z
Is it worth making a database for moral predictions? 2021-08-16T14:51:54.609Z
Is there any serious attempt to create a system to figure out the CEV of humanity and if not, why haven't we started yet? 2021-02-25T22:06:04.695Z

Comments

Comment by Jonas Hallgren on Closed Limelike Curves's Shortform · 2024-07-19T07:46:44.657Z · LW · GW

Well, it seems like this story might have to do something with it?: https://www.lesswrong.com/posts/3XNinGkqrHn93dwhY/reliable-sources-the-story-of-david-gerard

I don't know to what extent that is, though; otherwise, I agree with you.

Comment by Jonas Hallgren on On saying "Thank you" instead of "I'm Sorry" · 2024-07-09T12:22:05.742Z · LW · GW

Sorry if this was a bad comment!

Comment by Jonas Hallgren on On saying "Thank you" instead of "I'm Sorry" · 2024-07-08T14:43:17.437Z · LW · GW

Damn, thank you for this post. I will put this to practice immediately!

Comment by Jonas Hallgren on Finding the Wisdom to Build Safe AI · 2024-07-05T09:09:11.913Z · LW · GW

I resonated with the post and I think it's a great direction to draw inspiration from!

A big problem with goodharting in RL is that you're handcrafting a utility function. In the wisdom traditions, we're encouraged to explore and gain insights into different ideas to form our utility function over time.

Therefore, I feel that setting up the right training environment together with some wisdom principles might be enough to create wise AI.

We, of course, run into all of the annoying inner alignment and deception whilst training style problems, yet still, it seems the direction to go in. I don't think the orthogonality thesis is fully true or false, it is more dependent on your environment and if we can craft the right one I think we can have wise AI that wants to create the most loving and kind future imaginable.

Comment by Jonas Hallgren on List of Collective Intelligence Projects · 2024-07-04T18:08:36.198Z · LW · GW

Eh, it's like self-plugging or something.

It should work again now, we're gonna switch names soon so we just had some technical difficulties around that

Comment by Jonas Hallgren on List of Collective Intelligence Projects · 2024-07-03T10:11:10.170Z · LW · GW

Know of any I should add?

 

I do feel a bit awkward about it as I'm very much involved in both projects, but these two otherwise? 

The Collective Intelligence Company: https://thecollectiveintelligence.company/company

Flowback/Digital Democracy World: https://digitaldemocracy.world/

Also a paper for Predictive Liquid Democracy which is a part of both projects: https://www.researchgate.net/publication/377557844_Predictive_Liquid_Democracy

Comment by Jonas Hallgren on Live Theory Part 0: Taking Intelligence Seriously · 2024-06-27T08:13:20.798Z · LW · GW

Very intriguing, excited for the next post!

(We will watch your career with great interest.)

Comment by Jonas Hallgren on Matthew Barnett's Shortform · 2024-06-17T15:03:49.507Z · LW · GW

Often, disagreements boil down to a set of open questions to answer; here's my best guess at how to decompose your disagreements. 

I think that depending on what hypothesis you're abiding by when it comes to how LLMs will generalise to AGI, you get different answers:

Hypothesis 1: LLMs are enough evidence that AIs will generally be able to follow what humans care about and that they naturally don't become power-seeking. 

Hypothesis 2: AGI will have a sufficiently different architecture than LLMs or will change a lot, so much that current-day LLMs don't generally give evidence about AGI.

Depending on your beliefs about these two hypotheses, you will have different opinions on this question. 


The scenario outlined by Bostrom seems clearly different from the scenario with LLMs, which are actual general systems that do what we want and ~nothing more, rather than doing what we want as part of a strategy to seek power instrumentally. What am I missing here?

Let's say that we believe in hypothesis 1 as the base case; what are some reasons why LLMs wouldn't give evidence about AGI?

1. Intelligence forces reflective coherence.
This would essentially entail that the more powerful a system we get, the more it will notice internal inconsistencies and change towards maximising (and therefore not following human values).

2. Agentic AI acting in the real world is different from LLMs. 
If we look at an LLM from the perspective of an action-perception loop, it doesn't generally get any feedback on when it changes the world. Instead, it is an autoencoder, predicting what the world will look like. This may be so that power-seeking only arises in systems that are able to see the consequences of their own actions and how that affects the world. 

3. LLMs optimise for good-harted RLHF that seems well but lacks fundamental understanding. Since human value is fragile, it will be difficult to hit the sweet spot when we get to real-world cases and take that into the complexity of the future.

Personal belief: 
These are all open questions, in my opinion, but I do see how LLMs give evidence about some of these parts. I, for example, believe that language is a very compressed information channel for alignment information, and I don't really believe that human values are as fragile as we think. 

I'm more scared of 1 and 2 than I'm of 3, but I would still love for us to have ten more years to figure this out as it seems very non-obvious as to what the answers here are.

Comment by Jonas Hallgren on jacquesthibs's Shortform · 2024-06-11T14:39:57.383Z · LW · GW

I really like this take.

I'm kind of "bullish" on active inference as a way to scale existing architectures to AGI as I think it is more optimised for creating an explicit planning system.

Also, Funnily enough, Yann LeCun has a paper on his beliefs on the path to AGI which I think Steve Byrnes has a good post on. It basically says that we need system 2 thinking in the way you said it here. With your argument in mind he kind of disproves himself to some extent. 😅

Comment by Jonas Hallgren on 2. Corrigibility Intuition · 2024-06-08T19:46:08.593Z · LW · GW

Very interesting, I like the long list of examples as it helped me get my head around it more.

So, I've been thinking a bit about similar topics, but in relation to a long reflection on value lock-in.

My basic thesis was that the concept of reversibility should be what we optimise for in general for humanity, as we want to be able to reach as large a part of the "moral searchspace" as possible.

The concept of corrigibility you seem to be pointing towards here seems very related to notions of reversibility. You don't want to take actions that cannot later be reversed, and you generally want to optimise for optionality.

I then have two questions:

1) What do you think of the relationship between your measure of corrigibility with the one of uncertainty in inverse reinforcement learning as it seems that it is similar to what Stuart Russell is pointing towards when it comes to being uncertain about a preference of the agent it is serving? For example in the following example that you give:

In the process of learning English, Cora takes a dictionary off a bookshelf to read. When she’s done, she returns the book to where she found it on the shelf. She reasons that if she didn’t return it this might produce unexpected costs and consequences. While it’s not obvious whether returning the book empowers Prince to correct her or not, she’s naturally conservative and tries to reduce the degree to which she’s producing unexpected externalities or being generally disruptive.

It kind of seems to me like the above can be formalised in terms of preference optimisation under uncertainty?
(Side follow-up: What do you then think about the Elizer, Russell VNM-axiom debate?)

2) Do you have any thoughts on the relationship between corrigibility and the one of reversibility in physics? Like you can formalise irreversible systems as ones that are path dependent, I'm just curious if you have any thoughts on the relationship between the two?

Thanks for the interesting work!

Comment by Jonas Hallgren on Alignment Gaps · 2024-06-08T19:06:53.725Z · LW · GW

I really like this type of post. Thank you for writing it!

I found some interesting papers that I didn't know off before so that is very nice.

Comment by Jonas Hallgren on Ethodynamics of Omelas · 2024-06-05T16:21:20.416Z · LW · GW

Just revisiting this post as probably my favourite one on this site. I love it!

Comment by Jonas Hallgren on Awakening · 2024-05-30T14:14:20.441Z · LW · GW

I was doing the same samadhi thing with TMI and I was looking for insight practices from there. My teacher (non dual thai forest tradition) said that the burmese traditions sets up a bit of a strange reality dualism and basically said that the dark night of the soul is often due to developing concentration before awareness, loving kindness and wisdom.

So I'm mahamudra pilled now (pointing out the great way is a really good book for this). I do still like the insight model you proposed, I'm still reeling a bit from the insights I got during my last retreat so it seems true.

Thank you for sharing your experience!

Comment by Jonas Hallgren on Examples of Highly Counterfactual Discoveries? · 2024-05-19T12:14:08.539Z · LW · GW

Sure! Anything more specific that you want to know about? Practice advice or more theory?

Comment by Jonas Hallgren on Towards a formalization of the agent structure problem · 2024-04-30T13:18:43.985Z · LW · GW

There is a specific part of this problem that I'm very interested in and that is about looking at the boundaries of potential sub-agents. It feels like part of the goal here is to filter away potential "daemons" or inner optimisers so it feels kind of important to think of ways one can do this?

I can see how this project would be valuable even without it but do you have any thoughts about how you can differentiate between different parts of a system that's acting like an agent to isolate the agentic part?

I otherwise find it a very interesting research direction.

Comment by Jonas Hallgren on The first future and the best future · 2024-04-25T18:58:56.029Z · LW · GW

Disclaimer: I don't necessarily support this view, I thought about it for like 5 minutes but I thought it made sense.

If we were to do things the same thing as other slowing down of regulation, then that might make sense, but I'm uncertain that you can take the outside view here? 

Yes, we can do the same as for other technologies by leaving it down to the standard government procedures to make legislation and then I might agree with you that slowing down might not lead to better outcomes. Yet, we don't have to do this. We can use other processes that might lead to a lot better decisions. Like what about proper value sampling techniques like digital liquid democracy? I think we can do a lot better than we have in the past by thinking about what mechanism we want to use.

Also, for some potential examples, I thought of cloning technology in like the last 5 min. If we just went full-speed with that tech then things would probably have turned out badly? 

Comment by Jonas Hallgren on Examples of Highly Counterfactual Discoveries? · 2024-04-24T14:06:53.460Z · LW · GW

The Buddha with dependent origination. I think it says somewhere that most of the stuff in Buddhism was from before the Buddha's time. These are things such as breath-based practices and loving kindness, among others. He had one revelation that made the entire enlightenment thing basically which is called dependent origination.*

*At least according to my meditation teacher, I believe him since he was a neuroscientist and astrophysics masters at Berkeley before he left for India though so he's got some pretty good epistemics.

It basically states that any system is only true based on another system being true. It has some really cool parallels to Gödel's Incompleteness Theorem but on a metaphysical level. Emptiness of emptiness and stuff. (On a side note I can recommend TMI + Seeing That Frees if you want to experience som radical shit there.)

Comment by Jonas Hallgren on Neural uncertainty estimation review article (for alignment) · 2024-04-23T07:20:59.275Z · LW · GW

This was a great post, thank you for making it!

I wanted to ask what you thought about the LLM-forecasting papers in relation to this literature? Do you think there are any ways of applying the uncertainty estimation literature to improve the forecasting ability of AI?:

https://arxiv.org/pdf/2402.18563.pdf

Comment by Jonas Hallgren on Good Bings copy, great Bings steal · 2024-04-21T13:41:49.987Z · LW · GW

I like the post and generally agree. Here's a random thought on the OOD generalization. I feel that often we talk about how being good at 2 or 3 different things allow for new exploration. If you believe in books such as Range, then we're a lot more creative when combining ideas from multiple different fields. I rather think of multiple "hulls" (I'm guessing this isn't technically correct since I'm a noob at convex optimisation.) and how to apply them together to find new truths. 

 

Comment by Jonas Hallgren on My experience using financial commitments to overcome akrasia · 2024-04-16T18:28:50.949Z · LW · GW

Damn, great post, thank you!

I saw that you used Freedom; random tip is to use the appblock app instead, as it is more powerful as well as cold turkey blocker on the computer. (If you want to there are ways to get around the other blockers)

That's all I wanted to say really, I will probably try it out in the future. I was thinking of giving myself an allowance or something similar to what I could spend on the app and see if it would increase my productivity. 

Comment by Jonas Hallgren on [deleted post] 2024-04-02T10:04:23.500Z

This was a dig at interpretability research. I'm pro-interpretability research in general, so if you feel personally attacked by this, it wasn't meant to be too serious. Just be careful with infohazards, ok? :)

Comment by Jonas Hallgren on Gradient Descent on the Human Brain · 2024-04-02T08:49:11.507Z · LW · GW

I think Neurallink already did this actually, a bit late to the point but a good try anyway. Also, have you considered having Michael Bay direct the research effort? I think he did a pretty good job with the first Transformers.

Comment by Jonas Hallgren on Orthogonality Thesis seems wrong · 2024-03-26T15:33:06.934Z · LW · GW

Yeah, I agree with what you just said; I should have been more careful with my phrasing. 

Maybe something like: "The naive version of the orthogonality thesis where we assume that AIs can't converge towards human values is assumed to be true too often"

Comment by Jonas Hallgren on Orthogonality Thesis seems wrong · 2024-03-26T12:26:33.345Z · LW · GW

Compared to other people on this site this is a part of my alignment optimism. I think that there are Natural abstractions in the moral landscape that makes agents converge towards cooperation and similar things. I read this post recently and Leo Gao made an argument that concave agents generally don't exist because since they stop existing. I think that there are pressures that conform agents to part of the value landscape. 

Like I agree that the orthogonality thesis is presumed to be true way too often. It is more like an argument that it may not happen by default but I'm also uncertain about the evidence that it actually gives you.

Comment by Jonas Hallgren on All About Concave and Convex Agents · 2024-03-26T12:18:52.926Z · LW · GW

Any SBF enjoyers?

Comment by Jonas Hallgren on GPT, the magical collaboration zone, Lex Fridman and Sam Altman · 2024-03-19T15:53:42.708Z · LW · GW

I have the same experience, I love having it connect two disparate topics together, it is very fun. I had the thought today that I use GPT for basically 80%+ of work tasks i do as a brainstorming partner.

Comment by Jonas Hallgren on Strong-Misalignment: Does Yudkowsky (or Christiano, or TurnTrout, or Wolfram, or…etc.) Have an Elevator Speech I’m Missing? · 2024-03-16T07:35:10.454Z · LW · GW

Hey! I saw that you had a bunch of downvotes and I wanted to get in here before you came too disilusioned with the LW crowd. I think a big point for me is that you don't really have any sub-headings or examples that are more straight to the point. It is all a long text that seems similar to how you directly thought, this makes it really hard to engage with what you say. Of course you're saying controversial things but if there was more clarity I think you would have more engagement.

(GPT is really op for this nowadays) Anyway, I wish you the best of luck! I'm also sorry for not engaging with any of your arguments but I couldnt quite follow.

Comment by Jonas Hallgren on Toward a Broader Conception of Adverse Selection · 2024-03-15T07:30:11.484Z · LW · GW

Alright, quite a ba(y)sed point there, very nice. My lazy ass is looking for a heuristic here. It seems like the more the EMH is true in a situation/amount of optimisation pressure applied the more you should expect to be disappointed with a trade.

But what is a good heuristic for how much worse it will be? Maybe one just has to think about the counterfactual option each time?

Comment by Jonas Hallgren on Highlights from Lex Fridman’s interview of Yann LeCun · 2024-03-14T20:35:33.405Z · LW · GW

I thought the orangutan argument was pretty good when I first saw it, but then I looked it up, and I realised that it is not that they aren't power seeking. It is more that they only are when it comes to interactions that matter for the future survival of offspring. It actually is a very flimsy argument. Some of the things he says are smart like some of the stuff on the architecture front, but you know, he always talks about his aeroplane analogy in AI Safety. It is like really dumb as I wouldn't get into an aeroplane without knowing that it has been safety checked and I have a hard time taking him seriously when it comes to safety as a consequence.

Comment by Jonas Hallgren on Explaining the AI Alignment Problem to Tibetan Buddhist Monks · 2024-03-07T18:38:30.818Z · LW · GW

Very cool! I want to mention that it might be interesting to mention the connection between what the buddha called dependent origination and the formation of a self view of being an agent.

The idea is that your self is built through a loop of expecting your self to be there in the future and thus creating a self fulfilling prophecy. This is similar to how agents in the intentional stance are defined as it is informationally more efficient to express yourself as an agent.

A way to view the alignment problem is through a self loop taking over fully or dependent origination in artificisl agents. Anyway, I think it seems very cool and I wish you the best of luck!

Comment by Jonas Hallgren on Many arguments for AI x-risk are wrong · 2024-03-05T07:26:37.826Z · LW · GW

I notice being confused about the relationship between power-seeking arguments and counting arguments. Since I'm confused I'm assuming others are so I would appreciate some clarity on this.

In footnote 7, Turner mentions that the paper, optimal policies tend to seek power is an irrelevant counting error post.

In my head, I think of the counting argument as that it is hard to hit an alignment target because of there being a lot more non-alignment targets. This argument is (clearly?) wrong due to reasons specified in the post. Yet this doesn’t address the power seeking as that seems more like a optimisation pressure applied to the system not something dependent on counting arguments?

In my head, power-seeking is more like saying that an agent's attraction basin is larger in one point of the optimisation landscape compared to another point. The same can also be said about deception here.

I might be dumb but I never thought of the counting argument as true nor crucial to both deception and power-seeking. I'm very happy to be enlightened about this issue.

Comment by Jonas Hallgren on Counting arguments provide no evidence for AI doom · 2024-02-28T21:04:22.552Z · LW · GW

I buy the argument that scheming won't happen conditionally on the fact that we don't allow much slack between different optimisation steps. As Quentin mentions in his AXRP podcast episode, SGD doesn't have close to the same level of slack that, for example, cultural evolution allowed. (See the entire free energy of optimisation debate here from before, can't remember the post names ;/) Iff that holds, then I don't see why the inner behaviour should diverge from what the outer alignment loop specifies.

I do, however, believe that ensuring that this is true by specifying the right outer alignment loop as well as the right deployment environment is important to ensure that slack is minimised at all points along the chain so that misalignment is avoided everywhere.

If we catch deception in training, we will be ok. If we catch actors that might create deceptive agents in training then we will be ok. If we catch states developing agents to do this or defense>offense then we will be ok. I do not believe that this happens by default.

Comment by Jonas Hallgren on AI #51: Altman’s Ambition · 2024-02-21T22:43:14.918Z · LW · GW

I know this is basically downvote farming on LW, but I find the idea of morality being downstream from the free energy principle very interesting.

Jezos obviously misses out on a bunch of game theoretic problems that arise, and FEP lacks explanatory power in such a domain, so it is quite clear to me that we shouldn't do this. I do think it's fundamentally true, just like how utilitarianism is fundamentally true. The only problem is that he's applying it naively.

I don't want to bet the future of humanity on this belief, but what if is = ought, and we have just misconstrued it by adopting proxy goals along the way? (IGF gang rise!)

Comment by Jonas Hallgren on Natural abstractions are observer-dependent: a conversation with John Wentworth · 2024-02-13T16:05:39.189Z · LW · GW

Uh, I binged like 5 MLST episodes with Friston, but I think it's a bit later in this one with Stephen Wolfram: https://open.spotify.com/episode/3Xk8yFWii47wnbXaaR5Jwr?si=NMdYu5dWRCeCdoKq9ZH_uQ

It might also be this one: https://open.spotify.com/episode/0NibQiHqIfRtLiIr4Mg40v?si=wesltttkSYSEkzO4lOZGaw

Sorry for the unsatisfactory answer :/

Comment by Jonas Hallgren on Natural abstractions are observer-dependent: a conversation with John Wentworth · 2024-02-13T10:46:44.357Z · LW · GW

Great comment, I just wanted to share a thought on my perception of the why in relation to the intentional stance. 

Basically, my hypothesis that I stole from Karl Friston is that an agent is defined as something that applies the intentional stance to itself. Or, in other words, something that plans with its own planning capacity or itself in mind. 

One can relate it to the entire membranes/boundaries discussion here on LW as well in that if you plan as if you have a non-permeable boundary, then the informational complexity of the world goes down. By applying the intentional stance to yourself, you minimize the informational complexity of modelling the world as you kind of define a recursive function that acts within its own boundaries (your self). You will then act according to this, and then you have a kind of self-fulfilling prophecy as the evidence you get is based on your map which has a planning agent in it. 

(Literally self-fulfilling prophecy in this case as I think this is the "self"-loop that is talked about in meditation. It's quite cool to go outside of it.)

Comment by Jonas Hallgren on Value learning in the absence of ground truth · 2024-02-06T16:32:31.973Z · LW · GW

Good stuff.

I remember getting a unification vibe when talking to you about the first two methods. To some extent one of them is about time-based aggregation and the other one is about space or interpersonal aggregation at a point in time.

It feels to me that there is some way to compose the two methods in order to get a space-time aggregation of all agents and their expected utility function. Maybe just as an initialisation step or similar (think python init). Then when you have the initialisation you take Demski's method to converge to truth.

Comment by Jonas Hallgren on Trying to align humans with inclusive genetic fitness · 2024-01-12T10:21:29.469Z · LW · GW

FWIW, I think a better question to ask within this area might be if we can retain a meta-process rather than a specific value. 

You don't want to leave your footprint on the future so then why are we talking about having a specific goal such as IGF specified? Wouldn't it make more sense to talk about the characteristics of the general process of proxy goal evolution for IGF instead? I think this would better reflect something that is closer to a long reflection and to some extent I think that is what we should be aiming for. 

Other than that I like the concept of "slack" when looking at the divergence of models. If there is no way that humans can do anything else but IGF then they won't. If they're given abundance then degrees of freedom arise and divergence is suddenly possible.

Comment by Jonas Hallgren on Reprograming the Mind: Meditation as a Tool for Cognitive Optimization · 2024-01-11T18:55:37.820Z · LW · GW

Yeah this is a fair point, in my personal experience the elephant path works to build concentration but as you say it might be worth doing another more holistic approach from the get go to skip the associated problems.

Comment by Jonas Hallgren on A model of research skill · 2024-01-08T16:42:47.182Z · LW · GW

Great post!

I wanted to mention something cool I learnt the other day which is that buddhism actually was created with a lot of the cultural baggage already there. (This is a relevant point, let me cook)

The buddha actually only came up with the new invention of "Dependent Origination". This lead to a view of the inherent emptiness (read underdeterminedness) of phenomenology. Yet it was only one invention on top of the rest that led to a view that in my opinion reduces a lot of suffering.

Similarly human evolution to where we are today is largely a process of cultural evolution as described in The Secret Of Our Success.

What I want to say is that ideas are built on other ideas and that Great Artists Steal. (Also a book)

Final statistic is that interdisciplinary researchers generally have more influential papers than specialised researchers.

So what is the take away for me? Well by sampling from independent sources of information you gain a lot more richness in your models. I therefore am trying to slap together dynamical systems, Active Inference and Boundaries at the moment as they seem to have a lot in common that seems relevant for embedded agents.

(Extra note is that GPT is actually really good at generating leads in between different areas of study. Especially biology + ML.)

Comment by Jonas Hallgren on Almost everyone I’ve met would be well-served thinking more about what to focus on · 2024-01-07T09:17:04.136Z · LW · GW

I find the relation between emotion and planning (system 1 and 2) very fascinating when it comes to explore exploit tradeoffs.

My current hypothesis is that by trusting my emotions I can tap into a deeper part of my own non-linear architecture and get better exploration in my life over time.

Applying rules then feels like a very system 2 way of going about things and yet I know I'm irrational in a lot of ways and that I can't fully trust myself.

It then becomes a very interesting balance between these two and now I'm quite uncertain what is optimal.

Comment by Jonas Hallgren on Ability to solve long-horizon tasks correlates with wanting things in the behaviorist sense · 2023-11-25T07:59:09.594Z · LW · GW

Alright, I will try to visualise what I see as the disagreement here. 

It seems to me that Paul is saying that behaviourist abstractions will happen in smaller time periods than long time horizons.

(Think of these shards as in the shard theory sense)

Nate is saying that the right picture creates stable wants more than the left and Paul is saying that it is time-agnostic and that the relevant metric is how competent the model is. 

The crux here is essentially whether longer time horizons are indicative of behaviourist shard formation. 

My thought here is that the process in the picture to the right induces more stable wants because a longer time horizon system is more complex, and therefore heuristics is the best decision rule. The complexity is increased in such a way that it is a large enough difference between short-term tasks and long-term tasks.

Also, the Redundant Information Hypothesis might give credence to the idea that systems will over time create more stable abstractions? 
 

Comment by Jonas Hallgren on The Perils of Professionalism · 2023-11-07T08:40:47.693Z · LW · GW

I'm totally in the business of more free rationalist career advice, so please keep it going!

Comment by Jonas Hallgren on One Day Sooner · 2023-11-07T08:29:21.576Z · LW · GW

Nice, for me, this was one of those things in the business world that was kind of implicit in some models I had from before, and this post made it explicit. 

Good stuff!

Comment by Jonas Hallgren on Propaganda or Science: A Look at Open Source AI and Bioterrorism Risk · 2023-11-03T21:53:41.826Z · LW · GW

Apparently, even being a European citizen doesn’t help.

I still think that we shouldn't have books on sourcing and building pipe bombs laying around though.

Comment by Jonas Hallgren on Propaganda or Science: A Look at Open Source AI and Bioterrorism Risk · 2023-11-03T21:51:01.158Z · LW · GW

Well, I'm happy to be a European citizen in that case, lol.

I really walked into that one.

Comment by Jonas Hallgren on Propaganda or Science: A Look at Open Source AI and Bioterrorism Risk · 2023-11-03T21:46:00.659Z · LW · GW

I mean, I'm sure it isn't legal to openly sell a book on how to source material for and build a pipebomb, right? It's dependent on the intent of the book and its content among other things so I I'm half hesistantly biting the bullet here.

Comment by Jonas Hallgren on Are humans misaligned with evolution? · 2023-10-19T07:38:50.416Z · LW · GW

I've been following this discussion from Jan's first post, and I've been enjoying it. I've put together some pictures to explain what I see in this discussion.

Something like the original misalignment might be something like this:

                                                    
 

This is fair as a first take, and if we want to look at it through a utility function optimisation lens, we might say something like this:

                                                                        

 

Where cultural values is the local environment that we're optimising for. 

As Jacob mentions, humans are still very effective when it comes to general optimisation if we look directly at how well it matches evolution's utility function. This calls for a new model.

Here's what I think actually happens : 

                        

 

Which can be perceived as something like this in the environmental sense: 

 

 

Based on this model, what is cultural (human) evolution telling us about misalignment? 


We have adopted proxy values (Y1,Y2,..YN) or culture in order to optimise for X or IGF. In other words, the shard of cultural values developed as a more efficient optimisation target in the new environment where different tribes applied optimisation pressure on each other. 

Also, I really enjoy the book The Secret Of Our Success when thinking about these models as it provides some very nice evidence about human evolution. 

Comment by Jonas Hallgren on Jonas Hallgren's Shortform · 2023-10-11T09:52:20.519Z · LW · GW

I was going through my old stuff and I found this from a year and a half ago so I thought I would just post it here real quickly as I found the last idea funny and the first idea to be pretty interesting:

In normal business there exist consulting firms that are specialised in certain topics, ensuring that organisations can take in an outside view from experts on the topic.

This seems quite an efficient way of doing things and something that, if built up properly within alignment, could lead to faster progress down the line. This is also something that the future fund seemed to be interested in as they gave prices for both the idea of creating an org focused on creating datasets and one on taking in human feedback. These are not the only ideas that are possible, however, and below I mention some more possible orgs that are likely to be net positive. 

Examples of possible organisations:

Alignment consulting firm

Newly minted alignment researchers will probably have a while to go before they can become fully integrated into a team. One can, therefore, imagine an organisation that takes in inexperienced alignment researchers and helps them write papers. They then promote these alignment researchers as being able to help with certain things. Real orgs can then easily take them in for contracting on specific problems. This should help involve market forces in the alignment area and should in general, improve the efficiency of the space. There are reasons why consulting firms exist in real life and creating the equivalent of Mackenzie in alignment is probably a good idea. Yet I might be wrong in this and if you can argue why it would make the space less efficient, I would love to hear it. 

"Marketing firms"

We don't want the wrong information to spread, something between a normal marketing firm and the Chinese "marketing" agency, If it's an info-hazard then shut the fuck up!


 

Comment by Jonas Hallgren on We don't understand what happened with culture enough · 2023-10-10T09:04:45.270Z · LW · GW

Isn't there an alternative story here where we care about the sharp left turn, but in the cultural sense, similar to Drexler's CAIS where we have similar types of experimentation as happened during the cultural evolution phase? 

You've convinced me that the sharp left turn will not happen in the classical way that people have thought about it, but are you that certain that there isn't that much free energy available in cultural style processes? If so, why?

I can imagine that there is something to say about SGD already being pretty algorithmically efficient, but I guess I would say that determining how much available free energy there is in improving optimisation processes is an open question. If the error bars are high here, how can we then know that the AI won't spin up something similar internally? 

I also want to add something about genetic fitness becoming twisted as a consequence of cultural evolutionary pressure on individuals. Culture in itself changed the optimal survival behaviour of humans, which then meant that the meta-level optimisation loop changed the underlying optimisation loop. Isn't the culture changing the objective function still a problem that we have to potentially contend with, even though it might not be as difficult as the normal sharp left turn?

For example, let's say that we deploy GPT-6 and it figures out that in order to solve the loosely defined objective that we have determined for it using (Constitutional AI)^2 should be discussed by many different iterations of itself to create a democratic process of multiple COT reasoners. This meta-process seems, in my opinion, like something that the cultural evolution hypothesis would predict is more optimal than just one GPT-6, and it also seems a lot harder to align than normal? 

Comment by Jonas Hallgren on High-level interpretability: detecting an AI's objectives · 2023-09-29T09:59:07.447Z · LW · GW

Very nice! I think work in this general direction is what is more or less needed if we want to survive. 

I just wanted to probe a bit when it comes to turning these methods into governance proposals. Do you see ways of creating databases/tests for objective measurement or how do you see this being used in policy and the real world?

(Obviously, I get that understanding AI will be better for less doom, but I'm curious about your thoughts on the last implementation step)