Posts

Reprograming the Mind: Meditation as a Tool for Cognitive Optimization 2024-01-11T12:03:41.763Z
How well does your research adress the theory-practice gap? 2023-11-08T11:27:52.410Z
Jonas Hallgren's Shortform 2023-10-11T09:52:20.390Z
Advice for new alignment people: Info Max 2023-05-30T15:42:20.142Z
Respect for Boundaries as non-arbirtrary coordination norms 2023-05-09T19:42:13.194Z
Max Tegmark's new Time article on how we're in a Don't Look Up scenario [Linkpost] 2023-04-25T15:41:16.050Z
The Benefits of Distillation in Research 2023-03-04T17:45:22.547Z
Power-Seeking = Minimising free energy 2023-02-22T04:28:44.075Z
Black Box Investigation Research Hackathon 2022-09-12T07:20:34.966Z
Announcing the Distillation for Alignment Practicum (DAP) 2022-08-18T19:50:31.371Z
Does agent foundations cover all future ML systems? 2022-07-25T01:17:11.841Z
Is it worth making a database for moral predictions? 2021-08-16T14:51:54.609Z
Is there any serious attempt to create a system to figure out the CEV of humanity and if not, why haven't we started yet? 2021-02-25T22:06:04.695Z

Comments

Comment by Jonas Hallgren on My experience using financial commitments to overcome akrasia · 2024-04-16T18:28:50.949Z · LW · GW

Damn, great post, thank you!

I saw that you used Freedom; random tip is to use the appblock app instead, as it is more powerful as well as cold turkey blocker on the computer. (If you want to there are ways to get around the other blockers)

That's all I wanted to say really, I will probably try it out in the future. I was thinking of giving myself an allowance or something similar to what I could spend on the app and see if it would increase my productivity. 

Comment by Jonas Hallgren on [deleted post] 2024-04-02T10:04:23.500Z

This was a dig at interpretability research. I'm pro-interpretability research in general, so if you feel personally attacked by this, it wasn't meant to be too serious. Just be careful with infohazards, ok? :)

Comment by Jonas Hallgren on Gradient Descent on the Human Brain · 2024-04-02T08:49:11.507Z · LW · GW

I think Neurallink already did this actually, a bit late to the point but a good try anyway. Also, have you considered having Michael Bay direct the research effort? I think he did a pretty good job with the first Transformers.

Comment by Jonas Hallgren on Orthogonality Thesis seems wrong · 2024-03-26T15:33:06.934Z · LW · GW

Yeah, I agree with what you just said; I should have been more careful with my phrasing. 

Maybe something like: "The naive version of the orthogonality thesis where we assume that AIs can't converge towards human values is assumed to be true too often"

Comment by Jonas Hallgren on Orthogonality Thesis seems wrong · 2024-03-26T12:26:33.345Z · LW · GW

Compared to other people on this site this is a part of my alignment optimism. I think that there are Natural abstractions in the moral landscape that makes agents converge towards cooperation and similar things. I read this post recently and Leo Gao made an argument that concave agents generally don't exist because since they stop existing. I think that there are pressures that conform agents to part of the value landscape. 

Like I agree that the orthogonality thesis is presumed to be true way too often. It is more like an argument that it may not happen by default but I'm also uncertain about the evidence that it actually gives you.

Comment by Jonas Hallgren on All About Concave and Convex Agents · 2024-03-26T12:18:52.926Z · LW · GW

Any SBF enjoyers?

Comment by Jonas Hallgren on GPT, the magical collaboration zone, Lex Fridman and Sam Altman · 2024-03-19T15:53:42.708Z · LW · GW

I have the same experience, I love having it connect two disparate topics together, it is very fun. I had the thought today that I use GPT for basically 80%+ of work tasks i do as a brainstorming partner.

Comment by Jonas Hallgren on Strong-Misalignment: Does Yudkowsky (or Christiano, or TurnTrout, or Wolfram, or…etc.) Have an Elevator Speech I’m Missing? · 2024-03-16T07:35:10.454Z · LW · GW

Hey! I saw that you had a bunch of downvotes and I wanted to get in here before you came too disilusioned with the LW crowd. I think a big point for me is that you don't really have any sub-headings or examples that are more straight to the point. It is all a long text that seems similar to how you directly thought, this makes it really hard to engage with what you say. Of course you're saying controversial things but if there was more clarity I think you would have more engagement.

(GPT is really op for this nowadays) Anyway, I wish you the best of luck! I'm also sorry for not engaging with any of your arguments but I couldnt quite follow.

Comment by Jonas Hallgren on Toward a Broader Conception of Adverse Selection · 2024-03-15T07:30:11.484Z · LW · GW

Alright, quite a ba(y)sed point there, very nice. My lazy ass is looking for a heuristic here. It seems like the more the EMH is true in a situation/amount of optimisation pressure applied the more you should expect to be disappointed with a trade.

But what is a good heuristic for how much worse it will be? Maybe one just has to think about the counterfactual option each time?

Comment by Jonas Hallgren on Highlights from Lex Fridman’s interview of Yann LeCun · 2024-03-14T20:35:33.405Z · LW · GW

I thought the orangutan argument was pretty good when I first saw it, but then I looked it up, and I realised that it is not that they aren't power seeking. It is more that they only are when it comes to interactions that matter for the future survival of offspring. It actually is a very flimsy argument. Some of the things he says are smart like some of the stuff on the architecture front, but you know, he always talks about his aeroplane analogy in AI Safety. It is like really dumb as I wouldn't get into an aeroplane without knowing that it has been safety checked and I have a hard time taking him seriously when it comes to safety as a consequence.

Comment by Jonas Hallgren on Explaining the AI Alignment Problem to Tibetan Buddhist Monks · 2024-03-07T18:38:30.818Z · LW · GW

Very cool! I want to mention that it might be interesting to mention the connection between what the buddha called dependent origination and the formation of a self view of being an agent.

The idea is that your self is built through a loop of expecting your self to be there in the future and thus creating a self fulfilling prophecy. This is similar to how agents in the intentional stance are defined as it is informationally more efficient to express yourself as an agent.

A way to view the alignment problem is through a self loop taking over fully or dependent origination in artificisl agents. Anyway, I think it seems very cool and I wish you the best of luck!

Comment by Jonas Hallgren on Many arguments for AI x-risk are wrong · 2024-03-05T07:26:37.826Z · LW · GW

I notice being confused about the relationship between power-seeking arguments and counting arguments. Since I'm confused I'm assuming others are so I would appreciate some clarity on this.

In footnote 7, Turner mentions that the paper, optimal policies tend to seek power is an irrelevant counting error post.

In my head, I think of the counting argument as that it is hard to hit an alignment target because of there being a lot more non-alignment targets. This argument is (clearly?) wrong due to reasons specified in the post. Yet this doesn’t address the power seeking as that seems more like a optimisation pressure applied to the system not something dependent on counting arguments?

In my head, power-seeking is more like saying that an agent's attraction basin is larger in one point of the optimisation landscape compared to another point. The same can also be said about deception here.

I might be dumb but I never thought of the counting argument as true nor crucial to both deception and power-seeking. I'm very happy to be enlightened about this issue.

Comment by Jonas Hallgren on Counting arguments provide no evidence for AI doom · 2024-02-28T21:04:22.552Z · LW · GW

I buy the argument that scheming won't happen conditionally on the fact that we don't allow much slack between different optimisation steps. As Quentin mentions in his AXRP podcast episode, SGD doesn't have close to the same level of slack that, for example, cultural evolution allowed. (See the entire free energy of optimisation debate here from before, can't remember the post names ;/) Iff that holds, then I don't see why the inner behaviour should diverge from what the outer alignment loop specifies.

I do, however, believe that ensuring that this is true by specifying the right outer alignment loop as well as the right deployment environment is important to ensure that slack is minimised at all points along the chain so that misalignment is avoided everywhere.

If we catch deception in training, we will be ok. If we catch actors that might create deceptive agents in training then we will be ok. If we catch states developing agents to do this or defense>offense then we will be ok. I do not believe that this happens by default.

Comment by Jonas Hallgren on AI #51: Altman’s Ambition · 2024-02-21T22:43:14.918Z · LW · GW

I know this is basically downvote farming on LW, but I find the idea of morality being downstream from the free energy principle very interesting.

Jezos obviously misses out on a bunch of game theoretic problems that arise, and FEP lacks explanatory power in such a domain, so it is quite clear to me that we shouldn't do this. I do think it's fundamentally true, just like how utilitarianism is fundamentally true. The only problem is that he's applying it naively.

I don't want to bet the future of humanity on this belief, but what if is = ought, and we have just misconstrued it by adopting proxy goals along the way? (IGF gang rise!)

Comment by Jonas Hallgren on Natural abstractions are observer-dependent: a conversation with John Wentworth · 2024-02-13T16:05:39.189Z · LW · GW

Uh, I binged like 5 MLST episodes with Friston, but I think it's a bit later in this one with Stephen Wolfram: https://open.spotify.com/episode/3Xk8yFWii47wnbXaaR5Jwr?si=NMdYu5dWRCeCdoKq9ZH_uQ

It might also be this one: https://open.spotify.com/episode/0NibQiHqIfRtLiIr4Mg40v?si=wesltttkSYSEkzO4lOZGaw

Sorry for the unsatisfactory answer :/

Comment by Jonas Hallgren on Natural abstractions are observer-dependent: a conversation with John Wentworth · 2024-02-13T10:46:44.357Z · LW · GW

Great comment, I just wanted to share a thought on my perception of the why in relation to the intentional stance. 

Basically, my hypothesis that I stole from Karl Friston is that an agent is defined as something that applies the intentional stance to itself. Or, in other words, something that plans with its own planning capacity or itself in mind. 

One can relate it to the entire membranes/boundaries discussion here on LW as well in that if you plan as if you have a non-permeable boundary, then the informational complexity of the world goes down. By applying the intentional stance to yourself, you minimize the informational complexity of modelling the world as you kind of define a recursive function that acts within its own boundaries (your self). You will then act according to this, and then you have a kind of self-fulfilling prophecy as the evidence you get is based on your map which has a planning agent in it. 

(Literally self-fulfilling prophecy in this case as I think this is the "self"-loop that is talked about in meditation. It's quite cool to go outside of it.)

Comment by Jonas Hallgren on Value learning in the absence of ground truth · 2024-02-06T16:32:31.973Z · LW · GW

Good stuff.

I remember getting a unification vibe when talking to you about the first two methods. To some extent one of them is about time-based aggregation and the other one is about space or interpersonal aggregation at a point in time.

It feels to me that there is some way to compose the two methods in order to get a space-time aggregation of all agents and their expected utility function. Maybe just as an initialisation step or similar (think python init). Then when you have the initialisation you take Demski's method to converge to truth.

Comment by Jonas Hallgren on Trying to align humans with inclusive genetic fitness · 2024-01-12T10:21:29.469Z · LW · GW

FWIW, I think a better question to ask within this area might be if we can retain a meta-process rather than a specific value. 

You don't want to leave your footprint on the future so then why are we talking about having a specific goal such as IGF specified? Wouldn't it make more sense to talk about the characteristics of the general process of proxy goal evolution for IGF instead? I think this would better reflect something that is closer to a long reflection and to some extent I think that is what we should be aiming for. 

Other than that I like the concept of "slack" when looking at the divergence of models. If there is no way that humans can do anything else but IGF then they won't. If they're given abundance then degrees of freedom arise and divergence is suddenly possible.

Comment by Jonas Hallgren on Reprograming the Mind: Meditation as a Tool for Cognitive Optimization · 2024-01-11T18:55:37.820Z · LW · GW

Yeah this is a fair point, in my personal experience the elephant path works to build concentration but as you say it might be worth doing another more holistic approach from the get go to skip the associated problems.

Comment by Jonas Hallgren on A model of research skill · 2024-01-08T16:42:47.182Z · LW · GW

Great post!

I wanted to mention something cool I learnt the other day which is that buddhism actually was created with a lot of the cultural baggage already there. (This is a relevant point, let me cook)

The buddha actually only came up with the new invention of "Dependent Origination". This lead to a view of the inherent emptiness (read underdeterminedness) of phenomenology. Yet it was only one invention on top of the rest that led to a view that in my opinion reduces a lot of suffering.

Similarly human evolution to where we are today is largely a process of cultural evolution as described in The Secret Of Our Success.

What I want to say is that ideas are built on other ideas and that Great Artists Steal. (Also a book)

Final statistic is that interdisciplinary researchers generally have more influential papers than specialised researchers.

So what is the take away for me? Well by sampling from independent sources of information you gain a lot more richness in your models. I therefore am trying to slap together dynamical systems, Active Inference and Boundaries at the moment as they seem to have a lot in common that seems relevant for embedded agents.

(Extra note is that GPT is actually really good at generating leads in between different areas of study. Especially biology + ML.)

Comment by Jonas Hallgren on Almost everyone I’ve met would be well-served thinking more about what to focus on · 2024-01-07T09:17:04.136Z · LW · GW

I find the relation between emotion and planning (system 1 and 2) very fascinating when it comes to explore exploit tradeoffs.

My current hypothesis is that by trusting my emotions I can tap into a deeper part of my own non-linear architecture and get better exploration in my life over time.

Applying rules then feels like a very system 2 way of going about things and yet I know I'm irrational in a lot of ways and that I can't fully trust myself.

It then becomes a very interesting balance between these two and now I'm quite uncertain what is optimal.

Comment by Jonas Hallgren on Ability to solve long-horizon tasks correlates with wanting things in the behaviorist sense · 2023-11-25T07:59:09.594Z · LW · GW

Alright, I will try to visualise what I see as the disagreement here. 

It seems to me that Paul is saying that behaviourist abstractions will happen in smaller time periods than long time horizons.

(Think of these shards as in the shard theory sense)

Nate is saying that the right picture creates stable wants more than the left and Paul is saying that it is time-agnostic and that the relevant metric is how competent the model is. 

The crux here is essentially whether longer time horizons are indicative of behaviourist shard formation. 

My thought here is that the process in the picture to the right induces more stable wants because a longer time horizon system is more complex, and therefore heuristics is the best decision rule. The complexity is increased in such a way that it is a large enough difference between short-term tasks and long-term tasks.

Also, the Redundant Information Hypothesis might give credence to the idea that systems will over time create more stable abstractions? 
 

Comment by Jonas Hallgren on The Perils of Professionalism · 2023-11-07T08:40:47.693Z · LW · GW

I'm totally in the business of more free rationalist career advice, so please keep it going!

Comment by Jonas Hallgren on One Day Sooner · 2023-11-07T08:29:21.576Z · LW · GW

Nice, for me, this was one of those things in the business world that was kind of implicit in some models I had from before, and this post made it explicit. 

Good stuff!

Comment by Jonas Hallgren on Propaganda or Science: A Look at Open Source AI and Bioterrorism Risk · 2023-11-03T21:53:41.826Z · LW · GW

Apparently, even being a European citizen doesn’t help.

I still think that we shouldn't have books on sourcing and building pipe bombs laying around though.

Comment by Jonas Hallgren on Propaganda or Science: A Look at Open Source AI and Bioterrorism Risk · 2023-11-03T21:51:01.158Z · LW · GW

Well, I'm happy to be a European citizen in that case, lol.

I really walked into that one.

Comment by Jonas Hallgren on Propaganda or Science: A Look at Open Source AI and Bioterrorism Risk · 2023-11-03T21:46:00.659Z · LW · GW

I mean, I'm sure it isn't legal to openly sell a book on how to source material for and build a pipebomb, right? It's dependent on the intent of the book and its content among other things so I I'm half hesistantly biting the bullet here.

Comment by Jonas Hallgren on Are humans misaligned with evolution? · 2023-10-19T07:38:50.416Z · LW · GW

I've been following this discussion from Jan's first post, and I've been enjoying it. I've put together some pictures to explain what I see in this discussion.

Something like the original misalignment might be something like this:

                                                    
 

This is fair as a first take, and if we want to look at it through a utility function optimisation lens, we might say something like this:

                                                                        

 

Where cultural values is the local environment that we're optimising for. 

As Jacob mentions, humans are still very effective when it comes to general optimisation if we look directly at how well it matches evolution's utility function. This calls for a new model.

Here's what I think actually happens : 

                        

 

Which can be perceived as something like this in the environmental sense: 

 

 

Based on this model, what is cultural (human) evolution telling us about misalignment? 


We have adopted proxy values (Y1,Y2,..YN) or culture in order to optimise for X or IGF. In other words, the shard of cultural values developed as a more efficient optimisation target in the new environment where different tribes applied optimisation pressure on each other. 

Also, I really enjoy the book The Secret Of Our Success when thinking about these models as it provides some very nice evidence about human evolution. 

Comment by Jonas Hallgren on Jonas Hallgren's Shortform · 2023-10-11T09:52:20.519Z · LW · GW

I was going through my old stuff and I found this from a year and a half ago so I thought I would just post it here real quickly as I found the last idea funny and the first idea to be pretty interesting:

In normal business there exist consulting firms that are specialised in certain topics, ensuring that organisations can take in an outside view from experts on the topic.

This seems quite an efficient way of doing things and something that, if built up properly within alignment, could lead to faster progress down the line. This is also something that the future fund seemed to be interested in as they gave prices for both the idea of creating an org focused on creating datasets and one on taking in human feedback. These are not the only ideas that are possible, however, and below I mention some more possible orgs that are likely to be net positive. 

Examples of possible organisations:

Alignment consulting firm

Newly minted alignment researchers will probably have a while to go before they can become fully integrated into a team. One can, therefore, imagine an organisation that takes in inexperienced alignment researchers and helps them write papers. They then promote these alignment researchers as being able to help with certain things. Real orgs can then easily take them in for contracting on specific problems. This should help involve market forces in the alignment area and should in general, improve the efficiency of the space. There are reasons why consulting firms exist in real life and creating the equivalent of Mackenzie in alignment is probably a good idea. Yet I might be wrong in this and if you can argue why it would make the space less efficient, I would love to hear it. 

"Marketing firms"

We don't want the wrong information to spread, something between a normal marketing firm and the Chinese "marketing" agency, If it's an info-hazard then shut the fuck up!


 

Comment by Jonas Hallgren on We don't understand what happened with culture enough · 2023-10-10T09:04:45.270Z · LW · GW

Isn't there an alternative story here where we care about the sharp left turn, but in the cultural sense, similar to Drexler's CAIS where we have similar types of experimentation as happened during the cultural evolution phase? 

You've convinced me that the sharp left turn will not happen in the classical way that people have thought about it, but are you that certain that there isn't that much free energy available in cultural style processes? If so, why?

I can imagine that there is something to say about SGD already being pretty algorithmically efficient, but I guess I would say that determining how much available free energy there is in improving optimisation processes is an open question. If the error bars are high here, how can we then know that the AI won't spin up something similar internally? 

I also want to add something about genetic fitness becoming twisted as a consequence of cultural evolutionary pressure on individuals. Culture in itself changed the optimal survival behaviour of humans, which then meant that the meta-level optimisation loop changed the underlying optimisation loop. Isn't the culture changing the objective function still a problem that we have to potentially contend with, even though it might not be as difficult as the normal sharp left turn?

For example, let's say that we deploy GPT-6 and it figures out that in order to solve the loosely defined objective that we have determined for it using (Constitutional AI)^2 should be discussed by many different iterations of itself to create a democratic process of multiple COT reasoners. This meta-process seems, in my opinion, like something that the cultural evolution hypothesis would predict is more optimal than just one GPT-6, and it also seems a lot harder to align than normal? 

Comment by Jonas Hallgren on High-level interpretability: detecting an AI's objectives · 2023-09-29T09:59:07.447Z · LW · GW

Very nice! I think work in this general direction is what is more or less needed if we want to survive. 

I just wanted to probe a bit when it comes to turning these methods into governance proposals. Do you see ways of creating databases/tests for objective measurement or how do you see this being used in policy and the real world?

(Obviously, I get that understanding AI will be better for less doom, but I'm curious about your thoughts on the last implementation step) 

Comment by Jonas Hallgren on The Talk: a brief explanation of sexual dimorphism · 2023-09-20T07:17:17.221Z · LW · GW

When I inevitably have to answer why I didn't duplicate myself to my future offspring, I will link them to this post; thank you for this gem. 

Comment by Jonas Hallgren on Incentives affecting alignment-researcher encouragement · 2023-08-29T15:00:37.057Z · LW · GW

Well, incubators and many smaller bets are usually the best approach in this type of situation since you want to black-swan farm, as you say. 

This is the approach I'm generally taking right now, similar to the pyramid scheme argument of getting more people to be EA, I think it's worth mentoring new people on alignment perspectives.
So some stuff you could do:
1. Start a local organisation where you help people understand alignment and help them get perspectives. (Note that this only works over a certain quality of thinking but getting this is also a numbers game IMO.)
2. Create and mentor isolated groups of independent communities that develop alignment theories. (You want people to be weird but not too weird, meaning that they should be semi-independent from the larger community)
3. Theoretical alignment conference/competition with proposals optimised for being interesting in weird ways. 

I'm trying to do this specifically in the Nordics as I think there are a bunch of smart people here who don't want to move to the "talent hubs" so my marginal impact might be higher. To be honest, I'm uncertain on how much to focus on this vs. developing more on my personal alignment theories, but I'm a strong pyramid scheme believer. I could however be wrong on this and I would love to hear some more takes on this.
 

Comment by Jonas Hallgren on Ten Thousand Years of Solitude · 2023-08-21T09:51:33.539Z · LW · GW

I really like the overall picture that Guns, Germs and Steel presents and a book I believe compliments it very well if one is interested in the evolution of the human species is The Secret of Our Success which goes into more of the mechanisms of cultural evolution and our current theories for why Tasmanians for example fell behind mainland Australia as much as it did. 

Comment by Jonas Hallgren on I'm consistently overwhelmed by basic obligations. Are there any paradigm shifts or other rationality-based tips that would be helpful? · 2023-07-22T15:20:47.775Z · LW · GW

Maybe a shit answer but becoming great at meditation allows you to be with emotions without trouble. Chores then become an oppurtunity to deepen your awareness and practice (more long term but you cut the root of the tree)

Comment by Jonas Hallgren on News : Biden-⁠Harris Administration Secures Voluntary Commitments from Leading Artificial Intelligence Companies to Manage the Risks Posed by AI · 2023-07-21T21:27:47.027Z · LW · GW

Uhm, fuck yeah?

If someone told me this 2 years a go I wouldn't have believed it, kinda feels like we're doing a Dr Strange on the timeline right now.

Comment by Jonas Hallgren on Thoughts on “Process-Based Supervision” · 2023-07-18T10:19:14.023Z · LW · GW

I feel like there's a point worth emphasizing here about myopia and the existing work around that, as OP stated. I don't think of myopia as generally promising because of FDT-style reasoning since a VNM-style agent would continually optimise for consistency over longer time periods. 

Therefore this seems a bit like the myopic reasoning RLHF will go towards the same failure modes as RLHF in the limit as the agent becomes more capable. (I'm happy to be shown I'm wrong here)
This also depends on questions such as to what extent the underlying base model will be a maximiser and the agency model of the base model. (which OP also states)

If someone were to provide a convincing story that showed;
1. How this method could be used whilst counteracting deception
2. An example of how this would look from the inside of the AI
3. How the model itself doesn't converge towards reasoning RLHF
4. How this then itself is happening inside a generally capable AI

Then I might be convinced that it is a good idea. 

Comment by Jonas Hallgren on Ethodynamics of Omelas · 2023-06-11T10:56:42.007Z · LW · GW

Man, this was a really fun and exciting post! Thank you for writing it. 

 

Maybe there's a connection to the FEP here? I remember Karl Friston saying something about how we can see morality as downstream from the FEP, which is (kinda?) used here, maybe? 

Comment by Jonas Hallgren on Advice for new alignment people: Info Max · 2023-06-01T08:28:43.991Z · LW · GW

I actually read less books than I used to, the 3x thing was that I listen to audiobooks at 3x the speed so I read less non-fiction but at a faster pace.

Also weird but useful in my head is for example looking into population dynamics to understand alignment failures. When does ecology predict that mode collapse will happen inside of large language models? Understanding these areas and writing about them is weird but it could also a useful bet for at least someone to take.

However, this also depends on how much doing the normal stuff is saturated. I would recommend trying to understand the problems and current approaches really well and then come up with ways of tackling them. To get the bits of information on how to tackle them you might want to check out weirder fields since those bits aren't already in the common pool of "alignment information" if that makes sense?

Comment by Jonas Hallgren on The bullseye framework: My case against AI doom · 2023-05-31T07:07:15.443Z · LW · GW

Generally a well-argued post; I enjoyed it even though I didn't agree with all of it. 

I do want to point out the bitter lesson when it comes to capabilities increase. On current priors, it seems like that intelligence should be something that can solve a lot of tasks at the same time. This would point towards higher capabilities in individual AIs, especially once you add online learning to the mix. The AGI will not have a computational storage limit for the amount of knowledge it can have. The division of agents you propose will most likely be able to made into the same agent, it's more about storage retrieval time here and storing an activation module for "play chess" is something that will not be computationally intractable for an AGI to do. 

This means that the most probable current path forward is into highly capable general AI that generalise across tasks. 

Comment by Jonas Hallgren on Advice for new alignment people: Info Max · 2023-05-31T06:22:26.978Z · LW · GW

Since you asked so nicely, I can give you two other models. 

1. Meditation is like slow recursive self-improvement and reprogramming of your mind. It gives focus & mental health benefits that are hard to get from other places. If you want to accelerate your growth, I think it's really good. A mechanistic model of meditation & then doing the stages in the book The Mind Illuminated will give you this. (at least, this is how it has been for me)
2. Try to be weird and useful. If you have a weird background, you will catch ideas that might fall through the cracks for other people. Yet to make those ideas worth something you have to be able to actually take action on them, meaning you need to know how to, for example, communicate. So try to find the Pareto optimal between weird and useful by following & doing stuff you find interesting, but also valuable and challenging.

(Read a fuckton of non-fiction books as well if that isn't obvious. Just look up 30 different top 10 lists and you will have a good range to choose from.)

Comment by Jonas Hallgren on "Membranes" is better terminology than "boundaries" alone · 2023-05-29T06:23:12.690Z · LW · GW

I just wanted to say that you have my vote of confidence on this. It makes the intuitions behind the idea more salient as well.

Comment by Jonas Hallgren on Respect for Boundaries as non-arbirtrary coordination norms · 2023-05-16T08:13:00.492Z · LW · GW

Sorry for not responding earlier; working on a post that goes through related things in more detail. 

Well I'm just saying that the red blob goes outside the striped circle. The red blob is our viscera, which has now flowed outside our boundary. 

I imagine boundaries being a way of depicting the world so the viscera is the "object" that is in the territory whilst the boundary is our map of that object, meaning that the viscera can change without the boundary changing, which in turn leads us to a mismatch and both exfiltration and infiltration in this case.

Comment by Jonas Hallgren on [deleted post] 2023-05-15T12:28:21.555Z

I'm still in the hitting-my-head-against-a-brick-wall stage of discovering agents (even though I've read it twice), so I appreciate the overview. I didn't pick up what agent would be in your overview, so if you could give a pointer, that would be highly appreciated.

Also, the goal of the paper, in my mind is to "provide a potential way of inferring agentic behaviour in a system" so then the paper is an existence proof of this being possible?

Comment by Jonas Hallgren on Support Structures for Naturalist Study · 2023-05-15T11:24:41.426Z · LW · GW

I wanted to bring up a mode of cognition related to the undirected time inspired by Kaj Sotala's multiagent models of mind, which I think of as directed undirected time. It's basically defining a vague area, such as how to bridge natural abstractions to interpretability, where I allow my thoughts to roam free. If I get a thought such as, "Ah man, I should have been nicer in that situation yesterday", then I don't engage with it.  It's related to "problem-solving meditations" and something very beneficial to me, especially in the context of a walk. 
For you who like visualisations, you can imagine it as defining an area where thoughts are allowed to bounce around.


(A caveat is that I have meditated quite a bit, so I have a pretty good introspective awareness which might be required for directed undirected exploration, I'm pretty sure it helps at least.)

 

Comment by Jonas Hallgren on Clarifying and predicting AGI · 2023-05-05T08:10:43.879Z · LW · GW

I want to point out the two main pillars I think your model has to assume for it to be the best model for prediction. (I think they're good assumptions)

1. There is a steep difficulty increase in training NNs that act over larger time spans. 
2. This is the best metric to use as it outcompetes all other metrics when it comes to making useful predictions about the efficacy of NNs. 

I like the model and I think it's better than not having one. I do think it misses out on some of the things Steven Byrnes responds with. There's a danger of it being too much of a Procrustes bed or overfitted as specific subtasks and cognition that humans have evolved might be harder to replicate than others. The main bottlenecks might then not lay in the temporal planning distance but in something else.

My prior on the t-AGI not being overfitting is probably something like 60-80% due to the bitter lesson, which to some extent tells us that cognition can be replicated quite well with Deep Learning. So I tend to agree but I would have liked to see a bit more epistemic humility to this point I guess.

Comment by Jonas Hallgren on A freshman year during the AI midgame: my approach to the next year · 2023-04-14T09:50:06.709Z · LW · GW

I feel like this is trying to say something important but my brain isn't parsing it.

First and foremost, what categorisation are we talking about? Secondly, in what way are the categories framed in terms of social perception? Thirdly, what do you mean by direction and how does Buck confuse the direction?

(Sorry if this is obvious)

Comment by Jonas Hallgren on Power-Seeking = Minimising free energy · 2023-04-08T20:02:40.986Z · LW · GW

Sorry for not responding earlier, these are great points and it's taking me a bit of time to digest them.

I can say that with regards to the first point I'm uncertain what I mean myself. It is rather than I'm pointing out that these mechanics should exist in the internals of LLMs with some type of RL training. (Or to be more specific some form of internal agentic competition dynamics where an agent is defined as an entity that has is able to world-models based on action output.)

I will give you a more well thought out answer to your symbiosis argument in a bit. The only thing I want to say for now is that it seems to me that humans are non-symbiotic on average. Also shouldn't symbiosis only be productive if you have the same utility function? (reproductive fitness in ecology) I think a point here might be that symbiosis doesn't arise in AGI-human interactions for that reason.

Comment by Jonas Hallgren on Catching the Eye of Sauron · 2023-04-07T14:49:02.827Z · LW · GW

I will say that I thought Connor Leahy's talk on ML Street Talk was amazing and that we should if possible make Connor go on Lex Fridman?

The dude looks like a tech wizard and is smart, funny, charming and a short timeline doomer. What else do you want?

Anyway we should create a council of charming doomers or something and send them at the media, it would be very epic. (I am in full agreement with this post btw)

Comment by Jonas Hallgren on Beren's "Deconfusing Direct vs Amortised Optimisation" · 2023-04-05T17:18:06.842Z · LW · GW

When reading this, I have a question of where between a quantiliser and optimiser amortised optimisation lies. Like, how much do we run into maximised VNM-utility style problems if we were to scale this up into AGI-like systems?

My vibe is that it seems less maximising than a pure RL version would, but then again, I'm not certain to what extent optimising for function approximation is different from optimising for a reward.