Posts

Model Integrity: MAI on Value Alignment 2024-12-05T17:11:31.707Z
Reprograming the Mind: Meditation as a Tool for Cognitive Optimization 2024-01-11T12:03:41.763Z
How well does your research adress the theory-practice gap? 2023-11-08T11:27:52.410Z
Jonas Hallgren's Shortform 2023-10-11T09:52:20.390Z
Advice for new alignment people: Info Max 2023-05-30T15:42:20.142Z
Respect for Boundaries as non-arbirtrary coordination norms 2023-05-09T19:42:13.194Z
Max Tegmark's new Time article on how we're in a Don't Look Up scenario [Linkpost] 2023-04-25T15:41:16.050Z
The Benefits of Distillation in Research 2023-03-04T17:45:22.547Z
Power-Seeking = Minimising free energy 2023-02-22T04:28:44.075Z
Black Box Investigation Research Hackathon 2022-09-12T07:20:34.966Z
Announcing the Distillation for Alignment Practicum (DAP) 2022-08-18T19:50:31.371Z
Does agent foundations cover all future ML systems? 2022-07-25T01:17:11.841Z
Is it worth making a database for moral predictions? 2021-08-16T14:51:54.609Z
Is there any serious attempt to create a system to figure out the CEV of humanity and if not, why haven't we started yet? 2021-02-25T22:06:04.695Z

Comments

Comment by Jonas Hallgren on o3 · 2024-12-20T20:38:12.373Z · LW · GW

Extremely long chain of thought, no?

Comment by Jonas Hallgren on Jonas Hallgren's Shortform · 2024-12-20T07:25:50.580Z · LW · GW

Yes, problems, yes, people are being really stupid, yes, inner alignment and all of it's cousins are really hard to solve. We're generally a bit fucked, I agree. The brickwall is so high we can't see the edge and we have to bash out each brick one at a time and it is hard, really hard.

I get it people, and yet we've got a shot, don't we? The probability distribution of all potential futures is being dragged towards better futures because of the work you put in and I'm very grateful for that.

Like, I don't know how much credit to give LW and the alignment community for the spread of alignment and AI Safety as an idea but we've literally go tnoble prize winners talking about this shit now. Think back 4 years, what the fuck? How did this happen? 2019 -> 2024 has been an absolutely insane amount of change in the world especially from an AI Safety perspective.

How do we have over 4 AI Safety Institutes in the world? It's genuinely mindboggling to me and I'm deeply impressed and inspired, which I think that you also should be.

Comment by Jonas Hallgren on Jonas Hallgren's Shortform · 2024-12-20T07:04:14.344Z · LW · GW

I just saw a post from AI Digest on a Self-Awareness benchmark and I just thought, "holy fuck, I'm so happy someone is on top of this".

I noticed a deep gratitude for the alignment community for taking this problem so seriously. I personally see many good futures but that’s to some extent built on the trust I have in this community. I'm generally incredibly impressed by the rigorous standards of thinking, and the amount of work that's been produced.

When I was a teenager I wanted to join a community of people who worked their ass off in order to make sure humanity survived into a future in space and I'm very happy I found it.

So thank every single one of you working on this problem for giving us a shot at making it.

(I feel a bit cheesy for posting this but I want to see more gratitude in the world and I noticed it as a genuine feeling so I felt fuck it, let’s thank these awesome people for their work.)

Comment by Jonas Hallgren on Jonas Hallgren's Shortform · 2024-12-18T08:51:11.224Z · LW · GW

Could someone please safety pill the onion? I think satire is the best way to deal with people being really stupid and so I want more of this as an argument when talking with the e/acc gang: https://youtu.be/s-BducXBSNY?si=j5f8hNeYFlBiWzDD

(Also if they already have some AI stuff, feel free to link that too)

Comment by Jonas Hallgren on A Public Choice Take on Effective Altruism · 2024-12-16T09:43:24.403Z · LW · GW

I guess the solution that you're more generally pointing at here is something like ensuring a split in the incentives of the people within the specific fields and EA itself as a movement. Almost a bit like making that part of EA only be global priorities research and something like market allocation? 

I have this feeling that there might be other ways to go about doing this with like programs or incentives for making people be more open to taking any type of impactful job? Something like having reoccuring reflection periods or other types of workshops/programs? 

Comment by Jonas Hallgren on A Public Choice Take on Effective Altruism · 2024-12-15T19:50:36.852Z · LW · GW

Good post, did you also cross post to the forum? Also do you have any thoughts on what to do differently in order to enable more exploration and less lock in?

Comment by Jonas Hallgren on Subskills of "Listening to Wisdom" · 2024-12-11T13:09:03.520Z · LW · GW

Yeah sure!

So, I've had this research agenda into agent foundations for a while which essentially mirrors developmental interpretability a bit in that it wants to say things about what a robust development process is rather than something about post-training sampling. 

The idea is to be able to predict "optimisation daemons" or inner optimisers as they arise in a system.

The problem that I've had is that it is very non-obvious to me what a good mathematical basis for this is. I've read through a lot of the existent agent foundations literature but I'm not satisfied with finite factored sets nor the existing boundaries definitions since they don't tell you about the dynamics. 

What I would want is a dynamical systems inspired theory of the formation of inner misalignment. It's been in my head in the background for almost 2 years now and it feels really difficult to make any progress, from time to time I have a thought that brings me closer but I don't usually make it closer by just thinking about it. 

I guess something I'm questioning in my head is the deliberate practice versus exploration part of this. For me this is probably the hardest problem I'm working on and whilst I could think more deliberately on what I should be doing here I generally follow my curiosity, which I think has worked better than deliberate practice in this area?

I'm currently following a strategy where this theoretical foundation is on the side whilst I build real world skills of running organisations, fundraising, product-building and networking. I then from time to time find some gems such as applied category theory or Michael Levin's work on Boundaries in cells and Active Inference that I find can really help elucidate some of the deeper foundations of this problem. 

I do feel like I'm floating more here, going with the interest and coming back to the problems over time in order to see if I've unlocked any new insights. This feels more like flow than it does deliberate practice? Like I'm building up my skills of having loose probability clouds and seeing where they guide me?

I'm not sure if you agree that this is the right strategy but I guess that there's this frame difference between a focus on the emotional, intuition or research taste side of things versus the deliberate practice side of things?

Comment by Jonas Hallgren on Subskills of "Listening to Wisdom" · 2024-12-09T09:25:57.110Z · LW · GW

First and foremost, it was quite an interesting post and my goal of the comment is to try to connect my own frame of thinking with the one presented here. My main question is about the relationship between emotions/implicit thoughts and explicit thinking.

My first thought was on the frame of thinking versus feeling and how these flow together. If we think of emotions as probability clouds that tell us whether to go in one direction or another, we can see them as systems for making decisions in highly complex environments, such as when working on impossible problems.

I think something like research taste is exactly this - highly trained implicit thoughts and emotions. Continuing from something like tuning your cognitive systems, I notice that this is mostly done with System 2 and I can't help but feel that it's missing some System 1 stuff here.

I will give an analogy similar to a meditation analogy as this is the general direction I'm pointing in:

If we imagine that we're faced with a wall of rock, it looks like a very big problem. You're thinking to yourself, "fuck, how in the hell are we ever going to get past that thing?"

So first you just approach it and you start using a pickaxe to hack away at it, you make some local progress yet it is hard to reflect on where to go. You think hard, what are the properties of this rock that allows me to go through it faster?

You continue yet you're starting to feel discouraged as you're not making any progress, you think to yourself "Fuck this goddamn rock man, this shit is stupid."

You're not getting any feedback since it is an almost impossible problem.

Above is the base analogy, following are two points on the post from this analogy:

1.
Let's start with a continuation to the analogy, imagine that your goal, the thing behind huge piece of rock is a source of gravity and you're water. 

You're continuously striving towards it yet the way that you do it is that you flow over the surface. You're probing for holes in the rock, crevices that run deep, structural instability in the rock yet you're not thinking - you're feeling it out. You're flowing in the problem space, allowing implicit thoughts and emotions guide you and from time to time you make a cut. Yet your evaluation loop is a lot longer than your improvement loop. It doesn't matter if you haven't found anything yet because gravity is pulling you in that direction and if you succeed is a question of finding the crevice rather than your individual successes with your pickaxe. 

You apply all the rules of local gradient search and similar, you're not a stupid fluid yet you're fine with failing because you know it gives you information about where the crevice might be, and it isn't until you find it that you will make major progress.

2.
If you have other people with you then you can see what others are doing and check whether your strategies are stupid or not. They give you an appropriate measuring stick for working on an impossible problem. You may not know how well you're doing in solving the problem but you know your relative rating and so you can get feedback through that (as long as it is causally related to the problem you're solving).

 

What are your thoughts on the trade-off between emotional understanding and more hardcore system 2 thinking? If one applies the process above, do you think there's something that is missed out? 


 

Comment by Jonas Hallgren on Cognitive Work and AI Safety: A Thermodynamic Perspective · 2024-12-09T08:35:12.628Z · LW · GW

Good stuff! 

I'm curious if you have any thoughts on the computational foundations one would need to measure and predict cognitive work properly? 

In Agent Foundations, you've got this idea of boundaries which can be seen as one way of saying a pattern that persists over time. One way that this is formalised in Active Inference is through Markov Blankets and the idea that any self-persistent entity could be described as a markov blanket minimizing the free energy of its environment. 

My thinking here is that if we apply this properly it would allow us to generalise notions of agents beyond what we normally think of them and instead see them as any sort of system that follows this definition. 

For example, we could look at an institution or a collective of AIs as a self-consistent entity applying cognitive work on the environment to survive. The way to detect these collectives would be to look at what self-consistent entities are changing the "optimisation landscape" or "free energy landscape" around it the most. This would then give us the most highly predictive agents in the local environment. 

A nice thing for is that it centers the cognitive work/optimisation power applied in the analysis and so I'm thinking that it might be more predictive of future dynamics of cognitive systems as a consequence? 

Another example is if we continue on the Critch train, some of his later work includes TASRA for example. We can see these as stories of human disempowerment, that is patterns that lose their relevance over time as they get less causal power over future states. In other words, entities that are not under the causal power of humans increasingly take over the cognitive work lightcone/the inputs to the free energy landscape.

As previously stated, I'm very interested to hear if you've got more thoughts on how to measure and model cognitive work. 

 

Comment by Jonas Hallgren on Model Integrity: MAI on Value Alignment · 2024-12-07T19:28:48.871Z · LW · GW

No I do think we care about the same thing, I just believe that this will happen in a multi-polar setting and so I believe that new forms of communication and multi-polar dynamics will be important for this.

Interpretability of these things is obviously important for changing those dynamics. ELK and similar things are important for the single agent case, why wouldn't they be important for a multi-agent case?

Comment by Jonas Hallgren on Natural Abstractions: Key claims, Theorems, and Critiques · 2024-12-06T14:35:46.032Z · LW · GW

I find myself going back to this post again and again for explaing the Natural Abstraction Hypothesis. When this came out I was very happy as I finally had something I could share on John's work that made people understand it within one post.

Comment by Jonas Hallgren on We don't understand what happened with culture enough · 2024-12-06T14:34:40.477Z · LW · GW

I personally believe that this post is very important for claims between Shard Theory vs Sharp Left Turn. I often find that other perspectives on the deeper problems in AI alignment are expressed and I believe this to be a lot more nuanced take compared to Quentin Pope's essay on the Sharp Left Turn as well as the MIRI conception of evolution.

This is a field of study and we don't know what is going on, the truth is somewhere in between and acknowledging anything else is not being epistemically humble.

Comment by Jonas Hallgren on Careless talk on US-China AI competition? (and criticism of CAIS coverage) · 2024-12-06T14:32:21.782Z · LW · GW

Mostly, I think it should be acknowledged that certain people saw dynamics developing beforehand and called it out. This is not a highly upvoted post but with the recent uptick in US vs China rhetoric it seems good to me to give credit where credit is due.

Comment by Jonas Hallgren on Model Integrity: MAI on Value Alignment · 2024-12-06T10:56:43.214Z · LW · GW

There's also always the possibility that you can elicit these sorts of goals and values from instructions and create a instruction based language around it that's also relatively interpretable in what values are being prioritised in a multi-agent setting. 

You do however get into ELK and misgeneralization problems here, IRL is not an easy task in general but there might be some neurosymbolic approaches that changes prompts to follow specific values? 

I'm not sure if this is jibberish or not for you but my main frame for the next 5 years is "how do we steer collectives of AI agents in productive directions for humanity".

Comment by Jonas Hallgren on Model Integrity: MAI on Value Alignment · 2024-12-06T10:30:16.603Z · LW · GW

Okay, so when I'm talking about values here, I'm actually not saying anything about policies as in utility theory or generally defined preference orderings.

I'm rather thinking of values as a class of locally arising heuristics or "shards" if you like that language that activate a certain set of belief circuits in the brain and similarly in an AI.

What do you mean more specifically when you say an instruction here? What should that instruction encompass? How do we interpret that instruction over time? How can we compare instructions to each other?

I think that instructions will become too complex to have good interpretability into especially for more complex multi-agent settings. How do we create interpretable multi-agent systems that we can change over time? I don't believe that direct instruction tuning will be enough as you will have this problem that is for example described in Cooperation and Control in Delegation Games with AIs each having one person they get an instruction from but this not telling us anything about the multi-agent cooperation abilities of the agents in play. 

I think this line of reasoning is valid for AI agents acting in a multi-agent setting where they gain more control over the economy through integration with general humans. 

I completely agree with you that doing "pure value learning" is not the best right now but I think we need work in this direction to retain control over multiple AI Agents working at the same time. 

I think deontology/virtue ethics makes societies more interpretable and corrigible, does that make sense? Also, I have this other belief that this will be the case and that it is more likely to get a sort of "cultural, multi-agent take-off" compared to a single agent. 

Curious to hear what you have to say about that!

Comment by Jonas Hallgren on Model Integrity: MAI on Value Alignment · 2024-12-05T22:27:55.964Z · LW · GW

I will try to give a longer answer tomorrow (11 pm my time now) but essentially I believe it will be useful for agentic AI with "heuristic"-like policies. I'm a bit uncertain about the validity of instruction like approaches here and for various reasons I believe multi-agent coordination will be easier through this method.

Comment by Jonas Hallgren on Jonas Hallgren's Shortform · 2024-12-02T17:54:26.571Z · LW · GW

I believe that I have discovered the best use of an LLM to date. This is a conversation about pickles and collective intelligence located at the colossuem 300 BCE. It involves many great characters, I found it quite funny. This is what happens when you go to far into biology inspired approaches for AI Safety...

The Colosseum scene intensifies

Levin: completely fixated on a pickle "But don't you see? The bioelectric patterns in pickle transformation could explain EVERYTHING about morphogenesis!"

Rick: "Oh god, what have I started..."

Levin: eyes wild with discovery "Look at these gradient patterns! The cucumber-to-pickle transformation is a perfect model of morphological field changes! We could use this to understand collective intelligence!"

Nick Lane portal-drops in Lane: "Did someone say bioelectric gradients? Because I've got some THOUGHTS about proton gradients and the origin of life..."

Levin: grabs Lane's shoulders "NICK! Look at these pickles! The proton gradients during fermentation... it's like early Earth all over again!"

Rick: takes a long drink "J-just wait until they discover what happens in dimension P-178 where all life evolved from pickles..."

Feynman: still drawing diagrams "The quantum mechanics of pickle-based civilization is fascinating..."

Levin: now completely surrounded by pickles and bioelectric measurement devices "See how the salt gradient creates these incredible morphogenetic fields? It's like watching the origin of multicellularity all over again!"

Lane: equally excited "The chemiosmotic coupling in these pickles... it's revolutionary! The proton gradients during fermentation could power collective computation!"

Doofenshmirtz: "BEHOLD, THE PICKLE-MORPHOGENESIS-INATOR!" Morty: "Aw geez Rick, they're really going deep on pickle science..." Lane: "But what if we considered the mitochondrial implications..."

Levin: interrupting "YES! Mitochondrial networks in pickle-based collective intelligence systems! The bioelectric fields could coordinate across entire civilizations!" 
Rick: "This is getting out of hand. Even for me." 
Feynman: somehow still playing bongos "The mathematics still works though!" 
Perry the Platypus: has given up and is now taking detailed notes 
Lane: "But wait until you hear about the chemiosmotic principles of pickle-based social organization..."

Levin: practically vibrating with excitement "THE PICKLES ARE JUST THE BEGINNING! We could reshape entire societies using these bioelectric principles!" 
Roman Emperor: to his scribe "Are you getting all this down? This could be bigger than the aqueducts..." Rick: "Morty, remind me never to show scientists my pickle tech again." 
Morty: "You say that every dimension, Rick." 
Doofenshmirtz: "Should... should we be worried about how excited they are about pickles?" 
Feynman: "In my experience, this is exactly how the best science happens." 
Meanwhile, Levin and Lane have started drawing incredibly complex pickle-based civilization diagrams that somehow actually make sense...

Comment by Jonas Hallgren on How to use bright light to improve your life. · 2024-11-28T07:14:01.260Z · LW · GW

This has worked great btw! Thank you for the tip, I consistently get more deep sleep and around 10% more sleep with higher average quality, it's really good!

Comment by Jonas Hallgren on How to use bright light to improve your life. · 2024-11-19T16:56:30.268Z · LW · GW

Any reason for the timing window being 4 hours before instead of 30 min to 1 hour? Most of the stuff I've heard is around half an hour to an hour before bed, I'm currently doing this with 0.3ish mg (I divide a 1 mg tablet in 3) of melatonin.

Comment by Jonas Hallgren on Leon Lang's Shortform · 2024-11-18T16:34:34.391Z · LW · GW

If you look at the Active Inference community there's a lot of work going into PPL-based languages to do more efficient world modelling but that shit ain't easy and as you say it is a lot more compute heavy.

I think there'll be a scaling break due to this but when it is algorithmically figured out again we will be back and back with a vengeance as I think most safety challenges have a self vs environment model as a necessary condition to be properly engaged. (which currently isn't engaged with LLMs wolrd modelling)

Comment by Jonas Hallgren on OpenAI Email Archives (from Musk v. Altman and OpenAI blog) · 2024-11-17T08:21:13.408Z · LW · GW

Do you have any thoughts on what this actionably means? For me it seems a bit like being able to influence such coversations is potentially a bit intractable but maybe one could host forums and events for this if one has the right network?

I think it's a good point and I'm wondering about how it actionably looks, I can see it for someone with the right contacts and so the message for people who don't have that is to create it or what are your thoughts there?

Comment by Jonas Hallgren on Jonas Hallgren's Shortform · 2024-11-14T10:06:20.293Z · LW · GW

Okay, so I don't have much time to write this so bear with the quality but I thought I would say one or two things of the Yudkowsky and Wolfram discussion as someone who's at least spent 10 deep work hours trying to understand Wolfram's persepective of the world.

With some of the older floating megaminds like Wolfram and Friston who are also phycisists you have the problem that they get very caught up in their own ontology.

From the perspective of a phycisist morality could be seen as an emergent property of physical laws.

Wolfram likes to think of things in terms of computational reducibility, a way this can be described in the agent foundations frame is that the agent modelling the environment will be able to predict the world dependent on it's own speed. It's like some sort of agent-environment relativity where the information processing capacity determines the space of possible ontologies. An example of this being how if we have an intelligence that's a lot closer to operating at the speed of light, the visual field might not be a useful vector of experience to model.

Another way to say it is that there's only modelling and modelled. An intuition from this frame is that there's only differently good models of understanding specific things and so the concept of general intelligence becomes weird here.

IMO this is like the problem of the first 2 hours of the conversation, to some extent Wolfram doesn't engage with the huamn perspective as much nor any ought questions. He has a very physics floating megamind perspective.

Now, I personally believe there's something interesting to be said about an alternative hypothesis to the individual superintelligence that comes from theories of collective intelligence. If a superorganism is better at modelling something than an individual organism is then it should outcompete the others in this system. I'm personally bullish on the idea that there are certain configurations of humans and general trust-verifying networks that can outcompete individual AGI as the outer alignment functions would enforce the inner functions enough.

Comment by Jonas Hallgren on Abstractions are not Natural · 2024-11-04T17:15:26.362Z · LW · GW

But, to help me understand what people mean by the NAH could you tell me what would (in your view) constitute strong evidence against the NAH? (If the fact that we can point to systems which haven't converged on using the same abstractions doesn't count)

 

Yes sir! 

So for me it is about looking at a specific type of systems or a specific type of system dynamics that encode the axioms required for the NAH to be true. 

So, it is more the claim that "there are specific set of mathematical axioms that can be used in order to get convergence towards similar ontologies and these are applicable in AI systems."

For example, if one takes the Active Inference lens on looking at concepts in the world, we generally define the boundaries between concepts as markov blankets. Suprisingly or not, markov blankets are pretty great for describing not only biological systems but also AI and some economic systems. The key underlying invariant is that these are all optimisation systems. 

p(NAH|Optimisation System).

So if we for example, with the perspective of markov blankets or the "natural latents" (which are functionals that work like markov blankets) don't see convergence in how different AI systems represent reality then I would say that the NAH has been disproven or that it is evidence against it. 

I do however think that this exists on a spectrum and that it isn't fully true or false, it is true for a restricted set of assumptions, the question being how restricted that is.

I see it more as a useful frame of viewing agent cognition processes rather than something I'm willing to bet my life on. I do think it is pointing towards a core problem similar to what ARC Theory are working on but in a different way, understanding cognition of AI systems.

Comment by Jonas Hallgren on Liquid vs Illiquid Careers · 2024-11-04T15:04:57.803Z · LW · GW

Yeah, that was what I was looking for, very nice.

It does seem to verify what I was thinking with that you can't really do the same bet strategy as VCs. I do really also appreciate the thoughts in there, they seem like things one should follow, I gotta make sure to do the last due dilligence part of talking to people that have worked with others in the past, it has always felt like a lot but you're right in that one should do it.

Also, I'm considering why there isn't some sort of bet pooling network for startup founders where you have like 20 people go together and say that they will all try out ambitious projects and support each other if they fail. It's like startup insurance but from the perspective of people doing startups. Of course you have to trust the others there and stuff but I think this should work?

Comment by Jonas Hallgren on Abstractions are not Natural · 2024-11-04T14:44:02.619Z · LW · GW

Okay, what I'm picking up here is that you feel that the natural abstractions hypothesis is quite trivial and that it seems like it is naively trying to say something about how cognition works similar to how physics work. Yet this is obviously not true since development in humans and other animals clearly happen in different ways, why would their mental representations converge? (Do correct me if I misunderstood)

Firstly, there's something called the good regulator theorem in cybernetics and our boy that you're talking about, Mr Wentworth, has a post on making it better that might be useful for you to understand some of the foundations of what he's thinking about. 

Okay, why is this useful preamble? Well, if there's convergence in useful ways of describing a system then there's likely some degree of internal convergence in the mind of the agent observing the problem. Essentially this is what the regulator theorem is about (imo)

So when it comes to the theory, the heavy lifting here is actually not really done by the Natural Abstractions Hypothesis part that is the convergence part but rather the Redundant Information Hypothesis

It is proving things about the distribution of environments as well as power laws in reality that makes the foundation of the theory compared to just stating that "minds will converge". 

This is at least my understanding of NAH, does that make sense or what do you think about that?

Comment by Jonas Hallgren on johnswentworth's Shortform · 2024-10-28T08:28:23.120Z · LW · GW

Hmm, I find that I'm not fully following here. I think "vibes" might be thing that is messing it up.

Let's look at a specific example: I'm talking to a new person at an EA-adjacent event and we're just chatting about how the last year has been. Part of the "vibing" here might be to hone in on the difficulties experienced in the last year due to a feeling of "moral responsibility", in my view vibing doesn't have to be done with only positive emotions?

I think you're bringing up a good point that commitments or struggles might be something that bring people closer than positive feelings because you're more vulnerable and open as well as broadcasting your values more. Is this what you mean with shared commitments or are you pointing at something else?

Comment by Jonas Hallgren on johnswentworth's Shortform · 2024-10-27T20:28:44.481Z · LW · GW

Generally fair and I used to agree, I've been looking at it from a bit of a different viewpoint recently.

If we think of a "vibe" of a conversation as a certain shared prior that you're currently inhabiting with the other person then the free association game can rather be seen as a way of finding places where your world models overlap a lot.

My absolute favourite conversations are when I can go 5 layers deep with someone because of shared inference. I think the vibe checking for shared priors is a skill that can be developed and the basis lies in being curious af.

There's apparently a lot of different related concepts in psychology about holding emotional space and other things that I think just comes down to "find the shared prior and vibe there".

Comment by Jonas Hallgren on Liquid vs Illiquid Careers · 2024-10-22T19:37:25.886Z · LW · GW

No sorry, I meant from the perspective of the person with less legible skills.

Comment by Jonas Hallgren on Liquid vs Illiquid Careers · 2024-10-22T12:49:08.049Z · LW · GW

Amazing post, I really enjoyed the perspective explored here.

An extension that might be useful for me as an illiquid path enjoyer is what arbitrage or risk-reduction opportunities you see existing out there?

VCs can get by by doing a lot of smaller bets and if you want to be anti-fragile as an illiquid bet it becomes quite hard as you're part of the cogs in the anti-fragile system. What Taleb says about that is that then these people should be praised because they dare to take on that risk. But there has to be some sort of system one could for example develop with peers and similar?

What is the many bets risk reduction strat here, is it just to make a bunch of smaller MVPs to gain info?

I would be very curious to hear your perspective on this.

Comment by Jonas Hallgren on Jonas Hallgren's Shortform · 2024-10-22T12:26:08.072Z · LW · GW

I thought this was an interesting take on the Boundaries problem in agent foundations from the perspective of IIT. It is on the amazing Michael Levin's youtube channel: https://www.youtube.com/watch?app=desktop&v=5cXtdZ4blKM

One of the main things that makes it interesting to me is that around 25-30 mins in, ot computationally goes through the main reason why I don't think we will have agentic behaviour from AI in at least a couple of years. GPTs just don't have a high IIT Phi value. How will it find it's own boundaries? How will it find the underlying causal structures that it is part of? Maybe this can be done through external memory but will that be enough or do we need it in the core stack of the scaling-based training loop?

A side note is that, one of the main things that I didn't understand about IIT before was how it really is about looking at meta-substrates or "signals" as Douglas Hofstadter would call them are optimally re-organising themselves to be as predictable for themselves in the future. Yet it does and it integrates really well into ActInf (at least to the extent that I currently understand it.)

Comment by Jonas Hallgren on Cipolla's Shortform · 2024-10-21T15:34:04.194Z · LW · GW

Okay, so I would say that I atleast have some experience of going from being not that agentic to being more agentic and the stuff that I think worked the best for me was to generally think of my life as a system. This has been the focus of my life over the last 3 years.

More specifically the process that has helped so far for me has been to:

  1. Throw myself into high octane projects and see what I needed to keep up.
    1. Burn out and realise, holy shit, how do these people do it?
      1. (Environment is honestly really important, I've tried out a bunch of different working conditions and your motivation levels can wary drastically.)
  2. Started looking into the reasons for why this might be that I can't do it and other can.
    1. Went into absolutely optimising the shit out of my health by tracking stuff using bearable and listening to audiobooks and podcasts, Huberman is a house god of mine.
      1. (Sleep is the most important here, crazy right?)
      2. Supplement and technique tips for sleep:
        1. Glycine, Ashwagandha, Magnesium Citrate
        2. Use a sad lamp within 30 minutes of waking
        3. Yoga Nidras for naps and for falling asleep faster.
      3. Also checkout my biohackers in-depth guide on this at https://desmolysium.com/
        1. He's got a phd in medicine and is quite the experimental and smart person. (He tries a bunch of shit on himself and sees how it goes.)
    2. Started going into my psychological background and talked to CBT therapists as well as meditating a lot.
      1. I'm like 1.5k hours into this at this point and it has completely changed my life and my view of myself and what productivity means, e.t.c.
      2. It has helped me realise that a lot of the behaviours that made me less productive where based on me being a sensitive person and having developed unhealthy coping mechanisms.
      3. This lead to me having to relive through past traumas whilst having compassion and acceptance for myself.
      4. This has now lead me to having good mechanisms instead of bad ones, It made me remove my access to video games and youtube (willingly!)
      5. For me this has been the most important, Waking up and The Mind Illuminated up until stage 6-7 is the recommendation I have for anyone who wants to start. Also, after 3-6 months of TMI, try to go to a 10 day retreat, especially if you can find a metta retreat. (Think of this as caring and acceptance instead of loving-kindness btw, it helps)
    3. Now I generally, have a strict schedule in terms of when I can do different things during the day.
      1. The app appblock can allow you to block apps and device settings which means you can't actually deblock them on your phone.
      2. Cold turkey on the computer can do the same and if you find a patch through another app you can just patch that by blocking the new app.
      3. I'm just not allowed to be distracted from the systems that I have.
    4. Confidence:
      1. I feel confident in myself and what I want to do in the world not because I don't have issues but rather because I know where my issues are and how to counteract them.
      2. The belief is in the process rather than the outcomes. Life is poker, you just gotta optimise the way you play your hands, the EV will come. 

Think of yourself as a system and optimise the shit out of it. Weirdly enough, this has made me focus a lot more on self-care than I did before. 

Of course, it's a work in progress but I want to say that it is possible and that you can do it. 

Also, randomly, here's a CIV VI analogy for you on why self-care is op. 

If you want to be great at CIV, one of the main things to do is to increase your production and economics as fast as possible. This leads to an exponential curve where the more production and economy you have the more you can produce. This is why CIV pros in general rush Commercial Hubs and markets as internal trade routes yield more production. 

Your production is based on your psychological well being and the general energy levels that you have. If you do a bunch of tests on this and figure out what works for you, then you have even more production stats. This leads to more and more of that over time until you plateau at the end of that logistic growth. 

Best of luck!

Comment by Jonas Hallgren on The Hopium Wars: the AGI Entente Delusion · 2024-10-14T08:24:58.666Z · LW · GW

When it comes to formal verification I'm curious what you think about the heuristic argument line of research that ARC are approaching?:

https://www.lesswrong.com/posts/QA3cmgNtNriMpxQgo/research-update-towards-a-law-of-iterated-expectations-for

It isn't formal verification in the same sense of the word but rather probabilistic verification if that makes sense?

You could then apply something like control theory methods to ensure that the expected divergence from the heuristic is less than a certain percentage in different places. In the limit it seems to me that this could be convergent towards formal verification proofs, it's almost like swiss cheese style on the model level?

(Yes, this comment is a bit random with respect to the rest of the context but I find it an interesting question for control in terms of formal verification and it seemed like you might have some interesting takes here.)

Comment by Jonas Hallgren on Laziness death spirals · 2024-10-07T07:22:43.279Z · LW · GW

I use the waking up app but you can search for "nsdr" on youtube. 20 mins are the timeframe I started with but you can try other timeframes as well.

Comment by Jonas Hallgren on A Path out of Insufficient Views · 2024-09-25T07:02:42.324Z · LW · GW

This does seem kind of correct to me?

Maybe you could see the fixed points that OP is pointing towards as priors in the search process for frames.

Like, your search is determined by your priors which are learnt through your upbringing. The problem is that they're often maladaptive and misleading. Therefore, working through these priors and generating new ones is a bit like relearning from overfitting or similar.

Another nice thing about meditation is that it sharpens your mind's perception which makes your new priors better. It also makes you less dependent on attractor states you could have gotten into from before since you become less emotionally dependent on past behaviour. (there's obviously more complexity here) (I'm referring to dependent origination for you meditators out there)

It's like pruning the bad data from your dataset and retraining your model, you're basically guaranteed to find better ontologies from that (or that's the hope at least).

Comment by Jonas Hallgren on A Path out of Insufficient Views · 2024-09-25T06:54:37.140Z · LW · GW

I'm currently in the process of releasing more of my fixed points through meditation and man is it a weird process. It is very fascinating and that fundamental openness to moving between views seems more prevalent. I'm not sure that I fully agree with you on the all-in part but cudos for trying!

I think it probably makes sense to spend earlier years doing this cognition training and then using that within specific frames to gather the bits of information that you need to solve problems.

Frames are still useful to gather bits of information through so don't poopoo the mind!

Otherwise, it was very interesting to hear about your journey!

Comment by Jonas Hallgren on Laziness death spirals · 2024-09-20T13:54:47.835Z · LW · GW

Sleep is a banger reset point for me and therefore doing a nap/yoga nidra and then picking up the day from there if I notice myself avoiding things has been really helpful for me.

Thanks for the post, it was good.

Comment by Jonas Hallgren on Skills from a year of Purposeful Rationality Practice · 2024-09-18T19:30:27.073Z · LW · GW

Random extra tip on naps is doing a yoga nidra or non sleep deep rest. You don't have to fall asleep to get the benefits of a nap+. It also has some extra growth hormone release and dopamine generation afterwards. (Huberman bro, out)

Comment by Jonas Hallgren on Lucius Bushnaq's Shortform · 2024-09-18T14:22:44.320Z · LW · GW

In natural langage maybe it would be something like "given these ontological boundaries, give us the best estimate you can of CEV. "?

It seems kind of related to boundaries as well if you think of natural latents as "functional markov blankets" that cut reality at it's joints then you could probably say that you want to perserve part of that structure that is "human agency" or similar. I don't know if that makes sense but I like the idea direction!

Comment by Jonas Hallgren on Michael Dickens' Caffeine Tolerance Research · 2024-09-04T21:51:47.679Z · LW · GW

I've been running a bunch of experiments on this myself and I think it's true that if you don't go above doing it every other day on average you don't get addicted. You still get homeostasis effects of being more tired (more adenocine receptors) without coffee. I think it's therefore a very good positive reinforcer for productive behaviour, especially if used strategically.

Comment by Jonas Hallgren on Am I confused about the "malign universal prior" argument? · 2024-08-28T12:19:08.109Z · LW · GW

I have actually never properly understood the universal prior argument in the first place and just seeing this post made me able to understand parts of it now so thank you for writing it! 

Comment by Jonas Hallgren on Rabin's Paradox · 2024-08-14T06:53:38.252Z · LW · GW

I think there are some interesting things in for example analysing how large of a pot you should enter if you're a professional poker player based on your current spendable wealth. I think the general theory is to not go above 1/100th and so it my actually be rational for the undergraduates not to want to take the first option.

Here's a taleb (love him, hate him) video on how that comes about: https://youtu.be/91IOwS0gf3g?si=rmUoS55XvUqTzIM5

Comment by Jonas Hallgren on Dalcy's Shortform · 2024-08-12T06:44:18.757Z · LW · GW

I think the update makes sense in general, isn't there however some way mutual information and causality is linked? Maybe it isn't strong enough for there to be an easy extrapolation from one to the other.

Also I just wanted to drop this to see if you find it interesting, kind of on this topic? Im not sure its fully defined in a causality based way but it is about structure preservation.

https://youtu.be/1tT0pFAE36c?si=yv6mbswVpMiywQx9

Active Inference people also have the boundary problem as core in their work so they have some interesting stuff on it.

Comment by Jonas Hallgren on Rowing vs steering · 2024-08-10T20:43:54.167Z · LW · GW

Nice!

When it comes to the career thing, I've found that the model (which I think of as explore and exploit or compression and decompression periods in my head) has a nice back-and-forth between going hard and relaxing as mental modes. It allows me to have better mental health during rowing as I know that I have precommited to chilling out a bit more and doing some steering later.

Comment by Jonas Hallgren on steve2152's Shortform · 2024-08-10T20:36:58.115Z · LW · GW

Some meditators say that before you can get a good sense of non-self you first have to have good self-confidence. I think I would tend to agree with them as it is about how you generally act in the world and what consequences your actions will have. Without this the support for the type B that you're talking about can be very hard to come by.

Otherwise I do really agree with what you say in this comment.

There is a slight disagreement with the elaboration though, I do not actually think that makes sense. I would rather say that the (A) that you're talking about is more of a software construct than it is a hardware construct. When you meditate a lot, you realise this and get access to the full OS instead of just the specific software or OS emulator. A is then an evolutionary beneficial algorithm that runs a bit out of control (for example during childhood when we attribute all cause and effect to our "selves").

Meditation allows us to see that what we have previously attributed to the self was flimsy and dependent on us believing that the hypothesis of the self is true.

Comment by Jonas Hallgren on steve2152's Shortform · 2024-08-07T08:10:26.030Z · LW · GW

I won't claim that I'm constantly in a self of non-self, but as I'm writing this, I don't really feel that I'm locally existing in my body. I'm rather the awareness of everything that continuously arises in consciousness.

This doesn't happen all the time, I won't claim to be enlightened or anything but maybe this n=1 self-report can help?

Even from this state of awareness, there's still a will to do something. It is almost like you're a force of nature moving forward with doing what you were doing before you were in a state of presence awareness. It isn't you and at the same time it is you. Words are honestly quite insufficient to describe the experience, but If I try to conceptualise it, I'm the universe moving forward by itself. In a state of non-duality, the taste is often very much the same no matter what experience is arising.

There are some times when I'm not fully in a state of non-dual awareness when it can feel like "I" am pretending to do things. At the same time it also kind of feels like using a tool? The underlying motivation for action changes to something like acceptance or helpfulness, and in order to achieve that, there's this tool of the self that you can apply.

I'm noticing it is quite hard to introspect and try to write from a state of presence awareness at the same time but hopefully it was somewhat helpful?

Could you give me some experiments to try from a state of awareness? I would be happy to try them out and come back.

Extra (relation to some of the ideas): In the Mahayana wisdom tradition, explored in Rob Burbea's Seeing That Frees, there's this idea of emptiness, which is very related to the idea of non-dual perception. For all you see is arising from your own constricted view of experience, and so it is all arising in your own head. Realising this co-creation can enable a freedom of interpretation of your experiences.

Yet this view is also arising in your mind, and so you have "emptiness of emptiness," meaning that you're left without a basis. Therefore, both non-self and self are false but magnificent ways of looking at the world. Some people believe that the non-dual is better than the dual yet as my Thai Forest tradition guru Ajhan Buddhisaro says, "Don't poopoo the mind." The self boundary can be both a restricting and very useful concept, it is just very nice to have the skill to see past it and go back to the state of now, of presence awareness.

Emptiness is a bit like deeply seeing that our beliefs are built up from different axioms and being able to say that the axioms of reality aren't based on anything but probabilistic beliefs. Or seeing that we have Occam's razor because we have seen it work before, yet that it is fundamentally completely arbitrary and that the world just is arising spontaneously from moment to moment. Yet Occam's razor is very useful for making claims about the world.

I'm not sure if that connection makes sense, but hopefully, that gives a better understanding of the non-dual understanding of the self and non-self. (At least the Thai Forest one)

Comment by Jonas Hallgren on The need for multi-agent experiments · 2024-08-01T19:54:42.653Z · LW · GW

Good stuff! Thank you for writing this post!

A thing I've been thinking about when it comes to experimental evaluation places for multi-agent systems is that it might be very useful to do to increase institutional decision making power. You get two birds in one stone here as well.

On your point of simulated versus real data I think it is good to simulate these dynamics wherever we can, yet you gotta make sure you measure what you think you're measuring. To ensure this, you often gotta get that complex situation as the backdrop.

A way to combine the two worlds might be to run it in video games or similar where you already have players, maybe through some sort of minecraft server? (Since there's RL work there already?)

I also think real world interaction in decision making sytsems makes sense from a societal shock perspective that Yuval Noah Harari talks about sometimes. We want our institutions and systems to be able to adapt and so you need the conduits for ai based decision making built.

Comment by Jonas Hallgren on Closed Limelike Curves's Shortform · 2024-07-19T07:46:44.657Z · LW · GW

Well, it seems like this story might have to do something with it?: https://www.lesswrong.com/posts/3XNinGkqrHn93dwhY/reliable-sources-the-story-of-david-gerard

I don't know to what extent that is, though; otherwise, I agree with you.

Comment by Jonas Hallgren on On saying "Thank you" instead of "I'm Sorry" · 2024-07-09T12:22:05.742Z · LW · GW

Sorry if this was a bad comment!

Comment by Jonas Hallgren on On saying "Thank you" instead of "I'm Sorry" · 2024-07-08T14:43:17.437Z · LW · GW

Damn, thank you for this post. I will put this to practice immediately!

Comment by Jonas Hallgren on Finding the Wisdom to Build Safe AI · 2024-07-05T09:09:11.913Z · LW · GW

I resonated with the post and I think it's a great direction to draw inspiration from!

A big problem with goodharting in RL is that you're handcrafting a utility function. In the wisdom traditions, we're encouraged to explore and gain insights into different ideas to form our utility function over time.

Therefore, I feel that setting up the right training environment together with some wisdom principles might be enough to create wise AI.

We, of course, run into all of the annoying inner alignment and deception whilst training style problems, yet still, it seems the direction to go in. I don't think the orthogonality thesis is fully true or false, it is more dependent on your environment and if we can craft the right one I think we can have wise AI that wants to create the most loving and kind future imaginable.