Posts

Model Integrity: MAI on Value Alignment 2024-12-05T17:11:31.707Z
Reprograming the Mind: Meditation as a Tool for Cognitive Optimization 2024-01-11T12:03:41.763Z
How well does your research adress the theory-practice gap? 2023-11-08T11:27:52.410Z
Jonas Hallgren's Shortform 2023-10-11T09:52:20.390Z
Advice for new alignment people: Info Max 2023-05-30T15:42:20.142Z
Respect for Boundaries as non-arbirtrary coordination norms 2023-05-09T19:42:13.194Z
Max Tegmark's new Time article on how we're in a Don't Look Up scenario [Linkpost] 2023-04-25T15:41:16.050Z
The Benefits of Distillation in Research 2023-03-04T17:45:22.547Z
Power-Seeking = Minimising free energy 2023-02-22T04:28:44.075Z
Black Box Investigation Research Hackathon 2022-09-12T07:20:34.966Z
Announcing the Distillation for Alignment Practicum (DAP) 2022-08-18T19:50:31.371Z
Does agent foundations cover all future ML systems? 2022-07-25T01:17:11.841Z
Is it worth making a database for moral predictions? 2021-08-16T14:51:54.609Z
Is there any serious attempt to create a system to figure out the CEV of humanity and if not, why haven't we started yet? 2021-02-25T22:06:04.695Z

Comments

Comment by Jonas Hallgren on Natural Abstractions: Key claims, Theorems, and Critiques · 2024-12-06T14:35:46.032Z · LW · GW

I find myself going back to this post again and again for explaing the Natural Abstraction Hypothesis. When this came out I was very happy as I finally had something I could share on John's work that made people understand it within one post.

Comment by Jonas Hallgren on We don't understand what happened with culture enough · 2024-12-06T14:34:40.477Z · LW · GW

I personally believe that this post is very important for claims between Shard Theory vs Sharp Left Turn. I often find that other perspectives on the deeper problems in AI alignment are expressed and I believe this to be a lot more nuanced take compared to Quentin Pope's essay on the Sharp Left Turn as well as the MIRI conception of evolution.

This is a field of study and we don't know what is going on, the truth is somewhere in between and acknowledging anything else is not being epistemically humble.

Comment by Jonas Hallgren on Careless talk on US-China AI competition? (and criticism of CAIS coverage) · 2024-12-06T14:32:21.782Z · LW · GW

Mostly, I think it should be acknowledged that certain people saw dynamics developing beforehand and called it out. This is not a highly upvoted post but with the recent uptick in US vs China rhetoric it seems good to me to give credit where credit is due.

Comment by Jonas Hallgren on Model Integrity: MAI on Value Alignment · 2024-12-06T10:56:43.214Z · LW · GW

There's also always the possibility that you can elicit these sorts of goals and values from instructions and create a instruction based language around it that's also relatively interpretable in what values are being prioritised in a multi-agent setting. 

You do however get into ELK and misgeneralization problems here, IRL is not an easy task in general but there might be some neurosymbolic approaches that changes prompts to follow specific values? 

I'm not sure if this is jibberish or not for you but my main frame for the next 5 years is "how do we steer collectives of AI agents in productive directions for humanity".

Comment by Jonas Hallgren on Model Integrity: MAI on Value Alignment · 2024-12-06T10:30:16.603Z · LW · GW

Okay, so when I'm talking about values here, I'm actually not saying anything about policies as in utility theory or generally defined preference orderings.

I'm rather thinking of values as a class of locally arising heuristics or "shards" if you like that language that activate a certain set of belief circuits in the brain and similarly in an AI.

What do you mean more specifically when you say an instruction here? What should that instruction encompass? How do we interpret that instruction over time? How can we compare instructions to each other?

I think that instructions will become too complex to have good interpretability into especially for more complex multi-agent settings. How do we create interpretable multi-agent systems that we can change over time? I don't believe that direct instruction tuning will be enough as you will have this problem that is for example described in Cooperation and Control in Delegation Games with AIs each having one person they get an instruction from but this not telling us anything about the multi-agent cooperation abilities of the agents in play. 

I think this line of reasoning is valid for AI agents acting in a multi-agent setting where they gain more control over the economy through integration with general humans. 

I completely agree with you that doing "pure value learning" is not the best right now but I think we need work in this direction to retain control over multiple AI Agents working at the same time. 

I think deontology/virtue ethics makes societies more interpretable and corrigible, does that make sense? Also, I have this other belief that this will be the case and that it is more likely to get a sort of "cultural, multi-agent take-off" compared to a single agent. 

Curious to hear what you have to say about that!

Comment by Jonas Hallgren on Model Integrity: MAI on Value Alignment · 2024-12-05T22:27:55.964Z · LW · GW

I will try to give a longer answer tomorrow (11 pm my time now) but essentially I believe it will be useful for agentic AI with "heuristic"-like policies. I'm a bit uncertain about the validity of instruction like approaches here and for various reasons I believe multi-agent coordination will be easier through this method.

Comment by Jonas Hallgren on Jonas Hallgren's Shortform · 2024-12-02T17:54:26.571Z · LW · GW

I believe that I have discovered the best use of an LLM to date. This is a conversation about pickles and collective intelligence located at the colossuem 300 BCE. It involves many great characters, I found it quite funny. This is what happens when you go to far into biology inspired approaches for AI Safety...

The Colosseum scene intensifies

Levin: completely fixated on a pickle "But don't you see? The bioelectric patterns in pickle transformation could explain EVERYTHING about morphogenesis!"

Rick: "Oh god, what have I started..."

Levin: eyes wild with discovery "Look at these gradient patterns! The cucumber-to-pickle transformation is a perfect model of morphological field changes! We could use this to understand collective intelligence!"

Nick Lane portal-drops in Lane: "Did someone say bioelectric gradients? Because I've got some THOUGHTS about proton gradients and the origin of life..."

Levin: grabs Lane's shoulders "NICK! Look at these pickles! The proton gradients during fermentation... it's like early Earth all over again!"

Rick: takes a long drink "J-just wait until they discover what happens in dimension P-178 where all life evolved from pickles..."

Feynman: still drawing diagrams "The quantum mechanics of pickle-based civilization is fascinating..."

Levin: now completely surrounded by pickles and bioelectric measurement devices "See how the salt gradient creates these incredible morphogenetic fields? It's like watching the origin of multicellularity all over again!"

Lane: equally excited "The chemiosmotic coupling in these pickles... it's revolutionary! The proton gradients during fermentation could power collective computation!"

Doofenshmirtz: "BEHOLD, THE PICKLE-MORPHOGENESIS-INATOR!" Morty: "Aw geez Rick, they're really going deep on pickle science..." Lane: "But what if we considered the mitochondrial implications..."

Levin: interrupting "YES! Mitochondrial networks in pickle-based collective intelligence systems! The bioelectric fields could coordinate across entire civilizations!" 
Rick: "This is getting out of hand. Even for me." 
Feynman: somehow still playing bongos "The mathematics still works though!" 
Perry the Platypus: has given up and is now taking detailed notes 
Lane: "But wait until you hear about the chemiosmotic principles of pickle-based social organization..."

Levin: practically vibrating with excitement "THE PICKLES ARE JUST THE BEGINNING! We could reshape entire societies using these bioelectric principles!" 
Roman Emperor: to his scribe "Are you getting all this down? This could be bigger than the aqueducts..." Rick: "Morty, remind me never to show scientists my pickle tech again." 
Morty: "You say that every dimension, Rick." 
Doofenshmirtz: "Should... should we be worried about how excited they are about pickles?" 
Feynman: "In my experience, this is exactly how the best science happens." 
Meanwhile, Levin and Lane have started drawing incredibly complex pickle-based civilization diagrams that somehow actually make sense...

Comment by Jonas Hallgren on How to use bright light to improve your life. · 2024-11-28T07:14:01.260Z · LW · GW

This has worked great btw! Thank you for the tip, I consistently get more deep sleep and around 10% more sleep with higher average quality, it's really good!

Comment by Jonas Hallgren on How to use bright light to improve your life. · 2024-11-19T16:56:30.268Z · LW · GW

Any reason for the timing window being 4 hours before instead of 30 min to 1 hour? Most of the stuff I've heard is around half an hour to an hour before bed, I'm currently doing this with 0.3ish mg (I divide a 1 mg tablet in 3) of melatonin.

Comment by Jonas Hallgren on Leon Lang's Shortform · 2024-11-18T16:34:34.391Z · LW · GW

If you look at the Active Inference community there's a lot of work going into PPL-based languages to do more efficient world modelling but that shit ain't easy and as you say it is a lot more compute heavy.

I think there'll be a scaling break due to this but when it is algorithmically figured out again we will be back and back with a vengeance as I think most safety challenges have a self vs environment model as a necessary condition to be properly engaged. (which currently isn't engaged with LLMs wolrd modelling)

Comment by Jonas Hallgren on OpenAI Email Archives (from Musk v. Altman) · 2024-11-17T08:21:13.408Z · LW · GW

Do you have any thoughts on what this actionably means? For me it seems a bit like being able to influence such coversations is potentially a bit intractable but maybe one could host forums and events for this if one has the right network?

I think it's a good point and I'm wondering about how it actionably looks, I can see it for someone with the right contacts and so the message for people who don't have that is to create it or what are your thoughts there?

Comment by Jonas Hallgren on Jonas Hallgren's Shortform · 2024-11-14T10:06:20.293Z · LW · GW

Okay, so I don't have much time to write this so bear with the quality but I thought I would say one or two things of the Yudkowsky and Wolfram discussion as someone who's at least spent 10 deep work hours trying to understand Wolfram's persepective of the world.

With some of the older floating megaminds like Wolfram and Friston who are also phycisists you have the problem that they get very caught up in their own ontology.

From the perspective of a phycisist morality could be seen as an emergent property of physical laws.

Wolfram likes to think of things in terms of computational reducibility, a way this can be described in the agent foundations frame is that the agent modelling the environment will be able to predict the world dependent on it's own speed. It's like some sort of agent-environment relativity where the information processing capacity determines the space of possible ontologies. An example of this being how if we have an intelligence that's a lot closer to operating at the speed of light, the visual field might not be a useful vector of experience to model.

Another way to say it is that there's only modelling and modelled. An intuition from this frame is that there's only differently good models of understanding specific things and so the concept of general intelligence becomes weird here.

IMO this is like the problem of the first 2 hours of the conversation, to some extent Wolfram doesn't engage with the huamn perspective as much nor any ought questions. He has a very physics floating megamind perspective.

Now, I personally believe there's something interesting to be said about an alternative hypothesis to the individual superintelligence that comes from theories of collective intelligence. If a superorganism is better at modelling something than an individual organism is then it should outcompete the others in this system. I'm personally bullish on the idea that there are certain configurations of humans and general trust-verifying networks that can outcompete individual AGI as the outer alignment functions would enforce the inner functions enough.

Comment by Jonas Hallgren on Abstractions are not Natural · 2024-11-04T17:15:26.362Z · LW · GW

But, to help me understand what people mean by the NAH could you tell me what would (in your view) constitute strong evidence against the NAH? (If the fact that we can point to systems which haven't converged on using the same abstractions doesn't count)

 

Yes sir! 

So for me it is about looking at a specific type of systems or a specific type of system dynamics that encode the axioms required for the NAH to be true. 

So, it is more the claim that "there are specific set of mathematical axioms that can be used in order to get convergence towards similar ontologies and these are applicable in AI systems."

For example, if one takes the Active Inference lens on looking at concepts in the world, we generally define the boundaries between concepts as markov blankets. Suprisingly or not, markov blankets are pretty great for describing not only biological systems but also AI and some economic systems. The key underlying invariant is that these are all optimisation systems. 

p(NAH|Optimisation System).

So if we for example, with the perspective of markov blankets or the "natural latents" (which are functionals that work like markov blankets) don't see convergence in how different AI systems represent reality then I would say that the NAH has been disproven or that it is evidence against it. 

I do however think that this exists on a spectrum and that it isn't fully true or false, it is true for a restricted set of assumptions, the question being how restricted that is.

I see it more as a useful frame of viewing agent cognition processes rather than something I'm willing to bet my life on. I do think it is pointing towards a core problem similar to what ARC Theory are working on but in a different way, understanding cognition of AI systems.

Comment by Jonas Hallgren on Liquid vs Illiquid Careers · 2024-11-04T15:04:57.803Z · LW · GW

Yeah, that was what I was looking for, very nice.

It does seem to verify what I was thinking with that you can't really do the same bet strategy as VCs. I do really also appreciate the thoughts in there, they seem like things one should follow, I gotta make sure to do the last due dilligence part of talking to people that have worked with others in the past, it has always felt like a lot but you're right in that one should do it.

Also, I'm considering why there isn't some sort of bet pooling network for startup founders where you have like 20 people go together and say that they will all try out ambitious projects and support each other if they fail. It's like startup insurance but from the perspective of people doing startups. Of course you have to trust the others there and stuff but I think this should work?

Comment by Jonas Hallgren on Abstractions are not Natural · 2024-11-04T14:44:02.619Z · LW · GW

Okay, what I'm picking up here is that you feel that the natural abstractions hypothesis is quite trivial and that it seems like it is naively trying to say something about how cognition works similar to how physics work. Yet this is obviously not true since development in humans and other animals clearly happen in different ways, why would their mental representations converge? (Do correct me if I misunderstood)

Firstly, there's something called the good regulator theorem in cybernetics and our boy that you're talking about, Mr Wentworth, has a post on making it better that might be useful for you to understand some of the foundations of what he's thinking about. 

Okay, why is this useful preamble? Well, if there's convergence in useful ways of describing a system then there's likely some degree of internal convergence in the mind of the agent observing the problem. Essentially this is what the regulator theorem is about (imo)

So when it comes to the theory, the heavy lifting here is actually not really done by the Natural Abstractions Hypothesis part that is the convergence part but rather the Redundant Information Hypothesis

It is proving things about the distribution of environments as well as power laws in reality that makes the foundation of the theory compared to just stating that "minds will converge". 

This is at least my understanding of NAH, does that make sense or what do you think about that?

Comment by Jonas Hallgren on johnswentworth's Shortform · 2024-10-28T08:28:23.120Z · LW · GW

Hmm, I find that I'm not fully following here. I think "vibes" might be thing that is messing it up.

Let's look at a specific example: I'm talking to a new person at an EA-adjacent event and we're just chatting about how the last year has been. Part of the "vibing" here might be to hone in on the difficulties experienced in the last year due to a feeling of "moral responsibility", in my view vibing doesn't have to be done with only positive emotions?

I think you're bringing up a good point that commitments or struggles might be something that bring people closer than positive feelings because you're more vulnerable and open as well as broadcasting your values more. Is this what you mean with shared commitments or are you pointing at something else?

Comment by Jonas Hallgren on johnswentworth's Shortform · 2024-10-27T20:28:44.481Z · LW · GW

Generally fair and I used to agree, I've been looking at it from a bit of a different viewpoint recently.

If we think of a "vibe" of a conversation as a certain shared prior that you're currently inhabiting with the other person then the free association game can rather be seen as a way of finding places where your world models overlap a lot.

My absolute favourite conversations are when I can go 5 layers deep with someone because of shared inference. I think the vibe checking for shared priors is a skill that can be developed and the basis lies in being curious af.

There's apparently a lot of different related concepts in psychology about holding emotional space and other things that I think just comes down to "find the shared prior and vibe there".

Comment by Jonas Hallgren on Liquid vs Illiquid Careers · 2024-10-22T19:37:25.886Z · LW · GW

No sorry, I meant from the perspective of the person with less legible skills.

Comment by Jonas Hallgren on Liquid vs Illiquid Careers · 2024-10-22T12:49:08.049Z · LW · GW

Amazing post, I really enjoyed the perspective explored here.

An extension that might be useful for me as an illiquid path enjoyer is what arbitrage or risk-reduction opportunities you see existing out there?

VCs can get by by doing a lot of smaller bets and if you want to be anti-fragile as an illiquid bet it becomes quite hard as you're part of the cogs in the anti-fragile system. What Taleb says about that is that then these people should be praised because they dare to take on that risk. But there has to be some sort of system one could for example develop with peers and similar?

What is the many bets risk reduction strat here, is it just to make a bunch of smaller MVPs to gain info?

I would be very curious to hear your perspective on this.

Comment by Jonas Hallgren on Jonas Hallgren's Shortform · 2024-10-22T12:26:08.072Z · LW · GW

I thought this was an interesting take on the Boundaries problem in agent foundations from the perspective of IIT. It is on the amazing Michael Levin's youtube channel: https://www.youtube.com/watch?app=desktop&v=5cXtdZ4blKM

One of the main things that makes it interesting to me is that around 25-30 mins in, ot computationally goes through the main reason why I don't think we will have agentic behaviour from AI in at least a couple of years. GPTs just don't have a high IIT Phi value. How will it find it's own boundaries? How will it find the underlying causal structures that it is part of? Maybe this can be done through external memory but will that be enough or do we need it in the core stack of the scaling-based training loop?

A side note is that, one of the main things that I didn't understand about IIT before was how it really is about looking at meta-substrates or "signals" as Douglas Hofstadter would call them are optimally re-organising themselves to be as predictable for themselves in the future. Yet it does and it integrates really well into ActInf (at least to the extent that I currently understand it.)

Comment by Jonas Hallgren on Cipolla's Shortform · 2024-10-21T15:34:04.194Z · LW · GW

Okay, so I would say that I atleast have some experience of going from being not that agentic to being more agentic and the stuff that I think worked the best for me was to generally think of my life as a system. This has been the focus of my life over the last 3 years.

More specifically the process that has helped so far for me has been to:

  1. Throw myself into high octane projects and see what I needed to keep up.
    1. Burn out and realise, holy shit, how do these people do it?
      1. (Environment is honestly really important, I've tried out a bunch of different working conditions and your motivation levels can wary drastically.)
  2. Started looking into the reasons for why this might be that I can't do it and other can.
    1. Went into absolutely optimising the shit out of my health by tracking stuff using bearable and listening to audiobooks and podcasts, Huberman is a house god of mine.
      1. (Sleep is the most important here, crazy right?)
      2. Supplement and technique tips for sleep:
        1. Glycine, Ashwagandha, Magnesium Citrate
        2. Use a sad lamp within 30 minutes of waking
        3. Yoga Nidras for naps and for falling asleep faster.
      3. Also checkout my biohackers in-depth guide on this at https://desmolysium.com/
        1. He's got a phd in medicine and is quite the experimental and smart person. (He tries a bunch of shit on himself and sees how it goes.)
    2. Started going into my psychological background and talked to CBT therapists as well as meditating a lot.
      1. I'm like 1.5k hours into this at this point and it has completely changed my life and my view of myself and what productivity means, e.t.c.
      2. It has helped me realise that a lot of the behaviours that made me less productive where based on me being a sensitive person and having developed unhealthy coping mechanisms.
      3. This lead to me having to relive through past traumas whilst having compassion and acceptance for myself.
      4. This has now lead me to having good mechanisms instead of bad ones, It made me remove my access to video games and youtube (willingly!)
      5. For me this has been the most important, Waking up and The Mind Illuminated up until stage 6-7 is the recommendation I have for anyone who wants to start. Also, after 3-6 months of TMI, try to go to a 10 day retreat, especially if you can find a metta retreat. (Think of this as caring and acceptance instead of loving-kindness btw, it helps)
    3. Now I generally, have a strict schedule in terms of when I can do different things during the day.
      1. The app appblock can allow you to block apps and device settings which means you can't actually deblock them on your phone.
      2. Cold turkey on the computer can do the same and if you find a patch through another app you can just patch that by blocking the new app.
      3. I'm just not allowed to be distracted from the systems that I have.
    4. Confidence:
      1. I feel confident in myself and what I want to do in the world not because I don't have issues but rather because I know where my issues are and how to counteract them.
      2. The belief is in the process rather than the outcomes. Life is poker, you just gotta optimise the way you play your hands, the EV will come. 

Think of yourself as a system and optimise the shit out of it. Weirdly enough, this has made me focus a lot more on self-care than I did before. 

Of course, it's a work in progress but I want to say that it is possible and that you can do it. 

Also, randomly, here's a CIV VI analogy for you on why self-care is op. 

If you want to be great at CIV, one of the main things to do is to increase your production and economics as fast as possible. This leads to an exponential curve where the more production and economy you have the more you can produce. This is why CIV pros in general rush Commercial Hubs and markets as internal trade routes yield more production. 

Your production is based on your psychological well being and the general energy levels that you have. If you do a bunch of tests on this and figure out what works for you, then you have even more production stats. This leads to more and more of that over time until you plateau at the end of that logistic growth. 

Best of luck!

Comment by Jonas Hallgren on The Hopium Wars: the AGI Entente Delusion · 2024-10-14T08:24:58.666Z · LW · GW

When it comes to formal verification I'm curious what you think about the heuristic argument line of research that ARC are approaching?:

https://www.lesswrong.com/posts/QA3cmgNtNriMpxQgo/research-update-towards-a-law-of-iterated-expectations-for

It isn't formal verification in the same sense of the word but rather probabilistic verification if that makes sense?

You could then apply something like control theory methods to ensure that the expected divergence from the heuristic is less than a certain percentage in different places. In the limit it seems to me that this could be convergent towards formal verification proofs, it's almost like swiss cheese style on the model level?

(Yes, this comment is a bit random with respect to the rest of the context but I find it an interesting question for control in terms of formal verification and it seemed like you might have some interesting takes here.)

Comment by Jonas Hallgren on Laziness death spirals · 2024-10-07T07:22:43.279Z · LW · GW

I use the waking up app but you can search for "nsdr" on youtube. 20 mins are the timeframe I started with but you can try other timeframes as well.

Comment by Jonas Hallgren on A Path out of Insufficient Views · 2024-09-25T07:02:42.324Z · LW · GW

This does seem kind of correct to me?

Maybe you could see the fixed points that OP is pointing towards as priors in the search process for frames.

Like, your search is determined by your priors which are learnt through your upbringing. The problem is that they're often maladaptive and misleading. Therefore, working through these priors and generating new ones is a bit like relearning from overfitting or similar.

Another nice thing about meditation is that it sharpens your mind's perception which makes your new priors better. It also makes you less dependent on attractor states you could have gotten into from before since you become less emotionally dependent on past behaviour. (there's obviously more complexity here) (I'm referring to dependent origination for you meditators out there)

It's like pruning the bad data from your dataset and retraining your model, you're basically guaranteed to find better ontologies from that (or that's the hope at least).

Comment by Jonas Hallgren on A Path out of Insufficient Views · 2024-09-25T06:54:37.140Z · LW · GW

I'm currently in the process of releasing more of my fixed points through meditation and man is it a weird process. It is very fascinating and that fundamental openness to moving between views seems more prevalent. I'm not sure that I fully agree with you on the all-in part but cudos for trying!

I think it probably makes sense to spend earlier years doing this cognition training and then using that within specific frames to gather the bits of information that you need to solve problems.

Frames are still useful to gather bits of information through so don't poopoo the mind!

Otherwise, it was very interesting to hear about your journey!

Comment by Jonas Hallgren on Laziness death spirals · 2024-09-20T13:54:47.835Z · LW · GW

Sleep is a banger reset point for me and therefore doing a nap/yoga nidra and then picking up the day from there if I notice myself avoiding things has been really helpful for me.

Thanks for the post, it was good.

Comment by Jonas Hallgren on Skills from a year of Purposeful Rationality Practice · 2024-09-18T19:30:27.073Z · LW · GW

Random extra tip on naps is doing a yoga nidra or non sleep deep rest. You don't have to fall asleep to get the benefits of a nap+. It also has some extra growth hormone release and dopamine generation afterwards. (Huberman bro, out)

Comment by Jonas Hallgren on Lucius Bushnaq's Shortform · 2024-09-18T14:22:44.320Z · LW · GW

In natural langage maybe it would be something like "given these ontological boundaries, give us the best estimate you can of CEV. "?

It seems kind of related to boundaries as well if you think of natural latents as "functional markov blankets" that cut reality at it's joints then you could probably say that you want to perserve part of that structure that is "human agency" or similar. I don't know if that makes sense but I like the idea direction!

Comment by Jonas Hallgren on Michael Dickens' Caffeine Tolerance Research · 2024-09-04T21:51:47.679Z · LW · GW

I've been running a bunch of experiments on this myself and I think it's true that if you don't go above doing it every other day on average you don't get addicted. You still get homeostasis effects of being more tired (more adenocine receptors) without coffee. I think it's therefore a very good positive reinforcer for productive behaviour, especially if used strategically.

Comment by Jonas Hallgren on Am I confused about the "malign universal prior" argument? · 2024-08-28T12:19:08.109Z · LW · GW

I have actually never properly understood the universal prior argument in the first place and just seeing this post made me able to understand parts of it now so thank you for writing it! 

Comment by Jonas Hallgren on Rabin's Paradox · 2024-08-14T06:53:38.252Z · LW · GW

I think there are some interesting things in for example analysing how large of a pot you should enter if you're a professional poker player based on your current spendable wealth. I think the general theory is to not go above 1/100th and so it my actually be rational for the undergraduates not to want to take the first option.

Here's a taleb (love him, hate him) video on how that comes about: https://youtu.be/91IOwS0gf3g?si=rmUoS55XvUqTzIM5

Comment by Jonas Hallgren on Dalcy's Shortform · 2024-08-12T06:44:18.757Z · LW · GW

I think the update makes sense in general, isn't there however some way mutual information and causality is linked? Maybe it isn't strong enough for there to be an easy extrapolation from one to the other.

Also I just wanted to drop this to see if you find it interesting, kind of on this topic? Im not sure its fully defined in a causality based way but it is about structure preservation.

https://youtu.be/1tT0pFAE36c?si=yv6mbswVpMiywQx9

Active Inference people also have the boundary problem as core in their work so they have some interesting stuff on it.

Comment by Jonas Hallgren on Rowing vs steering · 2024-08-10T20:43:54.167Z · LW · GW

Nice!

When it comes to the career thing, I've found that the model (which I think of as explore and exploit or compression and decompression periods in my head) has a nice back-and-forth between going hard and relaxing as mental modes. It allows me to have better mental health during rowing as I know that I have precommited to chilling out a bit more and doing some steering later.

Comment by Jonas Hallgren on steve2152's Shortform · 2024-08-10T20:36:58.115Z · LW · GW

Some meditators say that before you can get a good sense of non-self you first have to have good self-confidence. I think I would tend to agree with them as it is about how you generally act in the world and what consequences your actions will have. Without this the support for the type B that you're talking about can be very hard to come by.

Otherwise I do really agree with what you say in this comment.

There is a slight disagreement with the elaboration though, I do not actually think that makes sense. I would rather say that the (A) that you're talking about is more of a software construct than it is a hardware construct. When you meditate a lot, you realise this and get access to the full OS instead of just the specific software or OS emulator. A is then an evolutionary beneficial algorithm that runs a bit out of control (for example during childhood when we attribute all cause and effect to our "selves").

Meditation allows us to see that what we have previously attributed to the self was flimsy and dependent on us believing that the hypothesis of the self is true.

Comment by Jonas Hallgren on steve2152's Shortform · 2024-08-07T08:10:26.030Z · LW · GW

I won't claim that I'm constantly in a self of non-self, but as I'm writing this, I don't really feel that I'm locally existing in my body. I'm rather the awareness of everything that continuously arises in consciousness.

This doesn't happen all the time, I won't claim to be enlightened or anything but maybe this n=1 self-report can help?

Even from this state of awareness, there's still a will to do something. It is almost like you're a force of nature moving forward with doing what you were doing before you were in a state of presence awareness. It isn't you and at the same time it is you. Words are honestly quite insufficient to describe the experience, but If I try to conceptualise it, I'm the universe moving forward by itself. In a state of non-duality, the taste is often very much the same no matter what experience is arising.

There are some times when I'm not fully in a state of non-dual awareness when it can feel like "I" am pretending to do things. At the same time it also kind of feels like using a tool? The underlying motivation for action changes to something like acceptance or helpfulness, and in order to achieve that, there's this tool of the self that you can apply.

I'm noticing it is quite hard to introspect and try to write from a state of presence awareness at the same time but hopefully it was somewhat helpful?

Could you give me some experiments to try from a state of awareness? I would be happy to try them out and come back.

Extra (relation to some of the ideas): In the Mahayana wisdom tradition, explored in Rob Burbea's Seeing That Frees, there's this idea of emptiness, which is very related to the idea of non-dual perception. For all you see is arising from your own constricted view of experience, and so it is all arising in your own head. Realising this co-creation can enable a freedom of interpretation of your experiences.

Yet this view is also arising in your mind, and so you have "emptiness of emptiness," meaning that you're left without a basis. Therefore, both non-self and self are false but magnificent ways of looking at the world. Some people believe that the non-dual is better than the dual yet as my Thai Forest tradition guru Ajhan Buddhisaro says, "Don't poopoo the mind." The self boundary can be both a restricting and very useful concept, it is just very nice to have the skill to see past it and go back to the state of now, of presence awareness.

Emptiness is a bit like deeply seeing that our beliefs are built up from different axioms and being able to say that the axioms of reality aren't based on anything but probabilistic beliefs. Or seeing that we have Occam's razor because we have seen it work before, yet that it is fundamentally completely arbitrary and that the world just is arising spontaneously from moment to moment. Yet Occam's razor is very useful for making claims about the world.

I'm not sure if that connection makes sense, but hopefully, that gives a better understanding of the non-dual understanding of the self and non-self. (At least the Thai Forest one)

Comment by Jonas Hallgren on The need for multi-agent experiments · 2024-08-01T19:54:42.653Z · LW · GW

Good stuff! Thank you for writing this post!

A thing I've been thinking about when it comes to experimental evaluation places for multi-agent systems is that it might be very useful to do to increase institutional decision making power. You get two birds in one stone here as well.

On your point of simulated versus real data I think it is good to simulate these dynamics wherever we can, yet you gotta make sure you measure what you think you're measuring. To ensure this, you often gotta get that complex situation as the backdrop.

A way to combine the two worlds might be to run it in video games or similar where you already have players, maybe through some sort of minecraft server? (Since there's RL work there already?)

I also think real world interaction in decision making sytsems makes sense from a societal shock perspective that Yuval Noah Harari talks about sometimes. We want our institutions and systems to be able to adapt and so you need the conduits for ai based decision making built.

Comment by Jonas Hallgren on Closed Limelike Curves's Shortform · 2024-07-19T07:46:44.657Z · LW · GW

Well, it seems like this story might have to do something with it?: https://www.lesswrong.com/posts/3XNinGkqrHn93dwhY/reliable-sources-the-story-of-david-gerard

I don't know to what extent that is, though; otherwise, I agree with you.

Comment by Jonas Hallgren on On saying "Thank you" instead of "I'm Sorry" · 2024-07-09T12:22:05.742Z · LW · GW

Sorry if this was a bad comment!

Comment by Jonas Hallgren on On saying "Thank you" instead of "I'm Sorry" · 2024-07-08T14:43:17.437Z · LW · GW

Damn, thank you for this post. I will put this to practice immediately!

Comment by Jonas Hallgren on Finding the Wisdom to Build Safe AI · 2024-07-05T09:09:11.913Z · LW · GW

I resonated with the post and I think it's a great direction to draw inspiration from!

A big problem with goodharting in RL is that you're handcrafting a utility function. In the wisdom traditions, we're encouraged to explore and gain insights into different ideas to form our utility function over time.

Therefore, I feel that setting up the right training environment together with some wisdom principles might be enough to create wise AI.

We, of course, run into all of the annoying inner alignment and deception whilst training style problems, yet still, it seems the direction to go in. I don't think the orthogonality thesis is fully true or false, it is more dependent on your environment and if we can craft the right one I think we can have wise AI that wants to create the most loving and kind future imaginable.

Comment by Jonas Hallgren on List of Collective Intelligence Projects · 2024-07-04T18:08:36.198Z · LW · GW

Eh, it's like self-plugging or something.

It should work again now, we're gonna switch names soon so we just had some technical difficulties around that

Comment by Jonas Hallgren on List of Collective Intelligence Projects · 2024-07-03T10:11:10.170Z · LW · GW

Know of any I should add?

 

I do feel a bit awkward about it as I'm very much involved in both projects, but these two otherwise? 

The Collective Intelligence Company: https://thecollectiveintelligence.company/company

Flowback/Digital Democracy World: https://digitaldemocracy.world/

Also a paper for Predictive Liquid Democracy which is a part of both projects: https://www.researchgate.net/publication/377557844_Predictive_Liquid_Democracy

Comment by Jonas Hallgren on Live Theory Part 0: Taking Intelligence Seriously · 2024-06-27T08:13:20.798Z · LW · GW

Very intriguing, excited for the next post!

(We will watch your career with great interest.)

Comment by Jonas Hallgren on Matthew Barnett's Shortform · 2024-06-17T15:03:49.507Z · LW · GW

Often, disagreements boil down to a set of open questions to answer; here's my best guess at how to decompose your disagreements. 

I think that depending on what hypothesis you're abiding by when it comes to how LLMs will generalise to AGI, you get different answers:

Hypothesis 1: LLMs are enough evidence that AIs will generally be able to follow what humans care about and that they naturally don't become power-seeking. 

Hypothesis 2: AGI will have a sufficiently different architecture than LLMs or will change a lot, so much that current-day LLMs don't generally give evidence about AGI.

Depending on your beliefs about these two hypotheses, you will have different opinions on this question. 


The scenario outlined by Bostrom seems clearly different from the scenario with LLMs, which are actual general systems that do what we want and ~nothing more, rather than doing what we want as part of a strategy to seek power instrumentally. What am I missing here?

Let's say that we believe in hypothesis 1 as the base case; what are some reasons why LLMs wouldn't give evidence about AGI?

1. Intelligence forces reflective coherence.
This would essentially entail that the more powerful a system we get, the more it will notice internal inconsistencies and change towards maximising (and therefore not following human values).

2. Agentic AI acting in the real world is different from LLMs. 
If we look at an LLM from the perspective of an action-perception loop, it doesn't generally get any feedback on when it changes the world. Instead, it is an autoencoder, predicting what the world will look like. This may be so that power-seeking only arises in systems that are able to see the consequences of their own actions and how that affects the world. 

3. LLMs optimise for good-harted RLHF that seems well but lacks fundamental understanding. Since human value is fragile, it will be difficult to hit the sweet spot when we get to real-world cases and take that into the complexity of the future.

Personal belief: 
These are all open questions, in my opinion, but I do see how LLMs give evidence about some of these parts. I, for example, believe that language is a very compressed information channel for alignment information, and I don't really believe that human values are as fragile as we think. 

I'm more scared of 1 and 2 than I'm of 3, but I would still love for us to have ten more years to figure this out as it seems very non-obvious as to what the answers here are.

Comment by Jonas Hallgren on jacquesthibs's Shortform · 2024-06-11T14:39:57.383Z · LW · GW

I really like this take.

I'm kind of "bullish" on active inference as a way to scale existing architectures to AGI as I think it is more optimised for creating an explicit planning system.

Also, Funnily enough, Yann LeCun has a paper on his beliefs on the path to AGI which I think Steve Byrnes has a good post on. It basically says that we need system 2 thinking in the way you said it here. With your argument in mind he kind of disproves himself to some extent. 😅

Comment by Jonas Hallgren on 2. Corrigibility Intuition · 2024-06-08T19:46:08.593Z · LW · GW

Very interesting, I like the long list of examples as it helped me get my head around it more.

So, I've been thinking a bit about similar topics, but in relation to a long reflection on value lock-in.

My basic thesis was that the concept of reversibility should be what we optimise for in general for humanity, as we want to be able to reach as large a part of the "moral searchspace" as possible.

The concept of corrigibility you seem to be pointing towards here seems very related to notions of reversibility. You don't want to take actions that cannot later be reversed, and you generally want to optimise for optionality.

I then have two questions:

1) What do you think of the relationship between your measure of corrigibility with the one of uncertainty in inverse reinforcement learning as it seems that it is similar to what Stuart Russell is pointing towards when it comes to being uncertain about a preference of the agent it is serving? For example in the following example that you give:

In the process of learning English, Cora takes a dictionary off a bookshelf to read. When she’s done, she returns the book to where she found it on the shelf. She reasons that if she didn’t return it this might produce unexpected costs and consequences. While it’s not obvious whether returning the book empowers Prince to correct her or not, she’s naturally conservative and tries to reduce the degree to which she’s producing unexpected externalities or being generally disruptive.

It kind of seems to me like the above can be formalised in terms of preference optimisation under uncertainty?
(Side follow-up: What do you then think about the Elizer, Russell VNM-axiom debate?)

2) Do you have any thoughts on the relationship between corrigibility and the one of reversibility in physics? Like you can formalise irreversible systems as ones that are path dependent, I'm just curious if you have any thoughts on the relationship between the two?

Thanks for the interesting work!

Comment by Jonas Hallgren on Alignment Gaps · 2024-06-08T19:06:53.725Z · LW · GW

I really like this type of post. Thank you for writing it!

I found some interesting papers that I didn't know off before so that is very nice.

Comment by Jonas Hallgren on Ethodynamics of Omelas · 2024-06-05T16:21:20.416Z · LW · GW

Just revisiting this post as probably my favourite one on this site. I love it!

Comment by Jonas Hallgren on Awakening · 2024-05-30T14:14:20.441Z · LW · GW

I was doing the same samadhi thing with TMI and I was looking for insight practices from there. My teacher (non dual thai forest tradition) said that the burmese traditions sets up a bit of a strange reality dualism and basically said that the dark night of the soul is often due to developing concentration before awareness, loving kindness and wisdom.

So I'm mahamudra pilled now (pointing out the great way is a really good book for this). I do still like the insight model you proposed, I'm still reeling a bit from the insights I got during my last retreat so it seems true.

Thank you for sharing your experience!

Comment by Jonas Hallgren on Examples of Highly Counterfactual Discoveries? · 2024-05-19T12:14:08.539Z · LW · GW

Sure! Anything more specific that you want to know about? Practice advice or more theory?