Kajus's Shortform

post by Kajus · 2024-02-19T22:26:04.537Z · LW · GW · 13 comments

Contents

13 comments

13 comments

Comments sorted by top scores.

comment by Kajus · 2024-03-02T13:37:09.051Z · LW(p) · GW(p)

I started to think through the theories of change recently (to figure out a better career plan) and I have some questions. I hope somebody can direct me to relevant posts or discuss this with me.

The scenario I have in mind is: AI alignment is figured out. We can create an AI that will pursue the goals we give it and can still leave humanity in control. This is all optional, of course: you can still create an unaligned, evil AI. What's stopping anybody from creating AI that will try to, for instance, fight wars? I mean that even if we have the technology to align AI, we are still not out of the forest. 

What would solve the problem here would be to create a benevolent, omnipresent AGI, that will prevent things like this. 

comment by Kajus · 2024-02-19T12:38:50.405Z · LW(p) · GW(p)

Did EA scale too quickly?  
 
A friend recommended me to read a note from Andy's working notes, which argues that scaling systems too quickly led to rigid systems. Reading this note vaguely reminded me of EA.

Once you have lots of users with lots of use cases, it’s more difficult to change anything or to pursue radical experiments. You’ve got to make sure you don’t break things for people or else carefully communicate and manage change.

Those same varied users simply consume a great deal of time day-to-day: a fault which occurs for 1% of people will present no real problem in a small prototype, but it’ll be high-priority when you have 100k users.

First, it is debatable if EA experienced quick scale up in the last few years. In some ways, it feels to me like it did, and EA founds had a spike of founding in 2022.

But it feels to me like EA community didn't have things figured out properly. Like SBF crisis could be averted easily by following common business practices or the latest drama with nonlinear. The community norms were off and were hard to change? 

comment by Kajus · 2025-01-29T11:02:43.239Z · LW(p) · GW(p)

There is an attitude I see in AI safety from time to time when writing papers or doing projects:

  • People think more about doing a cool project rather than having a clear theory of change.
  • They spend a lot of time optimizing for being "publishable."

I think it's bad if we want to solve AI safety. On the other hand, having a clear theory of change is hard. Sometimes, it's just so much easier to focus on an interesting problem instead of constantly asking yourself, "Is this really solving AI safety?"

How to approch this whole thing? Idk about you guys but this is draining for me. 

Why would I publish papers in AI safety? Do people even read them? Am I doing it just to gain credibility? Aren't there already too many papers? 

Replies from: weibac
comment by Milan W (weibac) · 2025-01-30T16:25:08.293Z · LW(p) · GW(p)

The incentives for early career researchers are to blame for this mindset imo. Having legible output is a very good signal of competence for employers/grantors. I think it probably makes sense for the first or first couple project of a researcher to be more of a cool demo than clear steps towards a solution.

Unfortunately, some middle career and sometimes even senior researchers keep this habit of forward-chaining from what looks cool instead of backwards-chaining from good futures. Ok, the previous sentence was a bit too strong. No reasoning is pure backward-chaining or pure forward-chaining. But I think that a common failure mode is not thinking enough about theories of change.

Replies from: Kajus
comment by Kajus · 2025-02-02T17:09:36.627Z · LW(p) · GW(p)

Okay, this makes sense but doesn't answer my question. Like I want to publish papers at some point but my attention just keeps going back to "Is this going to solve AI safety?" I guess people in mechanistic interpretability don't keep thinking about it, they are more like  "Hm... I have this interesting problem at hand..." and they try to solve it. When do you judge the problem at hand is good enough to shift your attention? 

comment by Kajus · 2025-01-23T17:49:39.089Z · LW(p) · GW(p)

Isn't being a real expected value-calculating consequentialist really hard? Like, this week an article about not ignoring bad vibes was trending. I think that it's very easy to be a naive consequentialist, and it doesn't pay off, you get punished very easily because you miscalcualte and get ostracized or fuck your emotions up. Why would we get a consequentialist AI? 

Replies from: weibac
comment by Milan W (weibac) · 2025-01-25T03:32:45.166Z · LW(p) · GW(p)

Why would we get a consequentialist AI? 

Excellent question. Current AIs are not very strong-consequentialist[1], and I expect/hope that we probably won't get AIs like that either this year (2025) nor next year (2026). However, people here are interested in how an extremely competent AI would behave. Most people here model them as instrumentally-rational agents that are usefully described as having a closed-form utility function. Here goes a seminal formalization of this model by Legg and Hutter: link.

Are these models of future super-competent AIs wrong? Somewhat. All models are wrong. I personally trust them less than the average person who has spent a lot of time in here. I still find them a useful tool for thinking about limits and worst case scenarios: the sort of AI system actually capable of single-handedly taking over the world, for instance. However, I think it is also very useful to think about how AIs (and the people making them) are likely to act before these ultra-competent AIs show up, or in the case they don't.
 

  1. ^

    Term i just made up and choose to define like this: that reasons like a naive utilitarian, independently of its goals [? · GW].

comment by Kajus · 2025-02-02T16:54:02.869Z · LW(p) · GW(p)

In a few weeks, I will be starting a self-experiment. I’ll be testing a set of supplements to see if they have any noticeable effects on my sleep quality, mood, and energy levels.

The supplements I will be trying:

NameAmountPurpose / Notes
Zinc6 mg 
Magnesium300 mg 
Riboflavin (B2)0 mgI already consume a lot of dairy, so no need.
Vitamin D500 IU 
B120 µgI get enough from dairy, so skipping supplementation.
Iron20 mgI don't eat meat. I will get tested to see if I am deficient firstly 
Creatine3 gMay improve cognitive function
Omega-3500 mg/daySupposed to help with brain function and inflammation.

Things I want to measure:

  • Sleep quality – I will start using sleep as android to track my sleep - has anyone tried it out? This seems somewhat risky because if there is a phone next to my bed it will make it harder for me to fall asleep because I will be using it.
  • Energy levels – Subjective rating (1-10) every evening. Prompt: "What is your energy level?"
  • I will also do journaling and use chat GPT to summarize it.  
  • Digestion/gut health – Any noticeable changes in bloating, gas, or gut discomfort. I used to struggle with that, I will probably not measure this every day but just keep in mind that it might be related.
  • Exercise performance – I already track this via heavy so no added costs. (also, add me on heavy, my nick is tricular)

I expect my sleep quality to improve, especially with magnesium and omega-3. I’m curious if creatine will have any effect on mental clarity and exercise. 

If anyone has tried these supplements already and has tips, let me know! Would love to hear what worked (or didn’t) for you.

I will start with a two week period where I develop the daily questionare and test if the sleep tracking app works on me. 

One risk is that I just actually feel better and fail to see that in energy levels. How are those subjective measures performing in self-study? Also, I don't have a control - I kinda think it's useless. Convince me it is not! I just expect that I will notice getting smarter. Do you think it's stupid or not? 

Also, I'm vegetarian. My diet is pretty unhealthy as in it doesn't include a big variety of foods.

I checked the maximum intake of supplements on https://ods.od.nih.gov/factsheets/Zinc-HealthProfessional/  

comment by Kajus · 2024-12-19T13:55:37.421Z · LW(p) · GW(p)

I think that AI labs are going to use LoRA to lock cool capabilities in models and offer a premium subscription with those capabilities unlocked.

comment by Kajus · 2024-12-19T13:52:32.774Z · LW(p) · GW(p)

I recently came up with an idea to improve my red-teaming skills. By red-teaming, I mean identifying obvious flaws in plans, systems, or ideas. 

First, find high-quality reviews on open review or somewhere else. Then, create a dataset of papers and their reviews, preferably in a field that is easy to grasp and sufficiently complex. Read papers, compare to the reviews. 

Obvious flaw is that you see the reviews before, so you might want to hire someone else to do it. Doing this in a group is also really great.

comment by Kajus · 2024-04-05T17:40:37.992Z · LW(p) · GW(p)

I've just read "Against the singularity hypothesis" by David Thorstad and there are some things there that seems obviously wrong to me - but I'm not totally sure about it and I want to share it here, hoping that somebody else read it as well. In the paper, Thorstad tries to refute the singularity hypothesis. In the last few chapters, Thorstad discuses the argument for x-risks from AI that's based on three premises: singularity hypothesis, Orthogonality Thesis and Instrumental Convergence and says that since singularity hypothesis is false (or lacks proper evidence) we shouldn't worry that much about this specific scenario. Well, it seems to me like we should still worry and we don't need to have recursively self-improving agents to have agents smart enough so that instrumental convergence and orthogonality hypothesis applies to them. 

Replies from: DavidThorstad
comment by DavidThorstad · 2024-05-29T00:59:00.769Z · LW(p) · GW(p)

Thanks for your engagement! 

The paper does not say that if the singularity hypothesis is false, we should not worry about reformulations of the Bostrom-Yudkowksy argument which rely only on orthogonality and instrumental convergence. Those are separate arguments and would require separate treatment.

The paper lists three ways in which the falsity of the singularity hypothesis would make those arguments more difficult to construct (Section 6.2). It is possible to accept that losing the singularity hypothesis would make the Bostrom-Yudkowsky argument more difficult to push without taking a stance on whether this more difficult effort can be done.

comment by Kajus · 2024-03-02T14:26:53.479Z · LW(p) · GW(p)

The power-seeking, agentic, deceptive AI is only possible if there is a smooth transition from non-agentic AI (what we have right now) to agentic AI. Otherwise, there will be a sign that AI is agentic, and it will be observed for those capabilities. If an AI is mimicking human thinking process, which it might initially do, it will also mimic our biases and things like having pent-up feelings, which might cause it to slip and loose its temper. Therefore, it's not likely that power-seeking agentic AI is a real threat (initially).