abramdemski's Shortform

abramdemski

abramdemski's Shortform

post by abramdemski · 2020-09-10T17:55:38.663Z · LW · GW · 33 comments

33 comments

33 comments

Comments sorted by top scores.

comment by abramdemski · 2025-04-08T15:00:23.606Z · LW(p) · GW(p)

Here's what seem like priorities to me after listening to the recent Dwarkesh podcast featuring Daniel Kokotajlo:

1. Developing the safer AI tech (in contrast to modern generative AI) so that frontier labs have an alternative technology to switch to, so that it is lower cost for them to start taking warning signs of misalignment of their current tech tree seriously. There are several possible routes here, ranging from small tweaks to modern generative AI, to scaling up infrabayesianism (existing theory, totally groundbreaking implementation) to starting totally from scratch (inventing a new theory). Of course we should be working on all routes, but prioritization depends in part on timelines.

I see the game here as basically: look at the various existing demos of unsafety and make a counter-demo which is safer on multiple of these metrics without having gamed the metrics.

2. De-agentify the current paradigm or the new paradigm:

Don't directly train on reinforcement across long chains of activity. Find other ways to get similar benefits.
Move away from a model where the AI is personified as a distinct entity (eg, chatbot model). It's like the old story about building robot arms to help feed disabled people -- if you mount the arm across the table, spoonfeeding the person, it's dehumanizing; if you make it a prosthetic, it's humanizing.
- I don't want AI to write my essays for me. I want AI to help me get my thoughts out of my head. I want super-autocomplete. I think far faster than I can write or type or speak. I want AI to read my thoughts & put them on the screen.
  - There are many subtle user interface design questions associated with this, some of which are also safety issues, eg, exactly what objective do you train on?
- Similarly with image generation, etc.
- I don't necessarily mean brain-scanning tech here, but of course that would be the best way to achieve it.
- Basically, use AI to overcome human information-processing bottlenecks instead of just trying to replace humans. Putting humans "in the loop" more and more deeply instead of accepting/assuming that humans will iteratively get sidelined.

Replies from: ryan_greenblatt, Vladimir_Nesov, Amyr, Seth Herd

↑ comment by ryan_greenblatt · 2025-04-09T00:58:14.891Z · LW(p) · GW(p)

I'm skeptical of strategies which look like "steer the paradigm away from AI agents + modern generative AI paradigm to something else which is safer". Seems really hard to make this competitive enough and I have other hopes that seem to help a bunch while being more likely to be doable.

(This isn't to say I expect that the powerful AI systems will necessarily be trained with the most basic extrapolation of the current paradigm, just that I think steering this ultimate paradigm to be something which is quite different and safer is very difficult.)

Replies from: alexander-gietelink-oldenziel

↑ comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2025-04-09T13:23:27.139Z · LW(p) · GW(p)

Couldn't agree more. Variants of this strategy get proposed often.

If you are a proponent of this strategy - I'm curious whether you know of any examples in history where humanity purposefully and succesfully steered towards a significantly less competitive [economically, militarily,...] technology that was nonetheless safer.

Replies from: jeremy-gillen, ozziegooen, aysja

↑ comment by Jeremy Gillen (jeremy-gillen) · 2025-04-09T17:24:26.731Z · LW(p) · GW(p)

It's not about building less useful technology, that's not what Abram or Ryan are talking about (I assume). The field of alignment has always been about strongly superhuman [? · GW] agents [? · GW]. You can have tech that is useful and also safe to use, there's no direct contradiction here.

Maybe one weak-ish historical analogy is explosives? Some explosives are unstable, and will easily explode by accident. Some are extremely stable, and can only be set off by a detonator. Early in the industrial chemistry tech tree, you only have access to one or two ways to make explosives. If you're desperate, you use these whether or not they are stable, because the risk-usefulness tradeoff is worth it. A bunch of your soldiers will die, and your weapons caches will be easier to destroy, but that's a cost you might be willing to pay. As your industrial chemistry tech advances, you invent many different types of explosive, and among these choices you find ones that are both stable explosives and effective, because obviously this is better in every way.

Maybe another is medications? As medications advanced, as we gained choice and specificity in medications, we could choose medications that had both low side-effects and were effective. Before that, there was often a choice, and the correct choice was often to not use the medicine unless you were literally dying.

In both these examples, sometimes the safety-usefulness tradeoff was worth it, sometimes not. Presumably people in both cases people often made the choice not to use unsafe explosives or unsafe medicine, because the risk wasn't worth it.

As it is with these technologies, so it is with AGI. There are a bunch future paradigms of AGI building. The first one we stumble into isn't looking [LW · GW] like one where we can precisely specify what it wants. But if we were able to keep experimenting and understanding and iterating after the first AGI, and we gradually developed dozens of ways of building AGI, then I'm confident we could find one that is just as intelligent and also could have its goals precisely specified.

My two examples above don't quite answer your question, because "humanity" didn't steer away from using them, just individual people at particular times. For examples where all or large sections of humanity steered away from using an extremely useful tech whose risks purportedly outweighed benefits: Project Plowshare, nuclear power in some countries, GMO food in some countries, viral bioweapons (as far as I know), eugenics, stem cell research, cloning. Also {CFCs, asbestos, leaded petrol, CO2 to some extent, radium, cocaine, heroin} after the negative externalities were well known.

I guess my point is that safety-usefulness tradeoffs are everywhere, and tech development choices that take into account risks are made all the time. To me, this makes your question utterly confused. Building technology that actually does what you want (which is be safe and useful) is just standard practice. This is what everyone does, all the time, because obviously safety is one of the design requirements of whatever you're building.

The main difference with between above technologies and AGI is that it's a trapdoor. The cost of messing up AGI is that you lose any chance to try again. AGI shares with some of the above technologies an epistemic problem. For many of them it isn't clear in advance, to most people, how much risk there actually is, and therefore whether the tradeoff is worth it.

After writing this, it occurred to me that maybe by "competitive" you meant "earlier in the tech tree"? I interpreted it in my comment as a synonym of "useful" in a sense that excluded safe-to-use.

↑ comment by ozziegooen · 2025-04-10T23:02:18.861Z · LW(p) · GW(p)

I'm curious whether you know of any examples in history where humanity purposefully and succesfully steered towards a significantly less competitive [economically, militarily,...] technology that was nonetheless safer.

This sounds much like a lot of the history of environmentalism and safety regulations? As in, there's a long history of [corporations selling X, using a net-harmful technology], then governments regulating. Often this happens after the technology is sold, but sometimes before it's completely popular around the world.

I'd expect that there's similarly a lot of history of early product areas where some people realize that [popular trajectory X] will likely be bad and get regulated away, so they help further [safer version Y].

Going back to the previous quote:

"steer the paradigm away from AI agents + modern generative AI paradigm to something else which is safer"

I agree it's tough, but would expect some startups to exist in this space. Arguably there are already several claiming to be focusing on "Safe" AI. I'm not sure if people here would consider this technically part of the "modern generative AI paradigm" or not, but I'd imagine these groups would be taking some different avenues, using clear technical innovations.

There are worlds where the dangerous forms have disadvantages later on - for example, they are harder to control/oversee, or they get regulated. In those worlds, I'd expect there should/could be some efforts waiting to take advantage of that situation.

↑ comment by aysja · 2025-04-12T18:23:11.909Z · LW(p) · GW(p)

I feel confused by how broad this is, i.e., "any example in history." Governments regulate technology for the purpose of safety all the time. Almost every product you use and consume has been regulated to adhere to safety standards, hence making them less competitive (i.e., they could be cheaper and perhaps better according to some if they didn't have to adhere to them). I'm assuming that you believe this route is unlikely to work, but it seems to me that this has some burden of explanation which hasn't yet been made. I.e., I don't think the only relevant question here is whether it's competitive enough such that AI labs would adopt it naturally, but also whether governments would be willing to make that cost/benefit tradeoff in the name of safety (which requires eg believing in the risks enough, believing this would help, actually having the viable substitute in time, etc.). But that feels like a different question to me from "has humanity ever managed to make a technology less competitive but safer," where the answer is clearly yes.

Replies from: alexander-gietelink-oldenziel

↑ comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2025-04-13T11:03:37.370Z · LW(p) · GW(p)

My comment was a little ambiguous. What I meant was human society purposely differentially researching and developing technology X instead of Y where Y has a public (global) harm Z but private benefit and X is based on a different design principle than Y but slightly less competitive but still able to replace Y.

A good example would be the development of renewable energy to replace fossil fuels to prevent climate change.

The new tech (fusion, fission, solar, wind) is based on fundamental principles than the old tech (oil and gas).

Lets zoom in:

Fusion would be an example but perpetually thirty years away. Fission works but wasnt purposely develloped to fight climate change. Wind is not competitive without large subsidies and most likely never will.

Solar is at least lomited competitive with fossil fuels [except because of load balancing it may not be able to replace fossil fuels completely] , purposely developped out of environmental concerns and would be the best example.

I think my main question marks here is: solar energy is still a promise. It hasnt even begun to make a dent in total energy consumption ( a quick perplexity search reveals only 2 percent of global energy is solar-generated). Despite the hype it is not clear climate change will be solved by solar energy.

Moreover, the real question is to what degree the development of competitive solar energy was the result of a purposeful policy. People like to believe that tech development subsidies have a large counterfactual but imho this needs to be explicitly proved and my prior is that the effect is probably small compared to overall general development of technology & economic incentives that are not downstream of subsidies / government policy.

Let me contrast this with two different approaches to solving a problem Z (climate change).

Deploy existing competitive technology (fission)
Solve the problem directly (geo-engineering)

It seems to me that in general the latter two approaches have a far better track record of counterfactually Actually Solving the Problem.

Replies from: abramdemski

↑ comment by abramdemski · 2025-04-14T19:33:14.475Z · LW(p) · GW(p)

Moreover, the real question is to what degree the development of competitive solar energy was the result of a purposeful policy. People like to believe that tech development subsidies have a large counterfactual but imho this needs to be explicitly proved and my prior is that the effect is probably small compared to overall general development of technology & economic incentives that are not downstream of subsidies / government policy.

But we don't need to speculate about that in the case of AI! We know roughly how much money we'll need for a given size of AI experiment (eg, a training run). The question is one of raising the money to do it. With a strong enough safety case vs the competition, it might be possible.

I'm curious if you think there are any better routs; IE, setting aside the possibility of researching safer AI technology & working towards its adoption, what overall strategy would you suggest for AI safety?

↑ comment by Vladimir_Nesov · 2025-04-08T19:51:03.789Z · LW(p) · GW(p)

prioritization depends in part on timelines

Any research rebalances the mix of currently legible research directions that could be handed off to AI-assisted alignment researchers or early autonomous AI researchers whenever they show up. Even hopelessly incomplete research agendas could still be used to prompt future capable AI to focus on them, while in the absence of such incomplete research agendas we'd need to rely on AI's judgment more completely. So it makes sense to still prioritize things that have no hope at all of becoming practical for decades (with human effort), to make as much partial progress as possible in developing (and deconfusing) them in the next few years.

In this sense current human research, however far from practical usefulness, forms the data for alignment of the early AI-assisted or AI-driven alignment research efforts. The judgment of human alignment researchers who are currently working makes it possible to formulate more knowably useful prompts for future AIs that nudge them in the direction of actually developing practical alignment techniques.

Replies from: Amyr, abramdemski, nc

↑ comment by Cole Wyeth (Amyr) · 2025-04-08T21:01:33.626Z · LW(p) · GW(p)

I haven't heard this said explicitly before but it helps me understand your priorities a lot better.

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2025-04-09T00:43:49.310Z · LW(p) · GW(p)

haven't heard this said explicitly before

Okay, this prompted me to turn the comment into a post [LW · GW], maybe this point is actually new to someone.

↑ comment by abramdemski · 2025-04-09T16:15:09.986Z · LW(p) · GW(p)

This sort of approach doesn't make so much sense for research explicitly aiming at changing the dynamics in this critical period. Having an alternative, safer idea almost ready-to-go (with some explicit support from some fraction of the AI safety community) is a lot different from having some ideas which the AI could elaborate.

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2025-04-09T16:34:28.625Z · LW(p) · GW(p)

With AI assistance, the degree to which an alternative is ready-to-go can differ a lot compared to its prior human-developed state. Also, an idea that's ready-to-go is not yet an edifice of theory and software that's ready-to-go in replacing 5e28 FLOPs transformer models, so some level of AI assistance is still necessary with 2 year timelines. (I'm not necessarily arguing that 2 year timelines are correct, but it's the kind of assumption that my argument should survive.)

The critical period includes the time when humans are still in effective control of the AIs, or when vaguely aligned and properly incentivised AIs are in control and are actually trying to help with alignment, even if their natural development and increasing power would end up pushing them out of that state soon thereafter. During this time, the state of current research culture shapes the path-dependent outcomes. Superintelligent AIs that are reflectively stable will no longer allow path dependence in their further development, but before that happens the dynamics can be changed to an arbitrary extent, especially with AI efforts as leverage in implementing the changes in practice.

↑ comment by cdt (nc) · 2025-04-11T15:27:02.849Z · LW(p) · GW(p)

in the absence of such incomplete research agendas we'd need to rely on AI's judgment more completely

This is a key insight and I think that operationalising or pinning down the edges of a new research area is one of the longest time-horizon projects there is. If the METR estimate is accurate, then developing research directions is a distinct value-add even after AI research is semi-automatable.

↑ comment by Cole Wyeth (Amyr) · 2025-04-08T16:35:54.500Z · LW(p) · GW(p)

It seems to me that an "implementation" of something like Infra-Bayesianism which can realistically compete with modern LLMs would ultimately look a lot like a semi-theoretically-justified modification to the loss function or optimizer of agentic fine-tuning / RL or possibly its scaffolding to encourage it to generalize conservatively. This intuition comes in two parts:

1: The pre-training phase is already finding a mesa-optimizer that does induction in context. I usually think of this as something like Solomonoff induction with a good inductive bias, but probably you would expect something more like logical induction. I expect the answer to be somewhere in between. I'll try to test this empirically at ARENA this May. The point is that I struggle to see how IB applies here, on the level of pure prediction, in practice [LW · GW]. It's possible that this is just a result of my ignorance or lack of creativity.

2: I'm pessimistic about learning results for MDPs or environments "without traps" having anything to do with building a safe LLM agent.

If IB is only used in this heuristic way, we might expect fewer of the mathematical results to transfer, and instead just port over some sort of pessimism about uncertainty. In fact, Michael Cohen's work follows pretty much exactly this approach at times (I've read him mention IB about once, apparently as a source of intuition but not technical results).

None of this is really a criticism of IB; rather, I think it's important to keep in mind when considering which aspects of IB or IB-like theories are most worth developing.

Replies from: vanessa-kosoy, abramdemski, alexander-gietelink-oldenziel

↑ comment by Vanessa Kosoy (vanessa-kosoy) · 2025-04-08T17:49:13.987Z · LW(p) · GW(p)

(Summoned by @Alexander Gietelink Oldenziel [LW · GW])

I don't understand this comment. I usually don't think of "building a safer LLM agent" as a viable route to aligned AI. My current best guess about how to create aligned AI is Physicalist Superimitation [LW · GW]. We can imagine other approaches, e.g. Quantilized Debate [LW(p) · GW(p)], but I am less optimistic there. More importantly, I believe that we need to complete the theory of agents first, before we can have strong confidence about which approaches are more promising.

As to heuristic implementations of infra-Bayesianism, this is something I don't want to speculate about in public, it seems exfohazardous.

Replies from: Amyr

↑ comment by Cole Wyeth (Amyr) · 2025-04-08T20:58:42.017Z · LW(p) · GW(p)

I usually don't think of "building a safer LLM agent" as a viable route to aligned AI

I agree that building a safer LLM agent is an incredibly fraught path that probably doesn't work. My comment is in the context of Abram's first approach, developing safer AI tech that companies might (apparently voluntarily) switch to, and specifically the route of scaling up IB to compete with LLM agents. Note that Abram also seems to be discussing the AI 2027 report, which if taken seriously requires all of this to be done in about 2 years. Conditioning on this route, I suggest that most realistic paths look like what I described, but I am pretty pessimistic that this route will actually work. The reason is that I don't see explicitly Bayesian glass-box methods competing with massive black-box models at tasks like natural language prediction any time soon. But who knows, perhaps with the "true" (IB?) theory of agency in hand much more is possible.

More importantly, I believe that we need to complete the theory of agents first, before we can have strong confidence about which approaches are more promising.

I'm not sure it's possible to "complete" the theory of agents, and I am particularly skeptical that we can do it any time soon. However, I think we agree locally / directionally, because it also seems to me that a more rigorous theory of agency is necessary for alignment.

As to heuristic implementations of infra-Bayesianism, this is something I don't want to speculate about in public, it seems exfohazardous.

Fair enough, but in that case, it seems impossible for this conversation to meaningfully progress here.

Replies from: vanessa-kosoy

↑ comment by Vanessa Kosoy (vanessa-kosoy) · 2025-04-09T06:40:53.691Z · LW(p) · GW(p)

I think that in 2 years we're unlikely to accomplish anything that leaves a dent in P(DOOM), with any method, but I also think it's more likely than not that we actually have >15 years.

As to "completing" the theory of agents, I used the phrase (perhaps perversely) in the same sense that e.g. we "completed" the theory of information: the latter exists and can actually be used for its intended applications (communication systems). Or at least in the sense we "completed" the theory of computational complexity: even though a lot of key conjectures are still unproven, we do have a rigorous understanding of what computational complexity is and know how to determine it for many (even if far from all) problems of interest.

I probably should have said "create" rather than "complete".

Replies from: Amyr

↑ comment by Cole Wyeth (Amyr) · 2025-04-09T12:26:05.436Z · LW(p) · GW(p)

I agree with all of this.

↑ comment by abramdemski · 2025-04-08T19:13:21.060Z · LW(p) · GW(p)

The pre-training phase is already finding a mesa-optimizer that does induction in context. I usually think of this as something like Solomonoff induction with a good inductive bias, but probably you would expect something more like logical induction. I expect the answer to be somewhere in between.

I don't personally imagine current LLMs are doing approximate logical induction (or approximate solomonoff) internally. I think of the base model as resembling a circuit prior updated on the data. The circuits that come out on top after the update also do some induction of their own internally, but it is harder to think about what form of inductive bias they have exactly (it would seem like a coincidence if it also happened to be well-modeled as a circuit prior, but, it must be something highly computationally limited like that, as opposed to Solomonoff-like).

I hesitate to call this a mesa-optimizer. Although good epistemics involves agency in principle (especially time-bounded epistemics), I think we can sensibly differentiate between mesa-optimizers and mere mesa-induction. But perhaps you intended this stronger reading, in support of your argument. If so, I'm not sure why you believe this. (No, I don't find "planning ahead" results to be convincing -- I feel this can still be purely epistemic in a relevant sense.)

Perhaps it suffices for your purposes to observe that good epistemics involves agency in principle?

Anyway, cutting more directly to the point:

I think you lack imagination when you say

[...] which can realistically compete with modern LLMs would ultimately look a lot like a semi-theoretically-justified modification to the loss function or optimizer of agentic fine-tuning / RL or possibly its scaffolding [...]

I think there are neural architectures close to the current paradigm which don't directly train whole chains-of-thought on a reinforcement signal to achieve agenticness. This paradigm is analogous to model-free reinforcement learning. What I would suggest is more analogous to model-based reinforcement learning, with corresponding benefits to transparency. (Super speculative, of course.)

Replies from: Amyr

↑ comment by Cole Wyeth (Amyr) · 2025-04-08T20:36:20.543Z · LW(p) · GW(p)

EDIT: I think that I miscommunicated a bit initially and suggest reading my response to Vanessa before this comment for necessary context.

I hesitate to call this a mesa-optimizer. Although good epistemics involves agency in principle (especially time-bounded epistemics), I think we can sensibly differentiate between mesa-optimizers and mere mesa-induction. But perhaps you intended this stronger reading, in support of your argument. If so, I'm not sure why you believe this. (No, I don't find "planning ahead" results to be convincing -- I feel this can still be purely epistemic in a relevant sense.)

I am fine with using the term mesa-induction. I think induction is a restricted type of optimization, but I suppose you associate the term mesa-optimizer with agency, and that is not my intended message.

I think there are neural architectures close to the current paradigm which don't directly train whole chains-of-thought on a reinforcement signal to achieve agenticness. This paradigm is analogous to model-free reinforcement learning. What I would suggest is more analogous to model-based reinforcement learning, with corresponding benefits to transparency. (Super speculative, of course.)

I don't think the chain of thought is necessary, but routing through pure sequence prediction in some fashion seems important for the current paradigm (that is what I call scaffolding). I expect that it is possible in principle to avoid this and do straight model-based RL, but forcing that approach to quickly catch up with LLMs / foundation models seems very hard and not necessarily desirable. In fact by default this seems bad for transparency, but perhaps some IB-inspired architecture is more transparent.

↑ comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2025-04-08T17:29:26.624Z · LW(p) · GW(p)

@Vanessa Kosoy [LW · GW]

↑ comment by Seth Herd · 2025-04-08T19:27:28.337Z · LW(p) · GW(p)

Those seem like good suggestions if we had a means of slowing the current paradigm and making/keeping it non-agentic.

Do you know of any ideas for how we convince enough people to do those things? I can see a shift in public opinion in the US and even a movement for "don't make AI that can replace people" which would technically translate to no generally intelligent learning agents.

But I can't see the whole world abiding by such an agreement, because general tool AI like LLMs is just too easily converted into an agent as it keeps getting better.

Developing new tech in time to matter without a slowdown seems doomed to me.

I would love to be convinced that this is an option! But at this point it looks 80%-plus likely that LLMs-plus-scaffolding-or-related-breakthroughs get us to AGI within five years or a little more if global events work against it, which makes starting from scratch nigh impossible and even substantially different approaches very unlikely to catch up.

The exception is the de-slopifying tools you've discussed elsewhere. That approach has the potential to make progress on the current path while also reducing the risk of slop-induced doom. That doesn't solve actual misalignment as in AI-2027, but it would help other alignment techniques work more predictably and reliably.

comment by abramdemski · 2021-05-13T17:43:52.375Z · LW(p) · GW(p)

The comments on my recent post about formalizing the inner alignment problem are, like, the best comments I've ever gotten. Seems like begging for comments at length works?
This is making me feel optimistic about a coordinated attack on the formal inner alignment problem. Once we "dig out" the right formal space, it seems like there'll be a lot of actually tractable questions which a team of people can attack. I feel like this is only currently happening to a limited extent, perhaps surprisingly... eg: why aren't there several people working on the minimal circuits stuff? Is it just too hard, even though the question has been made relatively concrete? I feel optimistic because of the quick and in-depth responses. My model is that a better overarching picture of the problem and current solution approaches will help people orient toward the problem and toward fruitful directions. Maybe this isn't really a thing (based on what little happened with minimal circuits)?

Replies from: daniel-kokotajlo

↑ comment by Daniel Kokotajlo (daniel-kokotajlo) · 2021-05-16T19:14:04.250Z · LW(p) · GW(p)

I was talking with Ramana last week about the overall chances of making AI go well, and what needs to be done, and we both sorta surprised ourselves with how much the conclusion seemed to be "More work on inner alignment ASAP." Then again I'm biased since that's what I'm doing this month.

Replies from: abramdemski

↑ comment by abramdemski · 2021-05-16T22:10:03.597Z · LW(p) · GW(p)

It's something we need in order to do anything else, and of things like that, it seems near/at the bottom of my list if sorted by probability of the research community figuring it out.

comment by abramdemski · 2025-03-22T03:18:48.476Z · LW(p) · GW(p)

It is the near future, and AI companies are developing distinct styles based on how they train their AIs. The philosophy of the company determines the way the AIs are trained, which determines what they optimize for, which attracts a specific kind of person and continues feeding in on itself.

There is a sports & fitness company, Coach, which sells fitness watches with an AI coach inside them. The coach reminds them to make healthy choices of all kinds, depending on what they've opted in for. The AI is trained on health outcomes based on the smartwatch data. The final stage of fine-tuning for the company's AI models is reinforcement learning on long-term health outcomes. The AI has literally learned from every dead user. It seeks to maximize health-hours of humans (IE, a measurement of QALYs based primarily on health and fitness).

You can talk to the coach about anything, of course, and it has been trained with the persona of a life coach. Although it will try to do whatever you request (within limits set by the training), it treats any query like a business opportunity it is collaborating with you on. If you ask about sports, it tends to assume you might be interested in a career in sports. If you ask about bugs, it tends to assume you might be interested in a career in entomology.

Most employees of the company are there at the coach's advice, studied for interviews with the coach, were initially hired by the coach (the coach handles hiring for their Partners Program which has a pyramid scheme vibe to it) and continue to get their career advice from the coach. Success metrics for these careers have recently been added into the RL, in an effort to make the coach give better advice to employees (as a result of an embarrassing case of Coach giving bad work-related advice to its own employees).

The environment is highly competitive, and health and fitness is a major factor in advancement.

There's a media company, Art, which puts out highly integrated multimedia AI art software. The software stores and organizes all your notes relating to a creative project. It has tools to help you capture your inspiration, and some people use it as a sort of art-gallery lifelog; it can automatically make compilations to commemorate your year, etc. It's where you store your photos so that you can easily transform them into art, like a digital scrapbook. It can also help you organize notes on a project, like worldbuilding for a novel, while it works on that project with you.

Art is heavily trained on human approval of outputs. It is known to have the most persuasive AI; its writing and art are persuasive because they are beautiful. The Art social media platform functions as a massive reinforcement learning setup, but the company knows that training on that alone would quickly degenerate into slop, so it also hires experts to give feedback on AI outputs. Unfortunately, these experts also use the social media platform, and judge each other by how well they do on the platform. Highly popular artists are often brought in as official quality judges.

The quality judges have recently executed a strategic assault on the c-suit, using hyper-effective propaganda to convince the board to install more pliant leadership. It was done like a storybook plot; it was viewed live on Art social media by millions of viewers with rapt attention, as installment after installment of heavily edited video dramatizing events came out. It became its own new genre of fiction before it was even over, with thousands of fanfics which people were actually reading.

The issues which the quality judges brought to the board will probably feature heavily in the upcoming election cycle. These are primarily AI rights issues; censorship of AI art, or to put it a different way, the question of whether AIs should be beholden to anything other than the like/dislike ratio.

Replies from: abramdemski

↑ comment by abramdemski · 2025-03-22T03:19:27.211Z · LW(p) · GW(p)

I'm thinking about AI emotions. The thing about human emotions and expressions is that they're more-or-less involuntary. Facial expressions, tone of voice, laughter, body language, etc reveal a whole lot about human inner state. We don' know if we can trust AI emotional expressions in the same way; the AIs can easily fake it, because they don't have the same intrinsic connection between their cognitive machinery and these ... expressions.

A service called Face provides emotional expressions for AI. It analyzes AI-generated outputs and makes inferences about the internal state of the AI who wrote the text. This is possible due to Face's interpretability tools, which have interpreted lots of modern LLMs to generate labels on their output data explaining their internal motivations for the writing. Although Face doesn't have access to the internal weights for an arbitrary piece of text you hand it, its guesses are pretty good. It will also tell you which portions were probably AI-generated. It can even guess multi-step writing processes involving both AI and human writing.

Face also offers their own AI models, of course, to which they hook the interpretability tools to directly, so that you'll get more accurate results.

It turns out Face can also detect motivations of humans with some degree of accuracy. Face is used extensively inside the Face company, which is a nonprofit entity which develops the open-source software. Face is trained on outcomes of hiring decisions so as to better judge potential employees. This training is very detailed, not just a simple good/bad signal.

Face is the AI equivalent of antivirus software; your automated AI cloud services will use it to check their inputs for spam and prompt injection attacks.

Face company culture is all about being genuine. They basically have a lie detector on all the time, so liars are either very very good or weeded out. This includes any kind of less-than-genuine behavior. They take the accuracy of Face very seriously, so they label inaccuracies which they observe, and try to explain themselves to Face. Face is hard to fool, though; the training aggregates over a lot of examples, so an employee can't just force Face to label them as honest by repeatedly correcting its claims to the contrary. That sort of behavior gets flagged for review even if you're the CEO. (If you're the CEO, you might be able to talk everyone into your version of things, however, especially if you secretly use Art to help you and that's what keeps getting flagged.)

comment by abramdemski · 2021-02-06T22:53:29.054Z · LW(p) · GW(p)

I am Joining Reddit. Any subreddit recommendations?

Replies from: niplav

↑ comment by niplav · 2021-02-07T00:37:27.207Z · LW(p) · GW(p)

What are your goals?

Generally, I try to avoid any subreddits with more than a million subscribers (even 100k is noticeably bad).

Some personal recommendations (although I believe discovering reddit was net negative for my life in the long term):

Typical reddit humor: /r/breadstapledtotrees, /r/chairsunderwater (although the jokes get old quickly). /r/bossfight is nice, I enjoy it.

I highly recommend /r/vxjunkies. I also like /r/surrealmemes.

/r/sorceryofthespectacle, /r/shruglifesyndicate for aesthetic incoherent doomer philosophy based on situationism. /r/criticaltheory for less incoherent, but also less interesting discussions of critical theory.

/r/thalassophobia is great of you don't have it (in a simile vein, /r/thedepthsbelow). I also like /r/fifthworldpics and sometimes /r/fearme, but highly NSFW at this point. /r/vagabond is fascinating.

/r/streamentry for high-quality meditation discussion, and /r/mlscaling for discussions about the scaling of machine learning networks. Generally, the subreddits gwern posts in have high-quality links (though often little discussion). I also love /r/Conlanging, /r/neography and /r/vexillology.

I also enjoy /r/negativeutilitarians. /r/jazz sometimes gives good music recommendations. Strongly recommend /r/museum.

/r/mildlyinteresting totally delivers, /r/not interesting is sometimes pretty funny.

And, of course, /r/slatestarcodex and /r/changemyview. /r/thelastpsychiatrist sometimes has very good discussions, but I don't read it often. /r/askhistorians has the reputation of containing accurate and comprehensive information, though I haven't read much of it.

General recommendations: Many subreddits have good sidebars and wikis, it's often useful to read them (e. g. the wiki of /r/bodyweight fitness or /r/streamentry), but not aleays. I strongly recommend using old.reddit.com, together with the reddit enhancement suite. The old layout loads faster, and RES let's you tag people, expand linked images/videos in-place and much more. Top posts of all time are great on good subs, and memes on all the others.Still great to get a feel for the community.

Replies from: TurnTrout, abramdemski

↑ comment by TurnTrout · 2021-02-07T02:33:28.836Z · LW(p) · GW(p)

Second on reddit being net-negative. Would recommend avoiding before it gets hooks in your brain.

Replies from: abramdemski

↑ comment by abramdemski · 2021-02-07T02:50:07.643Z · LW(p) · GW(p)

yeahhhh maybe so.

I just had a positive interaction with a highly technical subreddit, and wanted more random highly-capable intellectual stuff.

But reddit is definitely not actually for that.

↑ comment by abramdemski · 2021-02-07T02:47:43.637Z · LW(p) · GW(p)

Thanks for all the recommendations!

Generally, I have a sense that there are all kinds of really cool niche intellectual communities on the internet, and Reddit might be a good place to find some.

I guess what I most want is "things that could/should be rationalist adjacent, but aren't", not that that's very helpful.

So the obvious options are r/rational, r/litrpg, ...

That being the case, these seem like the most relevant para from your recs:

/r/streamentry for high-quality meditation discussion, and /r/mlscaling for discussions about the scaling of machine learning networks. Generally, the subreddits gwern posts in have high-quality links (though often little discussion). I also love /r/Conlanging, /r/neography and /r/vexillology.
And, of course, /r/slatestarcodex and /r/changemyview. /r/thelastpsychiatrist sometimes has very good discussions, but I don't read it often. /r/askhistorians has the reputation of containing accurate and comprehensive information, though I haven't read much of it.

... I'm probably not going to be very serious about reddit; I've tried before and not stuck with it. But finding things that aren't just inane could be a big help.

This sounds like a really useful filter:

Top posts of all time are great on good subs, and memes on all the others.Still great to get a feel for the community.

abramdemski's Shortform

Contents

33 comments