Posts

RAND report finds no effect of current LLMs on viability of bioterrorism attacks 2024-01-25T19:17:30.493Z
The View from 30,000 Feet: Preface to the Second EleutherAI Retrospective 2023-03-07T16:22:08.370Z

Comments

Comment by StellaAthena on RAND report finds no effect of current LLMs on viability of bioterrorism attacks · 2024-01-26T01:20:41.162Z · LW · GW

This is one area where I hope the USG will be able to exert coercive force to bring companies to heel. Early access evals, access to base models, and access to training data seem like no-brainers from a regulatory POV.

Comment by StellaAthena on RAND report finds no effect of current LLMs on viability of bioterrorism attacks · 2024-01-25T21:45:32.117Z · LW · GW

I think you're misrepresenting Gwern's argument. He's arguing that terrorists are not optimizing for killing the most people. He makes no claims about whether terrorists are scientifically incompetent.

Comment by StellaAthena on RAND report finds no effect of current LLMs on viability of bioterrorism attacks · 2024-01-25T21:38:23.515Z · LW · GW

Thanks! I like your title more :)

Comment by StellaAthena on Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training · 2024-01-16T02:32:15.104Z · LW · GW

It seems helpful to me if policy discussions can include phrases like "the evidence suggests that if the current ML systems were trying to deceive us, we wouldn't be able to change them not to".

I take this as evidence that TurnTrout's fears about this paper are well-grounded. This claim is not meaningfully supported by the paper, but I expect many people to repeat it as if it is supported by the paper.

Comment by StellaAthena on Integrity in AI Governance and Advocacy · 2023-12-31T13:18:50.632Z · LW · GW

We ended up talking about this in DMs, but to gist of it is:

Back in June Hoagy opened a thread in our "community research projects" channel and the work migrated there. Three of the five authors of the [eventual paper](https://arxiv.org/abs/2309.08600) chose to have EleutherAI affiliation (for any work we organize with volunteers, we tell them they're welcome to use an EleutherAI affiliation on the paper if they like) and we now have an entire channel dedicated to future work. I believe Hoagy has two separate paper ideas currently in the works and over a half dozen people working on them.

Comment by StellaAthena on Integrity in AI Governance and Advocacy · 2023-12-31T03:59:38.067Z · LW · GW

Ooops. It appeared that I deleted my comment (deeming it largely off-topic) right as you were replying. I'll reproduce the comment below, and then reply to your question.

I separately had a very weird experience with them on the Long Term Future Fund where Conor Leahy applied for funding for Eleuther AI. We told him we didn't want to fund Eleuther AI since it sure mostly seemed like capabilities-research but we would be pretty interested in funding AI Alignment research by some of the same people. He then confusingly went around to a lot of people around EleutherAI and told them that "Open Phil is not interested in funding pre-paradigmatic AI Alignment research and that that is the reason why they didn't fund Eleuther AI". This was doubly confusing and misleading because Open Phil had never evaluated a grant to Eleuther AI (Asya who works at Open Phil was involved in the grant evaluation as a fund member, but nothing else), and of course the reason he cited had nothing to do with the reason we actually gave. He seems to have kept saying this for a long time even after I think someone explicitly corrected the statement to

While this anecdote is largely orthogonal to the broader piece, I remembered that this existed randomly today and wanted to mention that Open Phil has recommended a 2.6 M/3 years grant to EleutherAI to pursue interpretability research. It was a really pleasant and very easy experience: Nora Belrose (head of interpretability) and I (head of everything) talked with them about some of our recent and on-going work such as Eliciting Latent Predictions from Transformers with the Tuned Lens, Eliciting Latent Knowledge from Quirky Language Models, and Sparse Autoencoders Find Highly Interpretable Features in Language Models very interesting and once they knew we had shared areas of interest it was a really easy experience.

I had no vibes along the lines of "oh we don't like EleutherAI" or "we don't fund pre-paradigmatic research." It was a surprise to some people at Open Phil that we had areas of overlapping interest, but we spent like half an hour clarifying our research agenda and half an hour talking about what we wanted to do next and people were already excited.

Comment by StellaAthena on Propaganda or Science: A Look at Open Source AI and Bioterrorism Risk · 2023-11-02T22:55:29.241Z · LW · GW

I agree that a control group is vital for good science. Nonetheless, I think that such an experiment is valuable and informative, even if it doesn't meet the high standards required by many professional science disciplines. I believe in the necessity of acting under uncertainty. Even with its flaws, this study is sufficient evidence for us to want to enact temporary regulation at the same time as we work to provide more robust evaluations.

But... this study doesn't provide evidence that LLMs increase bioweapon risk.

Comment by StellaAthena on Introducing the Center for AI Policy (& we're hiring!) · 2023-10-09T16:09:31.684Z · LW · GW

It doesn't let the government institute prior restraint on speech.

Comment by StellaAthena on Introducing the Center for AI Policy (& we're hiring!) · 2023-09-08T20:37:19.047Z · LW · GW

So far, I'm confident that our proposals will not impede the vast majority of AI developers, but if we end up receiving feedback that this isn't true, we'll either rethink our proposals or remove this claim from our advocacy efforts.

It seems to me like you've received this feedback already in this very thread. The fact that you're going to edit the claim to basically say "this doesn't effect most people because most people don't work on LLMs" completely dodges the actual issue here, which is that there's a large non-profit and independent open source LLM community that this would heavily impact.

I applaud your honestly in admitting one approach you might take is to "remove this claim from our advocacy efforts," but am quite sad to see that you don't seem to care about limiting the impact of your regulation to potentially dangerous models.

Comment by StellaAthena on Introducing the Center for AI Policy (& we're hiring!) · 2023-09-08T20:28:35.309Z · LW · GW
Comment by StellaAthena on Introducing the Center for AI Policy (& we're hiring!) · 2023-08-30T05:52:34.518Z · LW · GW

Nora didn't say that this proposal is harmful. Nora said that if Zach's explanation for the disconnect between their rhetoric and their stated policy goals is correct (namely that they don't really know what they're talking about) then their existence is likely net-harmful.

That said, yes requiring everyone who wants to finetune LLaMA 2 get a license would be absurd and harmful. la3orn and gallabyres articulate some reasons why in this thread.

Another reason is that it's impossible to enforce, and passing laws or regulations and then not enforcing them is really bad for credibility.

Another reason is that the history of AI is a history of people ignoring laws and ethics so long as it makes them money and they can afford to pay the fines. Unless this regulation comes with fines so harsh that they remove all possibility of making money off of models, OpenAI et al. won't be getting licenses. They'll just pay the fines while small scale and indie devs (who allegedly the OP is specifically hoping to not impact) screech their work to a halt and wait for the government to tell them it's okay for them to continue to do their work.

Also, such a regulation seems like it would be illegal in the US. While the government does have wide latitude to regulate commercial activities that impact multiple states, this is rather specifically a proposal that would regulate all activity (even models that never get released!). I'm unaware of any precedent for such an action, can you name one?

Comment by StellaAthena on Introducing the Center for AI Policy (& we're hiring!) · 2023-08-30T03:40:32.547Z · LW · GW

CAIP is also advised by experts from other organizations and is supported by many volunteers.

Who are the experts that advise you? Are claims like "our proposals will not impede the vast majority of AI developers" vetted by the developers you're looking to avoid impacting?

Comment by StellaAthena on Specific Arguments against open source LLMs? · 2023-07-30T22:04:34.128Z · LW · GW

It’s always interesting to see who has legitimacy in the eyes of mainstream media. The “other companies” mentioned are EleutherAI and Open Future, both of whom co-authored the letter, and LAION who signed it. All three orgs are major players in the open source AI space, and EAI & LAION are arguably bigger than GitHub and CC given that this is specifically about the impact of the EU AI Act on open source large scale AI R&D. Of course, MSN’s target audience hasn’t heard of EleutherAI or LAION.

Note that other orgs have also done blog posts on this topic: EleutherAI (co-written by me), Hugging Face, Creative Commons, Open Future.

Comment by StellaAthena on Manifold Predicted the AI Extinction Statement and CAIS Wanted it Deleted · 2023-06-12T18:00:51.278Z · LW · GW

It's extremely difficult to create a fraudulent company and get it listed on the NYSE. Additionally, the Exchange can and does stop trading on both individual stocks and the exchange as a whole, though due to the downstream effects on consumer confidence this is only done rarely.

I don't know what lessons one should learn from the stock market regarding MM, but I don't think we should rush to conclude MM shouldn't intervene or shouldn't be blamed for not intervening.

Comment by StellaAthena on Terry Tao is hosting an "AI to Assist Mathematical Reasoning" workshop · 2023-06-06T22:23:40.366Z · LW · GW

I don’t understand the community obsession with Tao and recruiting him to work on alignment. This is a thing I hear about multiple times a year with no explanation of why it would be desirable other than “he’s famous for being very smart.”

I also don’t see why you’d think there’s be an opportunity to do this… it’s an online event, which heavily limits the ability to corner him in the hallway. It’s not even clear to me that you’d have an opportunity to speak with him… he’s moderating several discussions and panels, but any submitted questions to said events would go to the people actually in the discussions not the moderator.

Can you elaborate on what you’re actually thinking this would look like?

Comment by StellaAthena on [deleted post] 2023-05-09T06:30:51.670Z

Red teaming has always been a legitimate academic thing? I don’t know what background you’re coming from but… you’re very far off.

But yes, the event organizers will be writing a paper about it and publishing the data (after it’s been anonymized).

Comment by StellaAthena on [deleted post] 2023-05-05T00:21:01.779Z

What deployed LLM system does Tesla make that you think should be evaluated alongside ChatGPT, Bard, etc?

Comment by StellaAthena on [deleted post] 2023-05-04T23:48:59.006Z

Hi, I’m helping support the event. I think that some mistranslation happened by a non-AI person. The event is about having humans get together and do prompt hacking and similar on a variety of models side-by-side. ScaleAI built the app that’s orchestrating the routing of info, model querying, and human interaction. Scale’s platform isn’t doing the evaluation itself. That’s being done by users on-site and then by ML and security researchers analyzing the data after the fact.

Comment by StellaAthena on My Objections to "We’re All Gonna Die with Eliezer Yudkowsky" · 2023-03-22T00:14:00.779Z · LW · GW

I think there's a mistake here which kind of invalidates the whole post. Ice cream is exactly the kind of thing we’ve been trained to like. Liking ice cream is very much the correct response.

Everything outside the training distribution has some value assigned to it. Merely the fact that we like ice cream isn’t evidence that something’s gone wrong.

Comment by StellaAthena on The Waluigi Effect (mega-post) · 2023-03-08T13:32:29.753Z · LW · GW

I agree completely. This is a plausible explanation, but it’s one of many plausible explanations and should not be put forward as a fact without evidence. Unfortunately, said evidence is impossible to obtain due to OpenAI’s policies regarding access to their models. When powerful RLHF models begin to be openly released, people can start testing theories like this meaningfully.

Comment by StellaAthena on Basic facts about language models during training · 2023-02-22T15:10:30.992Z · LW · GW

Linear warm-up over the first 10% of training, then cosine decay to a minimum of one-tenth the peak LR which is set to occur at the end of training (300B tokens). Peak LRs vary by model but are roughly consistent with GPT-3 and OPT values. You can find all the config details on GitHub. The main divergence relevant to this conversation from mainstream approaches is that we use a constant batch size (2M) throughout scaling. Prior work uses batch sizes up to 10x smaller for the smallest models, but we find that we can train large batch small models without any problems. This enables us to achieve a substantial wall-clock speed-up for small models by throwing more GPUs at them. We continue to use this batch size for the 11B model for consistency, although the standard progression of batch sizes would encourage one of 3M or 4M by that point.

Checkpoint 20 and 40 are at 20k and 40k iterations respectively, and the entire training runs for 143k iterations. So they occur relatively shortly after the LR peaks, but don't coincide with anything I know to be particularly special.

Comment by StellaAthena on Basic facts about language models during training · 2023-02-21T17:38:36.251Z · LW · GW

This is really exciting work to see, and exactly the kind of thing I was hoping people would do when designing the Pythia model suite. It looks like you're experimenting with the 5 smallest models, but haven't done analysis on the 2.8B, 6.9B, or 12B models. Is that something you're planning on adding, or no?

I am really very surprised that the distributions don't seem to match any standard parameterized distribution. I was fully ready to say "okay, let's retrain some of the smaller Pythia models initialized using the distribution you think the weights come from" but apparently we can't do that easily. I suppose we can do a MCMC sampler? In general, it seems like a natural follow-up to the contents of this post is to change the way we initialize things in models, retrain them, and see what happens (esp. with the loss curve). If that's something you'd like to collaborate with EleutherAI about, I would be more than happy to arrange something :)

In general, the reliability of the things you're seeing across model scales is really cool. I agree that it seems to refute some of the theoretical assumptions of the NTK literature, but I wonder if perhaps it's consistent with the Tensor Programs work by Greg Yang et al. that lead to muP.

To clarify what's going on with the Pythia models:

  1. This work appears to be using the initial model release, which has an inconsistent naming scheme. Some models were named based on total parameters, while others were named based on the number of learnable parameters. The former is what models are typically named based on, but the later is what people put on the x-axis of scaling laws plots. This is a nomenclature change only with no impact on results.
  2. Shortly after release, we renamed the models to be consistently named using the total number of parameters. The models studied in this post are currently named 70M, 160M, 410M, 1B, and 1.4B.
  3. When writing the paper for these models, we discovered a handful of inconsistencies in the suite's hyperparameters. Specifically, the batch size and some all-reduce optimizations were inconsistent across training. We expect this to have no impact on the OP or 90% of experiments using the suite. That said, if we're going to spend all this compute to design a suite for controlled scientific experiments, it should control for as many factors as possible. The current models will remain public and people are encouraged to compare results across them to further validate that various properties don't impact the behavior that they're finding.
Comment by StellaAthena on Anomalous tokens reveal the original identities of Instruct models · 2023-02-09T04:34:29.293Z · LW · GW

This is excellent work, though I want to generically recommend caution when making assumptions about the success of such attacks based only on blackbox evaluations. Thorough analysis of false positive and false negative rates with ground-truth access (ideally in an adversarially developed setting) is essential for validation. [Sidebar: this reminds me that I really need to write up my analysis in the EleutherAI discord showing why prompt extraction attacks can be untrustworthy]

That said, this is really excellent work and I agree it looks quite promising.

Comment by StellaAthena on Basic Facts about Language Model Internals · 2023-01-04T21:03:59.096Z · LW · GW

Do you have a reference to the work you’re talking about? I’m doing some stuff involving fitting curves to activation tails currently.

Comment by StellaAthena on Basic Facts about Language Model Internals · 2023-01-04T17:54:11.132Z · LW · GW

This is very interesting. The OP doesn’t contain any specific evidence of Gaussianness, so it would be helpful if they could provide an elaboration of what evidence lead them to conclude these are Gaussian.

Comment by StellaAthena on Basic Facts about Language Model Internals · 2023-01-04T15:13:10.667Z · LW · GW

I’m not sure when you developed this work, but the LLM.int8 paper identifies outliers as an essential factor in achieving performance for models larger than 2.7B parameters (see Fig. 1 and Fig. 3 especially). There’s also some follow-up work here and here. Very curiously, the GLM-130B paper reports that they don’t see outlier features at all, or the negative effects of their lack of impact.

I’ve spoken with Tim (LLM.int8 lead author) about this a bit and some people in EleutherAI, and I’m wondering if there’s some kind of explicit or implicit regularizing effect in the GLM model that prevents it from learning outlier features. If this is the case, one might expect to find different patterns in outliers in models with sufficiently different architecture, perhaps GPT-2 vs Pythia vs GLM vs T5

Comment by StellaAthena on Are Mixture-of-Experts Transformers More Interpretable Than Dense Transformers? · 2023-01-01T16:50:27.322Z · LW · GW

I think that the answer is no, and that this reflects a common mental barrier when dealing with gradient descent. You would like different experts to specialize in different things in a human-interpretable way, but Adam doesn’t care what you say you want. Adam only cares about what you actually write down in the loss function.

Generally, a useful line of thinking when dealing with lines of thought like this is to ask yourself if your justification for why something should happen already justifies something that is known to not happen. If so, it’s probably flawed.

In this case there is: as far as I can tell, your justification applies to multiheaded attention (as an improvement over single headed attention). While there has been some attempts to examine MHA as an interpretability magnifying technique, in practice there hasn’t really been much success. Whatever story you tell about why it should work with MoE needs to distinguish MoE from MHA.

I think this question matters because it doesn't seem implausible to me that MoE models could be at par with dense models in terms of capabilities.

There are two regimes when talking about scaling LLMs, and I think it’s very important to keep them separate when talking about things like this. The literature on scaling laws was written by researchers at a very small number of companies that have a very important and non-standard situation: they are predicated upon the assumption that using twice as many GPUs for half as long doesn’t impact costs. It’s hard to overstate how few people fall into this regime.

I run EleutherAI, the non-profit org that has trained more and larger multi-billion parameter LLMs than any other non-profit in the world, and have worked on three different models that held the title “largest publicly available GPT-3-like LLM in the world.” I have access to thousands of A100 GPUs to train models if I really want to, and recently won a USG grant for 6 million V100 hours. I generally do not operate in this regime.

The regime that almost everyone finds themselves in is one where one day the VRAM runs out. Maybe it’s at a pair of 3090 Tis, maybe it’s at a v3-8 TPU, maybe it’s at a DGX machine. But one day you lose the ability to halve your runtime by doubling the amount of VRAM you are using without impacting costs.

In this “VRAM-constrained regime,” MoE models (trained from scratch) are nowhere near competitive with dense LLMs. While there has been some success at turning dense models into MoE models with less performance loss, that work isn’t really relevant to your hypothesis without a substantial amount of additional intellectual work. MoE models are egregiously inefficient in terms of performance-per-VRAM, but compensate by being more efficient in terms of performance-per-FLOP.

How egregious exactly? Well the first MoE paper I grabbed claims that their 1.1T parameter MoE model performs similarly to a 6.7B parameter dense model and that their 207B parameter MoE model performs similarity to a 1.3B parameter model. To put these numbers in prospective: the (currently unverified) claims NVIDIA is making about quantization on their H100 GPUs would enable you to fit a 640B parameter model on an 8xH100 (80GB) device. So you can use an entire 8xH100 machine to fit a MoE model, or you can use a single 3090 Ti and get better performance (using LLM.int8).

Edit: in a reply to the other answer you say

I'm not saying that MoE are more interpretable in general. I'm saying that for some tasks, the high level view of "which expert is active when and where" may be enough to get a good sense of what is going on.

I had misread your claim, but I think the intent of my response is still valid. Even with this more specific claim, you see people aspiring to believe that this is true for MHA and coming up (largely, albeit not entirely) empty. There’s still a significant burden on you to show why your position is better than the same position with the word “MoE” replaced with “MHA.”

Comment by StellaAthena on woke offline, anti-woke online · 2023-01-01T16:04:33.034Z · LW · GW

What sources do you have for your claim that “large groups” of people believe this?

Comment by StellaAthena on Extracting and Evaluating Causal Direction in LLMs' Activations · 2022-12-28T07:15:28.193Z · LW · GW

Hi! I recently trained a suite of models ranging from 19M to 13B parameters with the goal of promoting research on LLM interpretability. I think it would be awesome to try out these experiments on the model suite and look at how the results change as the models scale. If your code used the HF transformers library it should work more or less out of the box with my new model suite.

You can find out more here: https://twitter.com/AiEleuther/status/1603755161893085184?s=20&t=6xkBsYckPcNZEYG8cDD6Ag

Comment by StellaAthena on Is AI Progress Impossible To Predict? · 2022-05-17T02:25:16.521Z · LW · GW

Individual MMMLU tasks are extremely noisy. They’re so noisy that the paper actually specifically recommends that you don’t draw conclusions from performance on individual tasks and instead look at four high level topical categories. The individual tasks also have extremely large variances in their variance. Some of them are pretty easy for a college educated adult, while others have genuine experts scoring less than 80%.

This is compounded by the fact that the sample sizes vary wildly. While many of the tasks have around 100 questions, while at the other extreme there is a task with 1534 questions. The aggregated topics however have the same number of questions per topic, because the task was explicitly designed for analysis along those lines.

I don’t know the extent to which these issues plague the other evaluations, but I think more care needs to be taken before drawing conclusions with highly noisy data.

Comment by StellaAthena on Why hasn't deep learning generated significant economic value yet? · 2022-05-01T14:01:19.444Z · LW · GW

I agree with what Gwern said about things being behind-the-scenes, but it's also worth noting that there are many impactful consumer technologies that use DL. In fact, some of the things that you don't think exist actually do exist!

Examples of other DL-powered consumer applications

Comment by StellaAthena on Why hasn't deep learning generated significant economic value yet? · 2022-05-01T13:46:52.310Z · LW · GW
Comment by StellaAthena on Challenges to Yudkowsky's Pronoun Reform Proposal · 2022-03-20T17:20:31.085Z · LW · GW

Interesting. Thank you.

To be clear, you now understand that the content of the sentence "I am a transgender man" is more or less "contrary to popular opinion, I am in fact a man and not a woman"? And that pronouns only even come up because they are one of the many ways people convey assessments of gender?

Comment by StellaAthena on Challenges to Yudkowsky's Pronoun Reform Proposal · 2022-03-20T17:13:42.982Z · LW · GW

I'm not even going to pretend to address the first half of your comment. You're making extreme jumps of logic that are in no way justified by the conversation.

So that is the strong-request/demand that it's reasonable for people to get from "society".  (If people in power were unambiguously saying "In order to be polite and not be called bad, you must think of these people in a certain way", then I think there would be revolts.)  If someone hasn't become emotionally close friends with any trans people, I'd say it's not too surprising if they haven't picked up on something subtler than "socially enforced rules".

The content of the sentence "I am a transgender man" is more or less "contrary to popular opinion, I am in fact a man and not a woman." This has nothing to do with socially enforced rules and everything to do with the basic meaning of language. I did not realize that it was common for people to not know what the word "transgender" means.

Comment by StellaAthena on Challenges to Yudkowsky's Pronoun Reform Proposal · 2022-03-16T18:57:53.219Z · LW · GW

As a cis person who has interacted occasionally with trans people for the past ten years, it literally never occurred to me until last year that what trans people were asking me to do was actually reconsider my impression of their gender! I sincerely thought they were just asking me to memorize a different word to call them. I will at least try out a "reconsidering" process the next time I regularly interact with a trans person IRL and see whether it works. (I have also never read about what kind of "reconsidering" processes work for people, but I have some guesses for how I could approach it.

Can you elaborate on this? I am extremely surprised by this attitude and want to learn how to prevent similar miscommunications in the future.

Comment by StellaAthena on [deleted post] 2022-02-20T08:03:08.120Z

To do this, we'll start by offering alignment as a service for more limited AIs. Value extrapolation scales down as well as up: companies value algorithms that won't immediately misbehave in new situations, algorithms that will become conservative and ask for guidance when facing ambiguity.

What are examples of AIs you think you can currently align and how much (order of magnitude, say) would it cost to have you align one for me? If I have a 20B parameter language model, can you align it for me?

Comment by StellaAthena on Compute Trends Across Three eras of Machine Learning · 2022-02-17T18:09:17.149Z · LW · GW

The distinction between "large scale era" and the rest of DL looks rather suspicious to me. You don't give a meaningful defense of which points you label "large scale era" in your plot and largely it looks like you took a handful of the most expensive models each year to give a different label to.

On what basis can you conclude that Turing NLG, GPT-J, GShard, and Switch Transformers aren't part of the "large scale era"? The fact that they weren't literally the largest models trained that year?

There's also a lot of research that didn't make your analysis, including work explicitly geared towards smaller models. What exclusion criteria did you use? I feel like if I was to perform the same analysis with a slightly different sample of papers I could come to wildly divergent conclusions.

Comment by StellaAthena on Visible Thoughts Project and Bounty Announcement · 2021-12-02T05:52:39.103Z · LW · GW

1:  I expect that it's easier for authors to write longer thoughtful things that make sense;

I pretty strongly disagree. The key thing I think you are missing here is parallelism: you don't want one person to write you 100 different 600 page stories, you one person to organize 100 people to write you one 600 page story each. And it's a lot easier to scale if you set the barrier of entry lower. There are many more people who can write 60 page stories than 600 page stories, and it's easier to find 1,000 people to write 60 pages each than it is to find 100 people to write 600 pages each. There's also much less risk on both your side and theirs. If someone drops out half way through writing you lose 30 pages not 300.

Based on this comment:

I state: we'd be happy, nay, ecstatic, to get nice coherent complete shorter runs, thereby disproving my concern that short runs won't be possible to complete, and to pay for them proportionally.

I'm now under the impression that you'd be willing to pay out the 20k for 10 runs of 100 steps each (subject to reasonable quality control) and bringing that about was my main goal in commenting.

The other major worry I have about this pitch is the experimental design. I'm still happy you're doing this, but this doesn't seem to be the best project crafting in my mind. Briefly my concerns are:

  1. This is a very topically specific ask of unclear generalization. I would prefer a more generic ask that is not directly connected to D&D.
  2. In my experience training large language models, the number of examples is more important than the length of examples. Training on 100 shorter sequences is better than training on 10 longer sequences if the total length is the same. In particular, I think "You would also expect scarier systems to have an easier time learning without overnarrowing from 100 big examples instead of 10,000 small examples." is not clearly true and very plausibly false.
  3. Using this dataset in a meaningful fashion requires making a priori unrelated breakthroughs, making it overly inaccessible. I think that your comment "I don't want to freeze into the dataset the weird limitations of our current technology, and make it be useful only for training dungeons that are weird the same way 2021 dungeons are weird," is thinking about this the wrong way. The goal should be to maximize the time that we can effectively use this dataset, not be content with the fact that one day it will be useful.
  4. This is a pilot for the real thing you're after, but the "pilot" is a multi-year million-dollar effort. That doesn't seem like a very well designed pilot to me.
Comment by StellaAthena on Visible Thoughts Project and Bounty Announcement · 2021-11-30T15:32:03.683Z · LW · GW

Hi! Co-author of the linked “exploration” here. I have some reservations about the exact request (left as a separate comment) but I’m very excited about this idea in general. I’ve been advocating for direct spending on AI research as a place with a huge ROI for alignment research for a while and it’s very exciting to see this happening.

I don’t have the time (or aptitude) to produce a really high quality dataset, but I (and EleutherAI in general) would be happy to help with training the models if that’s desired. We’d be happy to consult on model design or training set-up, or to simply train the models for you all. No compensation necessary, just excited to contribute to worthwhile alignment research.

Comment by StellaAthena on Visible Thoughts Project and Bounty Announcement · 2021-11-30T12:59:44.394Z · LW · GW

What is the purpose of requesting such extremely long submissions? This comes out to ~600 pages of text per submission, which is extremely far beyond anything that current technology could leverage. Current NLP systems are unable to reason about more than 2048 tokens at a time, and handle longer inputs by splitting them up. Even if we assume that great strides are made in long-range attention over the next year or two, it does not seem plausible to me to anticipate SOTA systems in the near future to be able to use this dataset to its fullest. There’s inherent value in a more diverse set of scenarios, given the strong propensity of language models to overfit on repeated data. While this isn’t strictly speaking talking about repeating data, I am under the strong impression that having more diverse short scripts is going to train a much better mode than less diverse long scripts, assuming that the short scripts are still at or beyond the maximum context length a language model can handle.

For the same reasons it is challenging to leverage, I think that this will also be very challenging to produce. I think that changing the request to 100 different 6 page (10 step) or 10 different 60 page (100 step) stories would be a) much easier to produce and b) much more likely to actually help train an AI. It also allows you to pear down the per-submission payouts, assuaging some concerns in the comments about the winner-take-all and adversarial nature of the competition. If you offer $20 per 10-step story for 1,000 stories it greatly reduces the chances that someone will end up spending a ton of effort but be unable to get it in on time for the reward.

To put the length of this in prospective, a feature length movie script is typically around 100-130 pages. The ask here is to write 1-2 novels, or 5-6 movie scripts. That’s a massive amount of writing, and not something anyone can complete quickly.

Comment by StellaAthena on Visible Thoughts Project and Bounty Announcement · 2021-11-30T12:44:12.734Z · LW · GW

Also, I'm unclear on what constitutes a "run"... roughly how long does the text have to be, in words, to have a chance at getting $20,000?

Using the stated length estimates per section, a single run would constitute approximately 600 pages of single spaced text. This is a lot of writing.

Comment by StellaAthena on Yudkowsky and Christiano discuss "Takeoff Speeds" · 2021-11-28T17:53:06.410Z · LW · GW

Interesting… I was busy and wasn’t able to watch the workshop. That’s good to know, thanks!

Comment by StellaAthena on Yudkowsky and Christiano discuss "Takeoff Speeds" · 2021-11-24T18:43:00.739Z · LW · GW

For Sanh et al. (2021), we were able to negotiate access to preliminary numbers from the BIG Bench project and run the T0 models on it. However the authors of Sanh et al. and the authors of BIG Bench are different groups of people.

Comment by StellaAthena on Yudkowsky and Christiano discuss "Takeoff Speeds" · 2021-11-24T18:41:11.425Z · LW · GW

What makes you say BIG Bench is a joint Google / OpenAI project? I'm a contributor to it and have seen no evidence of that.

Comment by StellaAthena on What exactly is GPT-3's base objective? · 2021-11-24T03:45:06.539Z · LW · GW

I think that 4 is confused when people talk about "the GPT-3 training data." If someone said "there are strings of words found in the GPT-3 training data that GPT-3 never saw" I would tell them that they don't know what the words in that sentence mean. When an AI researcher speaks of "the GPT-3 training data" they are talking about the data that GPT-3 actually saw. There's data that OpenAI collected which GPT-3 didn't see, but that's not what the words "the GPT-3 training data" refers to.

Comment by StellaAthena on What exactly is GPT-3's base objective? · 2021-11-12T13:12:38.801Z · LW · GW

Or is it "Predict the next word, supposing what you are reading is a random-with-the-following-weights sample from dataset D? [where D is the dataset used to train GPT-3]

This is the correct answer.

The problem with these last two answers is that they make it undefined how well GPT-3 performs on the base objective on any prompt that wasn't in D, which then rules out psuedo-alignment by definition.

This is correct, but non-problematic in my mind. If data wasn’t in the training dataset, then yes there is no fact of the matter as to what training signal GPT-3 received when training on it. We can talk about what training signal GPT-3 counterfactually would have received had it been trained on this data, but there is no answer to the question in the actual world.

Comment by StellaAthena on Discussion with Eliezer Yudkowsky on AGI interventions · 2021-11-11T20:17:24.042Z · LW · GW

My thinking is that prosaic alignment can also apply to non-super intelligent systems. If multimodal GPT-17 + RL = superintelligence, then whatever techniques are involved with aligning that system would probably apply to multimodal GPT-3 + RL, despite not being superintelligence. Superintelligence is not a prerequisite for being alignable.

Comment by StellaAthena on Discussion with Eliezer Yudkowsky on AGI interventions · 2021-11-11T19:21:05.518Z · LW · GW

If superintelligence is approximately multimodal GPT-17 plus reinforcement learning, then understanding how GPT-3-scale algorithms function is exceptionally important to understanding super-intelligence.

Also, if superintelligence doesn’t happen then prosaic alignment is the only kind of alignment.

Comment by StellaAthena on Discussion with Eliezer Yudkowsky on AGI interventions · 2021-11-11T19:18:11.723Z · LW · GW

Strong upvote.

My original exposure to LW drove me away in large part because issues you describe. I would also add (at least circa 2010) you needed to have a near-deistic belief in the anti-messianic emergence of some AGI so powerful that it can barely be described in terms of human notions of “intelligence.”

Comment by StellaAthena on [deleted post] 2021-11-11T14:50:40.576Z

Yes, new information absolutely exists. Thinking about new information in some kind of absolute sense (“has anyone else ever had this thought?”) is the wrong approach in my mind. What we are really interested in is new information relative to an established set of knowledge. Information theory tells us that there’s a maximum amount of information that can be encoded in k bits, and (at least as long as our system is significantly smaller than the universe) so we can find information that’s not encoded in the existing system.

Whether GPT-3 is likely to succeed at doing this is a statistical and empirical question, but at a minimum the answer to the title question is a resounding “yes.”