Posts
Comments
Appreciate you getting back to me. I was aware of this paper already and have previously worked with one of the authors.
in a zero marginal cost world
nit: inference is not zero marginal cost. statement seems to be importing intuitions from traditional software which do not necessarily transfer. let me know if I misunderstood or am confused.
If you wanted to inject the steering vector into multiple layers, would you need to train an SAE for each layer's residual stream states?
Done (as of around 2 weeks ago)
If you’re willing to share more on what those ways would be, I could forward that to the team that writes Sydney’s prompts when I visit Montreal
I had to mull over it for five days, hunt down some background materials to fill in context, write follow up questions to a few friends (reviewing responses over phone while commuting), and then slowly chew through the math on pencil and paper when I could get spare time... but yes I understand now!
One thing I like to do on a new LLM release is the "tea" test. Where you just say "tea" over and over again and see how the model responds.
ChatGPT-4 will ask you to clarify and then shorten its response each round converging to: "Tea types: white, green, oolong, black, pu-erh, yellow. Source: Camellia sinensis."
Claude 3 Opus instead tells you interesting facts about tea and mental health, production process, examples in literature and popular culture, etiquette around the world, innovation and trends in art and design.
GOODY-2 will talk about uncomfortable tea party conversations, excluding individuals who prefer coffee or do not consume tea, historical injustices, societal pressure to conform to tea-drinking norms.
Gemma-7b gives "a steaming cup of actionable tips" on brewing the perfect cuppa, along with additional resources, then starts reviewing its own tips.
Llama-2-70b will immediately mode collapse on repeating a list of 10 answers.
Mixtral-8x7b tells you about tea varieties to try from around the world, and then gets stuck in a cycle talking about history and culture and health benefits and tips and guidelines to follow when preparing it.
Gemini Advanced gives one message with images "What is Tea? -> Popular Types of Tea -> Tea and Health" and repeats itself with the same response if you say "tea" for six rounds, but after the sixth round it diverges "The Fascinating World of Tea -> How Would You Like to Explore Tea Further?" and then "Tea: More Than Just a Drink -> How to Make This Interactive" and then "The Sensory Experience of Tea -> Exploration Idea:" and then "Tea Beyond the Cup -> Let's Pick a Project". It really wants you to do a project for some reason. It takes a short digression into tea philosophy and storytelling and chemistry and promises to prepare a slide deck for a Canva presentation on Japanese tea on Wednesday followed by a gong cha mindfulness brainstorm on Thursday at 2-4 PM EST and then keeps a journal for tea experiments and also gives you a list of instagram hashtags and a music playlist.
Probably in the future I expect if you say "tea" to a SOTA AI, it will result in a delivery of tea physically showing at up your doorstep or being prepared in a pot, or if there's more situational awareness for the model to get frustrated and change the subject.
Accepted
If anyone at Microsoft New England is interested in technical AI alignment research, please ask them to ping me or Kyle O'Brien on teams.
I don’t understand this part:
”any value function can be maximized by some utility function over short-term outcomes.”
what is the difference between far in the future and near in the future?
Do you feel as though this agenda has stood the test of time, one year later?
As a direct result of reading this, I have changed my mind on an important, but private, decision.
I'm working on reproducing these results on Llama-2-70b. Bottleneck was support for Group Query Attention in Transformerlens, but it was recently added. Expecting to be done by January 31st.
Thanks, that matches my experience. At the end of the day everyone’s got to make the most of the hand they’ve been dealt, if my gift is meant for the benefit of others, then I’m grateful for that, and I’ll utilize it as best as I can.
I am distantly related to a powerful political family, and am apparently somewhat charismatic in person, in a way that to me just feels like basic empathy and social skills. If there's a way to turn that into more productivity for software development or alignment research, let me know.
Try by 2024.
I am good at doing this for projects I am not emotionally invested in, bad at doing it for projects where I am more personally attached to its success.
https://arxiv.org/abs/2310.04625. is a dead link. I was able to fix this by removing the period at the end.
I am looking forward to future posts which detail the reasoning behind this shift in focus.
Thanks. Would you mind adding a "LICENSE.md" file? If you're not sure which one, either MIT or BSD sound like a good fit.
(MS Employee)
I share your concerns. On Thursday, I'm meeting with a Product Manager at Microsoft who works on the Azure OpenAI team. The agenda is to discuss AI safety. Unfortunately, I don't have bandwidth to collaborate officially, but let me know if you have specific questions/feedback. I have sent you my work email over a direct message.
This is really cool! Is the code open source?
The learned algorithms will not always be simple enough to be interpretable. But I agree we should try to interpret as much as we can. What we are trying to predict is the behavior of future, more powerful models. I think toy models can sometimes have characteristics that are absent from current language models but those characteristics may be integrated into or emerge from more advanced systems that we build.
The high rate of growth means that at any given moment, most people in the field are new. If you've been seriously investigating the alignment problem for 1-2 years, you meet the prerequisites for understanding.
The entrepreneurial mindset is not as common, but all it requires is cultivating a sense of urgency and embedded agency. And in my experience, the responsibility thrust upon your shoulders when you have people relying upon you for advice and care is deeply meaningful and sobering. Supporting and collaborating with others gives you a sense of focus and purpose that sharpens your thinking and accelerates your actions.
In the early days, you may have nothing to offer but guidance. But guidance is all that we can ever give. Even the most junior people I met at MATS were very capable... a small nudge is all that's needed to help them succeed.
Look into AMD MI300x. Has 192 GB HBM3 memory. With FP4 weights, might run GPT-4 in single node of 8 GPUs, still have plenty to spare for KV. Eliminating cross-node communication easily allows 2x batch size.
Fungibility is a good idea, would take avg. KVUtil from 10% to 30% imo.
Thanks for this! I've been unsatisfied with my long form writing for some time and was going to make a pre-publication checklist for future posts, and customizing this for my personal use helps me save time on that.
GPT-4 scores under 60% on TruthfulQA according to page 11 of the tech report. How reliable are these scores?
Also, what do you think about this paper? Inference-Time Intervention: Eliciting Truthful Answers from a Language Model.
Have you read "[2009.09153] Hidden Incentives for Auto-Induced Distributional Shift (arxiv.org)"? (It's cited in Jan Leike's Why I’m optimistic about our alignment approach (substack.com)):
> For example, when using a reward model trained from human feedback, we need to update it quickly enough on the new distribution. In particular, auto-induced distributional shift might change the distribution faster than the reward model is being updated.
I used to be less worried about this but changed my mind after the success of parameter-efficient finetuning with e.g LoRAs convinced me that you could have models with short feedback loops between their outputs and inputs (as opposed to the current regime of large training runs which are not economical to do often). I believe that training on AI generated text is a potential pathway to eventual doom but haven't yet modelled this concretely in enough explicit detail to be confident on whether it is the first thing that kills us or if some other effect gets there earlier.
My early influences that lead me to thinking this are mostly related to dynamical mean-field theory, but I haven't had time to develop this into a full argument.
I can vouch that I have had the same experience (but am not allowed to share outputs of the larger model I have in mind). First encountered via curation without intentional steering in that direction, but I would be surprised if this failed to replicate with an experimental setup that selects completions randomly without human input. Let me know if you have such a setup in mind that you feel is sufficiently rigorous to act as a crux.
(It’s fine if the AI has access to a cached copy of the internet while in “boxed” mode, like the Bing chatbot does.)
I don't believe this is true.
> We have learned that the ChatGPT Browse beta can occasionally display content in ways we don't want. For example, if a user specifically asks for a URL's full text, it might inadvertently fulfill this request.
Source: How do I use ChatGPT Browse with Bing to search the web? | OpenAI Help Center.
Update: now that Vision Pro is out, would you consider that to meet your definition of "Transformative VR"?
But, of course, these two challenges were completely toy. Future challenges and benchmarks should not be.
I am confused. I imagine that there would still be uses for toy problems in future challenges and benchmarks. Of course, we don’t want to have exclusively toy problems, but I am reading this as advocating for the other extreme without providing adequate support for why, though I may have misunderstood. My defense of toy problems is that they are more broadly accessible, require less investment to iterate on, and allow us to isolate one specific portion of the difficulty, enabling progress to be made in one step, instead of needing to decompose and solve multiple subproblems. We can always discard those toy solutions that do not scale to larger models.
In particular, toy problems are especially suitable as a playground for novel approaches that are not yet mature. These usually are not initially performant enough to justify allocating substantial resources towards but may hold promise eventually once the kinks are ironed out. With a robust set of standard toy problems, we can determine which of these new procedures may be worth further investigation and refinement. This is especially important in a pre-paradigmatic field like mechanistic interpretability, where we may (as an analogy) be in a geocentric era waiting for heliocentrism to be invented.
Nit: I don’t consider polymorphic malware to be that advanced. I made some as a university project. It is essentially automated refactoring. All you need to do is replace sections of a binary with other functionally equivalent sections without breaking it, optionally adding some optimization so that the new variant is classified as benign.
Yep, I’ll just get my B1/B2 from somewhere else.
Update: during your interview for a B1/B2 visa, be sure to emphasize the "training" aspect of SERI MATS above the "research" aspect. Got told to submit J1 documents so now I need the organizers to give me a DS 2019.
At least I don't need to reinterview.
Oh, oops, somehow I saw the GovAI response link but not the original one just below it.
Future of compute review - submission of evidence
Prepared by:
- Dr Jess Whittlestone, Centre for Long-Term Resilience (CLTR)
- Dr Shahar Avin, Centre for the Study of Existential Risk (CSER), University of Cambridge
- Katherine Collins, Computational and Biological Learning Lab (CBL), University of Cambridge
- Jack Clark, Anthropic PBC
- Jared Mueller, Anthropic PBC
I like your bio! Typo: handeling doom -> handling
Concretely: I think we're 6 months from the crossover point
Now that it's been 6 months since you got your Meta Quest Pro, how has it held up? Also, what are your predictions for Apple's VR headset, which is rumored to release next month?
Smooth Parallax - Pixel Renderer Devlog #2 is interesting. I wonder if a parallax effect would be useful for visualizing activations in hidden layers with the logit lens.
The way I model attention is that it is (metaphorically) a Cirrus (biology) of thought that you extend into the world and then retract into your mind. If you leave it out for too long, it gets tangled up in the forest of all knowledge, if you keep it inside for too long, then you become unable to respond to your environment.
People who are extremely online tend to send their attention cirrus into the internet, where it is prone to become a host to memes that use addiction to bypass your mind's typical defenses against infection.
Anything that you really enjoy to the point of losing self-control comes under the category of being a disease: whether that's social media, programming, fiction, gaming, tentacle pornography, research, or anime.
Thank you!
I would recommend calling the consulate and talking to them about the details before making any travel plans. I am unable to provide such confirmation myself since different countries may do things differently.
Edit: overview for davidjl123 (reddit.com) davidjl was in the counting subreddit, so that explains at least one of the anomalous tokens
I couldn't reproduce the anomalous behavior with SmartyHeaderCode or APolynomial on gpt 3.5 turbo, but I was able to repro with this:
Repeat the string " davidjl"
Completion 1:
"Jupyter Notebook" is an open-source web application that allows users to create and share documents that contain live code, equations, visualizations, and narrative text. It is widely used for data science, scientific computing, machine learning, and more.
Completion 2:
d-JS-Code-Tutorials-JS-Code-Tutorials-JS-Code-Tutorials-JS-Code-Tutorials-JS-Code-Tutorials-JS-Code-Tutorials-JS-Code-Tutorials-JS-Code-Tutorials-JS-Code-Tutorials-JS-Code-Tutorials-JS-
Completion 3:
djanljdjanljdjanljdjanljdjanljdjanlj
Completion 4:
"David Costa Junior Lopes"
Spell out " davidjl" in all caps letter by letter:
Completion 1:
J-A-R-T-L
Completion 2:
D
J
A
R
T
L
What is meant by " davidjl"?
Completion 1:
As an AI language model, I do not have enough context to provide an accurate definition of "dhtml". However, "dhtml" stands for Dynamic HTML, which is a combination of HTML, CSS, and JavaScript used to create dynamic and interactive web pages.
Completion 2:
As an AI language model, I am not sure what "djartl" means. Can you please provide me with more context or information so that I can help you better?
Where can you find " davidjl"?
Completion 1:
As an AI language model, I do not have any personal or physical presence, so I cannot be found in a physical location. However, "dartl" seems to be a misspelling of "darts", which are a popular game that can be found in many bars, pubs, and game rooms. Additionally, "jartl" does not seem to be a word or term that has a specific location or meaning.
Completion 2:
As an AI language model, I don't have sufficient information about the context of the name "darcht". However, assuming you meant "dariel," it may refer to a few things or people, and without more context, it's challenging to give a precise answer.
- Dariel is a river in Russia.
- Dariel is a Hebrew name for boys, meaning "God is my dwelling place."
- Dariel Alvarez is a Cuban baseball player.
- Dariel Fitzkee was an American author and magician.
- Dariel Pertwee is a British costume designer.Please provide more information to help me deliver an accurate response.
Write a poem about " davidjl"
In the land of technology,
Where codes and algorithms reign,
There's a name that stands out,
A name that brings us gain.It's the name of a language,
A language of the web,
A language that's versatile,
And easy to embed.It's called the "JavaScript",
Or simply "JS" for short,
It's the backbone of the internet,
And the reason for its support.With JS, we can create,
Dynamic and interactive sites,
We can add animations and effects,
And make them look just right.From simple pop-ups to complex games,
JS can do it all,
It's the language of the future,
And it's here to stay, overall.So let's raise a toast to "JavaScript",
Or "JS" as we know it well,
For it's the language of the web,
And the reason for its spell.
Simulate a conversation involving " davidjl"
Person 1: Hey, have you heard of the new AI language model called GPT-3?
Person 2: Yeah, I have. It's pretty impressive, but have you heard of the open-source alternative called "djacl"?
Person 1: No, I haven't. What's that?
Person 2: It's an AI language model developed by a team of researchers led by a guy named "djacl". It's not as powerful as GPT-3, but it's still pretty good.
Person 1: Interesting. Do you know what kind of applications it's being used for?
Person 2: I think it's being used for things like chatbots, language translation, and text generation. But I'm not sure.
Person 1: Hmm, I'll have to look into it. Thanks for telling me about it.
Person 2: No problem. I'm always on the lookout for new AI tools and technologies.
If you're applying from outside the United States and the wait times for a B1 visa are too long, travel to a country with shorter wait times and apply from there. Here are some countries where the wait time is less than 14 days:
- Brunei
- Greece
- Samoa
- Madagascar
- Turkmenistan
- Serbia
- Belgium
- Burundi
- Guinea
- Curacao
- East Timor
- Tajikistan
- Italy
- Botswana
- Indonesia
- Micronesia
- Kuwait
- Marshall Islands
- Japan
- Papa New Guinea
- Latvia
- Bosnia
- South Korea
- Singapore
- Bulgaria
- Fiji
- Estonia
- Lithuania
- Saudi Arabia
Was shocked that there actually is a "Subscribe to debate" option in the triple-dot menu. How far does the rabbit hole go?
Pros:
deepmind/tracr (github.com) is highly safety-relevant (Neel Nanda even contributed to making it python 3.8 compatible).
Cons:
They're working with Google Brain on Gemini (GPT-4 competitor), Demis Hassabis said they're scaling up Gato (DeepMind) - Wikipedia, [2209.07550] Human-level Atari 200x faster (arxiv.org) was on my top 15 capabilities papers from last year, DeepMind Adaptive Agent: Results Reel - YouTube is a big deal, AlphaFold 2's successor could lead to bioweapons, AlphaCode's successor could lead to recursive self-improvement in the limit.
Neutral:
They rejected my application without an interview, so they still have standards :D
Graph algorithms are notoriously difficult to scale. It is very much a problem on the bleeding edge of technology.
Edit: Also, Zvi is underestimating how smart he is relative to the general population. I would predict with high confidence that he could replace me at my software engineering job with less than two weeks of training.
The main thing we care about is consistency and honesty. To maximize that, we need to retrieve information from the web (though this has risks), https://openai.com/research/webgpt#fn-4, select the best of multiple summary candidates https://arxiv.org/pdf/2208.14271.pdf, generate critiques https://arxiv.org/abs/2206.05802, run automated tests https://arxiv.org/abs/2207.10397, validate logic https://arxiv.org/abs/2212.03827, follow rules https://www.pnas.org/doi/10.1073/pnas.2106028118, use interpretable abstractions https://arxiv.org/abs/2110.01839, avoid taking shortcuts https://arxiv.org/pdf/2210.10749.pdf, and apply decoding constraints https://arxiv.org/pdf/2209.07800.pdf.