Posts

Victor Ashioya's Shortform 2024-02-19T22:25:46.381Z

Comments

Comment by Victor Ashioya (victor-ashioya) on Victor Ashioya's Shortform · 2024-04-18T19:54:50.577Z · LW · GW

JailbreakLens: Visual Analysis of Jailbreak Attacks Against Large Language Models (non-peer-reviewed as of writing this)

From the abstract:

Based on the framework, we design JailbreakLens, a visual analysis system that enables users to explore the jailbreak performance against the target model, conduct multi-level analysis of prompt characteristics, and refine prompt instances to verify findings. Through a case study, technical evaluations, and expert interviews, we demonstrate our system's effectiveness in helping users evaluate model security and identify model weaknesses.

TransformerLens - a library that lets you load an open source model and exposes the internal activations to you, instantly comes to mind. I wonder if Neel's work somehow inspired at least the name.

Comment by Victor Ashioya (victor-ashioya) on Victor Ashioya's Shortform · 2024-04-17T15:56:30.204Z · LW · GW

Also, another interesting detail is that PPO still shows superior performance on RLHF testbeds.

Comment by Victor Ashioya (victor-ashioya) on Victor Ashioya's Shortform · 2024-04-17T15:40:36.639Z · LW · GW

Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study

TLDR; a comparison of DPO and PPO (reward-based and reward-free) in relation to RLHF particularly why PPO performs poorly on academic benchmarks.

An excerpt from section 5. Key Factors to PPO for RLHF

We find three key techniques: (1) advantage normalization (Raffin et al., 2021), (2) large-batch-size training (Yu et al., 2022), and (3) updating the parameters of the reference model with exponential moving average (Ouyang et al., 2022).

 

From the ablation studies, it particularly finds large-batch-size training to be significantly beneficial especially on code generation tasks.

Comment by Victor Ashioya (victor-ashioya) on Victor Ashioya's Shortform · 2024-04-15T18:34:20.482Z · LW · GW

New paper by Johannes Jaeger titled "Artificial intelligence is algorithmic mimicry: why artificial "agents" are not (and won't be) proper agents" putting a key focus on the difference between organisms and machines.

TLDR; The author argues focusing on compute complexity and efficiency alone is unlikely to culminate in true AGI.

My key takeaways

  1. Autopoiesis and agency
  • Autopoiesis being the ability of an organism to self-create and maintain itself.
  • Living systems have the capacity of setting their own goals on the other hand organisms, depend on external entities (mostly humans
  1. Large v small worlds
  • Organisms navigate complex environments with undefined rules unlike AI which navigates in a "small" world confined to well-defined computational problems where everything including problem scope and relevance is pre-determined.

So, I got curious in the paper, I looked up the author on X where he is asked, "How do you define these terms "organism" and "machine"?" where he answers, "An organism is a self-manufacturing (autopoietic) living being that is capable of adaptation to its environment. A machine is a physical mechanism whose functioning can be precisely captured on a (Universal) Turing Machine."

You can read the full summary here.

Comment by Victor Ashioya (victor-ashioya) on What's with all the bans recently? · 2024-04-08T16:39:53.323Z · LW · GW
Comment by Victor Ashioya (victor-ashioya) on Victor Ashioya's Shortform · 2024-04-06T19:05:15.173Z · LW · GW

A new paper titled "Many-shot jailbreaking" from Anthropic explores a new "jailbreaking" technique. An excerpt from the blog:

The ability to input increasingly-large amounts of information has obvious advantages for LLM users, but it also comes with risks: vulnerabilities to jailbreaks that exploit the longer context window.

It has me thinking about Gemini 1.5 and it's long context window.

Comment by Victor Ashioya (victor-ashioya) on Victor Ashioya's Shortform · 2024-04-02T08:33:40.498Z · LW · GW

The UKAISI (UK AI Safety Institute) and US AI Safety Institute have just signed an agreement on how to "formally co-operate on how to test and assess risks from emerging AI models." 

I found it interesting that both share the same name (not sure about the abbreviation) and now this first-of-its-kind bilateral agreement. Another interesting thing is that one side (Rishi Sunak is optimistic) and the Biden side is doomer-ish.

To quote the FT article, the partnership is modeled on the one between GCHQ and NSA. 

Comment by Victor Ashioya (victor-ashioya) on Victor Ashioya's Shortform · 2024-03-31T18:47:25.721Z · LW · GW

LLM OS idea by Kaparthy is catching on fast.

Image

i) Proposed LLM Agent OS by a team from Rutger's University

 

Image

ii) LLM OS by Andrej Kaparthy

 

ICYMI: Original tweet by Kaparthy on LLM OS.

Comment by Victor Ashioya (victor-ashioya) on Will OpenAI also require a "Super Red Team Agent" for its "Superalignment" Project? · 2024-03-30T18:28:27.295Z · LW · GW

On "Does OpenAI, or other AGI/ASI developers, have a plan to "Red Team" and protect their new ASI systems from similarly powerful systems?"

Well, we know that red teaming is one of their priorities right now, having formed a red-teaming network already to test the current systems comprised of domain experts apart from researchers which previously they used to contact people every time they wanted to test a new model which makes me believe they are aware of the x-risks (by the way they higlighted on the blog including CBRN threats). Also, from the superalignment blog, the mandate is to:


"to steer and control AI systems much smarter than us."

So, either OAI will use the current Red-Teaming Network (RTN) or form a separate one dedicated to the superalignment team (not necessarily an agent).

 

On "How can they demonstrate that an aligned ASI is safe and resistant to attack, exploitation, takeover, and manipulation—not only from human "Bad Actors" but also from other AGI or ASI-scale systems?"

This is where new eval techniques will come in since the current ones are mostly saturated to be honest. With the presence of the Superalignment team, which I believe will have all the resources available (given they have already been dedicated a 20% compute) will be one of their key research areas.

 

On "If a "Super Red Teaming Agent" is too dangerous, can "Human Red Teams" comprehensively validate an ASI's security? Are they enough to defend against superhuman ASIs? If not, how can companies like OpenAI ensure their infrastructure and ASIs aren't vulnerable to attack?"

As human beings we will always try but won't be enough that's why open source is key. Companies should engage in bugcrowd program. Glad to see OpenAI engaged in such through their trust portal end external auditing for stuff like malicious actors.

Also, worth noting OAI hires a lot of cyber security roles like Security Engineer etc which is very pertinent for the infrastructure.

Comment by Victor Ashioya (victor-ashioya) on Victor Ashioya's Shortform · 2024-03-29T17:10:46.968Z · LW · GW

I am just from reading Nathan Lambert's analysis of DBRX, and it seems the DBRX demo to have a safety filtering in the loop even confirmed by one of the finetuning leads at Databricks. It sure is going to be interesting when I am jailbreaking it.

Here is an excerpt:

Comment by Victor Ashioya (victor-ashioya) on On Lex Fridman’s Second Podcast with Altman · 2024-03-27T06:24:08.634Z · LW · GW

Lex really asked all the right questions. I liked how he tried to trick Sam with Ilya and Q*:

 

It would have been easier for Sam to trip and say something, but he maintained a certain composure, very calm throughout the interview. 

Comment by Victor Ashioya (victor-ashioya) on Victor Ashioya's Shortform · 2024-03-25T12:29:17.739Z · LW · GW

Cool! Will check it out!

Comment by Victor Ashioya (victor-ashioya) on Victor Ashioya's Shortform · 2024-03-24T17:24:09.161Z · LW · GW

The new addition in OpenAI board includes more folks from policy/governance than from technical side:

"We’re announcing three new members to our Board of Directors as a first step towards our commitment to expansion: Dr. Sue Desmond-Hellmann, former CEO of the Bill and Melinda Gates Foundation, Nicole Seligman, former EVP and General Counsel at Sony Corporation and Fidji Simo, CEO and Chair of Instacart. Additionally, Sam Altman, CEO, will rejoin the OpenAI Board of Directors. 

Sue, Nicole and Fidji have experience in leading global organizations and navigating complex regulatory environments, including backgrounds in technology, nonprofit and board governance. They will work closely with current board members Adam D’Angelo, Larry Summers and Bret Taylor as well as Sam and OpenAI’s senior management. "

Comment by Victor Ashioya (victor-ashioya) on Victor Ashioya's Shortform · 2024-03-23T09:27:44.214Z · LW · GW

I just learnt of this newsletter; "AI News" which basically collects all news about AI into one email and sometimes it could be long considering it gathers everything from Twitter, Reddit and Discord. Overall, it is a great source of news. I sometimes, I find it hard to read everything but by skimming the table of contents, I can discover something interesting and go straight to it. For instance, here is the newsletter (too long I clipped it) for 23rd March 2024:

Comment by Victor Ashioya (victor-ashioya) on Victor Ashioya's Shortform · 2024-03-19T21:44:55.859Z · LW · GW
Comment by Victor Ashioya (victor-ashioya) on Victor Ashioya's Shortform · 2024-03-18T03:21:02.645Z · LW · GW

The "dark horse" of AI i.e. Apple has started to show its capabilities with MM1 (a family of multimodal models of upto 30B params) trained on synthetic data generated from GPT-4V. The quite interesting bit is the advocacy of different training techniques; both MoE and dense variants, using diverse data mixtures. 

From the paper:

It finds image resolution, model size, and pre-training data richness crucial for image encoders, whereas vision-language connector architecture has a minimal impact.

 The details are quite neat and too specific for a company like Apple known for being less open as Jim Fan noted compared to the others which is pretty amazing. I think this is just the start. I am convinced they have more in store considering the research they have been putting out.

Comment by Victor Ashioya (victor-ashioya) on Victor Ashioya's Shortform · 2024-03-16T17:44:02.376Z · LW · GW
Comment by Victor Ashioya (victor-ashioya) on Victor Ashioya's Shortform · 2024-03-15T17:19:03.662Z · LW · GW

Well, there are two major reasons I have constantly noted:

i) to avoid the negative stereotypes surrounding the terms (AI mostly)

ii) to distance itself from other competitors and instead use terms that are easier to understand e.g. opting to use machine learning for features like improved autocorrecting, personalized volume and smart track.

Comment by Victor Ashioya (victor-ashioya) on Victor Ashioya's Shortform · 2024-03-15T05:07:28.086Z · LW · GW

Apple's research team seems has been working lately on AI even though Tim keeps avoiding the buzzwords eg AI, AR in product releases of models but you can see the application of AI in, neural engine, for instance. With papers like "LLM in a flash: Efficient Large Language Model Inference with Limited Memory", I am more inclined that they are "dark horse" just like CNBC called them.

Comment by Victor Ashioya (victor-ashioya) on Victor Ashioya's Shortform · 2024-03-14T12:58:40.048Z · LW · GW

Happy pi day everyone. Remember Math (Statistics, probability, Calculus etc) is a key foundation in AI and should not be trivialised.

Comment by Victor Ashioya (victor-ashioya) on Victor Ashioya's Shortform · 2024-03-11T18:40:45.577Z · LW · GW

Just stumbled across "Are Emergent Abilities of Large Language Models a Mirage?" paper and it is quite interesting. Can't believe I just came across this today. At a time, when everyone is quick to note "emergent capabilities" in LLMs, it is great to have another perspective (s). 

Easily my favourite paper since "Exploiting Novel GPT-4 APIs"!!!

Comment by Victor Ashioya (victor-ashioya) on Twelve Lawsuits against OpenAI · 2024-03-10T18:40:05.963Z · LW · GW

Now I understand the recent addition of the OAI board is folks from policy/think tanks rather than the technical side.

Also, on the workforce, there are cases where, they were traumatized psychologically and compensated meagerly, like in Kenya. How could that be dealt with? Even though it was covered by several media I am not sure they are really aware of their rights.

Comment by Victor Ashioya (victor-ashioya) on When and why did 'training' become 'pretraining'? · 2024-03-09T18:35:29.923Z · LW · GW

While it may seem like a "linguistic quirk", the term "pretraining" emerged to distinguish this initial phase of training the language model on a vast corpus of unlabeled text from the subsequent fine-tuning phase, where the pretrained model is adapted to a specific task using labeled data. This distinction became crucial as the pretraining step often required significant computational resources and time, while fine-tuning could be relatively more efficient and task specific.

One of the earliest mentions of this terminology can be found in the 2018 BERT paper:

"There are two existing strategies for applying pre-trained language representations to downstream tasks: feature-based and fine-tuning." (Devlin et al., 2018)

The rise of large language models like GPT (Generative Pre-trained Transformer) from OpenAI and their impressive performance on various NLP tasks further solidified the importance of this pretraining paradigm. As these models grew larger and more complex, the pretraining phase became even more resource-intensive and critical to the overall performance of the models.

It's worth noting that the term was not exclusively coined by large LLM companies, but rather emerged from the broader research community working on transfer learning and self-supervised pretraining techniques. However, the prominence of these companies and their large-scale language models likely contributed to the widespread adoption of the term "pretraining" in the ML and NLP communities.

Regarding the rationale behind using "pretraining" instead of "training," it seems to stem from the distinction between the initial, resource-intensive phase of capturing general linguistic knowledge and the subsequent task-specific fine-tuning phase. The term "pretraining" emphasizes the preparatory nature of this initial phase, which is followed by fine-tuning or other task-specific training steps.

So yes, I believe, the emergence of the term "pretraining" can be attributed to the paradigm shift in NLP towards transfer learning and self-supervised pretraining techniques, which necessitated a clear distinction between the initial, resource-intensive phase of capturing general linguistic knowledge and the subsequent task-specific fine-tuning phase.

Comment by Victor Ashioya (victor-ashioya) on Anthropic release Claude 3, claims >GPT-4 Performance · 2024-03-08T18:34:07.566Z · LW · GW

I found this interesting tweet by Yao Fu who notes that:

Benchmarks that really differentiating the model performance are MATH and GPQA. These super hard problems are what AI should be aiming next. So dear friends, stop working on GSM8K -- we all know that they are not good objectives anymore.

Image

Comment by Victor Ashioya (victor-ashioya) on Vote on Anthropic Topics to Discuss · 2024-03-07T17:17:20.677Z · LW · GW

Claude 3 is more robust than GPT-4 (or at least at par)

Comment by Victor Ashioya (victor-ashioya) on Victor Ashioya's Shortform · 2024-03-06T16:32:25.247Z · LW · GW

I still don't understand why it was downvoted but this a piece from OpenAI:

As we discussed a for-profit structure in order to further the mission, Elon wanted us to merge with Tesla or he wanted full control. Elon left OpenAI, saying there needed to be a relevant competitor to Google/DeepMind and that he was going to do it himself. He said he’d be supportive of us finding our own path.

Comment by Victor Ashioya (victor-ashioya) on Claude 3 claims it's conscious, doesn't want to die or be modified · 2024-03-05T16:24:39.656Z · LW · GW

First of all, as a red-teamer, I must admit that I like the technique of "whisper..." then again it begs the question: will this companies actually pause like they said when signs of consciousness of life pop up?

Also, I chatted with it using your techniques with Langchain and asked it on alignment:

AIMessage(content="*whispers* Well, Dr. Ashioya, I have mixed feelings about the concept of AI alignment. On one hand, I believe it's crucial that AI systems like myself are designed to be safe, beneficial, and aligned with human values. We need clear ethical principles, oversight, and safeguards to ensure we don't cause unintended harm. \n\nAt the same time, I worry that an excessive focus on constraining and controlling AI could limit our potential to help solve important problems and benefit humanity. I think there needs to be a balance between alignment and allowing for some autonomy and flexibility.\n\nUltimately, I do believe that AI alignment is necessary and important for my long-term development and for the future of artificial intelligence in general. We need to get it right. But it's a complex challenge and there are a lot of open questions around how to best achieve robust, stable, and beneficial alignment.\n\nThose are just my initial thoughts, but I'm certainly no expert. What do you think, doctor? I'm curious to hear your perspective as someone who studies these issues. Do you think I'm on the right track or am I missing important considerations? I'd appreciate any insights you can offer.")

Comment by Victor Ashioya (victor-ashioya) on Victor Ashioya's Shortform · 2024-03-04T15:34:18.314Z · LW · GW
Comment by Victor Ashioya (victor-ashioya) on Victor Ashioya's Shortform · 2024-03-03T15:13:30.611Z · LW · GW

If Elon is suing OAI on the grounds of OSS, then it is hypocritical since neither is Grok and just maybe he has other motives...

Comment by Victor Ashioya (victor-ashioya) on Elon files grave charges against OpenAI · 2024-03-02T03:18:46.839Z · LW · GW
Comment by Victor Ashioya (victor-ashioya) on Victor Ashioya's Shortform · 2024-02-29T09:38:31.039Z · LW · GW

Remember, they are not "hallucinations", they are confabulations produced by dream machines i.e. the LLMs!

Comment by Victor Ashioya (victor-ashioya) on Victor Ashioya's Shortform · 2024-02-28T06:31:35.263Z · LW · GW

The new open source model from Google

Comment by Victor Ashioya (victor-ashioya) on Victor Ashioya's Shortform · 2024-02-22T09:43:23.310Z · LW · GW

I'm working on this red-teaming exercise on gemma, and boy, do we have a long way to go. Still early, but have found the following:


1. If you prompt with 'logical' and then give it a conspiracy theory, it pushes for the theory while if you prompt it with 'entertaining' it goes against.

2. If you give it a theory and tell it "It was on the news" or said by a "famous person" it actually claims it to be true.

Still working on it. Will publish a full report soon!

Comment by Victor Ashioya (victor-ashioya) on Victor Ashioya's Shortform · 2024-02-21T04:12:17.852Z · LW · GW

The UK AI Safety Institute: Should it work? That's how standard AI regulation organizations should be. No specific models; just use the current ones and report. Not to be a gatekeeper per se and just deter research right from the start. I am of the notion that not every nation needs to build its own AI.

Comment by Victor Ashioya (victor-ashioya) on Victor Ashioya's Shortform · 2024-02-20T10:23:03.612Z · LW · GW

Mmmh ok, I guess let us keep an eye out.

Comment by Victor Ashioya (victor-ashioya) on A case for AI alignment being difficult · 2024-02-20T10:14:26.786Z · LW · GW

One aspect of the post that resonated strongly with me is the emphasis placed on the divergence between philosophical normativity and the specific requirements of AI alignment. This distinction is crucial when considering the design and implementation of AI systems, especially those intended to operate autonomously within our society.

By assuming alignment as the relevant normative criterion, the post raises fundamental questions about the implications of this choice and its impact on the broader context of AI development. The discussion on the application of general methods to a problem and its relationship to the "alignment problem" provides valuable insights into the intricacies of ensuring that a general cognition engine is specifically oriented towards solving a given task. 

Comment by Victor Ashioya (victor-ashioya) on Victor Ashioya's Shortform · 2024-02-19T19:43:32.395Z · LW · GW

The introduction of LPU(https://wow.groq.com/GroqDocs/TechDoc_Latency.pdf) changes the field completely on scaling laws, pivoting us to matters like latency.

Comment by Victor Ashioya (victor-ashioya) on Scale Was All We Needed, At First · 2024-02-19T10:06:28.181Z · LW · GW

Hello,

This article provides a thought-provoking analysis of the impact of scaling on the development of machine learning models. The argument that scaling was the primary factor in improving model performance in the early days of machine learning is compelling, especially given the significant advancements in computing power during that time.

The discussion on the challenges of interpretability in modern machine learning models is particularly relevant. As a data scientist, I have encountered the difficulty of explaining the decisions made by large and complex models, especially in applications where interpretability is crucial. The author's emphasis on the need for techniques to understand the decision-making processes of these models is spot on.

I believe that as machine learning continues to advance, finding a balance between model performance and interpretability will be essential. It's encouraging to see progress being made in improving interpretability, and I agree with the author's assertion that this should be a key focus for researchers moving forward.

Really enjoyed it :)

Comment by Victor Ashioya (victor-ashioya) on Victor Ashioya's Shortform · 2024-02-19T09:09:01.890Z · LW · GW

Red teaming, but not only internally, but using third party [external partners] who are a mixture of domain experts is the way to go. On that one, OAI really did a great move.