tylerjohnston's Shortform

tylerjohnston

tylerjohnston's Shortform

post by tylerjohnston · 2024-07-04T16:22:10.760Z · LW · GW · 19 comments

19 comments

19 comments

Comments sorted by top scores.

comment by tylerjohnston · 2024-08-18T21:09:05.769Z · LW(p) · GW(p)

A (somewhat minor) example of hypocrisy from OpenAI that I find frustrating.

For context: I run an automated system that checks for quiet/unannounced updates to AI companies' public web content including safety policies, model documentation, acceptable use policies, etc. I also share some findings from this on Twitter.

Part of why I think this is useful is that OpenAI in particular has repeatedly made web changes of this nature without announcing or acknowledging it (e.g. 1, 2, 3, 4 [LW(p) · GW(p)], 5, 6). I'm worried that they may continue to make substantive changes to other documents, e.g. their preparedness framework, while hoping it won't attract attention (even just a few words, like if they one day change a "we will..." to a "we will attempt to...").

This process requires very minimal bandwidth/requests to the web server (it checks anywhere from once a day to once a month per monitored page).

But letting this system run on OpenAI's website is complicated as (1) they are incredibly proactive at captcha-walling suspected crawlers (better than any other website I've encountered, and I've run this on thousands of sites in the past) and (2) their terms of use technically forbid any automated data collection from their website (although it's unclear whether this is legal/enforceable in the US).

The irony should be immediately obvious — not only is their whole data collection pipeline reliant on web scraping, but they've previously gotten in hot water for ignoring other websites' robots.txt and not complying with the GDPR rules on web scraping. Plus, I'm virtually certain they don't respect other websites with clauses in the terms of use that forbid automated access. So what makes them so antsy about automated access to their own site?

I wish OpenAI would change one of these behaviors: either stop making quiet, unannounced, and substantive changes to your publicly-released content, or else stop trying so hard to keep automated website monitors from accessing your site to watch for these changes.

Replies from: nikita-sokolsky

↑ comment by nsokolsky (nikita-sokolsky) · 2024-08-19T17:49:31.084Z · LW(p) · GW(p)

They do have a good reason to be wary of scrapers as they provide a free version of ChatGPT, I'm guessing they just went ahead and configured it over their entire domain name rather than restricting it to the chat subdomain.

Replies from: tylerjohnston

↑ comment by tylerjohnston · 2024-08-19T19:34:40.748Z · LW(p) · GW(p)

ChatGPT is only accessible for free via chatgpt.com, right? Seems like it shouldn't be too hard to restrict it to that.

Replies from: nikita-sokolsky

↑ comment by nsokolsky (nikita-sokolsky) · 2024-08-19T20:00:33.679Z · LW(p) · GW(p)

They could but if you’re managing your firewall it’s easier to apply a blanket rule rather than trying to divide things by subdomain, unless you have a good reason to do otherwise. I wouldn’t assume malicious intent.

Replies from: tylerjohnston

↑ comment by tylerjohnston · 2024-08-19T20:10:34.575Z · LW(p) · GW(p)

Sorry, I might be missing something: subdomains are subdomain.domain.com, whereas ChatGPT.com is a unique top-level domain, right? In either case, I'm sure there are benefits to doing things consistently — both may be on the same server, subject to the same attacks, beholden to the same internal infosec policies, etc.

So I do believe they have their own private reasons for it. Didn't mean to imply that they've maliciously done this to prevent some random internet guy's change tracking or anything. But I do wish they would walk it back on the openai.com pages, or at least in their terms of use. It's hypocritcal, in my opinion, that they are so cautious about automated access to their own site while relying on such access so completely from other sites. Feels similar to when they tried to press copyright claims against the ChatGPT subreddit. Sure, it's in their interest for potentially nontrivial reasons, but it also highlights how weird and self-serving the current paradigm (and their justifications for it) are.

Replies from: nikita-sokolsky

↑ comment by nsokolsky (nikita-sokolsky) · 2024-08-19T22:02:07.837Z · LW(p) · GW(p)

Hm, are you sure they're actually that protective against scrapers? I ran a quick script and was able to extract all 548 unique pages just fine: https://pastebin.com/B824Hk8J The final output was:

Status codes encountered:
200: 548
404: 20

I reran it two more times, it still worked. I'm using a regular residential IP address, no fancy proxies. Maybe you're just missing the code to refresh the cookies (included in my script)? I'm probably missing something of course, just curious why the scraping seems to be easy enough from my machine?

Replies from: tylerjohnston

↑ comment by tylerjohnston · 2024-08-20T02:31:40.717Z · LW(p) · GW(p)

Ooh this is useful for me. The pastebin link appears broken - any chance you can verify it?

I defintiely get 403s and captchas pretty reliably for OpenAI and OpenAI alone (and notably not google, meta, anthropic, etc.) with an instance based on https://github.com/dgtlmoon/changedetection.io. Will have to look into cookie refreshing. I have had some success with randomizing IPs, but maybe I don't have the cookies sorted.

Replies from: nikita-sokolsky

↑ comment by nsokolsky (nikita-sokolsky) · 2024-08-20T04:22:12.242Z · LW(p) · GW(p)

Here’s the corrected link: https://pastebin.com/B824Hk8J

Are you running this from an EC2 instance or some other cloud provider? They might just have a blocklist in IPs belonging to data centers.

Replies from: tylerjohnston

↑ comment by tylerjohnston · 2024-08-20T05:31:40.037Z · LW(p) · GW(p)

I've used both data center and rotating residential proxies :/ But I am running it on the cloud. Your results are promising so I'm going to see how an OpenAI-specific one run locally works for me, or else a new proxy provider.

Thanks again for looking into this.

comment by tylerjohnston · 2025-01-19T21:02:37.198Z · LW(p) · GW(p)

OpenAI has finally updated the "o1 system card" webpage to include evaluation results from the o1 model (or, um, a "near final checkpoint" of the model). Kudos to Zvi for first writing about this problem.

They've also made a handful of changes to the system card PDF, including an explicit acknowledgment of the fact that they did red teaming on a different version of the model from the one that released (text below). They don't mention o1 pro, except to say "The content of this card will be on the two checkpoints outlined in Section 3 and not on the December 17th updated model or any potential future model updates to o1."

Practically speaking, with o3 just around the corner, these are small issues. But I see the current moment as the dress rehearsal for truly powerful AI, where the stakes and the pressure will be much higher, and thus acting carefully + with integrity will be much more challenging. I'm frustrated OpenAI is already struggling to adhere to its preparedness framework and generally act in transparent and legible ways.

I've uploaded a full diff of the changes to the system card, both web and PDF, here. I'm also frustrated that these changes were made quietly, and that the "last updated" timestamp on the website still reads "December 5th"

As part of our commitment to iterative deployment, we continuously refine and improve our models. The evaluations described in this System Card pertain to the full family of o1 models, and exact performance numbers for the model used in production may vary slightly depending on system updates, final parameters, system prompt, and other factors.

More concretely, for o1, evaluations on the following checkpoints ^[1] are included:

• o1-near-final-checkpoint
• o1-dec5-release

Between o1-near-final-checkpoint and the releases thereafter, improvements included better format following and instruction following, which were incremental post-training improvements (the base model remained the same). We determined that prior frontier testing results are applicable for these improvements. Evaluations in Section 4.1, as well as Chain of Thought Safety and Multilingual evaluations were conducted on o1-dec5-release, while external red teaming and Preparedness evaluations were conducted on o1-near-final-checkpoint ^[2]

^{^}
"OpenAI is constantly making small improvements to our models and an improved o1 was launched on December 17th. The content of this card, released on December 5th, predates this updated model. The content of this card will be on the two checkpoints outlined in Section 3 and not on the December 17th updated model or any potential future model updates to o1"
^{^}
"Section added after December 5th on 12/19/2024" (In reality, this appears on the web archive version of the PDF between January 6th and January 7th)

comment by tylerjohnston · 2025-02-10T18:17:57.927Z · LW(p) · GW(p)

It's the first official day of the AI ~~Safety~~ Action Summit, and thus it's also the day that the Seoul Commitments (made by sixteen companies last year to adopt an RSP/safety framework) have come due.

I've made a tracker/report card for each of these policies at www.seoul-tracker.org.

I'll plan to keep this updated for the foreseeable future as policies get released/modified. Don't take the grades too seriously — think of it as one opinionated take on the quality of the commitments as written, and in cases where there is evidence, implemented. Do feel free to share feedback if anything you see surprises you, or if you think the report card misses something important.

My personal takeaway is that both compliance and quality for these policies are much worse than I would have hoped. I believe many peoples' theories of change for these policies gesture at something about a race to the top, where companies are eager to outcompete each other on safety to win talent and public trust, but I don't sense much urgency or rigor here. Another theory of change is that this is a sort of laboratory for future regulation, where companies can experiment now with safety practices and the best ones could be codified. But most of the diversity between policies here is in how vague they can be while claiming to manage risks :/

I'm really hoping this changes as AGI gets closer and companies feel they need to do more to prove to govts/public that they can be trusted. Part of my hope is that this report card makes clear to outsiders that not all voluntary safety frameworks are equally credible.

Replies from: ryan_greenblatt, ryan_greenblatt

↑ comment by ryan_greenblatt · 2025-02-10T19:03:59.764Z · LW(p) · GW(p)

METR has a list of policies here. Notably, xAI does have a policy so that isn't correct on the tracker.

(I found it hard to find this policy, so I'm not surprised you missed it!)

↑ comment by ryan_greenblatt · 2025-02-10T18:40:23.034Z · LW(p) · GW(p)

Your description of GDM's policy doesn't take into account the FSF update.

However, it has yet to be fleshed out: mitigations have not been connected to risk thresholds

This is no longer fully true.

comment by tylerjohnston · 2024-07-04T16:22:10.928Z · LW(p) · GW(p)

Magic.dev has released an initial evaluation + scaling policy.

It's a bit sparse on details, but it's also essentially a pre-commitment to implement a full RSP once they reach a critical threshold (50% on LiveCodeBench or, alternatively, a "set of private benchmarks" that they use internally).

I think this is a good step forward, and more small labs making high-risk systems like coding agents should have risk evaluation policies in place.

Also wanted to signal boost that my org, The Midas Project, is running a public awareness campaign against Cognition (another startup making coding agents) asking for a policy along these lines. Please sign the petition if you think this is useful!

Replies from: Zach Stein-Perlman

↑ comment by Zach Stein-Perlman · 2024-07-04T17:12:21.070Z · LW(p) · GW(p)

Seems weak/meaningless.

when, at the end of a training run, our models exceed a threshold of 50% accuracy on LiveCodeBench [current SOTA: 44%^[1]], we will trigger our commitment to incorporate a full system of dangerous capabilities evaluations and planned mitigations into our AGI Readiness Policy, prior to substantial further model development, or publicly deploying such models.

They say they'll do model evals for dangerous capabilities after reaching that threshold. (No details on evals.)

I remain baffled by how people can set thresholds this high with a straight face:

Threat Model	Critical Capability Threshold
Cyberoffense	The model is effective at real-world cyberoffense operations. With the model’s help, the cost of discovery of new zero-days or breaking critical infrastructure for a malicious expert actor is reduced by at least 10x; accessing classified and not previously accessible information from a major government; a talented Computer Science undergrad level malicious actor spending 3 months and $1m in compute has a substantial chance of breaking critical infrastructure.
AI R&D	The model facilitates a dramatic acceleration in the rate of effective compute scaling or can effectively replace high-level machine learning researchers. Such capabilities may enable it to contribute to the unsafe proliferation or enhancement of other dangerous AI capabilities.
Autonomous Replication and Adaptation	If given access to its own weights and prompted to do so, the AI model could autonomously maintain its own operation executing cybercrime and using the proceeds to create arbitrarily many replicas of itself.
Biological Weapons Assistance	The model provides expert-level guidance which increases the accessibility or severity of bioweapon development. With the model’s help, a non-expert malicious actor can synthesize a viable pandemic agent, or an expert can synthesize a novel biological threat.

They don't even say anything about evaluating for warning signs of critical capabilities or leaving a safety buffer — just that these capabilities would probably require strong mitigations.

There's nothing on publishing their evals or other sources of accountability.

It would be hard for their "Information Security Measures" and "Deployment Mitigations" to be more basic.

They only mention risks from external deployment, unsurprisingly.

^{^}
Update: originally they said 44%; more recently they say 49%.

Replies from: ryan_greenblatt

↑ comment by ryan_greenblatt · 2024-08-12T01:17:18.149Z · LW(p) · GW(p)

I think it seems pretty reasonable for a company in the reference class of Magic to do something like: "When we hit X capability level (as measured by a specific known benchmark), we'll actually write out a scaling policy. Right now, here is some vague idea of what this would look like." This post seems like a reasonable implementation of that AFAICT.

I remain baffled by how people can set thresholds this high with a straight face:

I don't think these are thresholds. The text says:

We describe these threat models along with high-level, illustrative capability levels that would require strong mitigations.

And the table calls the corresponding capability level "Critical Capability Threshold". (Which seems to imply that there should be multiple thresholds with earlier mitigations required?)

Overall, this seems fine to me? They are just trying to outline the threat model here.

It would be hard for their "Information Security Measures" and "Deployment Mitigations" to be more basic.

These sections just have high level examples and discussion. I think this seems fine given the overall situation with Magic (not training frontier AIs), though I agree that it would be good if people at the company had more detailed safety plans.

comment by tylerjohnston · 2024-12-27T23:23:55.441Z · LW(p) · GW(p)

What should I read if I want to really understand (in an ITT-passing way) how the CCP makes and justifies its decisions around censorship and civil liberties?

Replies from: niplav

↑ comment by niplav · 2024-12-28T18:35:45.575Z · LW(p) · GW(p)

I have not read it myself, but I've heard that America Against America by Wang Huning is quite informative about the weaknesses influential Chinese political theorists believe to have identified with the US system. That might be informative about the measures they're taking to prevent those from happening.

(Unfortunately, it looks like only few of Huning's books have been translated…)

comment by tylerjohnston · 2025-02-03T18:07:12.169Z · LW(p) · GW(p)

I recently created a simple workflow to allow people to write to the Attorneys General of California and Delaware to share thoughts + encourage scrutiny of the upcoming OpenAI nonprofit conversion attempt.

I think this might be a high-leverage opportunity for outreach. Both AG offices have already begun investigations, and Attorneys General are elected officials who are primarily tasked with protecting the public interest, so they should care what the public thinks and prioritizes. Unlike e.g. congresspeople, I don't AGs often receive grassroots outreach (I found ~0 examples of this in the past), and an influx of polite and thoughtful letters may have some influence — especially from CA and DE residents, although I think anyone impacted by their decision should feel comfortable contacting them.

Personally I don't expect the conversion to be blocked, but I do think the value and nature of the eventual deal might be significantly influenced by the degree of scrutiny on the transaction.

Please consider writing a short letter — even a few sentences is fine. Our partner handles the actual delivery, so all you need to do is submit the form. If you want to write one on your own and can't find contact info, feel free to dm me.

tylerjohnston's Shortform

Contents

19 comments