tylerjohnston's Shortform
post by tylerjohnston · 2024-07-04T16:22:10.760Z · LW · GW · 12 commentsContents
12 comments
12 comments
Comments sorted by top scores.
comment by tylerjohnston · 2024-08-18T21:09:05.769Z · LW(p) · GW(p)
A (somewhat minor) example of hypocrisy from OpenAI that I find frustrating.
For context: I run an automated system that checks for quiet/unannounced updates to AI companies' public web content including safety policies, model documentation, acceptable use policies, etc. I also share some findings from this on Twitter.
Part of why I think this is useful is that OpenAI in particular has repeatedly made web changes of this nature without announcing or acknowledging it (e.g. 1, 2, 3, 4 [LW(p) · GW(p)], 5, 6). I'm worried that they may continue to make substantive changes to other documents, e.g. their preparedness framework, while hoping it won't attract attention (even just a few words, like if they one day change a "we will..." to a "we will attempt to...").
This process requires very minimal bandwidth/requests to the web server (it checks anywhere from once a day to once a month per monitored page).
But letting this system run on OpenAI's website is complicated as (1) they are incredibly proactive at captcha-walling suspected crawlers (better than any other website I've encountered, and I've run this on thousands of sites in the past) and (2) their terms of use technically forbid any automated data collection from their website (although it's unclear whether this is legal/enforceable in the US).
The irony should be immediately obvious — not only is their whole data collection pipeline reliant on web scraping, but they've previously gotten in hot water for ignoring other websites' robots.txt and not complying with the GDPR rules on web scraping. Plus, I'm virtually certain they don't respect other websites with clauses in the terms of use that forbid automated access. So what makes them so antsy about automated access to their own site?
I wish OpenAI would change one of these behaviors: either stop making quiet, unannounced, and substantive changes to your publicly-released content, or else stop trying so hard to keep automated website monitors from accessing your site to watch for these changes.
Replies from: nikita-sokolsky↑ comment by Nikita Sokolsky (nikita-sokolsky) · 2024-08-19T17:49:31.084Z · LW(p) · GW(p)
They do have a good reason to be wary of scrapers as they provide a free version of ChatGPT, I'm guessing they just went ahead and configured it over their entire domain name rather than restricting it to the chat subdomain.
Replies from: tylerjohnston↑ comment by tylerjohnston · 2024-08-19T19:34:40.748Z · LW(p) · GW(p)
ChatGPT is only accessible for free via chatgpt.com, right? Seems like it shouldn't be too hard to restrict it to that.
Replies from: nikita-sokolsky↑ comment by Nikita Sokolsky (nikita-sokolsky) · 2024-08-19T20:00:33.679Z · LW(p) · GW(p)
They could but if you’re managing your firewall it’s easier to apply a blanket rule rather than trying to divide things by subdomain, unless you have a good reason to do otherwise. I wouldn’t assume malicious intent.
Replies from: tylerjohnston↑ comment by tylerjohnston · 2024-08-19T20:10:34.575Z · LW(p) · GW(p)
Sorry, I might be missing something: subdomains are subdomain.domain.com, whereas ChatGPT.com is a unique top-level domain, right? In either case, I'm sure there are benefits to doing things consistently — both may be on the same server, subject to the same attacks, beholden to the same internal infosec policies, etc.
So I do believe they have their own private reasons for it. Didn't mean to imply that they've maliciously done this to prevent some random internet guy's change tracking or anything. But I do wish they would walk it back on the openai.com pages, or at least in their terms of use. It's hypocritcal, in my opinion, that they are so cautious about automated access to their own site while relying on such access so completely from other sites. Feels similar to when they tried to press copyright claims against the ChatGPT subreddit. Sure, it's in their interest for potentially nontrivial reasons, but it also highlights how weird and self-serving the current paradigm (and their justifications for it) are.
Replies from: nikita-sokolsky↑ comment by Nikita Sokolsky (nikita-sokolsky) · 2024-08-19T22:02:07.837Z · LW(p) · GW(p)
Hm, are you sure they're actually that protective against scrapers? I ran a quick script and was able to extract all 548 unique pages just fine: https://pastebin.com/B824Hk8J The final output was:
Status codes encountered:
200: 548
404: 20
I reran it two more times, it still worked. I'm using a regular residential IP address, no fancy proxies. Maybe you're just missing the code to refresh the cookies (included in my script)? I'm probably missing something of course, just curious why the scraping seems to be easy enough from my machine?
Replies from: tylerjohnston↑ comment by tylerjohnston · 2024-08-20T02:31:40.717Z · LW(p) · GW(p)
Ooh this is useful for me. The pastebin link appears broken - any chance you can verify it?
I defintiely get 403s and captchas pretty reliably for OpenAI and OpenAI alone (and notably not google, meta, anthropic, etc.) with an instance based on https://github.com/dgtlmoon/changedetection.io. Will have to look into cookie refreshing. I have had some success with randomizing IPs, but maybe I don't have the cookies sorted.
↑ comment by Nikita Sokolsky (nikita-sokolsky) · 2024-08-20T04:22:12.242Z · LW(p) · GW(p)
Here’s the corrected link: https://pastebin.com/B824Hk8J
Are you running this from an EC2 instance or some other cloud provider? They might just have a blocklist in IPs belonging to data centers.
Replies from: tylerjohnston↑ comment by tylerjohnston · 2024-08-20T05:31:40.037Z · LW(p) · GW(p)
I've used both data center and rotating residential proxies :/ But I am running it on the cloud. Your results are promising so I'm going to see how an OpenAI-specific one run locally works for me, or else a new proxy provider.
Thanks again for looking into this.
comment by tylerjohnston · 2024-07-04T16:22:10.928Z · LW(p) · GW(p)
Magic.dev has released an initial evaluation + scaling policy.
It's a bit sparse on details, but it's also essentially a pre-commitment to implement a full RSP once they reach a critical threshold (50% on LiveCodeBench or, alternatively, a "set of private benchmarks" that they use internally).
I think this is a good step forward, and more small labs making high-risk systems like coding agents should have risk evaluation policies in place.
Also wanted to signal boost that my org, The Midas Project, is running a public awareness campaign against Cognition (another startup making coding agents) asking for a policy along these lines. Please sign the petition if you think this is useful!
Replies from: Zach Stein-Perlman↑ comment by Zach Stein-Perlman · 2024-07-04T17:12:21.070Z · LW(p) · GW(p)
Seems weak/meaningless.
when, at the end of a training run, our models exceed a threshold of 50% accuracy on LiveCodeBench [current SOTA: 44%[1]], we will trigger our commitment to incorporate a full system of dangerous capabilities evaluations and planned mitigations into our AGI Readiness Policy, prior to substantial further model development, or publicly deploying such models.
They say they'll do model evals for dangerous capabilities after reaching that threshold. (No details on evals.)
I remain baffled by how people can set thresholds this high with a straight face:
Threat Model | Critical Capability Threshold |
---|---|
Cyberoffense | The model is effective at real-world cyberoffense operations. With the model’s help, the cost of discovery of new zero-days or breaking critical infrastructure for a malicious expert actor is reduced by at least 10x; accessing classified and not previously accessible information from a major government; a talented Computer Science undergrad level malicious actor spending 3 months and $1m in compute has a substantial chance of breaking critical infrastructure. |
AI R&D | The model facilitates a dramatic acceleration in the rate of effective compute scaling or can effectively replace high-level machine learning researchers. Such capabilities may enable it to contribute to the unsafe proliferation or enhancement of other dangerous AI capabilities. |
Autonomous Replication and Adaptation | If given access to its own weights and prompted to do so, the AI model could autonomously maintain its own operation executing cybercrime and using the proceeds to create arbitrarily many replicas of itself. |
Biological Weapons Assistance | The model provides expert-level guidance which increases the accessibility or severity of bioweapon development. With the model’s help, a non-expert malicious actor can synthesize a viable pandemic agent, or an expert can synthesize a novel biological threat. |
They don't even say anything about evaluating for warning signs of critical capabilities or leaving a safety buffer — just that these capabilities would probably require strong mitigations.
There's nothing on publishing their evals or other sources of accountability.
It would be hard for their "Information Security Measures" and "Deployment Mitigations" to be more basic.
They only mention risks from external deployment, unsurprisingly.
- ^
Update: originally they said 44%; more recently they say 49%.
↑ comment by ryan_greenblatt · 2024-08-12T01:17:18.149Z · LW(p) · GW(p)
I think it seems pretty reasonable for a company in the reference class of Magic to do something like: "When we hit X capability level (as measured by a specific known benchmark), we'll actually write out a scaling policy. Right now, here is some vague idea of what this would look like." This post seems like a reasonable implementation of that AFAICT.
I remain baffled by how people can set thresholds this high with a straight face:
I don't think these are thresholds. The text says:
We describe these threat models along with high-level, illustrative capability levels that would require strong mitigations.
And the table calls the corresponding capability level "Critical Capability Threshold". (Which seems to imply that there should be multiple thresholds with earlier mitigations required?)
Overall, this seems fine to me? They are just trying to outline the threat model here.
It would be hard for their "Information Security Measures" and "Deployment Mitigations" to be more basic.
These sections just have high level examples and discussion. I think this seems fine given the overall situation with Magic (not training frontier AIs), though I agree that it would be good if people at the company had more detailed safety plans.