Who owns OpenAI's new language model?

post by ioannes_shade · 2019-02-14T17:51:26.367Z · LW · GW · 9 comments

This is a question post.

OpenAI published about their very general language-writing & comprehension model (a) today.

Interestingly, they're not releasing the model itself:

Due to concerns about large language models being used to generate deceptive, biased, or abusive language at scale, we are only releasing a much smaller version of GPT-2 along with sampling code. We are not releasing the dataset, training code, or GPT-2 model weights.

Does OpenAI hold a copyright for this model (and its other intellectual property)?

If the US government decided they wanted to requisition this model, would an OpenAI copyright defend against that? Would OpenAI have any real recourse against a government requisition?



Comments sorted by top scores.

comment by jbash · 2019-02-15T01:43:06.555Z · LW(p) · GW(p)

The "requisition" question isn't well formed. The US Government has various powers to demand various specific information from various specific people via various specific processes in various specific circumstances for various specific purposes, mostly but not all to do with law enforcement. I guess one or more of those could somehow apply, although the only one I can think of is a general Congressional fact-finding power.

The US Government has no general power to "requisition" anything from anybody. That's just not a thing at all. "Requisition" doesn't mean anything here.

However, if the US Government asked for it, I suspect OpenAI would be happy to hand it over voluntarily. They'd probably also give it to anybody else they thought of as "reputable". What would make you think that they'd want to resist such a request to begin with?

comment by Wei_Dai · 2019-02-15T03:10:06.592Z · LW(p) · GW(p)

The US Government has no general power to "requisition" anything from anybody. That's just not a thing at all. "Requisition" doesn't mean anything here.

Requisition ("an official order laying claim to the use of property or materials") is similar to eminent domain ("the right of a government or its agent to expropriate private property for public use, with payment of compensation") which definitely is a thing here. According to this Quora answer:

Yes, in the U.S. eminent domain can extend to intellectual property.

comment by ioannes_shade · 2019-02-15T06:57:29.071Z · LW(p) · GW(p)

Right, this is the area I'm curious about.

I imagine that if a private firm created a process for making nuclear suitcase bombs with a unit cost < $100k, the US government would show up and be like "nice nuclear-suitcase-bomb process – that's ours now. National security reasons, etc."

(Something like this is what I meant by "requisition.")

I wonder what moves the private firm could make, in that case. Could they be like "No, sorry, we invented this process, it's protected by our copyright, and we will sue if you try to take it?"

Would they have any chance of preserving their control of the technology through the courts?

Would they be able to just pack up shop & move operations to Ireland, or wherever?

I also wonder how distant the current OpenAI case is from the hypothetical nuclear-suitcase-bomb case, in terms of property rights & moves available to the private firm.

comment by jbash · 2019-02-15T13:05:09.424Z · LW(p) · GW(p)

So, there are several things that might be "property" here.

The method is probably patentable. The trained network is definitely NOT copyrightable by the clear intent of the copyright law, because it's obvious to any honest interpreter that it's nothing like a "creative work". However, based on their track record, if you took it to the Federal Circuit, they'd probably be willing to pervert the meaning of "creative work" to let somebody enforce a copyright on it based on curation of the training data or something equally specious. They may already have done that in some analogous case.

Property rights in patents or copyrights are separate from property rights in actual devices, copies of networks, or whatever. I can own a book without owning the copyright in the book. And if you own the copyright, that does NOT allow you to demand that I give you my copy of the book, even if you don't have a copy yourself.

The nuclear bomb case would involve a "patent secrecy order"... a power which was in fact created exactly for nuclear bombs. I don't think there's such a thing as a "copyright secrecy order".

They could also probably forcibly buy any patent (yes, under eminent domain). Eminent domain is NOT a "requisition", because eminent domain in the US requires compensation as a constitutional matter. I also don't know if they have any processes in place for exercising eminent domain in the case under discussion, and I doubt they do. Some particular agency has to be authorized and funded to exercise a power like that in any given case.

Even if the government forcibly bought a patent or copyright, that by itself would not entitle the government to be given a copy of the subject matter. I don't know if bits, as opposed to the media they were on, would even be "property".

... but if you REALLY want to go there, well, obviously the US Government, taken as a whole, could obviously pass a law giving itself the power to force OpenAI to hand over copies, delete its own copies, relinquish any patent or copyright rights (possibly with a requirement for money compensation for those last two), stay out of Ireland, and whatever else.

What I'm really puzzled by is the extremely counterfactuality of the question. It just doesn't seem to have any connection at all with how people or institutions actually behave. A neural network that can sound like somebody isn'tt a nuclear bomb, and the political dynamics around it are completely different.

The upper echelons of the US Government won't notice it at all.

If some researcher working for the US Government (or any government) wants a copy of the network for some reason, that person will just send a polite email request to OpenAI, and OpenAI will probably hand it over without worrying about it. If OpenAI doesn't, the question will probably die there. From a practical point of view, that researcher won't be able to make it enough of a priority for the government to even stir itself to figure out which powers might apply.

If some agency of the government suggests to OpenAI that it never release the network to anybody, and gives any kind of meaningful reason, then OpenAI will probably take that into account and comply. That's extremely unlikely, though.

Some government agency trying to actually force OpenAI not to release is farfetched enough not to be worth worrying about, but it would probably come down to timing; OpenAI might be able to release before the government could create any binding order preventing it.

comment by ioannes_shade · 2019-02-15T15:52:19.281Z · LW(p) · GW(p)

Thanks, this is helpful.

What I'm really puzzled by is the extremely counterfactuality of the question.

It doesn't feel too extreme to me (powerful new dual-use technology), but probably our priors are just different here :-)

comment by ioannes_shade · 2019-02-15T17:08:41.830Z · LW(p) · GW(p)

Haven't read it yet, but here's an academic review of "Federal Patent Takings," which seems relevant.

comment by avturchin · 2019-02-14T19:58:07.216Z · LW(p) · GW(p)

The model is trained on web-pages of different (I think so) legal copyright status. Most of web-page owners didn't provide license that their content will be used to train a neural net. So the model is massive copyright violation and could be seized by government. However, the model is nothing without supportive scientists, so it can't be used by an outsider.

comment by jimrandomh · 2019-02-14T20:04:37.975Z · LW(p) · GW(p)

That's not how copyright works. US copyright law is civil, not criminal, which means that the US government can't act on infringements of its own initiative. A web site owner could theoretically sue OpenAI claiming that OpenAI infringed their copyright, but they'd probably lose, for a number of reasons.

comment by jbash · 2019-02-15T13:08:20.578Z · LW(p) · GW(p)

The US has criminal copyright law. I thought it was recent, but Wikipedia says it's actually been around since 1897.

The probability of the governemnt trying to USE it in this kind of case is epsilon over ten, though. And as you say, they'd probably lose if they did, because the neural network isn't really derivative of the Web pages, and even if it is it's probably fair use.