Is there some reason LLMs haven't seen broader use?

post by tailcalled · 2022-11-16T20:04:48.473Z · LW · GW · 27 comments

This is a question post.

When GPT-3 first came out, I expected that people would use it as a sort of "common-sense reasoning module". That is, if you want to process or generate information in some way, then you can give GPT-3 a relevant prompt, and repeatedly apply it to a bunch of different inputs to generate corresponding outputs. After GPT-3 came out, I had expected that people would end up constructing a whole bunch of such modules and wire them together to create big advanced reasoning machines. However, this doesn't seem to have panned out; you don't see much discussion about building LLM-based apps.

Why not? I assume that there must be something that goes wrong along the way, but what exactly goes wrong? Seems like it has the potential to teach us a lot about LLMs.


answer by the gears to ascension (the gears to ascenscion) · 2022-11-17T09:33:57.174Z · LW(p) · GW(p)

They are. GPT3 doesn't have a lot of common sense. But, language models that large have lots of general intelligence due to their size, and are an incredible basis for doing stuff, if trained on the task at hand. eg (mostly non-vetted): (which I used to find everything in this list besides the first three items; note that you can drop any of these links into metaphor and walk the semantic relatedness web near these sites! I probably didn't even have to paste this big list, but y'all consider link clicks a trivial inconvenience, turn up your clickthrough-and-close rate to match your usage of mindless recommenders and retake your agency over what websites you use! or something! uh anyway) (which you know about, of course) (also gpt3)

general research ai tools: - the paper recommender is wonderful, add your favorite safety papers to your feeds! my strongest recommendation on this list besides metaphor. - looks very cool but kinda expensive; probably not even available to individual researchers outside institutions - looks like a general related-work finding tool like semanticscholar, may have some tastefully chosen small ML models like "does this citation support or contrast?". [costs. $16/mo individual. personally, that means SKIP] looks cool, found via semantic search seeded with semantic scholar looks cool, claims to have cite-as-you-write tool, I wonder how it compares to galactica looks cool, claims to compete with semanticscholar, just try semanticscholar though lol, looks cool but not free bio papers only and also not free I think not sure if this is for science or internal tools for companies or what looks xpensive tries to summarize, doesn't do as well as the classic and locally crafted.. ...! which is incredible and is the only one of these things I actually use already besides semanticscholar general site recommender based on contextual relevance, seems similar to metaphor in that respect [edit: ohmytofu appears nonfunctional] suggests plausible research trees to learn about a subject [edit: tried! sorta works. feels like just another elicit prompt] paper summarizer another paper summarizer, this one is actually gpt3, like you asked for not sure if this uses real ai or not but it looks maybe cool who knows if this one is any good another research summarizer yet another paper recommender

legal: (custom trained?) (might be gpt3?) (looks meh) (tba?) (idk) (this one looks mildly cooler) (ai powered contract linter) (legal semantic search) (contract review) (contract review and editing) (contract review and editing) (some blend of the above) (contract linter) (yet another linter addon for word) patent recommender

not ai, but came up and looks cool:

chem & bio (cell culture go foom!) - I'm even less qualified to evaluate most of these than the legal text stuff: out loud <- wtf! (actual ai bio lab) (lab, no public usability)
valence discovery goes here, as do deepmind and standford medai

sus: <- warning, this one looks like a manipulation tool

this one is specifically military; I'm sure that it, and many others like it, will detect this comment and categorize it somewhere:

honorable mentions: <- amazing looking paper relationships tool, but not actually based on deep learning actually a game, not an ai just an ai research competition site GIS + paper discovery? no ai tho this is a list of tools made by people who like lists almost as much as I do. similar list to the stuff you find browsing my profile here - big list of tools, mildly curated but significant shopping still remains.

and of course, my purpose in sharing these link floods is to give people seeds to find stuff on the webbernet. you asked for how ai has been productive; the answer is, it's a bit of a mess, but here's a big list of starting points to comment on. if anyone browses through these, please share which ones were worth more than a few seconds to your intuition - I spent an hour and a half on this list and barely skimmed any of them!

comment by Scott Stevenson (scott-stevenson) · 2022-11-17T23:12:32.247Z · LW(p) · GW(p)

Scott here from (mentioned above)!

We are finding LLMs do be incredibly powerful tools for legal drafting & review, mind-blowingly good. It is a whole new way of thinking as a programmer though: results are non-deterministic! Chaining together non-deterministic queries is much more of an art than science. I think it will take the software engineering profession a long time to get comfortable with that. It really requires tinkering at scale, but not necessarily formal methods.

I also think there is a perception that GPT-3 is "too easy" and you have to "learn to do things from first principles first". I really disagree with that, and I wrote about that fallacy here.

One last point: GPT-3 has improved dramatically over the past 2 years. It's not the same product it was when it launched. I don't think many people have caught on to the level of improvement yet.

Replies from: lahwran, quanticle
comment by the gears to ascension (lahwran) · 2022-11-18T02:50:06.225Z · LW(p) · GW(p)

I thought I might catch the eyes of some of the folks I was mentioning, heh. I'm curious which notification system you use to find mentions of your work!

also, welcome to the safety nerds site, enjoy your stay, don't destroy the world with stark differences in effective agency, and let's end suffering using advanced technology! :)

comment by quanticle · 2022-11-18T02:03:37.636Z · LW(p) · GW(p)

Your link goes to a private page, I'm afraid.

comment by tailcalled · 2022-11-17T17:57:19.337Z · LW(p) · GW(p)

Interesting list, I had no idea there was so much.

Replies from: lahwran
comment by the gears to ascension (lahwran) · 2022-11-17T19:51:57.121Z · LW(p) · GW(p)

I'd love to hear which tools from the research section you end up using! My favorites are metaphor and semantic scholar at the moment. copilot is also great for doing less typing, although it made me a mistake that I missed in some important code and I am a bit sketched out about it now.

comment by quanticle · 2022-11-18T02:06:33.250Z · LW(p) · GW(p)

Another example: Notion, the popular wiki/information management tool, just announced an AI-powered writing assistant. Now, they haven't announced specifically that it's using a LLM, but if you look at the demo, it's hard to imagine what else it could be.

Replies from: lahwran
comment by the gears to ascension (lahwran) · 2022-11-18T10:48:04.685Z · LW(p) · GW(p)

to be honest I'm slightly confused about your phrasing; it looks like they demonstrate the output of a language model on the page, and so the only question left is whether it's transformers or some swanky high speed RWKV thing or other

Replies from: quanticle
comment by quanticle · 2022-11-18T15:42:36.946Z · LW(p) · GW(p)

I wasn't aware of RWKV until you mentioned it. Fair enough. It's possible that they're using that instead of a LLM.

Replies from: lahwran
comment by the gears to ascension (lahwran) · 2022-11-18T21:02:01.606Z · LW(p) · GW(p)

no I mean, that would still be an LLM. just not a transformer-based one. to not be an LLM you have to train it on significantly less text data, I think. maybe I would also count training on sufficiently much other modality data. by its behavior we can know that it could only possibly be an LLM, there could be no other AI algorithm that outputs text like that without satisfying the constraint of "is LLM".

Replies from: quanticle
comment by quanticle · 2022-11-18T21:10:23.901Z · LW(p) · GW(p)

Oh, I guess I misunderstood what you were saying. Yes, I agree that nothing else produces output like that. I was just pointing out that Notion haven't come out and explicitly stated what, specifically, they're using to do this.

Replies from: lahwran
comment by the gears to ascension (lahwran) · 2022-11-18T21:14:51.005Z · LW(p) · GW(p)

yeah could be any LLM. It does feel like an ungrounded generative model like most LLMs right now, but maybe it's some swanky new physical model based thing, you never know.

comment by Lao Mein (derpherpize) · 2022-11-18T03:26:05.117Z · LW(p) · GW(p)

There's so many that I'm having trouble choosing just one. Can anyone recommend one for bioinformatics research? I would like something to help with hypothesis discovery, but am hoping to discover something that I currently don't know about.

Replies from: lahwran
comment by the gears to ascension (lahwran) · 2022-11-18T09:38:54.483Z · LW(p) · GW(p)

semanticscholar has been amazing, and I feel like I am often recommending new papers to people who haven't encountered them yet thanks to its feeds; the way you use them is by adding a paper to your library, which requires an account, but it only takes a few papers before you start getting ai recommendations. if you try just one, it's my recommendation. I've tried a few paper navigation tools, and my favorite so far is actually manually walking the citation graph on semanticscholar, followed by browsing its new-papers feeds.

I also have been absolutely blown away by metaphor. I'd definitely recommend trying metaphor for your paper search. it can't do everything but it provides an incredible component and is probably the most general tool I've recommended here.

if you find semanticscholar and metaphor disappointing is when I'd suggest you start trying a bunch of these tools in quick succession; set a goal of a kind of discovery you've had before that you'd like to have again, and see if the tool can replicate it. There are a lot of really cool papers, and that's how I find the coolest crazy-advanced-bio-whatever stuff so far; metaphor might be going to replace semanticscholar but ultimately neither are as strong as iris or causaly, afaict.

that said - I suspect that the most advanced bio tool on this list is advanced enough to make a night-and-day difference in your research throughput, and that opening all the bio links and setting a ten minute timer to close all but three would really give you some solid candidates. if you describe what you're looking for further, I can try filtering further.

also, for baseline, I tossed your comment into metaphor with some prompt engineering; here are the results: (<icon loadingspinner/>, manually...)

foss or freeware:


$$$ (no price given)+:

research lab focused on the topic of bio hypothesis discovery:

not available yet but whoa cool:

wat, collective behavior aggregation thing but I'm not sure if it's good or not, or, what:

misc foss tools that were not what you seek, unrelated but cool:

Replies from: derpherpize
comment by Lao Mein (derpherpize) · 2022-11-27T15:24:13.851Z · LW(p) · GW(p)

Do you know of any AI tools where I can input a table of labeled genetic data and get out an interesting hypothesis? If nothing like that exists, I should probably make one myself.

Replies from: lahwran
comment by the gears to ascension (lahwran) · 2022-11-28T00:02:04.245Z · LW(p) · GW(p)

I don't know of one. Here's what I found looking on semanticscholar and metaphor for ten minutes an hour or two of diffuse-focus multitasking:

near misses:

search trace:

Replies from: derpherpize
comment by Lao Mein (derpherpize) · 2022-12-08T14:21:23.605Z · LW(p) · GW(p)

Thanks! I know this is super late, but this has really improved my work productivity. I really appreciate you taking the time to help.

For what it's worth, Causaly is a disappointment. No strong LLM integration means it really struggles to compete some of the other products out there.

answer by trevor (Trevor1) · 2022-11-16T21:51:21.733Z · LW(p) · GW(p)

I don't know about big reasoning machines, but I've heard a lot of rumors about LLMs being integrated into an extremely wide variety of extant ML systems that were already commercially viable on their own. It seems pretty intuitive to me that LLMs can provide some very good layers to support other systems. What have people heard about that?

answer by Lost Futures · 2022-11-16T20:34:38.829Z · LW(p) · GW(p)

GPT-3 was announced less than two and a half years ago. I don't think it's reasonable to assume that the market has fully absorbed its capabilities yet.

comment by tailcalled · 2022-11-16T20:43:25.831Z · LW(p) · GW(p)

I would just have expected at least an explosion in basic demo projects that use GPT-3 for reasoning. A skilled programmer can usually code up something simple over a weekend or two, even if it is too unstable and incomplete to be economically viable. But instead there seems to just be... almost nothing.

Replies from: gerald-monroe
comment by Gerald Monroe (gerald-monroe) · 2022-11-17T09:42:28.131Z · LW(p) · GW(p)

There is already github copilot, and clones.

There is an explosion of other llms.

What do you expect?  The system was never intended to be usable commercially, and it has several problems.  Many of it's answers are wrong, often enough you can't use it to automate most jobs.  And it can unpredictably emit language embarrassing to the company running it, from profanity to racist and bigoted speech, and there is no known way to guarantee it will never do that.


Comments sorted by top scores.

comment by TekhneMakre · 2022-11-16T20:17:49.855Z · LW(p) · GW(p)

You could maybe use an LM for Babble. But how would you use an LM for Prune?

Replies from: tailcalled
comment by tailcalled · 2022-11-16T20:41:26.549Z · LW(p) · GW(p)

Indeed it works great for babble. OpenAI suggests the prompt of "Write a tagline for an ice cream shop" as an example. Using it I was able to generate the following three examples:

  • The best ice cream in town!
  • A scoop of happiness.
  • Delicious ice cream for all!

I would have assumed that you could do pruning just by prompting it to generate an evaluation and a rating. Evaluating the above taglines using the prompt "Explain the advantages and disadvantages of the following tagline for an ice cream shop, and then rate the tagline on a scale from 1 to 10.", I get:

The best ice cream in town!:

  • The advantages of the tagline are that it is short, sweet, and to the point. It tells customers exactly what they can expect from the ice cream shop. The disadvantages are that it is somewhat generic and does not really stand out from other ice cream shops. Overall, I would rate the tagline a 7 out of 10.
  • The advantage of the tagline is that it is succinct and to the point. The disadvantage is that it is vague and does not give any specific information about the ice cream shop. I would rate the tagline a 6.
  • The tagline is straightforward and to the point. It tells customers what they can expect from the ice cream shop. However, it is a bit generic and could apply to any ice cream shop. Rating: 7

A scoop of happiness.:

  • The advantage of the tagline is that it is short, simple, and to the point. It tells customers exactly what they can expect from the ice cream shop – happiness. The disadvantage of the tagline is that it is a bit cliché and might not stand out from the competition. Overall, I would rate the tagline a 7.
  • The advantage of the tagline is that it is short and catchy. The disadvantage is that it is somewhat corny. I would rate the tagline a 6.
  • The tagline is short, catchy, and easy to remember. It conveys a positive message about the product. However, it is somewhat generic and could be used for any type of business. 7/10

Delicious ice cream for all!:

  • The tagline is simple and straightforward, and it tells customers what they can expect from the shop. However, it is somewhat generic and does not really stand out from the crowd. I would rate it a 6.
  • The tagline is simple and straightforward, which is good. However, it is also somewhat generic and does not really convey anything unique about the shop. It is not a bad tagline, but it is not particularly great either. I would give it a 6 out of 10.
  • The tagline is straightforward and to the point. It tells customers that the shop sells delicious ice cream and that everyone is welcome. However, it is somewhat generic and doesn't really stand out from other ice cream shops. It also doesn't say anything about the quality of the ingredients or the variety of flavors available. Overall, I would rate this tagline a 6 out of 10.

 😅I suppose this doesn't tell us much. The evaluations are all very repetitive, but they seem to correctly apply to the generated ideas, it's just that the original ideas are not very distinct.

To try and generate more distinct ideas, I asked GPT-3 to come up with some themes for ice cream shops. It gave the following themes:

  • The ice cream shop would have a tropical theme! You could have all sorts of fruity and exotic flavors, and the decor could be bright and colorful.
  • The ice cream shop could be called "Frosty's" and would have a winter theme. The walls would be painted blue and white, and the floors would be covered in fake snow. The ceiling would be covered in twinkling lights, and the ice cream would be served in bowls made to look like snowballs.
  • The shop would be decorated like an old-fashioned ice cream parlor, with checkered tablecloths, vintage ice cream posters, and a staff dressed in traditional ice cream server uniforms. The menu would feature classic ice cream flavors like vanilla, chocolate, and strawberry, as well as more unique flavors like lavender, Earl Grey, and fig.

This yielded:

  • Tropical: The advantages of a tropical-themed ice cream shop would be the wide variety of flavors to choose from, as well as the bright and colorful decor. The disadvantages could include the potential for the shop to be too noisy or crowded, and the lack of variety in terms of toppings and mix-ins. Overall, I would rate this theme a 7.
  • The advantages of the theme are that it would be unique and eye-catching. It would also be appropriate for the winter season. The disadvantages are that it could be difficult to keep the floors clean, and the ceiling lights could be a fire hazard. I would rate the theme an 8 out of 10.
  • The advantages of this theme are that it would be unique and would likely attract customers who are looking for a more traditional ice cream experience. The disadvantages are that it would be more expensive to set up and maintain, and the menu would be limited to classic and unique flavors, which may not be what some customers are looking for. Overall, I would rate this theme a 7.

This critique doesn't seem great. For instance, ice cream is disproportionately eaten in the summer, yet it praises "Frosty's" for being "appropriate for the winter season". Possibly my prompt is bad though.

But at a first look, it seems like you might be right? Idk.

Replies from: tailcalled, TekhneMakre
comment by tailcalled · 2022-11-16T21:21:26.637Z · LW(p) · GW(p)

Another go at it. Using the prompt

Explain the advantages for each of the following taglines for an ice cream shop, and evaluate which one is the best: a. The best ice cream in town! b. A scoop of happiness. c. Delicious ice cream for all!

I'm getting the following results:

Attempt one:

The first tagline, "The best ice cream in town!" is a strong claim that could help to attract customers. However, it could also be seen as arrogant and could turn some customers away. The second tagline, "A scoop of happiness," is more positive and upbeat, and could help to create a positive association with the ice cream shop. The third tagline, "Delicious ice cream for all!," is inclusive and could help to attract a wide range of customers. Overall, the second tagline, "A scoop of happiness," is the best option.

Attempt two:

The best ice cream in town! - This tagline is straightforward and to the point. It tells customers that the shop has the best ice cream in town, and that they can expect a delicious treat.

A scoop of happiness. - This tagline is more whimsical and fun. It suggests that the ice cream shop is a place where customers can find happiness, and that their ice cream will be a delicious part of that experience.

Delicious ice cream for all! - This tagline is inclusive and welcoming. It tells customers that everyone is welcome at the shop, and that they can expect to find delicious ice cream.

The best one is The best ice cream in town!

Attempt three, changing it from explaining an advantage to explaining a disadvantage:

The disadvantage of the first tagline is that it is generic and does not stand out from other ice cream shops. The disadvantage of the second tagline is that it is too vague and does not tell the customer what they can expect from the shop. The disadvantage of the third tagline is that it is too general and does not focus on a specific type of ice cream. The best tagline would be something that is specific and tells the customer what they can expect from the shop, such as "The best ice cream in town!"

Replies from: TekhneMakre
comment by TekhneMakre · 2022-11-16T21:37:56.415Z · LW(p) · GW(p)

Yeah... I mean it's not thinking / comparing / reckoning / discerning, it's just.... saying things that are the sort of thing that someone says in that context...

Replies from: tailcalled
comment by tailcalled · 2022-11-16T22:26:26.483Z · LW(p) · GW(p)

I guess this would be the expected failure mode, too. I do wonder if people who have had more experience working with LLMs find this to be the failure mode, or if it's just because I'm not prompting them right.

comment by TekhneMakre · 2022-11-16T21:10:49.738Z · LW(p) · GW(p)

Yeah, not so impressive or useful-seeming. I would guess someone very skilled at prompting LLMs could get something slightly useful in this genre with a fair amount of work, but not very useful. 

An underlying issue is that, as you pointed out elsewhere IIRC, what we're wanting is the AI's own dynamic of acting agentically which induces an evaluation of which things are instrumentally useful. That discernment of what's useful for acting in the world isn't in GPT, so you can't evoke it through prompts. So it can't do the sort of pruning you could do if you have a familiarity with what sort of things are useful in the world. Maybe. 

(Also, "A scoop of happiness" is clearly the best one!)