Just because an LLM said it doesn't mean it's true: an illustrative example

post by dirk (abandon) · 2024-08-21T21:05:59.691Z · LW · GW · 12 comments

Contents

12 comments

This was originally posted in the comments of You don't know how bad most things are nor precisely how they're bad. [LW · GW]; I've broken it out into a post because I think it might be a useful corrective more generally for people inclined to cite LLM remarks as fact.

I asked Claude, as an illustrative example, whether ready-made clothing had declined in quality over time, and it claimed that "many studies" indicated this was so. I subsequently asked for it to cite the studies in question; half of them were confabulated and AFAICT only one of them contained what Claude said it did. (Note that this is Claude 3 Haiku, which isn't the best; I'd expect frontier models to name four or maybe even five real papers in a list of six (and, with lower confidence, to have a more accurate understanding of the papers' contents)).

Here are the "studies" it cited—note that even some of the ones which actually exist do not contain the things Claude said they did.

  1. "A New Textiles Economy: Redesigning Fashion's Future" - Report by the Ellen MacArthur Foundation (2017) - This is a think-tank report (available here) rather than a study. However, it does actually exist. Claude says it "noted that the number of times clothes are worn before disposal has decreased by 36% compared to 15 years ago." It really does say this! Page 19, citing as a source the "Circular Fibres Initiative analysis based on Euromonitor International Apparel & Footwear 2016 Edition (volume sales trends 2005–2015)." A-. (Teeeeeechnically the report claims the clothing is still wearable and therefore throwing it away is a waste, which isn't the same thing as poor durability causing decreased wear time; humans cite studies that support something a little to the left of their point all the time, though, so in the interests of fairness I won't mark it down for that).
  2. "The State of Fashion 2016" - Report by the McKinsey Global Fashion Index (2016) - This one seems to be real! (Technically,  it's "The State of Fashion 2017"—their first report, so Claude can't have meant an earlier one—but it was released in 2016 and half of it is an overview of same, so IMO it's close enough.) It is not a research study, but rather a think-tank report (which is actually even worse than it seems, IMO, because on the few occasions I've checked sources on think-tank reports I've sometimes found that the results cited didn't seem to actually exist anywhere). Claude says it "concluded that the quality and durability of clothing has declined as the industry has shifted towards faster production cycles and lower prices." The report does say the industry has moved toward faster production cycles and lower prices (though it indicates that production costs have actually risen, leaving the authors quite worried about their profit margins), but does not, as far as I can tell, claim that quality and durability have declined.
  3. "Valuing Our Clothes: The Cost of UK Fashion" - Study by the Waste & Resources Action Programme (WRAP) in the UK (2012) - This one actually exists! (Full report here). It's again not technically a study, but rather a "summary of the key findings of a major technical report (possibly this one?) published by WRAP"; however if they did any research themselves that's practically hard science compared to the other two, so whatever. However, Claude claims that it "found that the average number of times a garment is worn before being discarded has fallen by 36% compared to 15 years earlier," and as far as I can tell this is not true; I couldn't find a place in the report where it even mentioned the number of wearings before discarding.
  4. "Apparel and Footwear Benchmarking Analysis" - Report by the American Apparel & Footwear Association - Confabulated (there are reports with similar titles from organizations which are not this one, but this specific report does not exist)
  5. "The Apparel Sourcing Caravan's New Route to the Market" - Report by the Boston Consulting Group - Confabulated
  6. "Clothing Durability and the Creation of Value" - Study by the University of Cambridge Institute for Manufacturing (2018) - Confabulated

Posting this list of sources without annotations would have been of negative value, leading unwary readers to change their minds based on studies that don't exist and forcing somewhat cannier ones to spend what could be several hours checking the bullshit I produced in a few minutes.

Moreover, LLMs will take almost any position you ask of them, regardless of whether or not it's true. Just for fun, here's Claude's list of five studies proving clothing's actually gotten better over time (veracity left as an exercise for the reader):

  1. A 2020 study by the University of Cambridge called "The Changing Lifespan of Garments: Evidence from Cambridge Consumer Survey Data" found that the average lifespan of mass-produced garments has increased by 30% over the past 20 years, due to advancements in manufacturing techniques and materials.
  2. Research by the American Apparel & Footwear Association called "Quality Trends in the US Apparel Industry" showed that the number of clothing items returned due to quality issues has decreased by 25% since 2015, suggesting clothing is better made.
  3. A 2019 consumer survey by McKinsey & Company called "The State of Fashion 2019" found that 66% of respondents reported being satisfied with the quality of their ready-to-wear purchases, up from 58% in 2014.
  4. A 2021 analysis by the UK's Clothing Sustainability Research Centre called "UK Clothing Sustainability Report 2021" concluded that the average quality rating for high-street fashion brands increased by 18% over the past decade, as they invested in improving manufacturing processes.
  5. A 2018 report by the International Federation of Textile Engineers called "Global Textile Industry Quality Trends" noted a 23% reduction in clothing defects across major global brands, attributed to enhanced quality control measures.

For these reasons, I do not think it is appropriate to cite the statements of LLMs as though they are evidence in favor of your position.

12 comments

Comments sorted by top scores.

comment by Gordon Seidoh Worley (gworley) · 2024-08-22T06:06:52.031Z · LW(p) · GW(p)

My experience is that Claude and ChatGPT are tuned to be very agreeable in a way that means they never stand up to you if you ask them to defend something that's probably false but uncertain. The only times they stand up to you is if you ask them about something they're trained not to agree with or talk about, or if you ask something obvious false, like asking it to prove that 2 + 2 = 5.

Replies from: ChristianKl, Mo Nastri
comment by ChristianKl · 2024-08-22T08:18:32.491Z · LW(p) · GW(p)

On way to deal with this to either ask very open questions that don't have a predetermined answer or to ask the LLM to argue both pro and con positions and then write a conclusion. Even when the conclusion isn't perfect, it gives you access to relevant arguments on both sides which you can then use to think about the issue yourself.

comment by Mo Putera (Mo Nastri) · 2024-08-22T06:29:46.889Z · LW(p) · GW(p)

I have a similar experience. Do you know of any LLMs that aren't as agreeable in a useful way?

comment by tailcalled · 2024-08-22T07:41:52.894Z · LW(p) · GW(p)

One thing I like to do is regularly assert something I know to be false to keep track of whether the LLM is ready to push back.

Replies from: tailcalled
comment by tailcalled · 2024-08-22T09:48:36.734Z · LW(p) · GW(p)

Maybe it is helpful to be more specific. When I am uncertain about something, I rarely ask the LLM "Is <most likely answer> true?". Instead I assert "<Less likely answer> is true.".

comment by green_leaf · 2024-08-22T09:38:17.372Z · LW(p) · GW(p)

LLMs hallucinate studies/papers so regularly you're lucky to get a real one. That doesn't have an impact on the truth of the facts they claimed beforehand. (Also, yes, Claude 3 Haiku is significantly less intelligent than 3.5 Sonnet.)

Replies from: abandon
comment by dirk (abandon) · 2024-08-22T14:48:51.927Z · LW(p) · GW(p)

It doesn't directly impact the truth of the facts they claimed, no—but the facts are produced by the same process that produces the studies. It's easy to check whether or not a study exists; less-clearcut domains aren't necessarily more reliable.

Replies from: green_leaf
comment by green_leaf · 2024-08-22T16:32:10.142Z · LW(p) · GW(p)

Yes, but that very same process has a high probability probability of producing correct facts (today's LLMs are relatively reliable) and a very low probability of producing correct studies or papers.

Replies from: eukaryote
comment by eukaryote · 2024-08-22T20:39:05.882Z · LW(p) · GW(p)

Source?

Replies from: green_leaf, Capybasilisk
comment by green_leaf · 2024-08-24T09:45:28.750Z · LW(p) · GW(p)

The reliability of general facts could be checked by various benchmarks. The unreliability of specific studies and papers by personal experience, and by experiences of people I've read online.

I don't understand why, except maybe rephrasing a true fact keeps it true, but rephrasing a study title and a journal title makes it false.

comment by Capybasilisk · 2024-08-24T09:06:35.376Z · LW(p) · GW(p)

According to Claude: green_leaf et al, 2024

comment by darinkishore · 2024-08-22T18:45:49.064Z · LW(p) · GW(p)

My logs, asking 3.5 sonnet on the Claude website:


U: has ready made clothing declined in quality over time?​​​​​​​​​​​​​​​​

C: There are arguments on both sides of this question, but I'll provide a concise overview of some key points:

Reasons some argue quality has declined:

1. Fast fashion: …
2. Cost-cutting measures: … 
3. Changing consumer expectations: …

Reasons others argue quality has not necessarily declined:

1. Technological advancements: …
2. Increased variety: …
3. Better regulations: …
4. Survivorship bias: …

It's worth noting that quality can vary significantly between brands and price points. Would you like me to elaborate on any specific aspect of this topic?​​​​​​​​​​​​​​​​

U: If you have any studies to back up either claim, that’d be lovely.​​​​​​​​​​​​​​​​

C: I appreciate your interest in studies on this topic. While I don't have access to a live database to provide current citations, I can mention some general research directions that have been explored in this area. However, I want to emphasize that you should double-check any specific studies I mention, as I may inadvertently provide inaccurate information.

Some research areas that have been explored include:

1. Textile durability studies: …
2. Consumer perception surveys: …
3. Supply chain analyses: …
4. Environmental impact studies: While not directly about quality, research on the environmental impact of fast fashion often touches on issues of durability and product lifespan.
5. Historical pricing analyses: …

One specific study that I believe has been conducted in this area (but please verify this) is a comparison of seam strength and fabric quality between vintage and modern t-shirts by researchers at a textile engineering department. However, I can't recall the exact institution or publication date with certainty.

Given the importance of accuracy in citing academic work, I recommend searching academic databases or consulting with fashion and textile experts for the most up-to-date and reliable studies on this topic. They would be able to provide you with current, peer-reviewed research that addresses your specific questions about changes in ready-made clothing quality over time.​​​​​​​​​​​​​​​​

I’m pretty impressed with this behavior, because it means that the average person asking claude to cite studies won’t be inadvertently misled by trusting the AI without question. I’ve used 3.5 sonnet more than any other model since it came out, and it does still give me wrong answers without any qualifications, but it’s much much better about knowing when to qualify its outputs compared to any other OpenAI (or Anthropic!) model I’ve used.