Testing "True" Language Understanding in LLMs: A Simple Proposal

post by MtryaSam · 2024-11-02T19:12:34.710Z · LW · GW · 2 comments

Contents

  The Core Idea
  Why This Matters
  Some Initial Thoughts
    Potential Setup
    Example (Very Simplified)
  Open Questions
  Limitations
  Call for Discussion
  About Me
  Feedback Welcome
None
2 comments

The Core Idea

What if we could test whether language models truly understand meaning, rather than just matching patterns? Here's a simple thought experiment:

  1. Create two artificial languages (A and B) that bijectively map to the same set of basic concepts R'
  2. Ensure these languages are designed independently (no parallel texts)
  3. Test if an LLM can translate between them without ever seeing translations

If successful, this would suggest the model has learned to understand the underlying meanings, not just statistical patterns between languages. Theoretically, if Language A and Language B each form true mappings (MA and MB) to the same concept space R', then the model should be able to perform translation through the composition MA·MB^(-1), effectively going from Language A to concepts and then to Language B, without ever seeing parallel examples. This emergent translation capability would be a strong indicator of genuine semantic understanding, as it requires the model to have internalized the relationship between symbols and meanings in each language independently.

Why This Matters

This approach could help distinguish between:

It's like testing if someone really understands two languages versus just memorizing a translation dictionary.

Some Initial Thoughts

Potential Setup

Example (Very Simplified)

Concept: "red circle"

Without ever showing the model that "zix-kol" = "nare-tup", can it figure out the translation by understanding that both phrases refer to the same concept?

Open Questions

  1. How do we ensure the languages are truly independent?
  2. What's the minimum concept space needed for a meaningful test?
  3. How do we efficiently validate successful translations?

Limitations

As an undergraduate student outside the AI research community, I acknowledge:

Call for Discussion

I'm sharing this idea in hopes that:

About Me

I'm an engineering student interested in AI understanding and alignment. While I may not have the resources to develop this idea fully, I hope sharing it might spark useful discussions or inspire more developed approaches.

Feedback Welcome

If you have thoughts, suggestions, or see potential in this idea, I'd love to hear from you. Please feel free to comment or reach out.

2 comments

Comments sorted by top scores.

comment by Gunnar_Zarncke · 2024-11-03T13:01:49.236Z · LW(p) · GW(p)

I believe this has been done in Google's Multilingual Neural Machine Translation (GNMT) system that enables zero-shot translations (translating between language pairs without direct training examples). This system leverages shared representations across languages, allowing the model to infer translations for unseen language pairs.

comment by cubefox · 2024-11-04T10:37:30.514Z · LW(p) · GW(p)

I made basically the same proposal here, but phrased as a task of translating between a long alien message and human languages: https://www.lesswrong.com/posts/J3zA3T9RTLkKYNgjw/is-llm-translation-without-rosetta-stone-possible [LW · GW] See also the comments, which contain a reference to a paper with a related approach on unsupervised machine translation. Also this comment echoes your post:

I think this is a really interesting question since it seems like it should neatly split the "LLMs are just next token predictors" crows from the "LLMs actually display understanding" crowd.