align your latent spaces

bhauth

align your latent spaces

post by bhauth · 2023-12-24T16:30:09.138Z · LW · GW · 8 comments

This is a link post for https://www.bhauth.com/blog/thinking/align%20latents.html

8 comments

I learned very early the difference between knowing the name of something and knowing something.

— Richard Feynman

Here is a simplified diagram of the mind of someone reading English text and writing English text in response:

Here is a simplified diagram of the mind of someone fluent in English who is beginning to learn French and studying using flashcards with word pairs:

Red arrows indicate the main path of information from flashcards. Maybe you can see the reason I don't consider flashcards an optimal approach to learning languages. That approach can load word associations into a vector lookup system, but what's needed is encoder/decoder training. For that to happen with flashcards, the best-case scenario is training across the entire chain of two encoders to increase similarity at the concept latent space, and flashcards are inefficient for that because they provide a small amount of data without context. Anyone who understands neural networks knows that large amounts of in-context real data is better than repeatedly using small numbers of small fragments.

So, instead of using flashcards, I think it's better to read pairs of complete paragraphs with the same meaning, without repetition. For example, reading a book together with its translation. The foreign text not being understood as it's read doesn't mean that reading it is ineffective as an approach.

One reason flashcards are overrated is that a lot of language tests are for "vocabulary" because association between word pairs is easy to test.

Align your spine.

— Malcolm in the Middle

Once someone has become fluent in a 2nd language, they typically gain the ability to write and understand sentences using words from both languages. That implies integration of the encoders and decoders, which presumably happens via distillation, as in this diagram:

Considering that this happens with more practice and is seen in more-fluent people, it's probably better overall.

This integration of multiple systems for different languages seems to me to be fundamentally the same process as integration of understanding of multiple scientific fields. The equivalent of reading a book together with its translation for that would be reading a book that considers questions and problems from the perspective of one field and then from the perspective of a 2nd field. However, that's less available than translated literature, because there are fewer multidisciplinary experts than bilingual people, and multidisciplinary expertise is harder to evaluate and thus harder to find. Also, people capable of doing that would probably instead write a single version that integrates their understanding of both fields. That would be fine for learning from but possibly undesirable for students trying to pass a class or people trying to fit into an established culture of a field.

8 comments

Comments sorted by top scores.

comment by mtaran · 2023-12-28T02:44:45.247Z · LW(p) · GW(p)

This is a cool model. I agree that in my experience it works better to study sentence pairs than single words, and that having fewer exact repetitions is better as well. Probably paragraphs would be even better, as long as they're tailored to be not too difficult to understand (e.g. with a limited number of unknown words/grammatical constructions).

One thing various people recommend for learning languages quickly is to talk with native speakers, and I also notice that this has an extremely large effect. I generally think of it as having to do with more of one's mental subsystems involved in the interaction, though I only have vague ideas as to the exact mechanics of why this should be so helpful.

Do you think this could somehow fit parsimoniously into your model?

Replies from: bhauth

↑ comment by bhauth · 2023-12-28T19:07:21.541Z · LW(p) · GW(p)

Yes; try tracing the path of that data from conversations, and you should see what systems it would be training.

comment by Matt Goldenberg (mr-hire) · 2023-12-24T22:08:10.318Z · LW(p) · GW(p)

The Hattie model of learning posits surface -> deep -> transfer learning as a general rule for how learning progresses. I suspect that flashcards are excellent for surface learning, and the integration you're talking about is transfer learning. It's possible that you could try to skip straight to transfer learning, but I suspect it would actually take longer, as you'd be using transfer learning methods to get the surface and deep learning done.

Replies from: bhauth

↑ comment by bhauth · 2023-12-24T23:07:32.978Z · LW(p) · GW(p)

That model is not useful for understanding my post. What reasons have you seen for thinking it's useful for anything at all? Given that its core feature seems to be division into 3 categories, does it have precise and consistent definitions for their natures and boundaries?

Replies from: mr-hire

↑ comment by Matt Goldenberg (mr-hire) · 2023-12-25T03:45:05.999Z · LW(p) · GW(p)

In general I've found it practically very useful for my own learning, which is reason enough for me.

The model itself fell out Hattie's work trying to pull out all of the important meta studies in education, and understand the data - he found that by applying these 3 stages, he could better understand why certain interventions were effective in some cases and not others.

I would be very surprised if there are clear delineated boundaries between the 3, as such an abstraction rarely corresponds to reality so cleanly. And yet I still find it an incredibly useful model.

Replies from: bhauth

↑ comment by bhauth · 2023-12-25T12:24:53.352Z · LW(p) · GW(p)

I would be very surprised if there are clear delineated boundaries between the 3, as such an abstraction rarely corresponds to reality so cleanly. And yet I still find it an incredibly useful model.

If there's a continuum, then it should be expressed in terms of parameters for each of its dimensions. Using categories for things not separated into definite clusters is shoddy thinking.

Going by the words used, if anything, "transfer" should come first, because systems for new tasks are initialized by transfer from existing skills. It's also ongoing at every stage of learning, not something that happens after learning is complete.

Replies from: mr-hire

↑ comment by Matt Goldenberg (mr-hire) · 2023-12-25T16:24:26.643Z · LW(p) · GW(p)

I don't think I would describe hatties thinking as shoddy, he's one of the more careful thinkers in educational theory, both dong a careful review of the literature, and then testing his insights through his work with implementing his insights into schools and being careful with the results. Of course when you want your material implemented there's tradeoffs you make, between implementability and depth. But your assessment seems premature based on looking at my one comment

It's true that you're doing transfer with the previous skills that already went through the process while doing surface with new skills that are going through, but I dont think tht means it comes first. It comes last from those previous skills. If you want more specifics you can of course read Hattie.

Replies from: mr-hire

↑ comment by Matt Goldenberg (mr-hire) · 2023-12-25T17:24:40.633Z · LW(p) · GW(p)

I'm curious about the down votes to these comments. Do people think they weren't adding to the discussion?

align your latent spaces

Contents

8 comments