kiho-park

Posts
Comments

Posts

Comments

Comment by Kiho Park (kihopark) on The Geometry of Feelings and Nonsense in Large Language Models · 2024-10-13T18:36:57.600Z · LW · GW

Thank you for reading our paper and providing such thoughtful feedback! I would like to respond to your points.

First, regarding your plot with the caption , it appears that you are still using “the span of sadness and all emotions” without estimating ${¯ ℓ}_{j o y}$ . As a result, there is no information about ${¯ ℓ}_{j o y}$ in the 2D projection plot. When I estimate both ${¯ ℓ}_{s a d n e s s}$ and ${¯ ℓ}_{j o y}$ using your token collections, I find that $cos ({¯ ℓ}_{s a d n e s s}, {¯ ℓ}_{j o y}) = 0.2246$ and $cos ({¯ ℓ}_{s a d n e s s}, {¯ ℓ}_{s a d n e s s} - {¯ ℓ}_{j o y}) = 0.6818$ .

Second, I agree that our original experiments lacked sufficient baselines. In response, we have uploaded version 2 of the arXiv preprint, which now includes a comprehensive set of baselines.

As you mentioned, vectors in high-dimensional spaces are orthogonal with high probability, assuming they are sampled from a normal distribution with isotropic covariance. This raises the question of whether our orthogonality results arise purely from high-dimensional properties.

In Figure 5, when using random parents (orange), the cosine similarities between child-parent and parent vectors are not zero. This indicates that the orthogonality of the original vectors (blue) is not simply a consequence of high-dimensional space.

Additionally, we introduced another baseline using randomly shuffled unembeddings. Shuffling the unembeddings is equivalent to using random concepts with the same set inclusion. For example, a child collection might now have {dog, sandwich, running} and a parent might have {dog, sandwich, running, France, scientist, diamond}. Interestingly, set inclusion still yields orthogonality (green), as shown in the left panel of Figure 5 (see Appendix G for further explanation). While it is possible to define hierarchy by set inclusion, this is not our focus. Instead, we are interested in whether semantic hierarchy produces orthogonality.

To test this, we disrupted set inclusion by splitting the training data when estimating the vector representation for each feature. In this case, the shuffled baseline no longer produced orthogonality, whereas the original unembeddings maintained orthogonality. This result suggests that the orthogonality of the original vectors is not merely a product of set inclusion or high-dimensional space effects.

Lastly, with regard to the simplex, our contribution lies in defining vector representations with a notion of magnitude, which naturally leads to polytope representations for categorical concepts. In theory, the 3D projection plot should show that the token unembeddings for each category are concentrated at the vertices, as demonstrated in Corollary 10. However, in practice, the projections of tokens are slightly dispersed, as shown in Figure 6.

User info

Posts

Comments