Ellipsoids for the "mammal" and "bird" feature regions projected onto 3D space.
python visualize.pyThis project studies the geometry of contiguous regions of related features (in text embeddings), finding that hyperellipsoids are a good model. Hyperspheres give poor results, which is somewhat surprising given that embeddings are most often compared with cosine similarity.
The project is inspired by feature splitting and spatial structure and seeks to explicitly model the geometry of these regions. It's related to hierarchical geometry but focuses on the geometry of individual categories rather than the structure between categories. It models a different kind of geometry than non-linear features.
An ideal region geometry should have all of these properties:
- Generative (as opposed to discriminative) — defined from one class's points only, so regions are modular.
- Bounded (finite volume) without needing
O(d)points, for generalization to unseen points of different classes. - Full-dimensional (>0 volume) without needing
O(d)points, for generalization to unseen points of the same class. - Precision & recall — accurately modeling the shape of the feature region.
- Simplicity - always good to have.
Some candidate geometries are:
- Linear separation polytope: a polytope formed by the linear separation boundaries between all classes.
- Pros: full-dimensional, near perfect precision & recall (almost all classes are linearly separable), simple.
- Cons: discriminative, not bounded without
O(d)classes.
- Hypersphere: a hypersphere centered at the mean with radius equal to variance times a confidence threshold.
- Pros: generative, bounded, full-dimensional, simple.
- Cons: low precision & recall (see experiments).
- Hyperellipsoid + shrinkage: a hyperellipsoid centered at the mean with radius equal to variance times a confidence threshold, and shrinkage to make the shape full-dimension when
n<d.- Pros: generative, bounded, full-dimensional with shrinkage, high precision & recall (see experiments), simple.
- Cons: requires shrinkage to be full-dimensional, which is extra complexity.
- Convex hull: a convex hull of the points of the class.
- Pros: generative, bounded, likely high precision & recall (due to linear separability).
- Cons: no simple method like shrinkage to be full-dimension without
O(d)points. Less simple than hyperellipsoid in general as well.
The AUPRC curve measures one-vs-rest classification quality across subtrees of a WordNet hypernym tree when building a hypersphere/ellipsoid over text embeddings of the nodes. It sweeps the confidence threshold to trace a precision-recall curve. Hyperellipsoid (0.99) strongly outperforms hypersphere (0.83), maintaining near-perfect precision across most recall levels.
The shapes are built with all points of the class (to test precision and recall). Generalization to unseen points of the same class is not tested.
To reproduce:
# animal is the largest tree and 768 is an intermediate dimension between 128 and 3072,
# the min and max of the embedding model.
python auprc.py graph --tree-name animalmin --dimension 768Studying internal activations over text embeddings:
- This project studies text embeddings to avoid issues of layer and token, so a next step is studying internal activations.
These findings may have implications for linear probing:
- Since hyperellipsoids outperform hyperspheres, some sort of direction-weighted cosine similarity might be more accurate than standard cosine similarity for probes.
- The confidence estimate of the shape could be a better metric than cosine similarity as this naturally provides a probabilistic confidence level specific to that concept (the same cosine similarity may imply difference confidence levels for different concepts).

