ODSC East 2026

In machine learning, there is one question you can't answer with a prediction.

What's in here? 

5,000 films. No labels. No schema.

Clustering answers the question no one has labeled yet.

I work on finding structure in customer experience.

Hundreds of clients have handed us piles of text — tickets, reviews, survey responses — no labels, asking "what's in here?"

I can't show you client data. Same pipeline — UMAP, HDBSCAN, BERTopic — pointed at 5,000 movies. 47 clusters.

new_young_life_family

— one of them, verbatim.

What BERTopic shipped me:

new_young_life_family

What Claude labelled it:

Coming-of-age drama

The clustering didn't get smarter.
The labelling layer did.

Clustering isn't a button. It's a pipeline.

Input Text Sentence Representation Dimensionality Reduction Clustering Cluster Naming Topics

Let's walk this.

space next · back · esc exit Open · 1/5