In this course, you will focus on measuring distance —the dissimilarity of various documents. The goal is to discover how alike or unlike various groups of text documents are to one another. At scale, this is a problem you might encounter if you need to group thousands of products together purely by using their product description or if you would like to recommend a movie to someone based on whether they liked a different movie. You will work with several different data sets and use both hierarchical and k-means clustering to create clusters, and you will practice with several distance measures to analyze document similarity. Finally, you will create visualizations that help to convey similarity in powerful ways so stakeholders can easily understand the key takeaways of any clustering or distance measure that you create.
You are required to have completed the following courses or have equivalent experience before taking this course:
- Natural Language Processing Fundamentals
- Transforming Text Into Numeric Vectors
- Classifying Documents With Supervised Machine Learning
- Topic Modeling With Unsupervised Machine Learning