If you want to compare two large bodies of text with each other, you can do that by making comparisons with the text itself: Turn the text into tokens then compare the overlap in tokens. Sometimes, however, you don't just want to know that two texts are different (a binary comparison), but you want to know how different, which is a fuzzy comparison. In this course, you will transform text into numeric vectors, which allows us to perform arithmetic operations on textual information to calculate similarity. This is a classical natural language processing (NLP) technique, and it begins by creating different kinds of vectors. You will create both sparse and dense vectors, and you will compare vectors of different sizes to see how information is captured. Finally, you will measure similarity among document vectors, which is the real power of turning text into vectors. The ability to determine how similar two or more documents are is a common use of NLP, and you will practice this technique through hands-on exercises and projects.
You are required to have completed the following course or have equivalent experience before taking this course:
- Natural Language Processing Fundamentals