Company: Pluralsight Author: Janani Ravi Full Title: Mining Data From Text Year: 2019 Language: English Genre: Educational: Big Data Skill Level: Intermediate Price: - - Files: MP4 (+ Code Files, Slides .PDF) Time: 02:08:36 Video: AVC, 1280 x 720 (1.778) at 30.000 fps, 450 kbps Audio: AAC at 70 Kbps, 2 channels, 44.1 KHz This course discusses text and document feature vectors that can be passed into machine learning models, topic modeling using Latent Semantic Analysis, Latent Dirichlet Allocation, Non-negative Matrix Factorization, and keyword extraction using RAKE. A large part of the appeal of deep learning models is their ability to work with unstructured data types such as text, images, and video. However such models are only as good as the feature vectors that they operate on. In this course, Mining Data from Text, you will gain the ability to build highly optimized and efficient feature vectors from textual and document data. First, you will learn how to represent documents as numeric data using simple numeric identifiers for individual words as well as more elegant methods such as term frequency and inverse document frequency. Next, you will discover how to perform topic modeling using techniques such as latent semantic analysis, latent Dirichlet allocation, and non-negative matrix factorization. Finally, you will explore how to implement keyword extraction using a popular algorithm - RAKE. When you’re finished with this course, you will have the skills and knowledge to move on to build efficient and optimized feature vectors from a large document corpus and use those feature vectors in building powerful machine learning models. Lessons: 1. Course Overview 01. Course Overview 2. Modeling Text Using Natural Language Processing 02. Module Overview 03. Prerequisites and Course Outline 04. Mining Data from Text 05. Numeric Representations of Text: One Hot Encoding 06. Numeric Representations of Text: Frequency Based Encodings 07. Numeric Representations of Text: Prediction Based Embeddings 08. Feature Hashing 09. Bag of Words: Bag of N Grams 10. Install and Setup 11. Frequency Based Representation Using Bag of Words and Bag of N Grams Model 12. Representing Documents Using TFIDF Scores and Feature Hashes 13. Module Summary 3. Building Classification Models Using Text Data 14. Module Overview 15. Naive Bayes Classifier 16. Sentiment Analysis Using the Naive Bayes Classifier 17. scikit-learn Pipelines to Build Features 18. Multiclass Classification 19. Module Summary 4. Understanding Topic Modeling 20. Module Overview 21. Topic Modeling 22. Topic Modeling Algorithms 23. Module Summary 5. Implementing Topic Modeling 24. Module Overview 25. Latent Dirichlet Allocation: Topic Modeling with the Newspaper Headlines Dataset 26. Visualizing Topic Assignments Using Manifold Learning to Reduce Dimensions 27. Latent Dirichlet Allocation: Topic Modeling with the DBPedia Dataset 28. Visualizing Topics Using Manifold Learning to Reduce Dimensions 29. Interactive Topic Model Visualization Using PyLDAVis 30. Non-negative Matrix Factorization: Topic Modeling with the DBPedia Dataset 31. Interactive Topic Visualization Using Bokeh 32. Latent Semantic Indexing: Preprocessing Text 33. Concept Modeling Using LSI 34. Module Summary 6. Understanding and Implementing Keyword Extraction 35. Module Overview 36. Understanding RAKE for Keyword Extraction 37. Keyword Extraction Using RAKE 38. Summary and Further Study Our members see more. Join us! ------------- Our members see more. Join us!