Improving Data Quality In Data Analytics Machine Learning

Last updated 9/2022
MP4 | Video: h264, 1280×720 | Audio: AAC, 44.1 KHz
Language: English | Size: 2.04 GB | Duration: 5h 23m

Learn why, when, and how to maximize the quality of your data to optimize data-based decisions

What you’ll learn
Strategies for increasing data quality
Ways to assess data quality
Interpreting data visualizations
How to spot problems in data
Interest in working with data
Interest in knowing more about data quality
Some Python skills are useful for the optional coding videos
All of our decisions are based on data. Our sense organs gather data, our memories are data, and our gut-instincts are data. If you want to make good decisions, you need to have high-quality data.This course is about data quality: What it means, why it’s important, and how you can increase the quality of your data. In this course, you will learn:High-level strategies for ensuring high data quality, including terminology, data documentation and management, and the different research phases in which you can check and increase data quality.Qualitative and quantitative methods for evaluating data quality, including visual inspection, error rates, and outliers. Python code is provided to see how to implement these visualizations and scoring methods using pandas, numpy, seaborn, and matplotlib.Specific data methods and algorithms for cleaning data and rejecting bad or unusual data. As above, Python code is provided to see how to implement these procedures using pandas, numpy, seaborn, and matplotlib.This course is for Data practitioners who want to understand both the high-level strategies and the low-level procedures for evaluating and improving data quality.Managers, clients, and collaborators who want to understand the importance of data quality, even if they are not working directly with data.


Section 1: Introduction

Lecture 1 Is this course right for you?

Section 2: Download course materials (Python code)

Lecture 2 Download the code

Section 3: Why data quality matters

Lecture 3 Section summary

Lecture 4 Is data or are data??

Lecture 5 On the origins and quality of data

Lecture 6 GIGO (garbage in, garbage out)

Lecture 7 Data quality influences data-driven decisions

Section 4: Ensuring high data quality

Lecture 8 Section summary

Lecture 9 Data management

Lecture 10 Data documentation

Lecture 11 Data audits

Lecture 12 Data cleaning phases

Lecture 13 Improve quality before getting data

Lecture 14 Improve quality during data collection

Lecture 15 Improve quality after data collection

Lecture 16 Improve quality during data analysis

Lecture 17 Risks of biased results

Section 5: Assessing data quality

Lecture 18 Section summary

Lecture 19 Qualitative vs. quantitative quality assessments

Lecture 20 Qualitative assessments via visual inspection

Lecture 21 Code: Visualizing data distributions

Lecture 22 Variance assessments

Lecture 23 Correlations and correlation matrices

Lecture 24 Data error rates

Lecture 25 Sample sizes

Lecture 26 Code: Measuring data quality

Section 6: Data transformations

Lecture 27 Section summary

Lecture 28 Z-score scaling

Lecture 29 Min/max scaling

Lecture 30 Binning (rounding)

Lecture 31 Unit normalization

Lecture 32 Rank transform

Lecture 33 Nonlinear transformations

Lecture 34 Code: Transforming data

Section 7: Outliers and missing data

Lecture 35 Section summary

Lecture 36 What are outliers?

Lecture 37 The z-score method

Lecture 38 The modified z-score method

Lecture 39 Dealing with missing data

Lecture 40 Code: Dealing with bad or missing data

Section 8: Be a high-quality data scientist

Lecture 41 Section summary

Lecture 42 Keeping up with data science developments

Lecture 43 Can you know everything?

Lecture 44 What data scientists want

Section 9: Bonus

Lecture 45 Bonus material

Data science practitioners,Data scientist students,Managers or colleagues who work with data practitioners

Improving Data Quality In Data Analytics & Machine Learning






没有账号? 注册  忘记密码?