What is Data Cleaning and Why do we need it?

In the data science field, as we know most of the time we spend on data cleaning. So today I will give some suggestions and methods while cleaning data what precautions should we have to take while Data Cleaning.

First, We will know what is Data Cleaning? We will define in a simple manner that, Data Cleaning is the process of detecting incomplete, incorrect, inaccurate, or irrelevant parts of the data and then replacing, modifying, or deleting the dirty data. Data cleaning is also called Data Cleansing. So don’t be confused if some ask for data cleansing. It is nothing but data cleaning.

So we will start first that why we need to clean data with some example.

If we look at the business perspective,

  • Sales: A sales representative failing to contact previous customers, because of not having their complete, accurate data.
  • Operations: Configuring robots and other production machines based on low-quality operational data, can cause causes major problems for manufacturing companies

Same we can take various example in the industries perspective,

  • Accounting & finance: Inaccurate and incomplete data can lead to regulatory breaches, delayed decisions due to manual checks, and sub-optimal trade strategies.
  • Manufacturing & Logistics: Inventory valuations depend on accurate data. If data is missing or inconsistent, this may lead to delivery problems and unsatisfied customers.

If the organization had clean data, then all of these situations and the problems related to them could be avoided.

We will post more posts regarding Data Cleaning in the upcoming days.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Sunilkumar Prajapati

Sunilkumar Prajapati

Data Science enthusiast | Machine Learning | Deep Learning | Data Analyst | Software Support Engineer