Custom Search

What Is Data Profiling?

Before you start any data integration or data analysis project (indeed before you start any project involving data) you need to understand what data you really have; not what data you think you have.

Data Profiling is about building that understanding, and validating everyone’s assumptions about what data you have and what uses it can be put to.

Many data projects start off with data which was collected for one reason and is now being put to some new unanticipated use. Data Profiling is about finding gaps in your data which you may need to augment. It’s about finding what uses the data will actually support. And most importantly of all, it’s about flagging these issues up early in your project, before they become critical.

Any issue fixed in the analysis stage of a project is going to be hundreds of times cheaper to fix than one found during the testing or, worse, rollout phase of your project.

As such Data Profiling is an essential first step for any data project. Not just because it tells you what data you have, but because it is a quick way to find out problems early while they can still be addressed in a cost-effective way.

Traditionally Data Profiling, if it has been done at all, has been a manual, error-prone, process. But there are now Data Profiling tools and processes which do the heavy lifting, leaving you to leverage your experience in your business and your requirements, focusing your time and energy where it really matters.

But we still haven’t truly said what Data Profiling is. At its simplest, Data Profiling is a collection of simple to understand and generate statistics and checks which you can perform against your data to find issues, outliers, missing data, or anomalies; all items that you need to address, or at least be aware of, as your project progresses.

And while it would be great to have a business expert sitting alongside you during this process, you can quickly find issues and create a meaningful list of questions about any dataset with minimal prior knowledge. Of course to get the best out of your data you are going to need some knowledge of how the data was collected, the business needs and any implicit assumptions.

Data Profiling is a fast, intuitive and cost-effective process. With the right tools anyone can do it, and gain real insight very quickly.

By: Colin Ross

Article Directory: http://www.articledashboard.com

Citrus Technology provide Data Quality and Data Profiling tools to help you understand your data; to find patterns, issues and opportunities. Visit our website for a free trial of our software.

© 2005-2011 Article Dashboard