Make​ ​Data​ ​Science​ ​an​ ​Essential​ ​Part​ ​of​ ​Your​ ​Advanced​ ​Data​ ​Quality​ ​Program
September 8, 2016

Make​ ​Data​ ​Science​ ​an​ ​Essential​ ​Part​ ​of​ ​Your​ ​Advanced​ ​Data​ ​Quality​ ​Program

Data quality is key to the productivity of any business. Bad data can have a significant negative impact on decision-making, slow down progress, and cost a fortune to fix.

Based on work pioneered by Thomas C. Redman, CoreValue understands the importance of implementing an advanced data quality program, that uses data quality metrics to identify areas for improvement, and can ensure uninterrupted resiliency and enhanced data quality — especially in large environments, where even small wins can add up to large savings. In order to provide an effective program for its clients, CoreValue embraces the following elements:

  • Identify a problem
  • Identify the impact of the problem
  • Build a model for reprocessing data
  • Reintegrate data
  • Update all of reports

To illustrate the point, one of CoreValue’s clients was faced with the need to continuously download and evaluate huge quantities of data.  The data was generated by customers’ activities across a wide spectrum of services, as well as on-premise equipment provided by the company. Due to the complexity and quantity of data processed daily, data processing often led to missed service level agreements (SLAs), which in turn resulted in a negative impact on their inability to understand and support their customers. It was critical for the client to establish effective data management and consistent reporting for all data sources.

By utilizing the five elements of an advanced data quality program, as noted above, CoreValue delivered a 360-degree view of their system’s data quality that identified data problems and their impact; allowed for the reprocessing and reintegration of all data; and an updating of all reports based on the newly cleaned data.  

This allowed us to also address in-depth aspects of these individual items. For example, data modeling encompasses the following:

  • Database capacity predictionPredictive analysis uses statistical techniques and historical data to make predictions about future capacity. One of the most common methods employed in predictive modeling is linear regression. Unfortunately, application of regression is challenging because behavior changes. System administrators may change retention policies, or simply delete data, which can lead to poor predictions. Significantly more accurate models were obtained by finding the optimal subset of “clean” data for each database and applying linear regression to only that subset of the data.
  • Automated anomaly detection. Because ‘big data” needs effective anomaly detection, we proposed enhancements that enable real-time anomaly identification. Using R and PostgreSQL, we built an alarm system to monitor jobs from Scheduler so that users could immediately react to issues. The alarm system utilized Storage, Model and Shell script to perform checks, and then send an alarm if any anomaly is detected. Basically, these alarms monitor upper and lower thresholds for the start time and module’s duration for every weekday.
  • Uniform dashboards.  By using Tableau for reporting, we created uniform dashboards for all systems so that correlations between different metrics became easier to understand. In the reports we monitor:

          – KPIs

          – Metrics crossing into unacceptable ranges

          – Unexpected changes or trends

          – Variance in metrics data

          – Sliced data by host, cluster, geographical tags, etc.

As a result of the newly implemented data quality program, our client was able to save almost half a million dollars in processing time.

Obviously, processing time depends on the complexity of an environment, the volume of data, and overall amount of processing efforts of the data. In the best case scenario, a data quality program can save you a few hundred thousand dollars, while in the worst case up to millions.

By any means, data quality analysis is crucial to the success of any system of record used for official reporting.


Data Scientist, CoreValue



Recent Articles

More Value with Lightning Value Providers

February 15, 2019 | Bohdan Dovhan, Senior Salesforce Engineer

Salesforce cloud platform offers the whole variety of customization options. Those include point-and-click tools like Lightning App Builder, Process Builder, Visual Flow and Workflow alongside development tools like Apex, Visualforce, Lightning Aura Components, Lightning Aura Events, Lightning Aura Tokens, Lightning Aura Standalone application and Lightning Web Components. Lightning Aura Components development involves the development of […]

Hot in Salesforce Marketing Cloud: January 2019 Release Notes

February 8, 2019 | Ihor Shupeniuk, Salesforce Engineer and Marketing Cloud Specialist

In the era of intelligent marketing with the proliferation of technology where modern consumers are offered the widest choice, Marketing Cloud is sometimes named a future-looking service, We continue to closely follow the developments and improvements to  Marketing Cloud and we can safely affirm that this product is becoming better and better. Everyone who has […]