Project Overview

Our Client – a leading telecommunications and media company has an extensive data ingestion and ETL processing environment and needs to continuously monitor this complex data environment. The Client faces a continuous challenge of downloading and evaluating huge quantities of data from customer activity across a spectrum of services and equipment provided by their company. Due to the complexity and quantity of data processed daily, challenges in the ingestion & ETL processing often lead to missed SLA’s impacting their ability to understand and support their customers. CoreValue Data Scientists helped the Client on a number of fronts including: key metric selection and analysis, data source extensions, equipment failure and prediction analysis, data modeling (‘R’) ETL monitoring ETL Enhancements, and key reporting and data visualization reporting (Tableau).


Following a thorough analysis and exploration of the clients data pipeline, CoreValue’s Data Scientists suggested a number of enhancements to the overall data environment and then began to target specific trouble spots within the environment. Working in close cooperation with the Client’s team, data quality control reports and data visualizations were quickly created to help identify problem areas. After proving the value of the reports generated, CoreValue created several models in R to proactively predict potential data issues and data quality challenges.
Additionally, a number of specific challenges were met, as noted below, producing insights and actionable solutions.

Challenge I:

Due to the complexity and quantity of data analyzed by our client, job failures or data anomalies caused nightly processing to fail on many occasions. In many cases by the time our client realized there were issues, it was to late to complete the remediation and reprocess data within agreed to SLA’s. To ensure nightly data ingestion and ETL processes were completed within agreed to time frames, the client asked us to predict the likelihood of failure of key nightly processing jobs.


After a deep analysis of historical job process data and a complete Exploratory Data Analysis, the Core Value team identified key relevant variables and was able to create models in R to produce an early failure alarm. This solution allows processing issues to be predicted early and resolved in many cases before impacting SLA’s. This solution was automated and runs nightly.

Challenge II:

Due to data collection issues created by network equipment cutovers a significant set of valuable data (billions of records) was impacted with “Data Level Changes” that created significant challenges for analysis. To allow this data to be effectively analyzed, CoreValues team was asked to identify these data shifts and provide data sets that were not impacted by these significant step changes.


Following consultation with the Client’s team, CoreValue Data Scientists preformed a time series analysis and created a model to automatically identify changes in the data. The team developed a customized prediction algorithm to predict data shifts resulting from equipment shifts over time. The successful delivery of this Data Science project by CoreValue has resulted in a significant reduction in resources and time needed to analysis this important data source.

Challenge III:

An external service delivery device was failing and causing customer dissatisfaction. The Client requested our Data Science team to investigate this situation and see if we could predict the root cause of the failures and identify a reason for the device malfunctions.


CoreValue scientists performed a thorough analysis of log info acquired from these devices. Using a feature detection algorithm, the CoreValue team was able to identify the technical feature combiniations most likely causing the device failures. Based on this analysis the device firmware team was able to locate the root cause of failures and deploy a patch significantly improving the device uptime.


CoreValue Data Scientists and the Client’s team worked closely to create an effective Big Data environment capable of providing key insights to key business stakeholders. In addition to enhancing data quality and availability, the team continues to perform key analysis to improve efficiencies and improve customer satisfaction.