Securing Data Pipelines at Scale: Case Study  


Data leaders agree that as organisations adopt data-driven decision-making and move towards digital maturity, data pipelines have become increasingly complex. Data pipelines are critical to power and grow modern enterprises.

A data pipeline is the journey from raw data to its destination, typically a data lake or warehouse. Along the journey, transformation logic is applied to data to make it ready for analysis, and at the destination, data is analysed for actionable insights. Along the entire journey, data observability helps an organisation to fully understand the health of the dataSmart Data Week in the system.

Acceldata’s products offer enterprise solutions that deliver data observability at scale. It significantly impacts the productivity of data engineering teams, improves the pace of application development, reduces data-related incidents and lowers application development costs. Here’s how the team at Acceldata delivers data success so companies can grow:

Also Read: Why Does Data Observability Matter? 

Reducing infrastructure costs

PubMatic, one of the largest AdTech companies in the US, empowers independent  app developers and publishers to control and maximise their  digital  advertising businesses. The platform also enables advertisers to  drive ROI by reaching and engaging their target audiences in brand-safe,  premium environments across ad formats and devices.  Since 2006, PubMatic has built infrastructure that includes eight global data centres.

Until December 2020, PubMatic had served 171 billion ad impressions, handled a trillion advertiser bids, and processed more than 2 petabytes of new data daily.

The challenge was that ‍PubMatic consistently experienced high Mean Time to Resolution (MTTR) metrics, frequent outages, and performance bottlenecks because of their massively scaled environment. Their environment included 3,000 plus nodes, 150 plus petabytes and over 65 HDPs (Horton Dataworks Platform) clusters at hyper-scale mode. PubMatic uses Yarn, Kafka, Spark, HBase and open source HDP. The company was also planning to expand further.

An initial evaluation of the situation revealed that multiple issues stemmed from a large number of nodes. The existing data system performance wasn’t able to keep pace with its rapidly expanding business requirements. The system’s instability resulted in time-consuming operational issues and daily firefighting. The inability to correlate events across the infrastructure, data layers and pipelines meant that PubMatic could not materially improve their “cost per ad impression” metric, one of the most critical performance metrics for its business.

Moreover, the engineering team’s constant involvement in resolving operational system issues caused a distraction from the real objectives of scaling the data system to support the fast-growing business requirements.

The resolution came in the form of Acceldata’s Pulse product which immediately provided improved visibility into the inner-workings of PubMatic’s data applications and comprehensive observability for complex, interconnected data systems. It brought the ability to predict, prevent and optimise the data system’s performance at a scale that today’s digital ad market requires.

Within PubMatic’s environment, the product was able to isolate bottlenecks and automate performance improvements. The product distinguished between mandatory and unnecessary data to ensure scaled growth that could reliably support all critical enterprise and customer-facing analytics requirements.

The result was that PubMatic was able to reduce the “cost per ad impression”, improve reliability of data pipelines and cut expenses on unnecessary software licences. In addition, they were able to eliminate  day-to-day engineering firefighting on outages and performance degradation issues, decrease OEM support costs, optimise Hadoop Distributed File System (HDFS) to reduce block footprint by 30 per cent and consolidate Kafka cluster and saved infrastructure costs.

Also Read: 5 Challenges That Data Pipelines Must Solve

Dropping downtime

True Digital, a business unit of True Corporation, is a multinational digital technology company. With solutions in artificial intelligence (AI), big data, blockchain, Internet of Things (IoT) and robotics, the company enables digital transformation for consumers, merchants and enterprises.

The challenge was that True Digital had extensive data system performance issues that regularly left 50 per cent of ingested data unprocessed. In addition, the data operations team had limited visibility into their data pipelines and found themselves fighting multiple Sev 1 (a critical incident with very high impact) issues that caused system slowdowns and unplanned outages.

To understand the background of the company, True Digital’s data systems handle more than 500 million user impressions per month while streaming approximately 69,000 messages per second. To support an ever-increasing volume of users and activity, the data operations team managed a 100 plus node,  over eight petabyte technical environments based on Hadoop, Hive, Spark, Ranger, and Kafka open-source HDP.

Here, the data ops team was sidetracked into managing daily system performance, taking them away from achieving business goals like scaling their data infrastructure to meet expanding business requirements.

Acceldata stepped in to isolate bottlenecks and automate performance improvements. The resolution included identifying unnecessary data to ensure the over eight petabyte data lake could reliably support all critical enterprise analytics requirements. Moreover, Pulse enabled True Digital to reduce time required to produce critical daily business reports and eliminate unplanned outages for a record seven months and counting.

As a result, it optimised HDFS storage by approximately 2 per cent, which allowed the company to expand system capacity without expanding infrastructure, saving the company over $1 million in projected CapEx. True Digital reduced its annual software costs by over $2 million by identifying overprovisioned and unnecessary software licences.