Arize Debuts Data Lake Connectors


Launch enables machine learning (ML) teams to transform and analyse real-time data from their data lake without the need for complex real-time data pipelines

Arize AI, a market leader in machine learning observability, launched a data lake connectivity solution for BigQuery, Delta Lake, Redshift, and Snowflake today. Through Arize Data Lake Connectors, Arize clients with centralised inference stores can instantly link up their ML table data to Arize for robust model observability.

Arize leads the industry in both volumes of models and predictions monitored, topping billions of predictions daily. To date, ML observability platforms have struggled to make their deployments easy while handling billions of predictions and complex monitoring services, such as embedding drift. The new Arize release for data connectors further extends integration options for customers to the most-used data lakes.

The launch comes as the ML ecosystem begins to converge on a number of MLOps architectures. One modern approach to ML data architecture is designed around storing inference data in a data lake. ML teams are designing these ML data lakes to power feature stores for feature serving and an inference store for analysis.

Arize Data Lake Connectors are designed to fit seamlessly into modern data lake architectures. The advantages of connecting directly to the ML data store include:

  • Teams can run off of a single source of truth
  • Integration and onboarding are faster and easier
  • Financial savings can be significant relative to other approaches to ML monitoring

“The growing pool of ML data that is stored and used for ad hoc operational analysis is largely sitting untapped by ML engineering teams,” notes Jason Lopatecki, CEO and co-founder of Arize. “That data, when connected to Arize, empowers iterative workflows around model performance analysis and data improvement – ultimately saving teams time and improving the ROI on AI investments.”

Arize already integrates with cloud storage providers (including Amazon Web Services, Google Cloud Platform, and Microsoft Azure), Python pipelines through an SDK, and Kafka Streaming.  With today’s launch, it’s now easier than ever for users of data lakes to access real-time model analytics. Arize offers built-in connectors that are fully managed as part of its cloud and virtual private cloud (VPC) platform, obviating the need for users to build and manage complicated data pipelines or use a separate ETL tool and enabling real-time model performance analysis and monitoring.