Feathr is an abstraction layer that provides the namespace for defining, computing, serving, and discovering common machine learning features
LinkedIn Engineering recently open-sourced its feature store Feathr, which helps engineers to develop machine Learning products by simplifying feature management and usage in production.
Feathr is the data management layer for machine learning applications. It defines features, computes them for training and inference purposes, and makes them discoverable by other machine learning developers. It helps scale and manage the machine learning products by reducing the common feature generation, maintenance, and observability steps.
As shown in the following picture, machine learning feature generation pipelines need to bring different time-sensitive data sources and join them. These features are persisted in databases or caches for training and inference purposes (real-time or batch). In this process, consistency is essential. Features should be prepared similarly for training and inferencing to avoid inconsistency and leakage in the machine learning models.
Feathr is an abstraction layer that provides the namespace for defining, computing, serving, and discovering common machine learning features. The high-level architecture is like the producer-consumer architecture, where the producers define, generate, and register machine learning features, and consumers use those features in training and inferencing. Feathr has a simple programming model. Developers just provide the names of the features that they want to import and use in their machine learning models. All the other background processes like how everything should be sourced and computed happens in Feathr.
“Under the hood, Feathr figures out how to provide the requested feature data in the required way for model training and production inferencing. For model training, features are computed and joined to input labels in a point-in-time correct way, and for model inferencing, features are pre-materialized and deployed to online data stores for low-latency online serving. Features defined by different teams and projects can easily be used together, enabling collaboration and reuse,” said LinkedIn in a blog post.
As part of this announcement, LinkedIn engineering open-sourced Feathr in Github and made this service available on Azure (Microsoft Cloud Service) for developers.
A feature store is one of the most important services essential in machine learning operations (MLOps). It expedites usage and democratises machine learning-enabled products in any enterprise. There is a special community around this topic which also has its summit.
AWS SageMaker (Amazon Machine learning Service) feature store and Google Cloud Vertex AI are some samples for feature store solutions on public clouds. Also, there are other open-source feature stores for the public like Feast, Databricks Feature Store, and Hopsworks.