Oracle announced that Oracle MySQL HeatWave now supports in-database machine learning (ML) in addition to the previously available transaction processing and analytics — the only MySQL cloud database service to do so.
MySQL HeatWave ML fully automates the ML lifecycle and stores all trained models inside the MySQL database, eliminating the need to move data or the model to a machine learning tool or service.
Eliminating ETL reduces application complexity, lowers cost, and improves data and model security. HeatWave ML is included with the MySQL HeatWave database cloud service in all 37 Oracle Cloud Infrastructure (OCI) regions.
Adding machine learning capabilities to MySQL applications has been prohibitively difficult and time consuming for many developers. First, there is extracting data out of the database and into another system to create and deploy ML models. This approach creates multiple silos for applying machine learning to application data and introduces latency as data moves around.
It also leads to the proliferation of data out of the database, making it more vulnerable to security threats, and adds complexity for developers to program in multiple environments. Second, existing services expect developers to be experts in guiding the ML model training process; otherwise, the model is sub-optimal, which degrades the accuracy of predictions. Finally, most existing ML solutions don’t include functionality to explain why the models developers build deliver specific predictions.
MySQL HeatWave ML solves these problems by natively integrating machine learning capabilities inside the MySQL database, eliminating the need to ETL the data to another service. HeatWave ML fully automates the training process and creates a model with the best algorithm, optimal features, and the optimal hyper-parameters for a given data set and a specified task. All models generated by HeatWave ML can provide model and prediction explanations.
No other cloud database vendor provides such advanced ML capabilities directly inside their database service. Oracle published ML benchmarks performed across many publicly available machine learning classification and regression datasets such as Numerai, Namao, and Bank Marketing, among others. On average, on the smallest cluster, HeatWave ML trains machine learning models 25 times faster at one per cent of the cost of Redshift ML.
Additionally, the performance advantage over Redshift ML increases when training is done on a larger HeatWave cluster. Training is a time-consuming process, and since it can be done very efficiently and rapidly with MySQL HeatWave, customers can now retrain their models more often and keep up with data changes. This keeps the models up-to-date and improves the accuracy of predictions.
“Just as we integrated analytics and transaction processing within a single database, we are now bringing machine learning inside MySQL HeatWave,” said Edward Screven, Oracle’s chief corporate architect. “MySQL HeatWave is one of the fastest-growing cloud services at Oracle. Many customers have migrated from Amazon and other cloud database services to MySQL HeatWave and have gained significant performance improvements and lower costs. Today, we are also announcing several other innovations which enrich HeatWave’s capabilities, improve availability, and lower the cost. Our new and fully transparent benchmark results again demonstrate that Snowflake, AWS, Microsoft, and Google are slower and more expensive than MSQL HeatWave by a large margin.”
HeatWave ML offers the following capabilities:
Fully automated model training: All of the different stages in creating a model with HeatWave ML are fully automated and do not require any intervention from developers. This results in a more accurately tuned model that requires no manual work, and the training process is always completed. Other cloud database services such as Amazon Redshift provide integration with machine learning capabilities in external services, which require extensive manual inputs from developers during the ML training process.
Model and inference explanations: Model explainability helps developers understand the behaviour of a machine learning model. For example, if a bank denies a client a loan, the bank needs to determine which model parameters were taken into account or if the model contains any bias. Prediction explainability is a set of techniques that help answer why a machine learning model made a specific prediction. Prediction explanations are becoming increasingly important these days as companies must be able to explain the decisions made by their machine learning models.
HeatWave ML integrates both model explanation and prediction explanations as a part of its model training process. As a result, all models created by HeatWave ML can offer model and inference explanations without training data at inference explanation time. Oracle has augmented existing explanation techniques to improve performance, interpretability, and quality. Other cloud database services do not offer such rich explainability for all of their machine learning models.
Hyper-parameter tuning: HeatWave ML implements a new gradient search-based reduction algorithm for hyper-parameter tuning. This enables the hyper-parameter search to be executed in parallel without compromising the model accuracy. Hyper-parameter tuning is the most time-consuming stage of ML model training. This unique capability provides HeatWave ML with a significant performance advantage over other cloud services for building machine learning models.
Algorithm selection: HeatWave ML uses the notion of proxy models – which are simple models exhibiting the properties of a full complex model – to determine the best ML algorithm for training. Using a simple proxy model, algorithm selection is done very efficiently without loss of accuracy. No other database services for machine learning models have this capability for proxy modelling.
Intelligent data sampling: HeatWave ML samples a small percentage of the data to improve performance during model training. This sampling is done so that all representative data points are captured in the sample data set. Other cloud services for building machine learning models take a less efficient approach — using random data sampling – which samples a small percentage of data without considering the data distribution characteristics.
Feature selection: Feature selection helps determine the attributes of the training data, which influence the machine learning model behaviour for making predictions. The techniques in HeatWave ML for feature selection have been trained over a broad swath of data sets across multiple domains and applications. HeatWave ML can efficiently identify the relevant features in a new data set from these gathered statistics and meta information.
In addition to machine learning capabilities, Oracle released more innovations to the MySQL HeatWave service. Real-time elasticity enables customers to upsize and downsize their HeatWave cluster to any number of nodes without downtime or read-only time and without manually rebalancing the cluster. Also included is data compression, which enables customers to process twice the amount of data per node and lowers costs by nearly 50 per cent, while maintaining the same price-performance ratio.
Finally, a new pause-and-resume function enables customers to pause HeatWave to save costs. Upon resuming, the data and the statistics needed for MySQL Autopilot are automatically reloaded into HeatWave.