News

Open Source SkyPilot Targets Cloud Cost Optimisation For ML And Data Science

December 5, 2022

A team of researchers at the RISELab at UC Berkeley recently released Skypilot, an open-source framework for running machine learning workloads on the major cloud providers through a unified interface. The project focuses on cost optimisation, automatically finding the cheapest availability zone, region, and provider for the requested resources.

Given the job requirement, the framework automatically determines which locations on AWS, Azure, and Google Cloud have the resources (CPU/GPU/TPU) required to run the job and the most affordable one. Skypilot then performs three main tasks: it provisions the cluster, with automatic failover to other locations if there are capacity or quota errors, synchronises user code and files to the destination, and manages job queueing and execution.

Zongheng Yang, a postdoctoral researcher at UC Berkeley, and Ion Stoica, professor at UC Berkeley and co-founder at Anyscale, explain, “Cloud computing for ML and Data Science is already plenty hard, but when you start applying cost-cutting techniques, your overhead can multiply. Want to stop leaving machines up when they’re idle? You’ll need to spin them up and down repeatedly, redoing the environment and data setup. Want to use spot-instance pricing? That can add weeks of work to handle preemptions. What about exploiting the big price differences between regions or the even bigger price differences between clouds?”

SkyPilot is one of many open-source projects from the RISELab targeting cloud cost optimisation. As previously reported on InfoQ, the research centre released SkyPlane to optimise the transfer of large datasets between cloud providers, reducing transfer times and costs.

Open Source SkyPilot Targets Cloud Cost Optimisation For ML And Data Science

Latest Posts

OpenAI’s o3-Pro Is Here; Open-Weights Model Delayed

Mistral AI Unveils Its First Reasoning Model

Meta’s Zuckerberg Hiring for New ‘Superintelligence’ AI Team: Report

Apple Says AI Models Collapse When Facing Hard Puzzles

Meta in Talks to Invest in Scale AI

Reddit Sues Anthropic Over Alleged Data Scraping for AI Training