For AutoML, users need to be educated. This is a new set of functionalities, and the market is unfamiliar with using it, says Dr Ioannis Tsamardinos, CEO and Co-founder at JADBio, in an exclusive interview where he discusses how AutoML is changing the face of ML-based solutions.
Is AutoML the future of machine learning? Do you see growing demand for improved models to solve new problems?
Dr Ioannis Tsamardinos: Yes. What we now call AutoML is the basic block for the machine learning (ML) of the future. No one will be programming ML pipelines from scratch in the near future. Just as no one is programming in assembly language these days, except, of course, in special cases and situations. Instead, the data analyst of the future will be customising AutoML tools. They will be focusing on formulating and representing the problem as an ML task, they will be focusing on interpreting results or extending AutoML tools with their novel algorithms.
Demand is growing in terms of constructing better models and providing better explanations and interpretations for these models and decision support for taking action based on the models’ predictions.
There is a growing demand for solving new types of problems. “Standard” AutoML deals with tabular data for the most part. These are cross-sectional data: snapshots of measurements on human subjects, tissues, cells, etc. As AutoML proves itself in terms of quality and ease of use, non-analysts overcome their phobia of ML analyses and want to apply it to all types of data, problems, and situations. So, we see Auto-Deep Learning (AutoDL) systems specifically for image, speech, and text analysis rising and AutoML for unsupervised learning. We are currently working on Auto Causal Discovery trying to automate the induction of causal models and relations. There are a great deal more ML tasks to automate such as learning from multivariate time-series, repeated measurements and longitudinal data, relational data, graph data, streaming data, and anomaly detection.
Healthcare companies often struggle with unlocking value from available data. How vital is AutoML in healthcare and life science?
Dr Tsamardinos: It’s extremely vital. Only 0.5 per cent of data worldwide are ever analysed. As you can imagine, the per cent that is correctly and optimally analysed, converted to knowledge and actionable decisions is arguably minuscule. This means lost opportunities for discovering new biology and medicine, for designing better diagnostics, identifying drug targets, repurposing drugs, optimising therapies, saving lives, and improving health, in general.
AutoML can significantly provide value to life sciences in several ways:
- It can drastically improve the productivity and throughput of analyses. Analyses that would otherwise take months, requiring a team of experts, can now be performed in minutes by a single non-expert.
- It directly connects the life scientist to the knowledge in the data. The analyst may help in preparing the analysis, but some AutoML platforms provide such rich, interactive visualisations and interpretations that allow the clinician to explore and comprehend the results of the analysis themselves. The analyst does not have to fully mediate the journey from data to knowledge. That could improve the quality of interpretations as the life scientist has the necessary domain knowledge.
- AutoML can reduce statistical methodological errors that creep into manually coded analyses, at least when AutoML is performed correctly.
Taken together these benefits could mean saving hundreds of thousands of dollars per year in analyst costs. It could also mean saving millions going after the wrong drug target or waiting for months to get results before designing the next diagnostic assay.
Also Read: Top 5 Trends In Data And Analytics
What are some of the distinctive features of JADBio’s platform?
Dr Tsamardinos: There are numerous features that distinguish JADBio. The first is the ease of use. I feel very proud when I see research papers published from our life scientist users, all by themselves, with no analyst involved. The biggest obstacle in using JADBio is a psychological block by life scientists who may think it’s inconceivable they do state-of-the-art ML analyses. I’d respond: not anymore.
We can handle molecular and clinical data. Such data often contain very few samples: at the time of writing, 88.5 per cent of the 4348 curated datasets provided by Gene Expression Omnibus count 20 or fewer samples. These often stem from expensive or technically difficult treatments and interventions or rare diseases. JADBio can handle small-sample datasets and guarantee correct results. JADBio can also handle high-dimensional data, i.e., measuring numerous quantities. Just a couple of days ago, one of our clients analysed a dataset with 7,00,000 methylation expressions.
We offer functionalities necessary to life scientists. In biomedicine, knowledge discovery is of primary concern. To enable knowledge discovery, JADBio identifies the features (biomarkers) that are predictive in combination. It filters out not only irrelevant but also redundant measured quantities. In a recently presented poster in ASCO, one of our clients used JADBio and identified just 2 biomarkers out of tens of thousands of multi-omics quantities that when examined in combination lead to almost perfect diagnosis between right vs. left-sided colorectal cancers. Other functionalities include the ability to perform survival analysis (time-to-event in general), optimisation of clinical thresholds for diagnosis, interpretation of the role and added-value of the biomarkers, identification of multiple sets of equally predictive sets of biomarkers, and several others.
Fourth, JADBio emphasises the correctness of performance estimates. It does not systematically overestimate performances, thus misleading and misguiding users. This is true even when the sample size is quite small. It also means that no samples are lost to estimation. There is no need to hold out samples for estimating the performance of the final predictive model. Estimation is performed correctly and fully automatically from JADBio. To put these functionalities into perspective consider the standard practice in genomics (GWAS analysis) in creating a predictive model. The researcher needs one external study/dataset to select genetic markers. Their own data to develop a predictive model using the pre-selected markers and a third independent dataset to externally validate the model. JADBio identifies the markers among a million SNPs, creates a possibly non-linear model, and correctly estimates performance using a single low-sample dataset.
Finally, JADBio has several unique deep tech algorithms. We have solved some long-standing problems in the ML community such as scalable feature selection, multiple feature selection, estimation of performance in the presence of the winner’s curse, and others. Our research receives 1000 citations per year. JADBio has implemented meta-level ML meaning that we learn from past analyses which algorithms work best on a given dataset. Then, the most promising algorithms are applied in the next dataset to analyse. What the user experiences are a platform that keeps improving its ML quality with usage.
Also Read: Why Does Your Business Need AutoML?
What are some of the unique lessons you have learned from analysing customer behaviour?
Dr Tsamardinos: If something can be misinterpreted, it will. No matter how small. One has to be constantly checking that the majority of their users are actually interpreting labels, buttons, colours, and results in general as intended. Assuming certain visuals and labels are obvious and self-explanatory is dangerous.
For AutoML, I feel users need to be educated. This is a new set of functionalities and the market is unfamiliar with using it. The whole field of AutoML products is exploring what is the most intuitive way to structure and design such tools. We are far from reaching maturity. Just imagine, it took word editor products three decades to reach a decent level of maturity.
What is the one leadership motto you live by?
Dr Tsamardinos: Lead by example. Inspire. Make people feel they are working with you, not for you.