Datatechvibe chats with Danial Entezari, Managing Director at BigNumber and a Data Science and Computer Science instructor at AstroLabs about enterprises using technology to manage risk and uncertainty.
“The right tools help the business reduce uncertainty and risk. Good decisions mean good results and good results must be quantifiable. To make good decisions, the business must have the ability to explore data and forecast, quantifiably, the results of their decision,” says Entezari.
With businesses working on data-related problems, Entezari reflects on how managed and unmanaged tech are not mutually exclusive, and organisations can gradually transition to the latter.
Excerpts from the interview:
How will data science and cloud computing evolve in the next five years?
Let’s first begin with cloud computing. The short answer is better economic utility for everyone. By economic utility, I mean our experiences as consumers will improve in everything from retail and entertainment to finance and logistics, and even healthcare.
Cloud computing, in simple terms, is about outsourcing computation problems via the internet to specialised computers (data centres). This is also a definition for distributed computing, which is not a novel idea but one that started in the 1960s.
However, two factors have since made distributed computing a big deal: the internet and economies of scale. Despite the recent semiconductor shortage, computer components are better yet cheaper than ever. The speed and accessibility to the internet is greater too, especially with the advent of 5G.
More organisations will abandon on-premise infrastructure and migrate to the cloud. The costs of computation will be lower, and latency will become a lesser, negligible issue. Furthermore, constant consumer expectations for better digital experiences will drive companies to meet demands, increasing the number of providers and making the cloud market more competitive. Competition boosts innovation and the cycle continues.
Now, about Data Science. It is an emergent field combining statistics and computer science. Another description for data science statistical and mathematical techniques programmatically applied to large volumes of data. My forecast is that we’ll see better data engineering which in turn would lead to better mathematical models for understanding our world.
With these developments in cloud computing and data science, there will be stiffer competition for talent. Yes, cloud computing and data science is evolving but not every business will benefit. Both of these fields demand advanced technical talent, which is hard to come by. Businesses will have to turn to their existing talent and be willing to routinely train them to stay ahead with these developments.
What are the latest data and analytics technologies that can help business leaders make hard choices?
What makes a decision hard is when there’s uncertainty and risk about a problem. Thus, the right tools are ones that help the business reduce uncertainty and risk. Good decisions mean good results and good results must be quantifiable. To make good decisions, the business must have the ability to explore data and forecast, quantifiably, the results of their decision.
There are two categories here to consider: managed and unmanaged tech. Managed technologies are generally ones that require little to no technical expertise. Unmanaged tech, on the other hand, assumes the user already has the prerequisite technical knowledge and training.
An example of a managed tech is Gmail, a cloud service that allows you to send and receive emails without you knowing the first thing about computer networking. All you need is an email and password to get started. Thanks to Gmail, the complicated system of emailing is “managed” for you. An example of unmanaged tech would be Amazon SES, a cloud email service for applications that requires advanced knowledge of networking.
In Data Science, similarly, there are managed tools that allow business leaders and their personnel to focus on analytics without the complexity of data warehousing. Examples of managed tools for analytics are Excel, Power BI, Tableau, and Zoho Reports. These tools, of course, are compatible with CRM platforms such as HubSpot and Salesforce.
Next, there are partially managed technologies that require users to have more knowledge of data warehousing and even machine learning. Examples include Alteryx and DataBricks.
Finally, an example of unmanaged tech would be Hadoop DFS and Spark, both very comprehensive tools for processing and managing larger volumes of data (big data). Also, there are cloud vendors like AWS, Google Cloud, Microsoft Azure, and IBM Cloud that provide an entire suite of tools with which businesses can build entire data and analytics platforms from the ground up.
As a rule of thumb, business leaders must first consider managed technologies. If, however, the business is working on data-related problems for which no managed technology is adequate, then they may resort to unmanaged technologies. Managed and unmanaged tech are not mutually exclusive, and organisations can gradually transition to the latter.
Share online tools you find critical, and tell us why?
For basic analytics, I suggest Google Sheets because it is natively compatible with Excel but you have the benefit of integrating other applications from the Google suite like Analytics and even third-party vendors like BigML for machine learning. For more advanced analytics, assuming you know Python and SQL, I would recommend DataBricks. DataBricks is a managed web-based platform configured for Spark and AWS.
For cloud computing, I suggest Heroku and Microsoft Azure for managed and unmanaged cloud services, respectively. Heroku is a managed platform that makes it easy to deploy and scale services on the cloud, specifically on AWS. Of course, it is possible to work directly with AWS. However, in my view, AWS has a steeper learning curve compared to Google Cloud and Microsoft Azure. In fact, I would say that Azure would be the easiest to learn for two reasons: simple naming conventions and Microsoft’s plentiful learning resources.
How should analytics teaching be positioned from the supply side perspective – more business orientation or heavy emphasis on technical aspects?
Analytics is fundamentally a mathematical discipline. It is not necessary to complete a degree in statistics or mathematics but it is necessary to be able to at least interpret the results of an analysis. My suggestion is to put emphasis on relevant technical aspects as much as possible.
What is the edge your data, analytics and coding program has over other institutes?
There are two aspects to consider here: the curriculum and the delivery.
With respect to the curriculum, we make sure that our courses have the right balance of practice and theory. We teach professionals practical skills that they can apply immediately in the real-world but also theoretical knowledge that allows them to develop themselves independently and continuously. Computer Science and Data Science are difficult subjects, but some topics within these subjects are more difficult than others. Our curriculum puts emphasis on the most difficult parts. Otherwise, there is no point in teaching topics that are easy and that anyone can learn on their own.
The delivery is also crucial. As an instructor, I put emphasis on first principles and teaching topics that most struggle to learn on their own. Learning to code is easier than before. There are many resources that are free and online for learning to code. The problem starts when the learner encounters more advanced topics with no suitable tutorial or book. In fact, in many cases, you would need multiple reference materials to clearly understand the meaning of a single concept.
Additionally, the instructors are experienced and active practitioners. Compared to other training institutions I have researched (so far), our curriculum puts better emphasis on crucial theoretical topics. Both Computer Science and Data Science, academically, and ultimately, are mathematical disciplines. There is no way around that and we prepare our learners for a demanding career at the very start of the programs. Compared to universities, on the other hand, the practical teaching is more up-to-date and our communication with learners more direct and immediate.
What tools and techniques are covered as part of the course?
Generally, the tools provided in the programs are used in production.
For example, in the Computer Science Bootcamp we use tools like Heroku for deployment, MongoDB for databases, Cloudinary for CDN, Postman for API development, and React for UI. There’s a lot more that I cannot list here but these are important tools and libraries used in building some of the most advanced platforms we know.
In the Data Science Bootcamp, we use tools like Colab for programming and visualisation, Scikit and TensorFlow for machine learning, Spark for big data, NLTK for text analytics, and RStudio for statistical analysis.
The most important component in both courses is programming. It’s an indispensable skill in today’s world, as we all know. But what most people may not appreciate is that programming is also a bridge to learning higher-level mathematics, the foundation for both computer science and data science.
What value does it offer the professionals?
More important than the tools that we use in these programs are the fundamental skills that we teach our participants. Tools have a short shelf life. Technology is always evolving. But there are skills that will remain important indefinitely. Fundamental skills like data structures and statistics. This is the value of our programs: teaching people skills that will serve them forever.
If you liked reading this, you might like our other stories