Theo Groenewald, Head of Data Management at Discovery Limited shared exclusive insights into the ever-evolving data engineering landscape and the challenges that come with it. Learn more about how to tackle DataOps, MLOps, and privacy interventions in the enterprise.
Data teams should prioritise the implementation of comprehensive data classification strategies to ensure data is connected to its relevant business context. This is essential for streaming data for end-users to access from a topic or granting access to a warehouse. Proper classification of data can help organisations to understand its meaning better and maximise its potential, says Theo Groenewald.
Ahead of his session at Velocity – Data and Analytics Summit – taking place in South Africa on March 07-08, 2023, Groenewald spoke about balancing cloud, privacy and uptime in the enterprise.
Excerpts from the interview:
How do you see the role of data engineering changing in enterprises in the near future?
Adopting the cloud in the enterprise is likely the most significant force behind changes in data engineering as a discipline. Planning should take into account a hybrid environment, where tools and technologies can vary between on-premise and cloud-based workloads. Those supporting your current stack should be trained in cloud technology while still maintaining their expertise in legacy tools. Younger recruits should bolster their cloud-native skills with a deep understanding of the legacy to ensure continuity. In addition to cloud technology, there is growing demand for expertise in Java and Python to complement existing SQL capabilities. A challenge for us is to embed data engineering into the systems development lifecycle, enabling DataOps and MLOps to coexist with existing DevOps processes.
What are the key considerations when investing/ building solutions for your technology stack?
Interoperability between data teams across the organisation, including integration between the data and tech stack, is paramount. The availability of skilled personnel and the associated costs must be considered, especially in South Africa. Moreover, data sovereignty must be considered as to whether cloud services are available in the region needed.
What advice would you give enterprise leaders to reduce data downtime?
In our current data strategy, we strongly emphasise defining and building data products. We have identified three types of products: operational, informational, and data-driven applications. These products should be built on our data platform, and when considering which data will be used as part of a product, uptime should be defined upfront and managed throughout the product’s lifecycle.
How can technology leaders in the BFSI industry ensure data compliance in an evolving landscape of privacy laws?
Implementing privacy-related interventions can be seen as a grudge purchase when it comes to prioritising the work that businesses need done in the highly competitive landscape in which they operate. Privacy by design, which considers privacy upfront as part of the System Development Life Cycle (SDLC), would benefit from data product-related strategies; however, it would add the complexity and time required to build new features. For IT to succeed in this, it needs strong support and champions for this cause from the business itself. Creating awareness of the importance of data privacy throughout the organisation should decrease the pushback on these initiatives.
Data sharing is a complex undertaking. What advice would you give enterprises approaching the task to avoid common challenges?
There is always risk associated with sharing data outside one’s own platform. Once the data leaves your domain, the potential for misinterpretation and misuse is beyond your control. Data teams should consider implementing comprehensive data classification strategies, wherein the meaning of data can be connected to its business context. Whether streaming data for end-users to consume from a topic or providing access to a warehouse, it is essential to classify the data according to its meaning. Implementing standard quality assurance checks on data will help to foster trust in the data being shared.
For more information, visit Velocity – Data and Analytics Summit