Three Considerations To Set Data Streaming Endeavours on the Course for Success

Three-Considerations-To-Set-Data-Streaming-Endeavours-on-the-Course-for-Success

As a tech-savvy nation, the UAE has embraced new technologies, earning it the coveted position as the country in the region with the highest degree of digitalisation in a recent assessment by McKinsey. Software developers in the region have been turning to real-time event streaming (or data streaming) over the past ten years. The event-streaming platform Apache Kafka was listed as one of the most popular frameworks in Stack Overflow’s 2022 Developer Survey, and its popularity is rapidly growing in the region. In recent years, more and more companies have been using it for large-scale use cases such as food delivery and banking, which need especially low latency. 

These types of use cases are the most common and the most acclaimed – and this has resulted in the notion that event streaming is only suitable for use cases with challenging real-time requirements. In doing so, organisations fail to see the potential for its application to historical data. Indeed, this might be leading to missed opportunities, and designers and business leaders in the UAE could stand to benefit further by utilising streaming no matter how fast their company needs to process data. 

Event streaming can make the software more robust, less vulnerable to glitches and easier to understand. So, if you’re thinking about incorporating event streaming, here are three factors to consider. 

The time/value curve of your data

Most data has a time/value curve, and how valuable your data is depended on when the data point occurred. As a general rule, as it ages, data becomes less valuable.

When people consider streaming, they don’t generally think about older data. This is because, until relatively recently, most streaming platforms had a fairly limited capacity for storage. This made sense at the outset when data was housed in bare-metal data centres. But since almost everything has now shifted onto the cloud, with its vast capacity for storage, the same logic no longer applies. 

Lots of streaming platforms integrate directly with those stores and carry through the same storage capacity improvements. This means you no longer need to make forced retention decisions around streaming. Instead of needing to decide how long to keep the data within a stream, you can keep it for as long as is sensible. 

Recent developments around AI learning models such as ChatGPT demonstrate the immense potential of these solutions when exposed to sufficient historical data. Back-testing online machine learning models is one of the most interesting use cases for old data streams. When implementing a trained model, there are often some changes needed. But testing the new model against historical traffic is a great way to make sure the new model works well. So streaming helps you get the best value out of both curve ends if you understand your data’s time/value curve. 

Decide data flow direction

Traditionally with software engineering, polling – actively sampling to check the status – was often used. An example could be to sometimes poll a database table to check if a row has been changed or added. This is far from ideal because lots of things could change in the meantime, and you won’t be able to tell what they all are. 

But streaming is about lossless, unidirectional dataflows rather than changeable, bidirectional process calls. This is a far simpler way of understanding how your systems are communicating – and it doesn’t matter whether data is historical or real-time. Instead of periodic polling, it’s possible to listen for updates – and you can be certain you’ll see every change in order. Now, instead of polling a database as mentioned above, it’s becoming much more common to change data capture to listen to changes in the database.

So, when you’re considering streaming as a solution in your business, it’s a good idea to question if your system will benefit from this type of push model and if it’s important to have lossless updates.

Set a strategy for expiration

Using historical streams is a smart choice, but data shouldn’t last forever. At some point, whether because your business has changed or to meet regulations such as the UAE data protection law or the EU’s GDPR, it’s probably going to be necessary to delete your data. 

You generally have two main choices here. One is to put in place expiration policies that mean data systems delete data after a set time, like a time-to-live (TTL). Or compaction, where a record’s historical changes get erased after a set amount of time. 

Another option uses encryption, which is a little more sophisticated. You need the decryption key for an encrypted payload to be useful, so deleting a payload’s encryption key will stop that data from ever being accessed again. This means a solution for removing data can be choosing to delete encryption keys. 

As business leaders in the UAE know only too well, technology is ever-changing, and it’s vital to keep up. When thinking about utilising streaming as a solution, it’s important to consider the three factors above – and that will ensure you make the right choice for each use case. 

If you liked reading this, you might like our other stories

Top 20 Data Innovation Books for Your Reading List
Challenges For Data Leaders In South Africa