How To Build A Resilient Data Architecture


Businesses need maximum resilience, as in most industries, change is constant, and building a reliable data architecture helps enterprises remain competitive.

Pushing the limits of your system and software performance, yet still being able to recover, what resiliency is all about. Building resilient architectures is an engineering journey — choosing the correct set of data resiliency techniques and technologies for an overall business continuity plan.

For a steady state, there are a few principles that businesses need to keep in mind. It starts at the business goals and alignment, progresses to the network and data, and extends to people and culture.

An early alignment with the business is critical for a successful enterprise data architecture, according to Dina Mohammad-Laity, an independent data strategy advisor and former director of Data Science at Talabat.

“As Cassie Kozyrkov at Google frames it, understand if you’re in the business of baking bread or building ovens. Once you clarify what the business needs from a data architecture, what problems are there to solve, you can design around that.”

Also, businesses need to have a clear measurement framework before a single tool is purchased, or a single line of code is written. “Make sure you know what you want to measure and why, and then understand the processes that the data is generated to create those measurements,” Mohammad-Laity said.

Biased to simplicity, with today’s data tooling, Mohammad-Laity said, it can be tempting to over-engineer. “Focus on a longer term frame — how easy will this be to maintain, communicate, diagnose issues, data quality and governance built into the architecture and given equal consideration in the design process.”

Product-thinking, as in considering your end users, is another crucial principle for a successful enterprise data architecture. “They may be analysts, data scientists or business colleagues. Consider their user experience in architecture design,” Mohammad-Laity said.

“When I work on architecture design I imagine a persona, an analyst that will receive the outcome of what I am designing (say a set of tables in a data warehouse), and I want them to feel joy when they see it. I want them to understand the documentation and not feel dread at having to chunk through a lot of painful work to deliver value.”

As organisations move from multi-cloud architectures that are built ad hoc to designed architectures, data resilience is essential. When performance needs arise or a catastrophic failure occurs, enterprises need a thought-out and tested recovery initiative for data.

“Architects are straining to keep up. They design according to a five year growth plan only to find that two years later demand for data access and usage has far exceeded original estimates,” says Sanjeev Mohan, a data and analytics expert and former Research Vice President at Gartner.

“Building a future-proof architecture has grown many times more complex, and everyone must invent a custom solution to this problem. The components of a data and analytics architecture have never been so accessible,” Mohan added.

Besides, ensuring reliability and transparency into pipelines and optimising cost for high scale and performance, selecting technology according to use case is crucial. The first and foremost challenge occurs when the wrong technology is chosen for a business problem. “For example, the so-called ‘big data’ architectures, which focus on storing data first and querying it later, are not ideal for handling ‘small data’,” said Mohan.

“Imagine building an architecture to ingest streaming IoT data and writing every record as a file. This leads to millions of very small files or a very large number of partitions. These partitions then need to be specified in queries to reduce the amount of scanned data and hence reduce computation costs. It is imperative to make the right database choice,” he added.

Understanding where and how the data flows across the enterprise is vital to mitigate data breaches and data leaks. That knowledge also helps businesses to understand how to recover from a data loss and provide resilience for the data infrastructure.

For Mohan, it’s important to standardise the development, automate build and testing processes, and monitor and/or audit the system end-to-end. Businesses need to design for scalability, extensibility, ease of maintenance and deployment independence. DataOps and data observability areas have started providing transparency into the performance of the data pipelines with ML-driven recommendations.

“Not only does this strategy promote incremental feature developments to an analytics stack, but it also reduces the blast radius of technological failures. System failures are contained so that a warehouse level failure won’t automatically cascade to mission critical dashboards,” said Mohan.

Also, scaling your resources dynamically according to demand, as opposed to doing it manually, ensures your service can meet a variety of traffic patterns without anyone needing to plan for it.

So what does a resilient data architecture require? According to Mohammad-Laity, “Success metrics; regular review and a strong surrounding team that understands data products are ephemeral. Also, quality documentation as an element of the architecture, not as an afterthought.”

“By biasing to simplicity, you’ll have fewer points of failure as well as being more flexible to changing business needs. To quote Fivetran, a major player in the modern data tooling space, ‘simple and automated always beats complex and handcrafted’,” added Mohammad-Laity.

If you liked reading this, you might like our other stories

How NLP Powers Conversational AI Through Intent Analysis
Can Data Analytics Give Your Business A Competitive Edge?