Data architecture: Bridging the gap between IT and data teams.

Bridging the gap between IT and data teams

A pattern repeats often enough in enterprise data work to be worth naming. The data team has built a model that demonstrably works on a laptop. The IT team has, for years, run a stable platform that demonstrably serves the business. Each side is competent. Each side is, in its own terms, doing exactly what it was hired to do. And yet the model never ships, the integration never lands, and the executive sponsor six months and several review cycles later quietly redefines the project as a 'learning experience.'

The post-mortems usually point at scope, talent, or vendor choice. Those are rarely the cause. The cause is almost always that the two organizations involved IT and Data are operating with different vocabularies, different review cycles, different definitions of 'done', and a shared, polite refusal to renegotiate any of them. The work fails on the seam between them.

This is the structural problem the data architect role exists to solve. It is not a senior engineer. It is not a smarter data scientist. It is a specific job: to make the seam visible, governable, and crossable, so that data products can leave the laptop and become operations.

“
Most enterprise data initiatives are not blocked by technology. They are blocked at the interface between two functions that were never asked to design one.

What follows is a practitioner's view of the role, written for the people who hire it, fund it, and work alongside it: an anatomy of the gap, the five core responsibilities, a worked example, a timing guide, the skill matrix executives can use to assess candidates, and a short playbook for the architecture reviews where the role earns or loses its mandate.

FIG 01, Two competent organizations, two coherent worldviews, no shared interface. The work that fails is rarely badly built it is correctly built on the wrong side of the seam.

Why so many data products never ship

Over the last decade, enterprises have invested heavily in data platforms, lakehouses, AI/ML toolchains, hires, training. The investment is real. The output, in most organizations, is also real but disappointing: a portfolio of demonstrably-working prototypes, a long list of 'blocked on IT' tickets, and a gradually accumulating skepticism in the executive layer that anyone in the company knows how to convert data into operational value.

The standard explanations are not wrong, but they are downstream.The upstream cause is that data products are software products with data at the centre and most organizations did not staff for that fact.They staffed for data science, expecting that a strong analytical team would somehow find its way to production. It does not, because production is not an analytical problem.

The three failure modes

In our advisory work we see the same three patterns, in roughly equal measure:

The orphaned prototype. A data team builds something that works. Six months later, no IT team has accepted ownership of running it. There is no clear environment, no deployment pattern, no monitoring strategy, no on-call. The prototype works, but no one is allowed to use it.
The blocked integration. A model is ready. The data it needs lives in three systems owned by three teams, each with a different access protocol, each protected by a control regime designed for a different threat model. The integration is not technically hard; it is organizationally undefined.
The redundant build. A data team builds a pipeline that already exists somewhere in the organization, because no one in either function had the mandate or the time to map what was already in place. Two years later, the same data is being processed three different ways, with three different definitions of 'customer.'

None of these failures is a technology failure. They are interface failures predictable consequences of expecting two specialised functions to coordinate without a role designed for the coordination.

“
The model is not blocked by IT. It is blocked by the absence of an interface between the team that built it and the team expected to run it.

Anatomy of the IT-Data gap

The gap is easier to close once it is described concretely. It is not a personality clash. It is a difference in what each function was set up, measured, and rewarded to do compounded by a difference in vocabulary that makes that disagreement hard to even surface in a meeting.

IT Custodial Logic

Optimizes for stability and control
Hardened environments with reproducible builds
Identity, access, and audit baked into every path
Fewer surprises is the explicit goal
Default response to ambiguity: refuse

Data Exploratory Logic

Optimizes for discovery and iteration
Notebooks, ad-hoc environments, fast feedback
Access to as much data as possible, as fast as possible
Models judged on metrics, not on operability
"Done" means the result is convincing
Default response to ambiguity: experiment

Both worldviews are, in their own context, correct. IT was not invented to slow data work down it was invented to keep production trustworthy. Data was not invented to ignore controls it was invented to find non-obvious answers fast. The friction is not malice; it is two correct local optima with no shared global one.

The same vocabulary makes the disagreement worse. When a data scientist says 'the model is in production', they often mean a notebook that runs on a schedule. When IT hears the same phrase, they hear: change-managed deployment, observability, on-call, rollback strategy, audit trail, and a named owner. The two are not arguing about the same artefact. They are using the same word to refer to different objects.

“
The gap is not technical. It is a missing translation layer and that translation layer has a job title.

The further you move from the laptop, the more expensive the gap becomes. A misunderstanding caught at the design stage costs an afternoon. The same misunderstanding caught at deployment costs a quarter. The same misunderstanding caught after go-live costs the executive sponsor's budget cycle, and frequently the project itself.

What a data architect actually does

The role description matters because the wrong description leads to the wrong hire. Data architects are not enterprise architects with a Python certificate. They are not the most senior data engineer. They are not a data-platform vendor's pre-sales lead. The role has five specific responsibilities, all of which sit on the seam.

Map the existing IT estate honestly

Read the running landscape applications, data stores, pipelines, identity, networks, deployment patterns and document it as it is, not as the org chart suggests it should be. Without this, every data design is a guess at where it can land.

Document data products end-to-end

For every data initiative: stated purpose, business owner, source systems, data flows, models, downstream consumers, controls, and SLAs. The discipline forces vague projects to become concrete or fail honestly at the design stage rather than at deployment.

Find synergies before approving net-new

Before any new pipeline, store, or platform is approved: is something already doing this? Could it be extended? The default answer is rarely "build new" once the question is actually asked. Most enterprises duplicate platforms because no one is paid to look first.

Challenge architectural defaults

When the right answer is to move a workload off-prem, retire a monolith, or change the data contract between two teams, say so and engineer the path. The role exists in part to make the unpopular-but-correct architectural call possible to defend.

Identify gaps and stage the roadmap

Detect security, scalability, governance, and cost gaps in the current estate, and propose a phased plan that does not require a heroic year. A roadmap that cannot be done in increments will not be done.

The shape of the role implies the personality. A good data architect is by temperament a translator with authority: comfortable enough in both worlds to be trusted, and senior enough to make calls that close the seam rather than route around it. They write more than they code. They negotiate more than they architect, in the diagram-drawing sense of the word.

“
The right test for a data architect is not how good their reference architecture is. It is whether the IT director and the head of data both look relieved when they walk into the room.

The role sits at the intersection of three constituencies IT, Data, and the business stakeholders who sponsor the work and produces a small, specific set of artefacts that make the seam navigable: an estate map, a documented architecture, a roadmap, a portfolio of documented data products, and the process that ties them together.

FIG 02, The role at the intersection of three constituencies. Six recurring actions, five durable artefacts. The architect is not at the top of a hierarchy they sit at the centre of an interface.

A worked example: the platform that almost failed

To make the role concrete, consider a recent engagement anonymised at a mid-sized European commercial bank. The bank had spent two years and roughly €4.2M on a 'next-generation data platform' intended to consolidate analytics across retail, SME, and corporate banking. The technology selection was reasonable. The team was competent. At the eighteen-month review, three flagship use cases fraud monitoring, SME credit scoring, and customer churn had all stalled.

Case · Anonymised

Sector
Status
Sponsor
Spend to date
Stuck use cases
Diagnosed problem
Intervention

Mid-size commercial bank · DACH region

Recovered
Chief Data Officer, reporting to the CFO new role, eighteen months in
~€4.2M across platform licensing, two consultancies, and an internal team of nine
Fraud monitoring · SME credit decisioning · Retail churn none in production after 18 months
Platform built without an interface to the IT estate that was supposed to consume it
Embedded a data architect for ten weeks. No new tooling purchased. Three use cases unblocked in fourteen weeks.

What the diagnosis revealed

An estate map produced in the first three weeks made the situation legible for the first time. The new platform held curated data, but the source systems still held the system of record; nightly extracts had been built without coordination with the systems team that owned the source. The fraud use case required sub-minute latency; the platform had been designed for daily batch. The SME credit model relied on three fields that, in the system of record, lived under a different governance regime than the analytical team had assumed.

Latency mismatch

Fraud needs < 60s; the platform delivers data on a 24-hour cycle. The use case can never work on this path.

Ownership mismatch

Three fields used by SME credit are PII under the bank's policy, with different access rules than the team assumed.

Explicit capacity planning

No IT team had agreed to operate the new platform's pipelines. They assumed Data would. Data assumed IT would.

Duplication

Two of the new platform's pipelines reproduced data already produced by an existing IT-owned pipeline.

What changed

None of the findings required new technology. The intervention was almost entirely organisational. Fraud monitoring was repointed to a streaming path that already existed in IT for a different purpose; the bank had built it for transaction monitoring two years earlier. SME credit was unblocked by relocating two of its features to a privacy-preserving aggregate already maintained by the risk team, removing the PII issue. Retail churn was the simplest: a clear ownership contract was written between Data (model lifecycle) and IT (operations and SLA), and the use case shipped within six weeks.

The most consequential change was not technical. It was that, at the next steering committee,the head of data and the CIO arrived with a single architecture document, not two competing onesThe use cases that shipped after that were no longer fighting the seam because someone, finally, was paid to design across it.

“
The bank did not need a new platform. It needed a role that was accountable for the interface between the platform and everything else.

When to bring the role in

The most common timing mistake is to hire the data architect after the data platform is already chosen, the team is already standing up, and the first use case is already in flight. By that point, every decision the role exists to inform has already been made. The architect spends their first year unpicking commitments instead of shaping them.

· Right time

Brought in after the platform is bought and the first use cases are stuck. The architect spends the first year renegotiating decisions that were never theirs to make.

· Wrong time

Brought in before platform selection early enough to map the IT estate, define the operating model, and shape the data product portfolio. The architect's first quarter is documentation, not procurement.

A useful rule: if the organisation is about to spend more than €1M on data tooling, or to staff a team of more than ten on a data initiative, the data architect is hired before either of those decisions, not after. The role is cheaper than the alternative almost always by an order of magnitude.

The second timing question is reporting line. The temptation is to land the role inside Data, because that is where the energy is. The pattern that works better is for the architect to sit on the boundary reporting to the CDO or CIO depending on where the centre of gravity is, but with a formal mandate to convene both sides.If the architect cannot convene IT without escalation, the role has been miscast.

The skill matrix

The role is unusually demanding to staff because the competency stack is wide and shallow tolerances on any of the layers tend to be costly. The matrix below is the one we use to evaluate candidates and the one we recommend hiring managers use as a reference.

Production experience

Has shipped, run, and on-called real systems not just proofs of concept. Knows what fails at 3am.

Software engineering depth

Understands design patterns, trade-offs, and why a clean API is worth more than a clever model.

Data fundamentals

Ingestion, storage, transport, processing, quality, privacy, governance fluent across the stack.

Architectural patterns

Knows when monolith, microservice, lakehouse, mesh, or warehouse is right and when it is fashionable.

Cloud & deployment

Comfortable across major clouds and Kubernetes. Knows what gets cheap and what gets expensive at scale.

ML & AI literacy

Need not build models must read them, evaluate them, and judge what they need to operate.

Governance & risk

Understands legal, compliance, and risk constraints as design inputs, not afterthoughts to negotiate around.

Decision-making

Will commit to a recommendation under ambiguity, document the reasoning, and revise it when wrong.

Communication

Writes clearly. Presents to a board and to a platform team in the same week, in the same calm voice.

Of the nine, the two most often missing in candidates who otherwise look strong are production experience and communication. A candidate who has never been on-call rarely understands the cost of brittle integrations. A candidate who cannot write rarely produces the artefacts the role is paid to produce.

“
The hiring filter that matters most: ask the candidate to describe a time they changed an architectural decision they had previously approved. The answer reveals more than any technical screen.

The architecture review playbook

Even the right hire fails if the operating mechanism around them is wrong. The single most useful artefact the role owns is a recurring architecture review a forum where every data initiative is examined against a shared set of questions, with both IT and Data in the room, and a single document at the end.

A useful review covers four lenses, each scored 1–5:

Fit

Does it match the estate?
Identity, access, deployment, monitoring does it land on a path that already exists or is one of those being newly invented for this initiative alone?

Reuse

Is anything being rebuilt?
Is there a pipeline, dataset, or pattern that already exists and could be extended? If yes, why is it not being used?

Run

Who operates it?
Named team, named on-call, named SLA, agreed funding. If any of those are blank at review time, the initiative is not yet ready to ship.

Risk

What can go wrong?
Failure modes, governance constraints, regulatory exposure. Stated explicitly, with controls mapped to each not "we'll handle it later."

The discipline is to treat any unanswered question as a blocker, not a footnote. 'We'll figure it out at deployment' is the single most expensive sentence in enterprise data workIt is also the single most common. Two patterns make the review actually function. First: it must end with a written decision approve, conditionally approve with named owners for each open question, or send back. Verbal consensus is not a decision; it is a mood. Second: the architect chairs but does not own outcomes. The decisions belong to the named accountable executives. The architect's job is to make sure the right question is in front of them at the right time.

The boardroom checklist

Executives sponsoring a data initiative particularly a multi-year, multi-million-euro one should be able to answer eight questions before the budget is committed:

Who owns the seam?

Named role, named human, with a mandate that includes both IT and Data. If no one, the initiative is unowned by default.

Is the IT estate documented?

Not the wishlist, not the roadmap the running estate as it actually is, today.

What is the data product portfolio?

Use cases listed, sponsored, scoped not "the platform will enable many things."

What is being reused vs. rebuilt?

An explicit list. If everything is "new," the initiative is duplicating something that already runs.

Who runs it once it ships?

Named operator, named SLA, named funding line. Pre-deployment, not post-incident.

What controls are in place?

Identity, access, audit, privacy, model governance by design, not by remediation.

What is the architecture review cadence?

Recurring, attended by both functions, ending in written decisions. Not a one-time committee.

What does failure look like?

Named failure modes, named exit criteria, named off-ramp. If failure has not been imagined, it has not been planned for.

The most consequential of the eight is the first. Every other question can be answered if the seam has an owner. None of them can be answered honestly if it does not.

From translator to multiplier

The data architect is often described, including in our own early framing of the role, as a translator. The framing is useful but incomplete. Translators help two parties understand each other. The data architect's job is more demanding: to design an interface that did not exist before, and to make that interface durable as both organizations on either side of it evolve.

Done well, the role is a multiplier. Every data initiative that follows lands more cleanly because the estate is mapped, the controls are inherited, the review mechanism exists, and the IT and Data leaders have stopped privately rolling their eyes at each other. The first use case after the role is established is rarely cheap; the tenth is dramatically cheaper. The compounding shows up not in the first project's ROI but in the slope of the portfolio.

Leaving behind: Architecture as diagrams

Reference architectures that nobody runs
Estate documents that are six months out of date
Reviews that produce slides, not decisions
Roles that report to one side of the seam

Moving toward: Architecture as interface

Living estate maps used in every review
Data products documented end-to-end
Reviews ending in written decisions
A role whose mandate spans both functions

Every executive team that has built a data capability worth keeping has eventually staffed this role sometimes by hiring it deliberately, more often by promoting the one engineer who happened to be quietly doing the job already. The cost of doing it deliberately is roughly one senior salary. The cost of doing it accidentally is the eighteen-month detour we described in §04, repeated across every initiative until someone is paid to stop it.

“
The companies that turn data into operations are not the ones with the best platforms. They are the ones that hired for the seam and gave that role the authority to close it.

The right boardroom question is no longer 'Have we picked the right data platform?'

“
It is: 'Do we have a single human accountable for the interface between IT and Data and have we given them the authority to use it?'

Seam

The seam is the asset
Most failures are interface failures. Hire for the interface, not just for either side of it.

Timing

Hire early, not late
Before platform selection, before the team scales. After is salvage; before is leverage.

Mandate

Authority over both sides
The role only works if it can convene both functions without escalation. Otherwise it is decorative.

Closing the gap between IT and Data without a re-platforming year.

We help organisations design and staff the interface between IT and Data: estate mapping, data product portfolios, architecture review mechanisms, and the operating model that lets data initiatives reach production at the pace the business actually needs.