Data lineage is the process of tracking data usage within your organization. This includes how data originates, how it is transformed, how it is calculated, its movement between different systems, and ultimately how it is utilized in applications, reporting, analysis, and decision-making. This is a crucial capability for any modern ecosystem, as the amount of data businesses generate and store increases every year.
As of 2024, 64% of organizations manage at least one petabyte of data — and 41% have at least 500 petabytes of information within their systems. In many industries, like banking and insurance, this includes legacy data that spans not just systems but eras of technology.
As the data volume grows, so does the need to aid the business with trust in access to that data. Thus, it is important for companies to invest in data lineage initiatives to improve data governance, quality, and transparency. If you’re shopping for a data lineage tool, there are many cutting-edge options. The cloud-based Zengines platform uses an innovative artificial intelligence-powered model that includes data lineage capabilities to support clean, consistent, and well-organized data.
Whether you go with Zengines or something else, though, it’s important to be strategic in your decision-making. Here is a step-by-step process to help you choose the best data lineage tools for your organization’s needs.
Start by ensuring your selection team has a thorough understanding of not just data lineage as a concept but also the requirements that your particular data lineage tools must have.
First, consider core data lineage tool functionalities that every company needs. For example, you want to be able to access a clear visualization of the relationship between complex data across programs and systems at a glance. Impact analysis also provides a clear picture of how change will influence your current data system.
In addition, review technology-specific data-lineage needs, such as the need to ingest legacy codebases like COBOL. Compliance and regulatory requirements vary from one industry to the next, too. They also change often. Make sure you’re aware of both business operations needs and what is expected of the business from a compliance and legal perspective.
Also, consider future growth. Can the tool you select support the data as you scale? Don’t hamstring momentum down the road by short-changing your data lineage capabilities in the present.
When you begin to review specific data lineage tools, you want to know what features to prioritize. Here are six key areas to focus on:
Keep these factors in mind and make sure whatever tool you choose satisfies these basic requirements.
Along with specific features, you want to assess how easy it is to implement the tool and how easy it is to use the tool.
Start with setup. Consider how well each data lineage software solution is designed to implement within and configure to your system. For businesses that built technology solutions before the 1980s, you may have critical business operations that run on mainframes. Make sure a data lineage tool will be able to easily integrate into a complex system before signing off on it.
Consider the learning curve and usability too. Does the tool have an intuitive interface? Are there complex training requirements? Is the information and operation accessible?
When considering the cost of a data lineage software solution, there are a few factors to keep in mind. Here are the top elements that can influence expenses when implementing and using a tool like this over time:
Make sure to consider costs, benefits, TCO and ROI when assessing your options.
If you’re looking for a comprehensive assessment of what makes the Zengines platform stand out from other data lineage solutions, here it is in a nutshell:
Our automated solutions create frictionless, sped-up solutions that reduce risk, lower costs, and create more accessible data lineage solutions.
As you assess your data lineage tool choices, keep the above factors in mind. What are your industry and organizational requirements? Focus on key features like automation and integration capabilities. Consider implementation, training, user experience, ROI, and comprehensive cost analyses.
Use this framework to help create stakeholder buy-in for your strategy. Then, select your tool with confidence, knowing you are organizing your data’s past to improve your present and lay the groundwork for a more successful future.
If you have any follow-up questions about data lineage and what makes a software solution particularly effective and relevant in this field, our team at Zengines can help. Reach out for a consultation, and together, we can explore how to create a clean, transparent, and effective future for your data.
.png)
For nearly a decade, global banks have treated BCBS 239 compliance as an aspirational goal rather than a regulatory mandate. That era is ending.
Since January 2016, the Basel Committee's Principles for Effective Risk Data Aggregation and Risk Reporting (BCBS 239) have required global systemically important banks to maintain complete, accurate, and timely risk data. Yet enforcement was inconsistent, and banks routinely pushed back implementation timelines.
Now regulators are done waiting. According to KPMG, banks that fail to remediate BCBS 239 deficiencies are "playing with fire."
At the heart of BCBS 239 compliance sits data lineage - the complete, auditable trail of data from its origin through all transformations to final reporting. Despite being mandatory for nearly nine years, it remains the most consistently unmet requirement.
From 2016 through 2023, comprehensive data lineage proved extraordinarily difficult to verify and enforce. The numbers tell the story: as of November 2023, only 2 out of 31 assessed global systemically important banks fully complied with all BCBS 239 principles. Not a single principle has been fully implemented by all banks (PwC).
Even more troubling? Progress has been glacial. Between 2019 and 2022, the average compliance level across all principles barely moved - from 3.14 to 3.17 on a scale of 1 ("non-compliant") to 4 ("fully compliant") (PwC).
Throughout this period, banks submitted implementation roadmaps extending through 2019, 2021, and beyond, citing the technical complexity of establishing end-to-end lineage across legacy systems. Many BCBS 239 programs were underfunded and lacked attention from boards and senior management (PwC). For seven years past the compliance deadline, data lineage requirements remained particularly challenging to implement and even harder to validate.
The Basel Committee's November 2023 progress report marked a shift in tone. Banks' progress was deemed "unsatisfactory," and regulators signaled that increased enforcement measures - including capital surcharges, restrictions on capital distribution, and other penalties would follow (PwC).
Then came the ECB's May 2024 Risk Data Aggregation and Risk Reporting (RDARR) Guide, which provides unprecedented specificity on what compliant data lineage actually looks like - requirements that were previously open to interpretation (EY).
In public statements, ECB leaders have hinted that BCBS 239 could be the next area for periodic penalty payments (PPPs)—daily fines that accrue as long as a bank remains noncompliant (KPMG). These penalties can reach up to 5% of average daily turnover for every day the infringement continues, for a maximum of six months (European Central Bank).
This enforcement mechanism is no longer theoretical. In November 2024, the ECB imposed €187,650 in periodic penalty payments on ABANCA for failing to comply with climate risk requirements—demonstrating the regulator's willingness to deploy this tool (European Banking Authority).
European enforcement now includes ECB letters with findings, Pillar 2 requirement (P2R) add-ons, and fines (McKinsey & Company). These aren't hypothetical consequences.
ABN AMRO's Pillar 2 requirement increased by 0.25% to 2.25% in 2024, with the increase "mainly reflecting improvements required in BCBS 239 compliance" (ABN AMRO). That's a tangible capital cost for risk data aggregation deficiencies.
The ECB's May 2024 RDARR Guide goes further, warning that banks must "step up their efforts" or face "escalation measures." It explicitly states that deficiencies may lead to reassessment of the suitability of responsible executives—and in severe cases, their removal (EY).
American regulators have demonstrated equal resolve on data management failures. The OCC assessed a $400 million civil money penalty against Citibank in October 2020 for deficiencies in data governance and internal controls (Office of the Comptroller of the Currency). When Citi's progress proved insufficient, regulators added another $136 million in penalties in July 2024 for failing to meet remediation milestones (FinTech Futures).
Deutsche Bank felt the consequences in 2018, failing the Federal Reserve's CCAR stress test specifically due to "material weaknesses in data capabilities and controls supporting its capital planning process"—deficiencies examiners explicitly linked to weak data management practices (CNBC, Risk.net).
The ECB's May 2024 RDARR Guide exceeds even the July 2023 consultation draft in requiring rigorous data governance and lineage frameworks (KPMG). The specificity is unprecedented: banks need complete, attribute-level data lineage encompassing all data flows across all systems from end to end—not just subsets or table-level views.
The ECB is testing these requirements through on-site inspections that typically last up to three months and involve as many as 15 inspectors. These examinations often feature risk data "fire drills" requiring banks to produce large quantities of data at short notice with little warning (KPMG). Banks without comprehensive automated data lineage simply cannot respond adequately.
The regulatory stance continues to intensify. The ECB has announced targeted reviews of RDARR practices, on-site inspections, and annual questionnaires as key activities in its supervisory priorities work program (EY). With clearer guidance on what constitutes compliant data lineage and explicit warnings of enforcement escalation, deficiencies that were difficult to verify in previous years have become directly testable.
BCBS 239 data lineage requirements are mandatory and now explicitly defined in regulatory guidance. But here's the uncomfortable truth: for most banks, the biggest gap isn't in modern cloud systems with well-documented APIs. It's in the legacy mainframes that still process the majority of core banking transactions.
These systems—built on COBOL, RPG, and decades-old custom code—are the "black boxes" that make BCBS 239 compliance so difficult. They hold critical risk data, but their logic is buried in thousands of modules written by engineers who retired years ago. When regulators ask "where did this number come from?", banks often cannot answer with confidence.
Zengines' AI-powered platform solves this specific challenge. We deliver complete, automated, attribute-level lineage for legacy mainframe systems - parsing COBOL code, tracing data flows through job schedulers, and exposing the calculation logic that determines how risk data moves from source to regulatory report.
This isn't enterprise-wide metadata management. It's targeted, deep lineage for the systems that have historically been impossible to document—the same systems that trip up banks during ECB fire drills and on-site inspections. Zengines produces the audit-ready evidence that satisfies examination requirements, with the granularity regulators now explicitly demand.
For banks facing P2R capital add-ons, the cost of addressing mainframe lineage gaps is minimal compared to ongoing capital charges for non-compliance - let alone the risk of periodic penalty payments accruing at up to 5% of daily turnover.
BCBS 239 has required comprehensive data lineage since January 2016. With the May 2024 RDARR Guide providing explicit requirements and regulators signaling enforcement escalation, banks can no longer defer implementation—especially for legacy systems.
Zengines provides the proven technology to shine a light into mainframe black boxes, enabling banks to demonstrate compliance when regulators arrive with data requests and their enforcement toolkit.
Learn more today.

The "I" in CIO has always stood for Information, but in 2026 that responsibility takes on new urgency.
As the market pours resources into AI and enterprises face mounting pressure to manage it - whether deploying it internally, partnering with third parties who use it, or satisfying regulators who demand clarity on its use - the CIO's priority isn't another technology platform. It's data lineage and provenance as an unwavering capability.
This is what separates CIOs who treat technology management as an operational function from those who deliver trustworthy information as a strategic outcome.
Three industry drivers make this imperative urgent:
First, AI's transformative impact on business: Gartner reports that, despite an average spend of $1.9 million on GenAI initiatives in 2024, less than 30% of AI leaders report their CEOs are happy with AI investment return—largely because organizations struggle to verify their data's fitness for AI use.
Second, the massive workforce retirement in legacy technology: 79% cited their top mainframe-related challenge is acquiring the right resources and skills to get work done, according to Forrester Research, as seasoned experts retire and take decades of institutional knowledge about critical data flows with them.
Third, the ever-increasing regulatory landscape: Cybersecurity vulnerabilities, data governance, and regulatory compliance are three of the most common risk areas expected to be included in 2026 internal audit plans, with regulators demanding verifiable data lineage across industries.
As the enterprise's Information Officer, the CIO must be accountable for the organization's ability to produce and trust information - not just operate technology systems. Understanding the complete journey of data, from origin through every transformation to final use, supports every strategic outcome CIOs need to deliver: enabling AI capabilities, satisfying regulatory requirements, and partnering confidently with third parties. Data lineage provides the technical foundation that makes trustworthy information possible across the enterprise.
Three forces converge to create a burning platform:
First, regulatory compliance demands now span every industry - from BCBS-239 and DORA in financial services to HIPAA in healthcare to SEC analytics requirements across public companies. Regulators are enforcing data lineage mandates with substantial penalties.
Second, every business needs to demonstrate AI innovation, yet AI initiatives succeed or fail based on verified training data quality and explainability.
Third, in a connected world demanding "always on," enterprises must be agile enough to globally partner with third parties, whether serving customers through partner ecosystems or trusting data from their own vendors and service providers.
The urgency intensifies because mainframe systems house decades of critical business logic while the workforce that understands these systems is retiring, making automated lineage extraction essential before institutional knowledge disappears.
Given these converging pressures, CIOs need enterprise-wide data lineage capability that captures information flows across the entire technology landscape, including legacy systems. This means automated lineage extraction from mainframes, mid-tier applications, cloud platforms, and third-party integrations - creating a comprehensive map of how data moves and transforms throughout the organization.
Manual documentation fails because it can't keep pace with system complexity and depends on human compliance. The solution requires technology that captures lineage at the technical level where data actually flows, then makes this intelligence accessible for business understanding.
For mainframe environments specifically, this means extracting lineage from COBOL and RPG code before retiring experts leave. The strategic outcome: a single, verifiable source of truth about data provenance that serves regulatory needs, AI development, and partnership confidence simultaneously.
This shift elevates the CIO's accountability from operational execution to strategic outcomes. Rather than simply providing systems, CIOs become accountable for the infrastructure that proves information integrity and lineage.
This transforms conversations with boards and regulators from "we operate technology systems" to "we can verify our information's complete journey and quality"—a fundamentally stronger position.
The CIO role expands from technology delivery to information assurance, directly supporting enterprise risk management, innovation initiatives, and strategic partnerships through verifiable capability.
Ultimately, data lineage capability delivers three strategic business outcomes:
The enterprise moves from defensive compliance postures to offensive information leverage, with the CIO providing infrastructure that turns data into a strategic asset rather than a regulatory liability.
For CIOs in 2026, owning Information means proving it - and data lineage is what makes that promise possible.
To learn more about how Zengines can support your data lineage priorities, schedule a call with our team.

Every enterprise eventually faces a pivotal question: should we connect our systems together, or move our data to a new home entirely? The answer seems simple until you're staring at a 40-year-old mainframe with dwindling support, a dozen point solutions held together by ever-growing integrations, and a budget that doesn't accommodate mistakes.
Data migration and data integration are often confused because they both involve moving data. But they serve fundamentally different purposes - and choosing the wrong approach can cost you years of technical debt, millions in maintenance, or worse, a failed transformation project.
Data migration is about transition and consolidation.
Systems reach end-of-life. Platforms get replaced. Acquisitions require consolidation. Companies outgrow their technology stack and need to move from functionally siloed point solutions to consolidated platforms.
Migration addresses all of these - relocating data from a source system to a target, transforming it to fit the new data model, then retiring the source. The result is a cleaner footprint: fewer systems, fewer dependencies, a tidier architecture.
Data integration is about coexistence.
You're connecting systems so they can share data continuously, in real-time or near-real-time. Both systems stay alive. Think of it like building a bridge between two cities - traffic flows both directions, indefinitely.
On the surface, integration can seem more appealing - it preserves optionality and avoids the hard decision of retiring systems. But optionality has carrying costs. Every bridge you build is a bridge you must maintain, monitor, and update when either system changes. Migration delivers a leaner architecture with less operational overhead.
Migration makes sense when you're ready to consolidate and simplify - especially for operational systems.
Consider migration when:
Integration makes sense when systems genuinely need to coexist and communicate -- particularly for analytical use cases.
Consider integration when:
Migration projects have traditionally been expensive upfront. Research shows that over 80% of data migration projects run over time or budget. A 2021 Forbes analysis found that 64% of data migrations exceed their forecast budget, with 54% overrunning on time.
But here's what those statistics don't capture: much of this cost and risk stems from outdated approaches to migration. Legacy migration projects often relied on manual analysis, hand-coded transformation scripts, and armies of consultants reverse-engineering undocumented systems. The migration itself wasn't inherently expensive - the lack of proper tooling made it expensive.
When migration succeeds, you have a clean slate. The old system is retired. There's no pipeline to maintain, no nightly sync jobs to monitor, no integration layer to update when either system changes. You've reduced your technology footprint.
Integration appears easier at first. You're not touching the legacy data - you're just building a bridge. The upfront cost looks manageable. But that bridge requires constant attention.
According to McKinsey, the "interest" on technical debt includes the complexity tax from "fragile point-to-point or batch data integrations." Engineering teams spend an average of 33% of their time managing technical debt, according to research from Stripe. When you build an integration instead of migrating, you're committing to that maintenance indefinitely.
Gartner estimates that about 40% of infrastructure systems across asset classes already carry significant technical debt. Organizations that ignore this debt spend up to 40% more on maintenance than peers who address it early.
The key insight: integration's "lower cost" is an illusion if you only look at upfront spend. When you factor in total cost of ownership - years of maintenance, incident response, and the opportunity cost of engineers maintaining pipes instead of building value - the calculus often favors migration.
Integration preserves optionality. You can defer the retirement decision. You can keep both systems running while you figure out the long-term strategy. But optionality has carrying costs, and those costs compound over time.
Migration forces a constraint - and constraints drive clarity. When you commit to migration, you're forced to answer hard questions: What data do we actually need? What's the canonical source of truth? What business rules should govern this data going forward? The result is a tidier, more intentional data architecture.
Many organizations choose integration because migration feels too hard. But "too hard" often means "too hard to decide." Integration lets you defer decisions. Migration forces them - and in doing so, delivers a cleaner outcome.
Ask yourself these questions:
For years, integration was perceived as the lesser evil - not because it was the right choice, but because migration seemed too expensive and risky. Organizations built integrations they didn't really want because migration felt out of reach.
That calculation is changing. Modern migration platforms are lowering the barrier to making the right choice - automating the analysis, transformation, and validation work that used to require armies of consultants. When migration's entry cost drops, total cost of ownership (TCO) becomes the deciding factor. And on TCO, migration often wins.
If you're modernizing legacy systems, consolidating point solutions into an ERP, or keeping operational systems lean for faster troubleshooting, migration gives you a cleaner footprint and eliminates technical debt. Yes, it requires commitment upfront. But you're trading short-term focus for long-term simplicity.
If you're feeding analytical systems, connecting platforms that both serve ongoing purposes, or need real-time data flow between coexisting systems, integration is the right tool. Just go in with your eyes open about the maintenance commitment you're making.
The worst outcome is choosing integration because migration seemed too hard - and then spending the next decade maintaining pipes to systems you should have retired years ago.
Zengines is an AI-native data migration platform built to lower the barrier to making the right choice. If you're weighing migration against integration - or stuck maintaining integrations you wish were migrations - we'd love to show you what's now possible. Let's talk.
.png)