Articles

How to Choose the Best Data Lineage Tools for Your Organization's Needs

June 19, 2025
Greg Shoup

Data lineage is the process of tracking data usage within your organization. This includes how data originates, how it is transformed, how it is calculated, its movement between different systems, and ultimately how it is utilized in applications, reporting, analysis, and decision-making. This is a crucial capability for any modern ecosystem, as the amount of data businesses generate and store increases every year. 

As of 2024, 64% of organizations manage at least one petabyte of data — and 41% have at least 500 petabytes of information within their systems. In many industries, like banking and insurance, this includes legacy data that spans not just systems but eras of technology.

As the data volume grows, so does the need to aid the business with trust in access to that data. Thus, it is important for companies to invest in data lineage initiatives to improve data governance, quality, and transparency. If you’re shopping for a data lineage tool, there are many cutting-edge options. The cloud-based Zengines platform uses an innovative artificial intelligence-powered model that includes data lineage capabilities to support clean, consistent, and well-organized data.

Whether you go with Zengines or something else, though, it’s important to be strategic in your decision-making. Here is a step-by-step process to help you choose the best data lineage tools for your organization’s needs.

Understanding Data Lineage Tool Requirements

Start by ensuring your selection team has a thorough understanding of not just data lineage as a concept but also the requirements that your particular data lineage tools must have.

First, consider core data lineage tool functionalities that every company needs. For example, you want to be able to access a clear visualization of the relationship between complex data across programs and systems at a glance. Impact analysis also provides a clear picture of how change will influence your current data system.

In addition, review technology-specific data-lineage needs, such as the need to ingest legacy codebases like COBOL. Compliance and regulatory requirements vary from one industry to the next, too. They also change often. Make sure you’re aware of both business operations needs and what is expected of the business from a compliance and legal perspective.

Also, consider future growth. Can the tool you select support the data as you scale? Don’t hamstring momentum down the road by short-changing your data lineage capabilities in the present.

Key Features to Look for in Data Lineage Tools

When you begin to review specific data lineage tools, you want to know what features to prioritize. Here are six key areas to focus on:

  1. Automated metadata collection: Automation should be a feature throughout any data lineage tool at this point. However, the specific ability to automate the collection of metadata from internal solutions and data catalogs is critical to sustainable data lineage activity over time.
  2. Integration capabilities: Data lineage tools must be able to comprehensively integrate across entities that store data — past, present, and future. This is where a tool like Zengines shines, where accessing legacy data in legacy technology has been the #1 challenge for most organizations. 
  3. End-to-end visibility: Data lineage must provide a clear picture of the respective data paths from beginning to end. This is a fundamental element of quality data lineage analysis.
  4. Impact analysis and research capabilities: Leading solutions make it easy for users to obtain and understand impact analysis for any data path or data changes showing data relationships and dependencies.. Further, the ability to seamlessly research across data entities assists in confidence for such analysis.
  5. Tracking and monitoring: Data lineage is an ongoing activity, thus data lineage tools must be able to keep up with ongoing data change within an organization.
  6. Visualization features: Visualizations - such as logic graphs - should provide comprehensive data paths across the data life cycle.

Keep these factors in mind and make sure whatever tool you choose satisfies these basic requirements.

Implementation and User Experience Considerations

Along with specific features, you want to assess how easy it is to implement the tool and how easy it is to use the tool.  

Start with setup. Consider how well each data lineage software solution is designed to implement within and configure to your system. For businesses that  built technology solutions before the 1980s, you may have critical business operations that run on mainframes. Make sure a data lineage tool will be able to easily integrate into a complex system before signing off on it.

Consider the learning curve and usability too. Does the tool have an intuitive interface? Are there complex training requirements? Is the information and operation accessible? 

Cost Analysis and ROI

When considering the cost of a data lineage software solution, there are a few factors to keep in mind. Here are the top elements that can influence expenses when implementing and using a tool like this over time:

  • Direct cost: Take the time to compare the up-front cost of each option. Recognize and account for capability differentiation e.g., ability to integrate with legacy technology, necessary to generate a comprehensive end-to-end data lineage. The lowest-cost option is unlikely to include this necessary capability. . 
  • Benefit analysis: : Consider benefits across both quantitative (e.g., time savings for automated data lineage versus manual data lineage search) and qualitative (e.g., cost avoidance for compliance and regulatory fines). 
  • Total cost of ownership (TCO): Consider the big picture with your investment. Are there licensing and subscription fees? Implementation costs? Ongoing maintenance and support expenses? These should be clarified before making a decision.
  • Expected return on investment: Your ROI will depend on things like speed of implementation, reduced costs, and accelerated digital transformations. Automated AI/ML solutions, like those that power Zengines, can provide meaningful benefit to TCO over time and  should be factored into the cost analysis.

Make sure to consider costs, benefits, TCO and ROI when assessing your options.

The Zengines Advantage

If you’re looking for a comprehensive assessment of what makes the Zengines platform stand out from other data lineage solutions, here it is in a nutshell:

  • Zengines Mainframe Data Lineage offers a unique approach to data lineage with its robust ability to ingest and parse COBOL programs so that businesses can now have data lineage inclusive of legacy technology.  Zengines de-risks the mainframe “black box” and enables business to better and more quickly manage, modernize, or migrate mainframes.  
  • Zengines comes backed by a wide range of software companies, businesses, consulting firms, and other enterprises that have successfully used our software solutions.
  • Our data lineage is part of our larger frictionless data conversion and integration solutions designed to speed up the notoriously slow process of proper data management. We use AI/ML to automate and accelerate the process, from implementation through ongoing use, helping your data work for you rather than get in the way of your core functions.
  • Our ZKG (Zengines Knowledge Graph) is a proprietary, industry-specific database that is always growing and providing more detailed information for our algorithms.

Our automated solutions create frictionless, sped-up solutions that reduce risk, lower costs, and create more accessible data lineage solutions.

Making the Final Decision

As you assess your data lineage tool choices, keep the above factors in mind. What are your industry and organizational requirements? Focus on key features like automation and integration capabilities. Consider implementation, training, user experience, ROI, and comprehensive cost analyses. 

Use this framework to help create stakeholder buy-in for your strategy. Then, select your tool with confidence, knowing you are organizing your data’s past to improve your present and lay the groundwork for a more successful future.

If you have any follow-up questions about data lineage and what makes a software solution particularly effective and relevant in this field, our team at Zengines can help. Reach out for a consultation, and together, we can explore how to create a clean, transparent, and effective future for your data.

You may also like

The 2008 financial crisis exposed a shocking truth: major banks couldn't accurately report their own risk exposures in real-time.

When Lehman Brothers collapsed, regulators discovered that institutions didn't know their actual exposure to toxic assets -- not because they were hiding it, but because they genuinely couldn't aggregate their own data fast enough.

Fifteen years and billions in compliance spending later, only 2 out of 31 Global Systemically Important Banks fully comply with BCBS-239 -- the regulation designed to prevent this exact problem.

The bottleneck? Data lineage.

Who Must Comply and When

BCBS-239 applies from January 1, 2016 for Global Systemically Important Banks (G-SIBs) and is recommended by national supervisors for Domestic Systemically Important Banks (D-SIBs) three years after their designation. In practice, this means hundreds of banks worldwide are now expected to comply.

Unlike regulations with fixed annual filing deadlines, BCBS-239 is an ongoing compliance requirement. Supervisors can test a bank's compliance with occasional requests on selected risk issues with short deadlines, gauging a bank's capacity to aggregate risk data rapidly and produce risk reports.

Think of it as a fire drill that can happen at any moment -- and with increasingly serious consequences for failure.

The Sobering Statistics

More than a decade after publication and eight years past the compliance deadline, the results are dismal. Only 2 out of 31 assessed Global Systemically Important Banks fully comply with all principles, and no single principle has been fully implemented by all banks.

Even more troubling, the compliance level across all principles barely improved from an average of 3.14 in 2019 to 3.17 in 2022 on a scale of 1 ("non-compliant") to 4 ("fully compliant"). At this rate of improvement, full compliance is decades away.

What Happens If Your Bank Fails BCBS-239 Compliance?

The consequences are escalating. The ECB guide explicitly mentions:

  • Enforcement actions against the institution
  • Capital add-ons to compensate for data risk
  • Removal of responsible executives who fail to drive compliance
  • Operational restrictions on new business lines or acquisitions

The Basel Committee makes it clear that banks' progress towards BCBS 239 compliance in recent years has not been satisfactory and that increased measures on the part of the supervisory authorities are to be expected to accelerate implementation.

What Banks Are Doing (And Why It's Not Enough)

Most banks have responded to BCBS-239 with predictable tactics:

  • Governance restructuring: Creating Chief Data Officer roles and data governance committees
  • Policy documentation: Writing comprehensive data management policies and frameworks
  • Technology investments: Purchasing disparate tools like data catalogs, metadata management tools, and master data management platforms
  • Remediation programs: Launching multi-year, multi-million dollar compliance initiatives

These tactics, as positive steps forward, are necessary but not sufficient to meeting compliance. In other words, they're checking boxes without fundamentally solving the problem.

The issue? Banks are treating BCBS-239 like a project with an end date, when it's actually an operational capability that must be demonstrated continuously.

The Data Lineage Bottleneck

Among the 14 principles, one capability has emerged as the make-or-break factor for compliance: data lineage.

Data lineage has been identified as one of the key challenges that banks have faced in aligning to the BCBS-239 principles, as it is one of the more time consuming and resource intensive activities demanded by the regulation.

Why Data Lineage Is Different

Data lineage -- the ability to trace data from its original source through every transformation to its final destination -- sits at the intersection of virtually every BCBS-239 principle. The European Central Bank refers to data lineage as "a minimum requirement of data governance" in the latest BCBS 239 recommendations.

Here's why lineage is uniquely difficult:

It's invisible until you need it.
Unlike a data governance policy you can show an auditor or a data quality dashboard you can pull up, lineage is about proving flows, transformations, and dependencies that exist across dozens or hundreds of systems. You can't fake it in a PowerPoint.

It crosses organizational and system boundaries.
Complete lineage requires cooperation between IT, risk, finance, operations, and business units -- each with their own priorities, systems, and definitions. Further, data hand-off occurs in and between systems, databases and files, which adds to the complexity of connecting what happens at each hand-off.  Regulators are increasingly requiring detailed traceability of reported information, which can only be achieved through lineage across organizations and systems.

It must be current and complete.
The ECB requires "complete and up-to-date data lineages on data attribute level (starting from data capture and including extraction, transformation and loading) for the risk indicators, and their critical data elements." A lineage document from six months ago is worthless if your systems have changed.

It must work under pressure.
Supervisors increasingly require institutions to demonstrate the effectiveness of their data frameworks through on-site inspections and fire drills, with data lineage providing the audit trail necessary for these reviews. When a regulator asks "prove this number came from where you say it came from," you have hours -- not days -- to respond.

The Eight Principles That Demand Data Lineage Proof

While 11 of the 14 principles benefit from good data lineage, regulatory guidance makes it explicitly mandatory for eight:

  • Principle 2 (Data Architecture): Demonstrate integrated data architecture through documented lineage flows
  • Principle 3 (Accuracy & Integrity): Prove data accuracy by showing traceable lineage from source to report
  • Principle 4 (Completeness): Demonstrate comprehensive risk coverage through lineage mapping
  • Principle 6 (Adaptability): Respond to ad-hoc requests using lineage to quickly identify relevant data
  • Principle 7 (Report Accuracy): Validate report numbers through documented lineage and audit trails
  • Principles 12-14 (Supervisory Review): Provide lineage evidence during audits and fire drills

The Technology Gap: Why Traditional Tools Fall Short

Most banks have invested heavily in data catalogs, metadata management platforms, and governance frameworks. Yet they still can't produce lineage evidence under audit conditions. Why?

Traditional approaches have three fatal flaws:

1. Manual Documentation

Excel-based lineage documentation becomes outdated within weeks as systems change. By the time you finish documenting one data flow, three others have been modified. Manual approaches simply can't keep pace with modern banking environments.

2. Point Solutions that only support newer applications

Modern data lineage tools can map cloud warehouses and APIs, but they hit a wall when they encounter legacy mainframe systems. They can't parse COBOL code, decode JCL job schedulers, or trace data through decades-old custom applications -- exactly where banks' most critical risk calculations often live.

3. Incomplete Coverage

Lineage that stops at the data warehouse is fundamentally incomplete under BCBS-239's end-to-end data lineage requirements. Regulators want to see the complete path -- from original source system through every transformation, including hard-coded business logic in legacy applications, to the final risk report. Most tools miss 40-70% of the actual transformation logic.

How AI-Powered Data Lineage Changes the Game

This is where AI-powered solutions like Zengines fundamentally differ from traditional approaches.

Instead of manually documenting lineage, Zengines can automatically and comprehensively:

  • Parse legacy mainframe code (COBOL, RPG, Focus, etc) to extract data flows and transformation logic
  • Trace calculations backward from any report field to ultimate source systems
  • Document relationships between tables, fields, programs, files and job schedulers
  • Generate audit-ready evidence in minutes instead of months
  • Maintain relevancy and currency through lineage updates as code changes

Solving the "Black Box" Problem

For many banks, the biggest lineage gap isn't in modern systems -- it's in legacy mainframes where critical risk calculations were encoded 20-60 years ago by developers who have long since retired. These systems are literal "black boxes": they produce numbers, but no one can explain how.

Zengines' Mainframe Data Lineage capability specifically addresses this challenge by:

  • Parsing COBOL and RPG modules to expose calculation logic and data dependencies
  • Tracing variables across millions of lines of legacy code
  • Identifying hard-coded values, conditional logic, and branching statements
  • Visualizing data flows across interconnected mainframe programs and external files
  • Extracting "requirements" that were never formally documented but are embedded in code

This capability is essential for banks that need to prove how legacy calculations work -- whether for regulatory compliance, system modernization, or simply understanding their own risk models.

Assessment: Can Your Bank Prove Compliance Right Now?

The critical question isn't "Do we have data lineage?" It's "Can we prove compliance through data lineage right now, under audit conditions, with short notice?"

Most banks would answer: "Well, sort of..."

That's not good enough anymore.

We've translated ECB supervisory expectations into a practical, principle-by-principle checklist. This isn't about aspirational capabilities or future roadmaps -- it's about what you can demonstrate today, under audit conditions, with short notice.

The Bottom Line

The bottleneck to full BCBS-239 compliance is clear: data lineage.

Traditional approaches -- manual documentation, point solutions, incomplete coverage -- can't solve this problem fast enough. The compliance deadline was 2016. Enforcement is escalating. Fire drills are becoming more frequent and demanding.

Banks that solve the lineage challenge with AI-powered automation will demonstrate compliance in hours instead of months. Those that don't will continue struggling with the same gaps, facing increasing regulatory pressure, and risking enforcement actions.

The technology to solve this exists today. The question is: how long can your bank afford to wait?

Schedule a demo with our team today to get started.

BOSTON, MA – October 29, 2025 – Zengines, the AI-powered data migration and data lineage platform, announces expanded support for RPG (Report Program Generator) language in its Data Lineage product. Organizations running IBM i (AS/400) systems can now rapidly analyze legacy RPG code alongside COBOL, dramatically accelerating modernization initiatives while reducing dependency on scarce programming expertise.

Breaking Through the RPG "Black Box"

Many enterprises still rely on mission-critical applications written in RPG decades ago, creating what Zengines calls the "black box" problem – legacy technology where business logic, data flows, and requirements are locked away in legacy code with little to no documentation. As companies undertake digital transformation and cloud migration initiatives, understanding these legacy systems has become a critical bottleneck.

The challenge with RPG is particularly acute. While COBOL's descriptive, English-like syntax makes it easier to "read," RPG's fixed-format column specifications and cryptic operation codes require developers to decode what goes in which column while tracing through numbered indicators to follow the logic. This complexity, combined with a shrinking pool of RPG expertise, makes understanding these systems even more critical—and difficult—than their COBOL counterparts.

"The majority of our enterprise customers are running legacy technology across multiple platforms – both mainframe COBOL environments and IBM i systems with RPG code," said Caitlyn Truong, CEO of Zengines. "By expanding our support to include RPG alongside COBOL, we can now address the full spectrum of legacy code challenges these organizations face. This means our customers can leverage a single AI-powered platform to comprehensively analyze, understand and modernize their legacy technology estate, rather than cobbling together multiple point solutions or relying on increasingly scarce programming expertise across different languages and systems."

Minutes, Not Months: AI-Powered Legacy Code Analysis

The enhanced Zengines Data Lineage platform automatically ingests RPG code, job schedulers, and related artifacts to deliver:

  • Interactive data lineage visualization – Graphical representation of data paths, sources, and hard-coded values
  • Comprehensive code intelligence – Relationships between modules, tables, fields, variables, and files
  • Business logic extraction – Calculation logic, branching conditions, and transformation rules
  • Actionable insights – Tables and fields inventory, profiling, and impact analysis

This capability is critical for organizations navigating system replacements, M&A integrations, compliance initiatives, and technology modernization programs where understanding legacy RPG logic is essential for de-risking implementations.

Real-World Impact: From Guesswork to Precision

Managing and modernizing legacy systems break down when teams lack complete understanding of existing logic. Migrations stall when teams cannot achieve functional coverage or resolve test failures. When validating new systems against legacy outputs, discrepancies inevitably emerge – but without understanding why the old system produces specific results, teams cannot effectively test, replicate, or improve functionality.

"Our customers use Zengines to reverse-engineer business requirements from legacy code," added Truong. "When a new system returns a different result for an interest calculation compared to that of  the 40-year-old RPG program, teams need to understand the original logic to make informed decisions about what to preserve and what to update. That's the power of shining a light into the black box."

Immediate Availability

RPG parsing capability is now available on Zengines Data Lineage platform. Organizations can analyze both COBOL and RPG codebases within a single integrated platform.

About Zengines

Zengines is a technology company that transforms how organizations handle data migrations and modernization inititatives. Zengines serves business analysts, developers, and transformation leaders who need to map, change, and move data across systems. With deep expertise in AI, data migration, and legacy systems, Zengines helps organizations reduce time, cost, and risk associated with their most challenging data initiatives.

Media Contact:

Todd Stone

President, Zengines

todd@zengines.ai

IBM's RPG (Report Program Generator) began in 1959 with a simple mission: generate business reports quickly and efficiently. What started as RPG I evolved through multiple generations - RPG II, RPG III, RPG LE, and RPG IV - each adding capabilities that transformed it from a simple report tool into a full-featured business programming language. Today, RPG powers critical business applications across countless AS/400, iSeries, and IBM i systems. Yet for modern developers, understanding RPG's unique approach and legacy codebase presents distinct challenges that make comprehensive data lineage essential.

The Strengths That Made RPG Indispensable

Built-in Program Cycle: RPG's fixed-logic cycle automatically handled file operations, making database processing incredibly efficient. The cycle read records, processed them, and wrote output with minimal programmer intervention - a major strength that processed data sequentially, making it ideal for report generation and business data handling.

Native Database Integration: RPG was designed specifically for IBM's database systems, providing direct interaction with database files and making it ideal for transactional systems where fast and reliable data processing is essential. It offered native access to DB2/400 and its predecessors, with automatic record locking, journaling, and data integrity features.

Rapid Business Application Development: For its intended purpose - business reports and data processing - RPG was remarkably fast to code. The fixed-format specifications (H, F, D, C specs) provided a structured framework that enforced consistency and simplified application creation.

Exceptional Performance and Scalability: RPG applications typically ran with exceptional efficiency on IBM hardware, processing massive datasets with minimal resource consumption. RPG programming language has the ability to handle large volumes of data efficiently.

Evolutionary Compatibility: The language's evolution path meant that RPG II code could often run unchanged on modern IBM i systems - a testament to IBM's commitment to backward compatibility that spans over 50 years.

The Variations That Created Complexity

RPG II (Late 1960s): The classic fixed-format version with its distinctive column-specific coding rules and built-in program logic cycle, used on System/3, System/32, System/34, and System/36.

RPG III (1978): Added subroutines, improved file handling, and more flexible data structures while maintaining the core cycle approach. Introduced with System/38, later rebranded as "RPG/400" on AS/400.

RPG LE - Limited Edition (1995): A simplified version of RPG IV designed for smaller systems, notably including a free compiler to improve accessibility.

RPG IV/ILE RPG (1994): The major evolution that introduced modular programming with procedures, prototypes, and the ability to create service programs within the Integrated Language Environment - finally bringing modern programming concepts to RPG.

Free-Format RPG (2013): Added within RPG IV, this broke away from the rigid column requirements while maintaining backward compatibility, allowing developers to write code similar to modern languages.

The Weaknesses That Challenge Today's Developers

Steep Learning Curve: RPG's fixed-logic cycle and column-specific formatting are unlike any modern programming language. New developers must understand both the language syntax and the underlying program cycle concept, which can be particularly challenging.

Limited Object-Oriented Capabilities: Even modern RPG versions lack full object-oriented programming capabilities, making it difficult to apply contemporary design patterns and architectural approaches.

Cryptic Operation Codes: Traditional RPG used operation codes like "CHAIN," "SETLL," and "READE" with rigid column requirements that aren't intuitive to developers trained in modern, free-format languages.

Complex Maintenance Due to Evolution: The evolution from RPG II (late 1960s) through RPG III (1978) to RPG IV/ILE RPG (1994) and finally free-format coding (2013) created hybrid codebases mixing multiple RPG styles across nearly 50 years of development, making maintenance and understanding complex for teams working across different generations of the language.

Proprietary IBM-Only Ecosystem: Unlike standardized languages, RPG has always been IBM's proprietary language, creating vendor lock-in and concentrating expertise among IBM specialists rather than fostering broader community development.

The Legacy Code Challenge: Why RPG Is Particularly Difficult Today

RPG presents unique challenges that go beyond typical legacy system issues, rooted in decades of development practices:

  • Multiple Format Styles in Single Systems: A single system might contain RPG II fixed-format code (1960s-70s), RPG III subroutines (1978+), RPG LE simplified code (1995+), and RPG IV/ILE procedures with free-format sections (1994+) - all working together but following different conventions and programming paradigms developed across 50+ years, making unified understanding extremely challenging.
  • Embedded Business Logic: RPG's tight integration with IBM databases means business rules are often embedded directly in database access operations and the program cycle itself, making them hard to identify, extract, and document independently.
  • Minimal Documentation Culture: The RPG community traditionally relied on the language's self-documenting nature and the assumption that the program cycle made logic obvious, but this assumption breaks down when dealing with complex business logic or when original developers are no longer available.
  • Proprietary Ecosystem Isolation: RPG development was largely isolated within IBM midrange systems, creating knowledge silos. Unlike languages with broader communities and extensive online resources, RPG expertise became concentrated among IBM specialists, limiting knowledge transfer.
  • External File Dependencies: RPG applications often depend on externally described files (DDS) where data structure definitions live outside the program code, making data relationships and dependencies difficult to trace without specialized tools.

Making Sense of RPG Complexity: The Data Lineage Solution

Given these unique challenges - multiple format styles, embedded business logic, and lost institutional knowledge - how do modern teams gain control over their RPG systems without risking business disruption? The answer lies in understanding what your systems actually do before attempting to change them.

Modern data lineage tools provide exactly this understanding by:

  • Analyzing all RPG variants within a single system, providing unified visibility across decades of development spanning RPG II through modern free-format code.
  • Mapping database relationships from database fields through program logic to output destinations, since RPG applications are inherently database-centric.
  • Discovering business rules by analyzing how data transforms as it moves through RPG programs, helping teams reverse-engineer undocumented logic.
  • Assessing impact before making changes, identifying all downstream dependencies - crucial given RPG's tight integration with business processes.
  • Planning modernization by understanding data flows, helping teams make informed decisions about which RPG components to modernize, replace, or retain.

The Bottom Line

RPG systems represent decades of business logic investment that often process a company's most critical transactions. While the language may seem archaic to modern eyes, the business logic it contains is frequently irreplaceable. Success in managing RPG systems requires treating them not as outdated code, but as repositories of critical business knowledge that need proper mapping and understanding.

Data lineage tools bridge the gap between RPG's unique characteristics and modern development practices, providing the visibility needed to safely maintain, enhance, plan modernization initiatives, extract business rules, and ensure data integrity during system changes. They make these valuable systems maintainable and evolutionary rather than simply survivable.

Interested in preserving and understanding your RPG-based systems?  Schedule a demo today.

Subscribe to our Insights