Articles

How to Choose the Best Data Lineage Tools for Your Organization's Needs

June 19, 2025
Gregory Shoup

Data lineage is the process of tracking data usage within your organization. This includes how data originates, how it is transformed, how it is calculated, its movement between different systems, and ultimately how it is utilized in applications, reporting, analysis, and decision-making. This is a crucial capability for any modern ecosystem, as the amount of data businesses generate and store increases every year. 

As of 2024, 64% of organizations manage at least one petabyte of data — and 41% have at least 500 petabytes of information within their systems. In many industries, like banking and insurance, this includes legacy data that spans not just systems but eras of technology.

As the data volume grows, so does the need to aid the business with trust in access to that data. Thus, it is important for companies to invest in data lineage initiatives to improve data governance, quality, and transparency. If you’re shopping for a data lineage tool, there are many cutting-edge options. The cloud-based Zengines platform uses an innovative artificial intelligence-powered model that includes data lineage capabilities to support clean, consistent, and well-organized data.

Whether you go with Zengines or something else, though, it’s important to be strategic in your decision-making. Here is a step-by-step process to help you choose the best data lineage tools for your organization’s needs.

Understanding Data Lineage Tool Requirements

Start by ensuring your selection team has a thorough understanding of not just data lineage as a concept but also the requirements that your particular data lineage tools must have.

First, consider core data lineage tool functionalities that every company needs. For example, you want to be able to access a clear visualization of the relationship between complex data across programs and systems at a glance. Impact analysis also provides a clear picture of how change will influence your current data system.

In addition, review technology-specific data-lineage needs, such as the need to ingest legacy codebases like COBOL. Compliance and regulatory requirements vary from one industry to the next, too. They also change often. Make sure you’re aware of both business operations needs and what is expected of the business from a compliance and legal perspective.

Also, consider future growth. Can the tool you select support the data as you scale? Don’t hamstring momentum down the road by short-changing your data lineage capabilities in the present.

Key Features to Look for in Data Lineage Tools

When you begin to review specific data lineage tools, you want to know what features to prioritize. Here are six key areas to focus on:

  1. Automated metadata collection: Automation should be a feature throughout any data lineage tool at this point. However, the specific ability to automate the collection of metadata from internal solutions and data catalogs is critical to sustainable data lineage activity over time.
  2. Integration capabilities: Data lineage tools must be able to comprehensively integrate across entities that store data — past, present, and future. This is where a tool like Zengines shines, where accessing legacy data in legacy technology has been the #1 challenge for most organizations. 
  3. End-to-end visibility: Data lineage must provide a clear picture of the respective data paths from beginning to end. This is a fundamental element of quality data lineage analysis.
  4. Impact analysis and research capabilities: Leading solutions make it easy for users to obtain and understand impact analysis for any data path or data changes showing data relationships and dependencies.. Further, the ability to seamlessly research across data entities assists in confidence for such analysis.
  5. Tracking and monitoring: Data lineage is an ongoing activity, thus data lineage tools must be able to keep up with ongoing data change within an organization.
  6. Visualization features: Visualizations - such as logic graphs - should provide comprehensive data paths across the data life cycle.

Keep these factors in mind and make sure whatever tool you choose satisfies these basic requirements.

Implementation and User Experience Considerations

Along with specific features, you want to assess how easy it is to implement the tool and how easy it is to use the tool.  

Start with setup. Consider how well each data lineage software solution is designed to implement within and configure to your system. For businesses that  built technology solutions before the 1980s, you may have critical business operations that run on mainframes. Make sure a data lineage tool will be able to easily integrate into a complex system before signing off on it.

Consider the learning curve and usability too. Does the tool have an intuitive interface? Are there complex training requirements? Is the information and operation accessible? 

Cost Analysis and ROI

When considering the cost of a data lineage software solution, there are a few factors to keep in mind. Here are the top elements that can influence expenses when implementing and using a tool like this over time:

  • Direct cost: Take the time to compare the up-front cost of each option. Recognize and account for capability differentiation e.g., ability to integrate with legacy technology, necessary to generate a comprehensive end-to-end data lineage. The lowest-cost option is unlikely to include this necessary capability. . 
  • Benefit analysis: : Consider benefits across both quantitative (e.g., time savings for automated data lineage versus manual data lineage search) and qualitative (e.g., cost avoidance for compliance and regulatory fines). 
  • Total cost of ownership (TCO): Consider the big picture with your investment. Are there licensing and subscription fees? Implementation costs? Ongoing maintenance and support expenses? These should be clarified before making a decision.
  • Expected return on investment: Your ROI will depend on things like speed of implementation, reduced costs, and accelerated digital transformations. Automated AI/ML solutions, like those that power Zengines, can provide meaningful benefit to TCO over time and  should be factored into the cost analysis.

Make sure to consider costs, benefits, TCO and ROI when assessing your options.

The Zengines Advantage

If you’re looking for a comprehensive assessment of what makes the Zengines platform stand out from other data lineage solutions, here it is in a nutshell:

  • Zengines Mainframe Data Lineage offers a unique approach to data lineage with its robust ability to ingest and parse COBOL programs so that businesses can now have data lineage inclusive of legacy technology.  Zengines de-risks the mainframe “black box” and enables business to better and more quickly manage, modernize, or migrate mainframes.  
  • Zengines comes backed by a wide range of software companies, businesses, consulting firms, and other enterprises that have successfully used our software solutions.
  • Our data lineage is part of our larger frictionless data conversion and integration solutions designed to speed up the notoriously slow process of proper data management. We use AI/ML to automate and accelerate the process, from implementation through ongoing use, helping your data work for you rather than get in the way of your core functions.
  • Our ZKG (Zengines Knowledge Graph) is a proprietary, industry-specific database that is always growing and providing more detailed information for our algorithms.

Our automated solutions create frictionless, sped-up solutions that reduce risk, lower costs, and create more accessible data lineage solutions.

Making the Final Decision

As you assess your data lineage tool choices, keep the above factors in mind. What are your industry and organizational requirements? Focus on key features like automation and integration capabilities. Consider implementation, training, user experience, ROI, and comprehensive cost analyses. 

Use this framework to help create stakeholder buy-in for your strategy. Then, select your tool with confidence, knowing you are organizing your data’s past to improve your present and lay the groundwork for a more successful future.

If you have any follow-up questions about data lineage and what makes a software solution particularly effective and relevant in this field, our team at Zengines can help. Reach out for a consultation, and together, we can explore how to create a clean, transparent, and effective future for your data.

You may also like

RPG and Its Evolution: From Report Generator to Business Logic Powerhouse

IBM's RPG (Report Program Generator) began in 1959 with a simple mission: generate business reports quickly and efficiently. What started as RPG I evolved through multiple generations - RPG II, RPG III, RPG LE, and RPG IV - each adding capabilities that transformed it from a simple report tool into a full-featured business programming language. Today, RPG powers critical business applications across countless AS/400, iSeries, and IBM i systems. Yet for modern developers, understanding RPG's unique approach and legacy codebase presents distinct challenges that make comprehensive data lineage essential.

The Strengths That Made RPG Indispensable

Built-in Program Cycle: RPG's fixed-logic cycle automatically handled file operations, making database processing incredibly efficient. The cycle read records, processed them, and wrote output with minimal programmer intervention - a major strength that processed data sequentially, making it ideal for report generation and business data handling.

Native Database Integration: RPG was designed specifically for IBM's database systems, providing direct interaction with database files and making it ideal for transactional systems where fast and reliable data processing is essential. It offered native access to DB2/400 and its predecessors, with automatic record locking, journaling, and data integrity features.

Rapid Business Application Development: For its intended purpose - business reports and data processing - RPG was remarkably fast to code. The fixed-format specifications (H, F, D, C specs) provided a structured framework that enforced consistency and simplified application creation.

Exceptional Performance and Scalability: RPG applications typically ran with exceptional efficiency on IBM hardware, processing massive datasets with minimal resource consumption. RPG programming language has the ability to handle large volumes of data efficiently.

Evolutionary Compatibility: The language's evolution path meant that RPG II code could often run unchanged on modern IBM i systems - a testament to IBM's commitment to backward compatibility that spans over 50 years.

The Variations That Created Complexity

RPG II (Late 1960s): The classic fixed-format version with its distinctive column-specific coding rules and built-in program logic cycle, used on System/3, System/32, System/34, and System/36.

RPG III (1978): Added subroutines, improved file handling, and more flexible data structures while maintaining the core cycle approach. Introduced with System/38, later rebranded as "RPG/400" on AS/400.

RPG LE - Limited Edition (1995): A simplified version of RPG IV designed for smaller systems, notably including a free compiler to improve accessibility.

RPG IV/ILE RPG (1994): The major evolution that introduced modular programming with procedures, prototypes, and the ability to create service programs within the Integrated Language Environment - finally bringing modern programming concepts to RPG.

Free-Format RPG (2013): Added within RPG IV, this broke away from the rigid column requirements while maintaining backward compatibility, allowing developers to write code similar to modern languages.

The Weaknesses That Challenge Today's Developers

Steep Learning Curve: RPG's fixed-logic cycle and column-specific formatting are unlike any modern programming language. New developers must understand both the language syntax and the underlying program cycle concept, which can be particularly challenging.

Limited Object-Oriented Capabilities: Even modern RPG versions lack full object-oriented programming capabilities, making it difficult to apply contemporary design patterns and architectural approaches.

Cryptic Operation Codes: Traditional RPG used operation codes like "CHAIN," "SETLL," and "READE" with rigid column requirements that aren't intuitive to developers trained in modern, free-format languages.

Complex Maintenance Due to Evolution: The evolution from RPG II (late 1960s) through RPG III (1978) to RPG IV/ILE RPG (1994) and finally free-format coding (2013) created hybrid codebases mixing multiple RPG styles across nearly 50 years of development, making maintenance and understanding complex for teams working across different generations of the language.

Proprietary IBM-Only Ecosystem: Unlike standardized languages, RPG has always been IBM's proprietary language, creating vendor lock-in and concentrating expertise among IBM specialists rather than fostering broader community development.

The Legacy Code Challenge: Why RPG Is Particularly Difficult Today

RPG presents unique challenges that go beyond typical legacy system issues, rooted in decades of development practices:

Multiple Format Styles in Single Systems: A single system might contain RPG II fixed-format code (1960s-70s), RPG III subroutines (1978+), RPG LE simplified code (1995+), and RPG IV/ILE procedures with free-format sections (1994+) - all working together but following different conventions and programming paradigms developed across 50+ years, making unified understanding extremely challenging.

Embedded Business Logic: RPG's tight integration with IBM databases means business rules are often embedded directly in database access operations and the program cycle itself, making them hard to identify, extract, and document independently.

Minimal Documentation Culture: The RPG community traditionally relied on the language's self-documenting nature and the assumption that the program cycle made logic obvious, but this assumption breaks down when dealing with complex business logic or when original developers are no longer available.

Proprietary Ecosystem Isolation: RPG development was largely isolated within IBM midrange systems, creating knowledge silos. Unlike languages with broader communities and extensive online resources, RPG expertise became concentrated among IBM specialists, limiting knowledge transfer.

External File Dependencies: RPG applications often depend on externally described files (DDS) where data structure definitions live outside the program code, making data relationships and dependencies difficult to trace without specialized tools.

Making Sense of RPG Complexity: The Data Lineage Solution

Given these unique challenges - multiple format styles, embedded business logic, and lost institutional knowledge - how do modern teams gain control over their RPG systems without risking business disruption? The answer lies in understanding what your systems actually do before attempting to change them.

Modern data lineage tools provide exactly this understanding by:

Analyzing all RPG variants within a single system, providing unified visibility across decades of development spanning RPG II through modern free-format code.

Mapping database relationships from database fields through program logic to output destinations, since RPG applications are inherently database-centric.

Discovering business rules by analyzing how data transforms as it moves through RPG programs, helping teams reverse-engineer undocumented logic.

Assessing impact before making changes, identifying all downstream dependencies - crucial given RPG's tight integration with business processes.

Planning modernization by understanding data flows, helping teams make informed decisions about which RPG components to modernize, replace, or retain.

The Bottom Line

RPG systems represent decades of business logic investment that often process a company's most critical transactions. While the language may seem archaic to modern eyes, the business logic it contains is frequently irreplaceable. Success in managing RPG systems requires treating them not as outdated code, but as repositories of critical business knowledge that need proper mapping and understanding.

Data lineage tools bridge the gap between RPG's unique characteristics and modern development practices, providing the visibility needed to safely maintain, enhance, plan modernization initiatives, extract business rules, and ensure data integrity during system changes. They make these valuable systems maintainable and evolutionary rather than simply survivable.

Interested in preserving and understanding your RPG-based systems?  Call Zengines for a demo today.

Understanding COBOL: The Backbone of Business Computing That Still Powers Our World

When Grace Hopper and her team developed COBOL (Common Business-Oriented Language) in the late 1950s, they created something revolutionary: a programming language that business people could actually read. Today, over 65 years later, COBOL still processes an estimated 95% of ATM transactions and 80% of in-person transactions worldwide. Yet for modern development teams, working with COBOL systems presents unique challenges that make data lineage tools absolutely critical.

The Strengths That Made COBOL Legendary

English-Like Readability: COBOL's English-like syntax is self-documenting and nearly self-explanatory, with an emphasis on verbosity and readability. Commands like MOVE CUSTOMER-NAME TO PRINT-LINE or IF ACCOUNT-BALANCE IS GREATER THAN ZERO made business logic transparent to non-programmers, setting it apart from more cryptic languages like FORTRAN. This was revolutionary - before COBOL, business logic looked like assembly language (L 5,CUSTNAME followed by ST 5,PRINTAREA) or early FORTRAN with mathematical notation that business managers couldn't decipher.

Precision Decimal Arithmetic: One of COBOL's biggest strengths is its strong support for large-precision fixed-point decimal calculations, a feature not necessarily native to many traditional programming languages. This capability helped set COBOL apart and drive its adoption by many large financial institutions. This eliminates floating-point errors critical in financial calculations.

Proven Stability and Scale: COBOL's imperative, procedural and (in its newer iterations) object-oriented configuration serves as the foundation for more than 40% of all online banking systems, supports 80% of in-person credit card transactions, handles 95% of all ATM transactions, and powers systems that generate more than USD 3 billion of commerce each day.

The Weaknesses That Challenge Today’s Teams

Excessive Verbosity: COBOL uses over 300 reserved words compared to more succinct languages. What made COBOL readable also made it lengthy, often resulting in monolithic programs that are hard to comprehend as a whole, despite their local readability.

Poor Structured Programming Support: COBOL has been criticized for its poor support for structured programming. The language lacks modern programming concepts like comprehensive object orientation, dynamic memory allocation, and advanced data structures that developers expect today.

Rigid Architecture and Maintenance Issues: By 1984, maintainers of COBOL programs were struggling to deal with "incomprehensible" code, leading to major changes in COBOL-85 to help ease maintenance. The language's structure makes refactoring challenging, with changes cascading unpredictably through interconnected programs.

Limited Standard Library: COBOL lacks a large standard library, specifying only 43 statements, 87 functions, and just one class, limiting built-in functionality compared to modern languages.

Problematic Standardization Journey: While COBOL was standardized by ANSI in 1968, standardization was more aspirational than practical. By 2001, around 300 COBOL dialects had been created, and the 1974 standard's modular structure permitted 104,976 possible variants. COBOL-85 faced significant controversy and wasn't fully compatible with earlier versions, with the ANSI committee receiving over 2,200 mostly negative public responses. Vendor extensions continued to create portability challenges despite formal standards.

The Legacy Challenge: Why COBOL Is Hard to Master Today

The biggest challenge isn't the language itself - it's the development ecosystem and practices that evolved around it from the 1960s through 1990s:

Inconsistent Documentation Standards: Many COBOL systems were built when comprehensive documentation was considered optional rather than essential. Comments were sparse, and business logic was often embedded directly in code without adequate explanation of business context or decision rationale.

Absence of Modern Development Practices: Early COBOL development predated modern version control systems, code review processes, and structured testing methodologies. Understanding how a program evolved - or why specific changes were made - is often impossible without institutional knowledge.

Monolithic Architecture: COBOL applications were typically built as large, interconnected systems where data flows through multiple programs in ways that aren't immediately obvious, making impact analysis extremely difficult.

Proprietary Vendor Extensions: While COBOL had standards, each vendor added extensions and enhancements. IBM's COBOL differs from Unisys COBOL, creating vendor lock-in that complicates understanding and portability.

Lost Institutional Knowledge: The business analysts and programmers who built these systems often retired without transferring their institutional knowledge about why certain design decisions were made, leaving current teams to reverse-engineer business requirements from code.

Why Data Lineage Is Your COBOL Lifeline

This is where modern data lineage tools become invaluable for teams working with COBOL systems:

  • Automated Documentation: Lineage tools can map data flows across hundreds of COBOL programs, creating the documentation that was never written
  • Impact Analysis: Before making changes, teams can see exactly which programs, files, and downstream systems will be affected
  • Business Context: By tracing data from source to consumption, teams can understand the business purpose behind complex COBOL logic
  • Risk Reduction: Visual data flows help prevent the costly mistakes that come from modifying poorly understood legacy systems

The Bottom Line

COBOL's deep embedding in critical business processes represents a significant business challenge and risk that organizations must address. Success with COBOL modernization - whether maintaining, replacing, or transforming these systems - requires treating them as the complex, interconnected ecosystems they are. Data lineage tools provide the missing roadmap that makes COBOL systems understandable and manageable, enabling informed decisions about their future.

The next time you make an online payment, remember: there's probably COBOL code processing your transaction. And somewhere, a development team is using data lineage tools to keep that decades-old code running smoothly in our modern world.

To see and navigate your COBOL code in seconds, call Zengines.

The Biggest Mistakes in Mainframe Modernization

Mistake #1: Underestimating embedded complexity.

Mainframe systems combine complex data formats AND decades of embedded business rules that create a web of interdependent complexity. VSAM files aren't simple databases - they contain redefinitions, multi-view records, and conditional logic that determines data values based on business states. COBOL programs embed business intelligence like customer-type based calculations, regulatory compliance rules, and transaction processing logic that's often undocumented. Teams treating mainframe data like standard files discover painful surprises during migration when they realize the "data" includes decades of business logic scattered throughout conditional statements and 88-level condition names. This complexity extends to testing: converting COBOL business rules and EBCDIC data formats demands extensive validation that most distributed-system testers can't handle without deep mainframe expertise.

Mistake #2: Delaying dependency discovery.

Mainframes feed dozens of systems through complex webs of middleware like WebSphere, CICS Transaction Gateway, Enterprise Service Bus, plus shared utilities, schedulers, and business processes. The costly mistake is waiting too long to thoroughly map all these connections, especially downstream data feeds and consumption patterns. Your data lineage must capture every system consuming mainframe data, from reporting tools to partner integrations, because modernization projects can't go live when teams discover late in development that preserving these data feeds and business process expectations requires extensive rework that wasn't budgeted or planned.

Mistake #3: Tolerating knowledge bottlenecks.

Relying on two or three mainframe experts for a million-line modernization project creates a devastating traffic jam where entire teams sit idle waiting for answers. Around 60% of mainframe specialists are approaching retirement, yet organizations attempt massive COBOL conversions with skeleton crews already stretched thin by daily operations. Your expensive development team, cloud architects, and business analysts become inefficient and underutilized because everything funnels through the same overworked experts.  The business logic embedded in decades-old COBOL programs often exists nowhere else, creating dangerous single points of failure that can derail years of investment and waste millions in team resources.

Mistake #4: Modernizing everything indiscriminately.

Organizations waste enormous effort converting obsolete, duplicate, and inefficient code that should be retired or consolidated instead. Mainframe systems often contain massive amounts of redundant code - programs copied by developers who didn't understand dependencies, inefficient routines that were never optimized, and abandoned utilities that no longer serve any purpose. Research shows that 80% of legacy code hasn't been modified in over 5 years, yet teams spend months refactoring dead applications and duplicate logic that add no business value. The mistake is treating all millions of lines of code equally rather than analyzing which programs actually deliver business functionality. Proper assessment identifies code for retirement, consolidation, or optimization before expensive conversion, dramatically reducing modernization scope and cost.

Mistake #5: Starting without clear business objectives.

Many modernization projects fail because organizations begin with technology solutions rather than business outcomes. Teams focus on "moving to the cloud" or "getting off COBOL" without defining what success looks like in business terms. According to research, 80% of IT modernization efforts fall short of savings targets because they fail to address the right complexity. The costly mistake is launching modernization without stakeholder alignment on specific goals - whether that's reducing operational costs, reducing risk in business continuity, or enabling new capabilities. Projects that start with clear business cases and measurable objectives have significantly higher success rates and can demonstrate ROI that funds subsequent modernization phases.

If you want to avoid these mistakes or need helping overcoming these challenges, reach out to Zengines.

Subscribe to our Insights