Gregory Shoup

Customer Success Analyst

Gregory Shoup is a Customer Success Analyst at Zengines, where he leads client onboarding and delivers ongoing training and professional services to help customers realize the full value of the platform. With a unique blend of technical and consulting experience, he enhances the customer experience.

Previously, Gregory was a Principal Consultant at ACA Group, managing GIPS compliance projects for leading asset managers. He also completed a 500-hour full-stack engineering fellowship at General Assembly, building applications with modern frameworks and APIs.

Gregory’s background in data analysis, compliance, and software development makes him a vital partner to Zengines clients navigating complex system migrations and data conversions.

Posts by this Author

Data lineage is the process of tracking data usage within your organization. This includes how data originates, how it is transformed, how it is calculated, its movement between different systems, and ultimately how it is utilized in applications, reporting, analysis, and decision-making. This is a crucial capability for any modern ecosystem, as the amount of data businesses generate and store increases every year. 

As of 2024, 64% of organizations manage at least one petabyte of data — and 41% have at least 500 petabytes of information within their systems. In many industries, like banking and insurance, this includes legacy data that spans not just systems but eras of technology.

As the data volume grows, so does the need to aid the business with trust in access to that data. Thus, it is important for companies to invest in data lineage initiatives to improve data governance, quality, and transparency. If you’re shopping for a data lineage tool, there are many cutting-edge options. The cloud-based Zengines platform uses an innovative artificial intelligence-powered model that includes data lineage capabilities to support clean, consistent, and well-organized data.

Whether you go with Zengines or something else, though, it’s important to be strategic in your decision-making. Here is a step-by-step process to help you choose the best data lineage tools for your organization’s needs.

Understanding Data Lineage Tool Requirements

Start by ensuring your selection team has a thorough understanding of not just data lineage as a concept but also the requirements that your particular data lineage tools must have.

First, consider core data lineage tool functionalities that every company needs. For example, you want to be able to access a clear visualization of the relationship between complex data across programs and systems at a glance. Impact analysis also provides a clear picture of how change will influence your current data system.

In addition, review technology-specific data-lineage needs, such as the need to ingest legacy codebases like COBOL. Compliance and regulatory requirements vary from one industry to the next, too. They also change often. Make sure you’re aware of both business operations needs and what is expected of the business from a compliance and legal perspective.

Also, consider future growth. Can the tool you select support the data as you scale? Don’t hamstring momentum down the road by short-changing your data lineage capabilities in the present.

Key Features to Look for in Data Lineage Tools

When you begin to review specific data lineage tools, you want to know what features to prioritize. Here are six key areas to focus on:

  1. Automated metadata collection: Automation should be a feature throughout any data lineage tool at this point. However, the specific ability to automate the collection of metadata from internal solutions and data catalogs is critical to sustainable data lineage activity over time.
  2. Integration capabilities: Data lineage tools must be able to comprehensively integrate across entities that store data — past, present, and future. This is where a tool like Zengines shines, where accessing legacy data in legacy technology has been the #1 challenge for most organizations. 
  3. End-to-end visibility: Data lineage must provide a clear picture of the respective data paths from beginning to end. This is a fundamental element of quality data lineage analysis.
  4. Impact analysis and research capabilities: Leading solutions make it easy for users to obtain and understand impact analysis for any data path or data changes showing data relationships and dependencies.. Further, the ability to seamlessly research across data entities assists in confidence for such analysis.
  5. Tracking and monitoring: Data lineage is an ongoing activity, thus data lineage tools must be able to keep up with ongoing data change within an organization.
  6. Visualization features: Visualizations - such as logic graphs - should provide comprehensive data paths across the data life cycle.

Keep these factors in mind and make sure whatever tool you choose satisfies these basic requirements.

Implementation and User Experience Considerations

Along with specific features, you want to assess how easy it is to implement the tool and how easy it is to use the tool.  

Start with setup. Consider how well each data lineage software solution is designed to implement within and configure to your system. For businesses that  built technology solutions before the 1980s, you may have critical business operations that run on mainframes. Make sure a data lineage tool will be able to easily integrate into a complex system before signing off on it.

Consider the learning curve and usability too. Does the tool have an intuitive interface? Are there complex training requirements? Is the information and operation accessible? 

Cost Analysis and ROI

When considering the cost of a data lineage software solution, there are a few factors to keep in mind. Here are the top elements that can influence expenses when implementing and using a tool like this over time:

  • Direct cost: Take the time to compare the up-front cost of each option. Recognize and account for capability differentiation e.g., ability to integrate with legacy technology, necessary to generate a comprehensive end-to-end data lineage. The lowest-cost option is unlikely to include this necessary capability. . 
  • Benefit analysis: : Consider benefits across both quantitative (e.g., time savings for automated data lineage versus manual data lineage search) and qualitative (e.g., cost avoidance for compliance and regulatory fines). 
  • Total cost of ownership (TCO): Consider the big picture with your investment. Are there licensing and subscription fees? Implementation costs? Ongoing maintenance and support expenses? These should be clarified before making a decision.
  • Expected return on investment: Your ROI will depend on things like speed of implementation, reduced costs, and accelerated digital transformations. Automated AI/ML solutions, like those that power Zengines, can provide meaningful benefit to TCO over time and  should be factored into the cost analysis.

Make sure to consider costs, benefits, TCO and ROI when assessing your options.

The Zengines Advantage

If you’re looking for a comprehensive assessment of what makes the Zengines platform stand out from other data lineage solutions, here it is in a nutshell:

  • Zengines Mainframe Data Lineage offers a unique approach to data lineage with its robust ability to ingest and parse COBOL programs so that businesses can now have data lineage inclusive of legacy technology.  Zengines de-risks the mainframe “black box” and enables business to better and more quickly manage, modernize, or migrate mainframes.  
  • Zengines comes backed by a wide range of software companies, businesses, consulting firms, and other enterprises that have successfully used our software solutions.
  • Our data lineage is part of our larger frictionless data conversion and integration solutions designed to speed up the notoriously slow process of proper data management. We use AI/ML to automate and accelerate the process, from implementation through ongoing use, helping your data work for you rather than get in the way of your core functions.
  • Our ZKG (Zengines Knowledge Graph) is a proprietary, industry-specific database that is always growing and providing more detailed information for our algorithms.

Our automated solutions create frictionless, sped-up solutions that reduce risk, lower costs, and create more accessible data lineage solutions.

Making the Final Decision

As you assess your data lineage tool choices, keep the above factors in mind. What are your industry and organizational requirements? Focus on key features like automation and integration capabilities. Consider implementation, training, user experience, ROI, and comprehensive cost analyses. 

Use this framework to help create stakeholder buy-in for your strategy. Then, select your tool with confidence, knowing you are organizing your data’s past to improve your present and lay the groundwork for a more successful future.

If you have any follow-up questions about data lineage and what makes a software solution particularly effective and relevant in this field, our team at Zengines can help. Reach out for a consultation, and together, we can explore how to create a clean, transparent, and effective future for your data.

In today's increasingly regulated financial landscape, banks and financial institutions face mounting pressure to ensure complete visibility and traceability of their Critical Data Elements (CDEs). While regulatory frameworks like BCBS 239, CDD, and CIP establish clear requirements for data governance, many organizations struggle with implementation, particularly when critical information resides within decades-old mainframe systems.

These legacy environments have become the Achilles' heel of compliance efforts, with opaque data flows and hard-to-decipher COBOL code creating significant blind spots. Zengines Mainframe Data Lineage product offers a revolutionary solution to this challenge, providing unparalleled visibility into "black box" systems and transforming regulatory compliance from a time-consuming burden into an efficient, streamlined process.

The Regulatory Challenge of Critical Data Elements

For banks and financial services firms, managing Critical Data Elements (CDEs) is no longer optional - it's a fundamental regulatory requirement with significant implications for compliance, risk management, and operational integrity. Regulations like BCBS 239, the Customer Due Diligence (CDD) Rule, and the Customer Identification Program (CIP) mandate that financial institutions not only identify their critical data but also understand its origins, transformations, and dependencies across all systems.

However, for institutions with legacy mainframe systems, this presents a unique challenge. These "black box" environments, often powered by decades-old COBOL code spread across thousands of modules, make tracing data lineage a time-consuming and error-prone process. Without the right tools, financial institutions face substantial risks, including regulatory penalties, audit failures, and compromised decision-making.

"Financial institutions today are trapped between regulatory demands for data transparency and legacy systems that were never designed with this level of visibility in mind. At Zengines, we've created Mainframe Data Lineage to bridge this gap, turning black box mainframes into transparent, auditable systems that satisfy even the most stringent CDE requirements." - Caitlyn Truong, CEO, Zengines

The Hidden Compliance Challenge in Legacy Systems

Many financial institutions operate with legacy mainframe technology that can contain up to 80,000 different COBOL modules, each potentially containing thousands of lines of code. This complexity creates several critical challenges for CDE compliance:

  1. Opacity of Data Origins: When regulators ask "Where did this value come from?", companies struggle to provide clear, documented answers from within mainframe systems.
  2. Calculation Verification: Understanding how critical values like interest accruals, risk assessments, or customer identification data are calculated becomes nearly impossible without specialized tools.
  3. Conditional Logic Tracing: Determining why specific data paths were followed or how specific business rules are implemented requires manually tracing through complex code branches.
  4. Resource Scarcity: Limited availability of mainframe or COBOL experts makes compliance activities dependent on a shrinking pool of specialized talent.
  5. Documentation Gaps: Years of system changes with inconsistent documentation practices have left critical knowledge gaps about data elements and their transformations.

"The challenge with mainframe environments isn't that the data isn't there—it's that it's buried in thousands of COBOL modules and complex code paths that would take months to manually trace. Zengines automates this process, reducing what would be weeks of research into minutes of interactive exploration." - Caitlyn Truong, CEO, Zengines

Introducing Zengines Mainframe Data Lineage

Zengines Mainframe Data Lineage product is purpose-built to solve compliance challenges like these by bringing transparency to legacy systems. By automatically analyzing and visualizing mainframe data flows, it enables financial institutions to meet regulatory requirements without the traditional manual effort.

How Zengines Transforms CDE Compliance

1. Automated Data Traceability

Zengines ingests COBOL modules, JCL code, SQL, and other mainframe components to automatically map relationships between data elements across your entire mainframe environment. This comprehensive approach ensures that no critical data element remains untraced.

2. Visual Data Lineage

Instead of manually tracing through thousands of lines of code, Zengines provides interactive visualizations that instantly show:

  • Where data originates
  • How it transforms through calculations
  • Which conditions affect its processing
  • Where it ultimately flows

This visualization capability is particularly valuable during regulatory examinations, allowing institutions to demonstrate compliance with confidence and clarity.

3. Calculation Logic Transparency

For BCBS 239 compliance, institutions must understand and validate calculation methodologies for risk data aggregation. Zengines automatically extracts and presents calculation logic in human-readable format, making it simple to verify that risk metrics are computed correctly.

4. Branch Condition Analysis

When regulators question why certain customer records received specific treatment (critical for CDD and CIP compliance), Zengines can immediately identify the conditional logic that determined the data path, showing exactly which business rules were applied and why.

5. Comprehensive Module Statistics

Zengines provides detailed metrics about your mainframe environment, helping compliance teams understand the scope and complexity of systems containing critical data elements.

"When regulators ask where a critical value came from or how it was calculated, financial institutions shouldn't have to launch a massive investigation. With Zengines Mainframe Data Lineage, they can answer these questions confidently and immediately, transforming their compliance posture from reactive to proactive." - Caitlyn Truong, CEO, Zengines

Real-World Impact: Accelerating Compliance Activities

Financial institutions using Zengines Mainframe Data Lineage have experienced transformative results in their regulatory compliance activities:

  • 90% Reduction in Audit Response Time: Questions about data calculations that previously took weeks or months to research can now be answered in minutes.
  • Enhanced Confidence in Regulatory Reporting: With the ability to see, follow, and explain data origins and transformations, institutions can ensure the accuracy of regulatory reports.
  • Reduced Dependency on Specialized Resources: Business analysts can now answer many compliance questions without requiring mainframe expertise.
  • Improved Risk Management: Comprehensive visibility into how critical risk metrics are calculated enables better oversight and governance.
  • Future-Proofed Compliance: As regulations evolve, having comprehensive data lineage documentation ensures adaptability to new requirements.

Beyond Compliance: Strategic Benefits

While regulatory compliance drives initial adoption, financial institutions discover additional strategic benefits from implementing Zengines Mainframe Data Lineage:

  1. System Modernization Support: The detailed understanding of data flows facilitates safer, faster and more accurate modernization from legacy systems - this may include requirements gathering, new development, data migration, data testing, reconciliation, etc.
  2. Operational Efficiency: Rapid identification of data dependencies reduces development time for system changes.
  3. Risk Reduction: Comprehensive visibility into mainframe operations reduces operational risk associated with mainframe management and changes.
  4. Knowledge Preservation: As mainframe experts retire, their implicit knowledge becomes explicitly documented through Zengines.

"What we've discovered working with financial services firms is that CDE compliance isn't just about satisfying regulators—it's about fundamentally understanding your own critical data. Our Mainframe Data Lineage solution doesn't just help banks pass audits; it gives them unprecedented insight into their own operations." - Caitlyn Truong, CEO, Zengines

Getting Started with Zengines

For financial institutions struggling with CDE compliance across legacy systems, Zengines offers a proven path forward. The implementation process is designed to be non-disruptive, with no modifications required to your existing mainframe environment.

The journey to compliance begins with a simple assessment of your current mainframe landscape, followed by automated ingestion of your code base. Within days, you'll have unprecedented visibility into your critical data elements – transforming your compliance posture from reactive to proactive.

In today's regulatory environment, financial institutions can no longer afford the uncertainty and risk associated with "black box" mainframe systems. Zengines Mainframe Data Lineage brings the transparency and traceability required not just to satisfy regulators, but to operate with confidence in an increasingly data-driven industry.

In today's rapidly evolving technology landscape, organizations with legacy mainframe systems face increasing pressure to modernize. Whether driven by cost concerns, skills shortages, or the need for greater agility, mainframe modernization has become a strategic imperative.

However, there's no one-size-fits-all approach. Let's explore the various paths to modernization and how platforms like Zengines can help you navigate this complex journey.

The Mainframe Modernization Spectrum

The various ways organizations are approaching mainframe modernization include:

1. Rehosting (Lift and Shift)

What it is: Moving mainframe applications to new hardware with minimal code changes, often to cloud infrastructure.

Pros Cons
Lowest risk approach with fastest implementation Maintains legacy code limitations
Minimal disruption to business operations Doesn't address technical debt
Preserves existing business logic Limited innovation opportunity
Reduces hardware costs Skills gap remains for legacy languages

2. Replatforming

What it is: Migrating applications to a new platform while making moderate modifications to the code.

Pros Cons
Reduces hardware and licensing costs Requires code modifications
Maintains most business logic Higher risk than rehosting
Less risky than complete rewrites Limited modernization benefits
Can improve performance and scalability May still rely on legacy languages

3. Code Translation/Automated Conversion

What it is: Automatically converting legacy code (like COBOL) to modern languages like Java or C#.

Pros Cons
Faster than manual rewrites Converted code may not be optimal
Preserves existing business logic Quality issues with automated conversion
Reduces reliance on legacy skills Often requires significant post-conversion cleanup
Can be implemented gradually Not all code converts cleanly

4. Refactoring

What it is: Restructuring existing code without changing external behavior.

Pros Cons
Improves code quality and maintainability Labor-intensive process
Preserves tested business logic Requires deep understanding of existing code
Incremental approach reduces risk Limited modernization benefits
Addresses specific pain points May still use legacy technologies

5. Complete Rewrite/Rebuilding

What it is: Redeveloping applications from scratch using modern languages and architectures.

Pros Cons
Modern architecture and technologies Highest risk approach
Opportunity to improve functionality Time-consuming and expensive
Eliminates technical debt Risk of losing critical business logic
Better long-term maintainability Extensive testing required

6. Replacing with Commercial Software

What it is: Abandoning legacy applications for newer commercial off-the-shelf solutions.

Pros Cons
Faster implementation than custom rewrites May require business process changes
Vendor maintains and updates software Customization limitations
Modern features and interfaces Vendor dependency
Reduced internal maintenance burden Potential functionality gaps

The Critical Role of Data in Modernization

Data migration remains "the highest risk during any systems change" according to industry experts. Organizations face numerous challenges including:

  • Unpredictable data values and formats
  • Incomplete documentation
  • Unknown data quality and sources
  • Legacy technology constraints
  • Scarce skilled resources

Before embarking on any modernization journey, organizations need to understand their current systems deeply. This process becomes particularly challenging with legacy mainframes that have been operating for decades with limited documentation and dwindling expertise.

How Zengines Transforms the Modernization Process

Zengines tackles the two most critical aspects of mainframe modernization:

1. Mainframe Data Lineage Understanding

Zengines' Mainframe Data Lineage technology illuminates the "black box" of legacy systems by:

  • Providing interactive research into applications with graphical visualizations
  • Showing the relationships between modules, tables, fields, and variables
  • Analyzing calculation logic and conditional statements
  • Revealing data paths, sources, and transformations

This deep visibility allows organizations to understand how their current systems work before they attempt migration, preventing costly errors and unexpected outcomes.

2. AI-Powered Data Migration Tools

Zengines accelerates data migration through:

  • AI algorithms that analyze data scope, schemas, and inputs
  • Automated mapping predictions that save countless hours of manual work
  • Intelligent data transformation with natural language interaction
  • Comprehensive data quality identification and fixing
  • Streamlined testing and reconciliation

Organizations using Zengines can complete data migration tasks in minutes rather than months, dramatically reducing the time, cost, and risk associated with modernization projects.

The Bottom Line

Mainframe modernization is a complex journey with multiple potential paths. The right approach depends on your organization's specific goals, timeline, budget, and risk tolerance.

What's universal, however, is the need to understand your legacy systems and data thoroughly before making changes. With Zengines, organizations gain both the deep visibility into their current mainframe operations and the powerful tools to migrate data efficiently and accurately.

By reducing the highest-risk aspects of modernization, Zengines helps organizations avoid becoming another cautionary tale of failed transformations and instead realize the full benefits of their technology investments.

Ready to learn more?

Connect with our team to learn more about how we’re supporting some of the largest and most complex mainframe modernizations today.