Articles

Three Keys to Successful Mainframe Refactoring: A Practical Guide

August 14, 2025

With 96% of companies moving mainframe workloads to the cloud, yet 74% of modernization projects failing, organizations need a systematic approach to refactoring legacy systems. The difference between success and failure lies in addressing three critical challenges: dependency visibility, testing optimization, and knowledge democratization.

The Hidden Challenge

Mainframe systems built over decades contain intricate webs of dependencies that resist modernization, but the complexity runs deeper than most organizations realize. Unlike modern applications designed with clear interfaces, documentation standards and plentiful knowledge resources, legacy systems embed business logic within data relationships, file structures, and program interactions that create three critical failure points during mainframe refactoring:

Hidden Dependencies: Runtime data flows and dynamic relationships that static analysis cannot reveal, buried in millions of lines of code across interconnected systems.

Invisible Testing Gaps: Traditional validation approaches fail to catch the complex data transformations and business logic embedded in mainframe applications, leaving critical edge cases undiscovered until production.

Institutional Knowledge Scarcity: The deep understanding needed to navigate these invisible complexities exists only in the minds of departing veterans.

Any one of these challenges can derail a refactoring project. Combined, they create a perfect storm that explains why 74% of modernization efforts fail. Success requires ensuring this critical information is available throughout the refactoring effort, not left to chance or discovery during code transformation.

Key 1: Master Data Dependencies Before Code Conversion

The Problem: Runtime data flows and dynamic dependencies create invisible relationships that static analysis cannot reveal.

The Problem: Complex data flows and dynamic dependencies create invisible relationships that span program execution flows, database navigation patterns, and runtime behaviors.

Implementation Checklist

□ Trace Data Element Journeys Across All Systems

Identify program actions that reads, modifies, or depends on specific data structures
Map cross-application data sharing through job control language (JCL) and program execution sequences

□ Understand Database and Program Execution Patterns

Analyze JCL/CL job flows to understand program dependencies and execution order
Map hierarchical (IMS) and network (IDMS) database structures and navigation paths
Identify data-driven business logic that changes based on content and processing context

□ Access Hidden Business Rules

Identify validation logic embedded in program execution sequences
Discover error handling routines that function as business rules
Uncover edge cases handled through decades of modifications

□ Generate Impact Analysis

Visualize effects of modifying specific programs or data structures
Understand downstream impacts from changing data formats or program execution flows
Access comprehensive decomposition analysis for monolithic applications

What It Looks Like in Real Life

Manual Approach: Teams spend months interviewing SMEs, reading through millions of lines of undocumented code, and creating spreadsheets to track data flows and job dependencies. The scale and complexity make it impossible to find all relationships—critical dependencies exist in JCL execution sequences, database navigation patterns, and runtime behaviors that are buried in decades of modifications. Even after extensive documentation efforts, teams miss interconnected dependencies that cause production failures.

With Zengines: Complete data lineage mapping across all systems in days. Interactive visualization shows exactly how customer data flows from the 1985 COBOL program through job control sequences, database structures, and multiple processing steps, including execution patterns and database behaviors that documentation never captured.

Success Metrics

Complete visibility into data flows, program dependencies, and execution patterns
Real-time access to comprehensive refactoring complexity analysis
Zero surprises during code conversion phase

Key 2: Implement Data Lineage-Driven Testing

The Problem: Traditional testing approaches fail to validate the complex data transformations and business logic embedded in mainframe applications. While comprehensive testing includes performance, security, and integration aspects, the critical foundation is ensuring data accuracy and transformation correctness.

Implementation Checklist

□ Establish Validation Points at Every Data Transformation

Identify test checkpoints at each step where data changes hands between programs
Monitor intermediate calculations and business rule applications
Track data transformation throughout the process

□ Generate Comprehensive Data-Driven Test Scenarios

Create test cases covering all conditional logic branches based on data content
Build transaction sequences that replicate actual data flow patterns
Include edge cases and error conditions that exercise unusual data processing paths

□ Enable Data-Focused Shadow Testing

Process test data through refactored systems alongside legacy systems
Compare data transformation results at every lineage checkpoint
Monitor data accuracy and consistency during parallel data processing

□ Validate Data Integrity at Scale

Test with comprehensive datasets to identify data accuracy issues
Monitor for cumulative calculation errors in long-running data processes
Verify data transformations produce identical results to legacy systems

What It Looks Like in Real Life

Manual Approach: Testing teams manually create hundreds of test cases, then spend weeks comparing data outputs from old and new systems. The sheer volume of data transformation points makes comprehensive coverage impractical—when data discrepancies appear across thousands of calculation steps, teams have no way to trace where in the complex multi-program data flow the difference occurred. Manual comparison of data transformations across interconnected legacy systems becomes impossible at scale.

With Zengines: Enable test generation automation to create thousands of data scenarios based on actual processing patterns. Self-service validation at every data transformation checkpoint to pinpoint exactly where refactored logic produces different data results—down to the specific calculation or business rule application.

Success Metrics

Test coverage across all critical data transformation points
Validation of data accuracy and business logic correctness
Confidence in refactored data processing before cutover

Key 3: Democratize Institutional Knowledge

The Problem: Critical system knowledge exists only in the minds of retiring experts, creating bottlenecks that severely delay modernization projects.

Implementation Checklist

□ Access Comprehensive Data Relationship Mapping

Obtain complete visualization of how data flows between systems and programs
Understand business logic and transformation rules embedded in legacy code
Enable team members to explore system dependencies without expert consultation

□ Extract Business Context from Legacy Systems

Capture business rules and validation requirements from existing code
Link technical implementations to business processes and requirements
Create accessible knowledge bases with complete rule extraction

□ Enable Independent Impact Analysis

Provide capabilities to show downstream effects of proposed changes
Allow developers to trace data origins and dependencies during refactoring
Support business analysts in validating modernized logic

□ Eliminate SME Consultation Bottlenecks

Provide role-based access to comprehensive system analysis
Enable real-time exploration of data flows and business rules
Deliver complete context for development and testing teams

What It Looks Like in Real Life

Manual Approach: Junior developers submit tickets asking "What happens if I change this customer validation routine?" and wait 2 weeks for Frank to review the code and explain the downstream impacts. The interconnected nature of decades-old systems makes it impractical to document all relationships—Frank might remember 47 downstream systems, but miss the obscure batch job that runs monthly. The breadth of institutional knowledge across millions of lines of code is impossible to capture manually, creating constant bottlenecks as project velocity crawls.

With Zengines: Any team member clicks on the validation routine and instantly sees its complete impact map—every consuming program, all data flows, and business rules. Questions get answered in seconds instead of weeks, keeping modernization projects on track.

Success Metrics

80% reduction in SME consultation requests
Independent access to system knowledge for all team members
Accelerated decision-making without knowledge transfer delays

Technology Enablers

Modern platforms like Zengines automate much of the dependency mapping, testing framework creation, and knowledge extraction.

Take Action

Successful mainframe refactoring demands more than code conversion expertise. Organizations that master data dependencies, implement lineage-driven testing, and democratize institutional knowledge create sustainable competitive advantages in their modernization efforts. The key is addressing these challenges systematically before beginning code transformation, not discovering them during production deployment.

Next Steps: Assess your current capabilities in each area and prioritize investments based on your specific modernization timeline and business requirements.

Key points from their discussion

Beyond pathway tracking: Traditional lineage tools show where data travels. Zengines Contextual Data Lineage ingests entire legacy codebases to reveal not just what happens to data, but why and how — the calculations, conditions, and business rules embedded in the code itself.
Answers in seconds, not months: Business analysts, data analysts, compliance teams, and technical staff get self-service answers to questions that previously required waiting on scarce subject matter experts.
Three use cases driving urgency: Meeting regulatory compliance requirements, de-risking modernization and transformation programs, and making legacy data AI-ready with the trust and traceability regulated institutions need.
The Finovate experience: Caitlyn shares how the Sherlock Holmes-themed demo brought "shining a light into the black box" to life on stage — and her advice for first-time demoers on using seven minutes to plant hooks that turn into real booth conversations.

Listen to the full episode

Watch the demo replay

Greg Shoup

June 25, 2026

Articles

An Open Letter to Bank Regulators on BCBS 239: The Right Rule, The Wrong Excuse

There is a rule that has been on the books for over a decade, and almost nobody outside of risk and compliance teams has ever heard of it: BCBS 239. It is not a catchy name. But the idea behind it is one of the more sensible things to come out of the post-2008 regulatory response: banks should be able to explain where their risk numbers come from.

Not approximate. Not eventually. Be able to trace a number back to its source, on demand, and show the path it took to get there.

That standard came into force for the world’s largest banks in January 2016. Almost ten years later, only a handful of the 31 global systemically important banks (G-SIBs) have reported full compliance. The ECB’s RDARR Guide, published in May 2024, named data lineage as one of seven priority areas still holding institutions back, and said it expects remediation work to continue through 2027.

I want to make the case that this isn’t a story about banks dragging their feet, or regulators failing to enforce something. It’s a story about a rule that was right, running into a technical wall that was real.

The wall was real

If you’ve spent time around a bank’s core systems, you already know what the wall looks like. Decades of COBOL or RPG, written and rewritten by people who retired years ago, running calculations that nobody currently on staff can fully explain. Ask a team to trace how a specific risk figure was derived, and the honest answer is often: we’d need a few months, and a few of our most senior mainframe engineers — who are also the people we can least afford to pull onto this.

That’s not a compliance excuse. It’s a real description of how these systems work. Logic gets buried inside modules that branch into other modules, which branch into more, written in a language most engineering schools stopped teaching in the 1990s.

So banks have been stuck between a standard they understand and largely agree with, and infrastructure that makes meeting it genuinely hard. Regulators have been patient about this — I think correctly — because the alternative, demanding visibility into systems that were close to a black box, wasn’t realistic.

What’s changed

I run a company called Zengines. We built technology specifically to deal with this wall: parsing legacy code at scale, tracing how data moves through mainframes and AS/400 applications, and surfacing the business logic that’s been buried inside them for decades — with the context needed to make it usable.

At one Fortune 100 financial institution, we’re currently working through hundreds of thousands of COBOL modules, some of them tens of thousands of lines deep, netting out to tens of millions of lines of code. Questions that used to take a mainframe specialist months to answer — tracing a variable by hand through branch after branch — can now be answered in seconds. An analyst can ask the system directly where a number came from, instead of opening a ticket and waiting. That same self-service access lets teams build their own understanding, and answer questions from regulators and transformation programs directly.

I’m not suggesting this solves everything BCBS 239 asks for. Governance, and the behavioral discipline of actually using data management tools once you have them — those still take sustained organizational effort, and always will.

But the specific claim that legacy mainframes are too opaque to document fully? That claim is no longer true, at least not in the way it used to be.

Why this matters beyond one regulation

I’d guess most people reading this don’t work in regulatory compliance.

If you’re a CDO, a CIO, or a risk leader at a bank with a mainframe at its core, BCBS 239 is probably one item on a long list. But the underlying question — can we actually explain how our own systems work? — isn’t a regulatory question. It’s a basic operational one. It’s the same question that determines whether you can trust the data going into a new AI initiative, whether you can defend a number in front of your own board, and whether the next system migration breaks something nobody saw coming.

Lineage has quietly become a prerequisite for almost everything banks are now trying to do with their data. Most executives don’t ask for it directly, because they don’t think to ask — they ask for the AI use case, or the modernization roadmap, or the faster reporting cycle, and lineage turns out to be the thing standing between them and any of it.

Where I land

I don’t think this is a story that needs villains. The standard was right. The barrier was real. What’s changed is narrower, and more hopeful: the wall that made the standard so hard to meet has a way through it now.

If you’re a regulator, I’d offer this as something worth knowing: the technical excuse has less weight than it used to. If you’re an executive at a bank still living with this problem, I’d offer something more direct — this is more solvable, and more quickly, than you’ve been told.

Either way, the goal was never the regulation itself. It was being able to look at your own systems and actually understand them. That’s now a lot closer than it’s been in years.

Sincerely,

Caitlyn Truong

CEO, Zengines

Caitlyn Truong

May 26, 2026

Articles

Data Access Isn’t Enough: Why Business Leaders Need Data Lineage They Aren’t Asking For

At industry conferences this year, I’ve spent dozens of hours inside conversations with CEOs, CDOs, CIOs and operating executives across financial services. When I ask what’s keeping them up at night when it comes to their data, the answer is remarkably consistent: data access. They want data more accessible, faster, in more usable form, in more places, with fewer gatekeepers.

What's notable is what they don't ask for. Not trustworthiness. Not audit-ability. Not the ability to defend a number to a regulator without calling three people first. Access is the ceiling of the conversation, and honestly, that makes sense. In large financial enterprises built on decades of legacy applications, murky integrations, and pipelines that nobody fully documented, just getting the data somewhere useful is still a meaningful achievement.

The problem is that "getting the data" is already more complicated than most leaders realize. The moment data leaves its source system, decisions are being made about it. Decisions that quietly change what it means. And if you don't know those decisions were made, you don't know what you're actually looking at.

That's where lineage comes in, and why it matters even before you get to the outcomes leaders should be asking for.

Below, I’ll walk through (1) what “access” really delivers, (2) the abstraction layer hidden inside every extraction, (3) the compounding problem of “data derivatives”, (4) a concrete example – encoding and precision – where this gets expensive, and (5) what business leaders should be asking for instead.

What “Data Access” Really Delivers

When a business team asks for access to data, they almost always receive something that has already been processed for their consumption. Someone – usually a data engineer or database administrator – sat down with the source system and made a series of decisions:

Which tables matter for this use case

Which fields to expose

How to filter, aggregate, or join the records

Which technical artifacts to strip out (temp tables, system metrics, audit fields that don’t translate to business meaning)

These decisions are reasonable. Business consumers don’t want raw operational data; they want something readable without extraneous noise. But every one of those decisions encodes logic and judgment that doesn’t travel with the data. The output looks complete – and to the business user, it looks like the source of truth – but it is already an abstraction.

The Extraction Event Is a Translation Event

I find it useful to think of an extraction as a translation. Someone translated the operational reality of a data storage system into a business-readable view. Like any translation, choices were made: what to keep, what to drop, how to render concepts that don’t map cleanly across contexts. And like any translation, those choices can quietly change the meaning.

When a business leader looks at the extracted view, the assumption is usually that the data was “moved and shifted” – that is, copied with fidelity. That assumption is possible. In my experience, it is also highly doubtful. Logic gets applied at the moment of extraction, and unless someone deliberately captured and shared that logic, it is invisible by the time the data reaches a dashboard.

Abstractions of Abstractions: How Data Derivatives Compound the Problem

Here is where it gets harder.

Once an extracted data set exists, other people start using it. And why wouldn't they? There is already a data access path. The alternative - forging a new data access path - is the full corporate yellow tape headache: hunting for a charge code, filling out a technical work request that Business can’t quite decipher, watching that ticket age in a queue, and depending on legacy data SMEs who left the company in 2019. The extracted data set skips all of that. Already shaped for consumption, already lightly documented, already trusted by some peer team who vouched for it in a meeting six months ago. So the next team builds a report off it. Or creates a derivative data set for their own use case. Or both. What they don't realize is that the easy path and the right path may not be the same one.

They use it because it’s available and easier than starting from scratch – it’s already shaped for consumption, already lightly documented, already trusted by some peer team. So they build a new report off it. Or they create a derivative data set for their own use case. Or both.

That derivative is now an abstraction of an abstraction. The further you move from the originating system, the more layers of unrecorded judgment sit between the business decision and the operational event the data was supposed to describe. By the third or fourth hop, the question “where did this number come from?” can be genuinely difficult to answer – even for the team that produced the report.

A Concrete Example: How Encoding and Precision Quietly Rewrite Your Data

Let me make this concrete with an example I keep encountering.

When data is moved between systems, engineers make practical choices about how to package it. One of those choices is how to handle numeric precision. A value originally stored at six decimal places in the source might be packaged at four, or two, depending on what the receiving system supports – or simply what the engineer is most familiar with.

In some industries, that’s fine. In financial services, insurance, and healthcare, it is often not fine. A decimal place in an interest rate, a reserve calculation, or a pricing model can represent material variance. Once precision has been silently reduced, the data is no longer the real data – it is an approximation that looks identical to a casual reviewer. The business consumer assumes they’re working with the underlying record; in reality, they’re working with a rounded version of it that was reshaped during packaging.

This is exactly the kind of change that lineage is built to surface. Without lineage, you can’t tell that anything happened. With lineage, the precision change is documented, traceable, and reviewable.

Why Regulated Industries Can’t Afford to Skip Data Lineage

Regulatory frameworks have been ahead of business intuition on this point. BCBS-239 requires banks to demonstrate the accuracy, completeness, and timeliness of their risk data – which is impossible to defend without lineage. ORSA and Solvency II require insurers to substantiate the data flowing into solvency and capital calculations. None of these frameworks ask whether you have access to the data. They ask whether you can prove what the data is and how it got there.

For institutions operating under these regimes, lineage isn’t a nice-to-have analytics enhancement. It is the substrate that makes the rest of the data conversation defensible.

What Business Leaders Should Be Asking For Instead

If “give me access to the data” is the wrong ask on its own, what’s the right one? In my view, business leaders should be asking three questions every time a new data set lands on their desk:

Where did this data originate, and what happened to it between then and now? Not a verbal summary – a documented path that is understandable in Business terms.

What decisions were made during extraction or packaging that could have changed the meaning of the values I’m looking at? Especially around encoding, precision, filtering, and aggregation.

If a regulator or auditor asked me to defend this number tomorrow, do I have the evidence trail to do it? If the answer is “we’d have to go find the engineer who built this,” the answer is no.

These questions don’t replace the access conversation – they extend it. Access is the entry point. Lineage is what makes access trustworthy.

A Final Thought

The reason business teams don’t ask for lineage isn’t that lineage doesn’t matter. It’s that the absence of lineage rarely announces itself. The data looks fine. The dashboard renders. The report mostly ties out. The risk lives in the assumptions you didn’t know you were making about what the data went through to get to you.

If your business teams are only asking for access, you have a gap – and in legacy environments where decades of undocumented logic sit between the source and the report, that gap is widest. The fix is to start asking for lineage too.

See Contextual Data Lineage in Action

Zengines Contextual Data Lineage is built for the environments where the lineage gap is widest – large financial enterprises with critical business logic locked inside COBOL, RPG, PL/1, and AS/400 code. We extract that embedded logic, make the data path visible, and give your teams the evidence trail they need to defend their numbers to auditors, regulators, and themselves.

If you’re working through a BCBS-239, ORSA, or Solvency II mandate, a planned mainframe migration, or a growing trust gap between your business teams and the data they consume, we’d like to hear about it.

Caitlyn Truong

Three Keys to Successful Mainframe Refactoring: A Practical Guide

The Hidden Challenge

Key 1: Master Data Dependencies Before Code Conversion

The Problem: Complex data flows and dynamic dependencies create invisible relationships that span program execution flows, database navigation patterns, and runtime behaviors.

Implementation Checklist

What It Looks Like in Real Life

Success Metrics

Key 2: Implement Data Lineage-Driven Testing

Implementation Checklist

What It Looks Like in Real Life

Success Metrics

Key 3: Democratize Institutional Knowledge

Implementation Checklist

What It Looks Like in Real Life

Success Metrics

Technology Enablers

Take Action

You may also like

Key points from their discussion

Listen to the full episode

Watch the demo replay

The wall was real

What’s changed

Why this matters beyond one regulation

Where I land

What “Data Access” Really Delivers

The Extraction Event Is a Translation Event

Abstractions of Abstractions: How Data Derivatives Compound the Problem

A Concrete Example: How Encoding and Precision Quietly Rewrite Your Data

Why Regulated Industries Can’t Afford to Skip Data Lineage

What Business Leaders Should Be Asking For Instead

A Final Thought

See Contextual Data Lineage in Action

Subscribe to our Insights