Modernization at Portfolio Scale: Why Determinism + AI Outperforms Either Alone

Travis Shepherd
June 17, 2026

Updated: July 2, 2026

GenDD in Practice

Travis Shepherd · A field guide for CTOs and architects modernizing at scale

The market for mass code modernization is loud, and most of what it sells will not survive contact with an enterprise portfolio. Pure-AI platforms promise to migrate codebases at the press of a button. Pure-deterministic tools promise auditable, repeatable transformations. Both deliver on parts of the problem and neither delivers on all of it.

The pattern that does deliver, at the volumes enterprises actually face, is a hybrid: deterministic refactoring engines doing the bulk of the work, with generative AI handling the residual reasoning that determinism cannot reach. This pattern is now visible in production at Duolingo, Slack, AWS, and inside the platforms shipped by Moderne, GitHub, and Amazon. It is the only model that simultaneously delivers speed, safety, and auditability at the scale we are seeing in 2026.

The case for modernizing is no longer interesting; the question is how. Legacy systems accumulate security vulnerabilities, inflate maintenance cost, obstruct cloud-native architecture, and slow the delivery of new capability. Modernizing them is a precondition for the next frontier of digital platforms, not a discretionary investment. What makes the work hard is the combination of three constraints that compound:

Volume. Thousands of repositories. Billions of lines of code. Highly variable dependency graphs.
Distance. The semantic gap between current state and target state often spans multiple language versions, framework overhauls, and runtime changes.
Risk. The business logic is frequently mission-critical and regression-intolerant. Subtle issues are still missed until they surface in production.

Manual and semi-automated efforts fail against this combination. So do agentic-AI-only programs, in ways that are easy to miss until late in the rollout. This article walks through what works, why, and where it breaks down. It is opinionated, and it draws on direct experience running modernization programs against portfolios with all three constraints firing at once.

In this article

Why any single approach fails at scale
What you have to inventory before you start
AST and LST: the technical foundation
The hybrid pattern: deterministic engines + AI residual work
Every major platform is converging on this pattern
Running a modernization program
How it fails when it fails
This is GenDD applied to legacy code
Working with HatchWorks AI
Sources and references

Before getting to the hybrid pattern, it is worth being specific about how each pure-play approach breaks down. The failure modes are different, and recognizing which one you are seeing in your own program is the first step toward a working strategy.

Pure-AI modernization stalls on consistency

General-purpose AI coding tools (Claude Code, Cursor, Copilot, and the platform-level coding agents now embedded in GitHub and elsewhere) are genuinely useful for low-volume, short-distance upgrades. A single codebase moving from Java 17 to Java 25, or a tactical port from .NET Framework 4 to .NET Core, is well within reach for a well-instructed agent equipped with appropriate skills and rules.

At enterprise scale, the same approach breaks down. The agents get stuck in loops of refactoring and attempted compilation. The success rate varies meaningfully across comparable workloads and teams, even with identical prompts. Code review burden inflates, because every diff has to be read against the possibility that something subtle was changed. Heroic developer effort gets absorbed into making the AI's output reviewable, and that effort goes unnoticed because the metric being reported is "PRs generated," not "PRs that landed without rework."

The deeper issue is the property the agents lack: determinism. Non-deterministic output means consistency cannot be guaranteed, and the ceiling on predictability and confidence is structurally lower than the bar an enterprise modernization program needs to clear. The models and the tooling are improving, and the ceiling is rising, but it has not yet reached the bar required for an audit trail that compliance teams will accept.

Pure-AI is a useful tactical lever for a more leisurely pace. As the engine of an urgent, portfolio-wide modernization program, it remains risky and unproven.

Pure-deterministic tools stop short of the finish line

The opposite failure pattern shows up in pure-deterministic programs. Assessment platforms, IDE refactoring tools, and rule-based migration engines are precise and auditable, but they can only act on the patterns they were programmed to recognize. At enterprise scale, every codebase eventually has the case the recipe author did not anticipate: a custom annotation, a generated class that the type system cannot resolve cleanly, a build configuration that drifts from the standard, a library version that's three releases behind what the recipe assumed.

When the deterministic engine cannot proceed, the work falls back to a human developer, who has to read the recipe's output, understand what the recipe was trying to do, and finish the transformation manually. Across a portfolio with thousands of codebases, the residual work absorbs the program's velocity. The recipes handle the "easy" eighty percent; the remaining twenty percent is where the schedule disappears.

The hybrid model is the only one that holds at portfolio scale

The pattern that delivers, repeatedly, is the combination. Deterministic engines (today, almost universally, this means OpenRewrite-based platforms) handle the bulk transformations: language version bumps, namespace migrations, dependency upgrades, security patches, anything that maps cleanly to a typed code representation. Generative AI agents handle the residual reasoning: diagnosing build failures the recipe could not resolve, generating targeted fixes for the cases the recipe author did not anticipate, planning the sequence of recipes for a specific codebase's circumstances.

The deterministic core gives the program auditability, reproducibility, and the kind of consistent behavior that compliance reviews can be built on. The AI layer gives the program the reach to finish, on the codebases where determinism alone would have left a long tail of manual cleanup. Independent engineering organizations have arrived at the same conclusion through their own experience: Duolingo published their sequence of OpenRewrite recipes plus AI controlled-loop cleanup for JVM golden-path upgrades; Slack used AST-based codemods plus LLM prompts enriched with rendered-DOM context to migrate tests from Enzyme to React Testing Library. Pure codemods stalled on semantic cases. Pure LLM prompts drifted. The combination worked.

What you have to inventory before you start

A modernization program that begins by picking tools is a program that has already chosen its failure mode. The work that precedes tool selection is portfolio inventory, and the unit of inventory matters: not applications, not repositories, not running instances, but codebases.

A codebase, for this purpose, is the collection of files that compile to an executable software program. A repository may contain many codebases. An application may be spread across many repositories. The codebase is the unit of work that gets modernized, and the codebase is what your inventory has to enumerate.

Once you have the inventory, every codebase gets scored on the three constraints introduced at the top of this article:

Volume. Lines of code, number of files, dependency counts, contribution velocity.
Distance. The technical gap between current and target state. Java 8 to Java 21 plus Spring Boot 2 to 3 is long distance. Java 17 to 21 in-place is short.
Risk. Business criticality, regulatory exposure, the complexity of the design-time and run-time dependency graphs, the size and skill of the team that maintains the codebase.

High values across all three categories point to a solution that is scalable, auditable, and repeatable. Low values in one or two categories may permit a lighter approach. Pretending the scores are similar across the portfolio is what makes programs slip; sequencing the waves to the scores is what makes them ship.

Assessment platforms produce visibility, not execution

Two assessment platforms surface repeatedly in enterprise modernization conversations. Both are worth understanding for what they do and (importantly) what they do not do.

Dr. Migrate (Altra) delivers AI-assisted discovery, dependency mapping, complexity scoring, and wave planning under a configurable 6R or 7R framework. Its strengths are agentless collection, rapid report generation (often days rather than weeks), and actionable affinity analysis that groups related applications for wave planning. Its limitations are commercial licensing and a product history tilted toward Azure-centric environments, which can require supplementation in multi-cloud portfolios.

CAST (Highlight and Imaging) performs rapid static analysis across hundreds of applications and surfaces cloud readiness, technical debt, open-source risk, resiliency, and green-impact metrics. Its strengths are broad language coverage, quantified portfolio health scores, and strong governance reporting. The limitations are interpretability (the UX is dense and reading the results well usually requires an experienced practitioner) and weak code-level remediation relative to its assessment strength.

Both platforms are credible for prioritization and governance. Neither modifies code. The effective pattern is to use Dr. Migrate or CAST to rank the portfolio and produce the wave plan, then hand that plan off to rewrite-based execution rather than expecting the assessment vendor to deliver transformation. Reasonable assessment-platform expectations save real money during the build versus buy decision; unreasonable ones produce the program drift this article is about avoiding.

AST and LST: the technical foundation

Large-scale deterministic refactoring requires a code representation that is both type-aware and format-preserving. Two representations dominate the discussion, and the difference between them is the difference between IDE-style refactoring (which works on one project at a time) and portfolio-style modernization (which has to work across thousands of repositories at once).

Abstract Syntax Tree (AST). A hierarchical representation of source code as syntactic elements. ASTs enable parsing and basic structural transformation, but standard ASTs discard type information from external references and do not preserve original whitespace, comments, or formatting. Re-emitting code from an AST typically reformats the source. This is the representation that IDE refactoring tools work against.

Lossless Semantic Tree (LST). An extension of the AST model. Every node is type-attributed; resolved type information is attached even across files, modules, and transitive dependencies. Every node is format-preserving, retaining whitespace, comments, and local style. When an LST is printed back to source, the unchanged regions are byte-identical to the input.

Why the LST was created

The LST exists specifically to overcome AST shortcomings that block cross-repository refactoring. Without cross-file type resolution, a recipe cannot safely rename a method that is defined in one repository and called by others. Without format preservation, even a well-scoped change produces thousands of incidental diffs that reviewers reject on sight. Both of those failure modes are routine in IDE-based refactoring, and both are unacceptable at portfolio scale.

The LST solves both problems and is the foundation for any credible large-scale modernization program. The comparison below makes the trade-offs explicit.

The Technical Difference

AST vs LST: side by side

The LST exists because the AST cannot safely refactor across repositories. Below: the six properties where the difference shows up in practice.

Property	AST	LST
Type information	Absent or partial. External references typically unresolved.	Fully attributed, including cross-file and cross-project references.
Formatting fidelity	Lost on re-emit. Whitespace, comments, and style are reformatted by the printer.	Preserved byte-for-byte in unchanged regions. Re-emit only changes what the recipe touched.
Pattern matching	Syntactic only. Cannot reliably distinguish operations on different types with the same syntax.	Semantic and type-aware. Patterns operate against resolved meaning, not just structure.
Refactoring scale	Limited by context loss. Cross-repository refactors require manual stitching.	Millions of lines of code across thousands of repositories under a single recipe execution.
Auditability	Low. Incidental style drift obscures the substantive changes in any given diff.	High. Exact before-and-after traceability; reviewers see only semantic change.
Review burden	Inflated by incidental formatting changes that reviewers have to read past.	Minimized. Diffs contain semantic content only, which is what reviewers actually need to see.

Why this matters The audit and review properties (rows 5 and 6) are what separate a refactoring tool you can run at portfolio scale from one you cannot. Format-preserving diffs are how a modernization program produces pull requests that engineering teams will actually merge. Without LSTs, even technically correct transformations get rejected because the diff is too large to read.

The hybrid pattern: deterministic engines + AI residual work

With the AST/LST distinction in hand, the hybrid pattern becomes legible. OpenRewrite is the open-source engine that operates on LSTs through declarative recipes: small, composable units of transformation. Moderne is the commercial platform that persists and distributes LSTs across thousands of repositories and orchestrates recipe execution at organizational scale. Together they form the deterministic core that anchors every credible portfolio-scale modernization program in 2026.

Recipes are deterministic. The same input produces the same output, every time. This property is non-negotiable at enterprise scale, and it is what separates the engine from anything an LLM can do alone. Determinism is what makes a rollout auditable (every change traceable to a named recipe and its version), reproducible (a failed wave can be replayed without ambiguity), and compatible with compliance review (auditors can reason about a known transformation rather than a probabilistic one). Pure-AI approaches, whatever their strengths, cannot match this property.

Where AI residual work fits

Determinism is necessary, but it is not sufficient. Every portfolio has the cases that no recipe author anticipated: a custom annotation, a build configuration that diverged from the standard six years ago, a transitive dependency on a library that was forked and never re-merged. These cases break recipes deterministically. An LLM agent, by contrast, can read the error, reason about the context, propose a targeted fix, validate it against the build, and either succeed or report a structured blocker.

In a working hybrid system, the LLM does not write code directly. It plans, interprets natural-language intent, diagnoses failures, and calls deterministic recipes as tools through Model Context Protocol or equivalent tool-calling interfaces. The agent selects and delegates; the recipes execute. This division of labor preserves the auditability of the underlying transformations while giving the program the reach to finish on the codebases that pure determinism could not.

The economics of this pattern follow the eighty-twenty distribution that practitioners have been reporting publicly. Independent analyses of AWS's Amazon Q Developer Transform consistently describe an eighty-twenty split: OpenRewrite recipes handle the deterministic eighty percent (JDK bumps, namespace changes, dependency migrations), while the LLM handles the exceptions and planning. AWS contributes recipes back to the open-source ecosystem as it builds against more cases, which moves the eighty-percent number up and pushes the LLM toward residual work that is increasingly narrow and increasingly hard.

Critiques worth acknowledging

The hybrid pattern is not a settled empirical claim. Industry-defining benchmarks do not yet exist in the public literature, and much of the reinforcing material is published by Moderne-affiliated authors. OpenRewrite's recipe ecosystem is strongest for Java and .NET, less mature for JavaScript, TypeScript, and Python, and sparse for COBOL, mainframe, and niche frameworks. A program that depends on the recipe ecosystem for an under-supported language is signing up for more recipe authoring than the marketing suggests.

The pattern nonetheless holds across the sources we do have, and the trend is unambiguous: deterministic engines plus agentic AI is not just sound architecturally; it is the direction every major platform vendor is heading. The next section walks through the evidence.

Every major platform is converging on this pattern

The strongest argument for the hybrid model is what the platform vendors are actually shipping. Independently, multiple teams have arrived at the same architectural conclusion: deterministic recipe execution at the core, LLM agent orchestration on the outside. Looking at four vendors in 2026 makes the convergence visible.

Moderne's Moddy agent

Moddy is Moderne's multi-repo AI agent. The LLM (frontier-model agnostic) does natural-language planning and intent interpretation, then calls over 5,000 deterministic OpenRewrite recipes as tools to perform the actual transformations on LSTs. The agent does not change code itself; it selects and delegates to recipes through tool-calling and Model Context Protocol. Moderne also ships a CLI command that installs skills for Claude Code, Cursor, GitHub Copilot, Windsurf, and Sourcegraph Amp; the skills teach those agents how to create new recipes, run existing ones at scale, and analyze impact with pre-built reports. Moderne's own documentation is explicit about why those skills are necessary: AI coding agents do not know out of the box how to do reliable LST-based refactoring.

Amazon Q Developer Transform

AWS integrated OpenRewrite into Amazon Q Developer Transform from the start in 2023, and it remains the core of the product through 2026. The architecture is the eighty-twenty split described above: OpenRewrite recipes handle the deterministic majority of the work, while the LLM handles exceptions and planning. AWS has been contributing recipes back to the open-source ecosystem as it builds against new cases, which is the strongest possible signal that the deterministic core is not a stopgap on the way to pure-AI; it is the foundation.

GitHub Copilot Modernization Agent

GitHub's Copilot Modernization Agent uses OpenRewrite behind the scenes for the actual code transformations to ensure predictability. The LLM generates the plan and selects recipes; the recipes execute the changes. The same pattern, shipped by the largest developer platform on the market. The fact that GitHub independently chose this architecture is meaningful evidence that the convergence is structural rather than coincidental.

The autonomous-SDLC frontier

A separate category of platform is worth naming because it is the credible alternative architecture: end-to-end agent-orchestrated modernization without the deterministic engine underneath. Blitzy is one example, a platform that ingests codebases, builds a proprietary knowledge graph of code structure and dependencies, then orchestrates specialized AI agents that can think and cooperate for hours to generate, validate, and refactor code, with output delivered as pull requests after compile-time and runtime validation.

Blitzy's knowledge graph gives agents structured context (cross-file dependencies, architecture) that is a real improvement over raw-text prompting. What it does not currently offer is lossless format preservation, full type attribution across projects, or deterministic tree visitors. Changes are probabilistic and agent-generated, and the same input can produce non-identical outputs across runs. Blitzy performs multi-agent QA plus compile and runtime checks, but the platform leaves the final twenty percent (security, last-mile validation, business-logic edge cases) to humans and opens pull requests for review. As a web platform, source code has to be ingested into the external system, and integration with automated pipelines is unclear.

The autonomous-SDLC architecture has its place, particularly for greenfield generation and tactical work where reproducibility is less critical than agent ingenuity. For portfolio-scale modernization against compliance and audit requirements, the hybrid pattern's deterministic foundation is the architecture that the convergence keeps pointing back to.

Running a modernization program

Modernization at scale is a program, not a project. Technology alone does not guarantee success; execution discipline does. The patterns below have shown up consistently across programs we have run with PE-backed mid-market companies and enterprise CTOs in regulated industries.

Clarity of scope and intent

What "modernization" means in any given program varies, and that variation is the single most common reason programs slip. A layered definition keeps the conversation honest:

Upgrading codebases to modern, stable, and approved versions of their language, SDK, runtime, and referenced libraries or packages, with the goal of deploying to the same host or a host that does not necessitate additional changes.
Updating the codebase's configuration mechanisms, coding styles, patterns, or build mechanism beyond what is strictly required for the upgrades above.
Updating the CI/CD workflows, pipelines, and actions to match newer patterns.
Adding containerization where it is absent, with a goal of portability or replatforming.

All four are worthy goals. Their inclusion is a matter of capability and time allotted to the program. Reimagining the user experience or rearchitecting to microservices are beyond the scope of large-scale automation and should be sequenced as separate workstreams with their own owners and budgets.

Program structure

Establish a program-management structure with a named executive owner, a cross-functional steering group (engineering, security, compliance, finance, and representative business domains), and a dedicated delivery team. Governance should be cadenced: weekly delivery reviews, monthly steering, and a quarterly reset against portfolio-level metrics.

Success metrics

Define what success at scale looks like before execution begins. A useful starting set:

Auto-generated PRs per week. Throughput of the transformation engine.
Percentage of PRs auto-merged without human editing. Recipe and AI maturity.
Developer hours saved versus baseline. Business-case realization.
Post-deployment defect rate. Safety in the transformation pipeline.
Wave completion rate versus plan. Program execution discipline.

Publish these metrics on a cadence. Hiding them is a leading indicator of program drift.

Upgrade in waves, aligned to business domains

Partition the portfolio into waves that follow business domains rather than arbitrary technical buckets. Domain-aligned waves contain blast radius (a failure in one domain does not stall the entire program), tighten feedback loops (domain teams own both the code and the service-level consequences), and align naturally with change-management calendars.

Release pattern

Select release patterns early and track them in the inventory. Canary and blue/green are useful: both support fast rollback and both scale to a portfolio rollout without disproportionate overhead. More elaborate patterns (dual-run, shadow traffic with full replay, phased multi-region cutovers) multiply program duration without commensurate risk reduction. Reserve them for the workloads that genuinely require them.

Verification strategy

Compilation and unit-test passage are necessary but insufficient. Verification has to be layered:

Golden tests that capture expected behavior at service boundaries, curated per domain.
Behavioral diffing of old and new runtimes against production-like traffic.
Automated rollback on metric regression.
AST- or LST-level diff review so reviewers see semantic change, not formatting noise.

Each layer catches a class of failure the previous layer missed. This is where pure-AI approaches most often fail silently and where the determinism in the hybrid model pays its dividend.

Integration and adoption costs

Standing up a reliable recipe library, hosting LSTs at portfolio scale, and operating trustworthy LLM-agent orchestration (with error handling, validation loops, and human-in-the-loop gates) requires meaningful upfront engineering investment and organizational maturity. The hybrid model is the leading strategy, but it is not free. Plan it as an explicit capability build, with the same rigor that a treasury function or a security-engineering function would receive at this scale.

How it fails when it fails

Even rigorous modernization programs fail. The hardest failure class is the one that survives compilation, tests, and staging, and then misbehaves in production. The two patterns below are the ones I have seen burn programs that were doing almost everything else right.

Undetected runtime behavioral drift

A documented pattern in Java 8 to Java 17 migrations: core library behavior changes silently. Internal exception handling and bean resolution in some framework versions catch and log exceptions that the previous version propagated, so a downstream code path that used to throw now quietly logs and continues. Unit tests that assert on return values still pass. Integration tests that do not observe logs still pass. A downstream caller that depended on the exception to short-circuit a workflow breaks in production.

Benchmark studies of Java migrations have recorded exactly this pattern. LLMs can predict a successful compilation and leave behavioral drift in place. The compilation is successful. The tests pass. The behavior is wrong, and the program does not know it.

Output-format drift in serialization

A second class: output-format drift in serialization. The core-library XML and date/time serializers changed behavior between Java versions in ways that preserve semantic correctness for the producer but break consumers that parse by exact format. Producer tests pass. Consumer systems, which may not be in the modernization scope, fail.

Both classes share the same signature: compilation passes, tests pass, deployment succeeds, and the app runs, but is unusable for upstream callers. Only layered verification (golden tests at service boundaries, behavioral diffing, and canary with automated rollback) reliably catches them. Layered verification should be mandatory rather than optional.

Human and process risks are at least as costly

Technology failures are rarely the whole story. Three process risks are as expensive as any technical one, and they show up across programs regardless of which platform was chosen:

Loss of developer trust in auto-generated PRs. Usually caused by diffs that are too large early in the program. Mitigation: start narrow, demonstrate clean diffs, expand scope only after trust is earned.
Insufficient golden-test investment before the first wave. Leaves verification leaning on unit tests that do not cover the failure modes above. Mitigation: treat golden-test authoring as a first-class wave-zero deliverable, not as something the team will get to in parallel with the migration work.
Program-management drift. Waves slip, metrics stop being published, the 7R classification is ignored under deadline pressure. Mitigation: executive cadence and a genuine willingness to pause waves rather than ship silent regressions.

All three of these are recoverable when caught early. None of them are recoverable when caught late. The programs that succeed are the ones that build a fast feedback loop into the structure of the program itself, so the drift surfaces in weeks rather than quarters.

This is GenDD applied to legacy code

The hybrid pattern this article walks through is not a coincidence. It is what happens when a discipline that has worked in greenfield AI-augmented development gets applied to the harder problem of modernizing legacy code. At HatchWorks, we call that discipline Generative-Driven Development (GenDD).

OpenRewrite recipes are like Agent Skills. Each recipe is a packaged, deterministic capability: input goes in, predictable output comes out, the unit is composable and reusable across codebases. Similarly, GenDD uses Agent Skills to instruct LLMs on coding conventions, rules, and do's and don'ts—the output of the LLM is guided yet non-deterministic.
The LLM agent doing residual work is the Execution Loop. GenDD's Execution Loop is the agentic layer that plans, decomposes, reasons, and delegates to deterministic units below it. That is what Moddy, Amazon Q, and the Copilot Modernization Agent are doing: planning the migration, diagnosing build failures, generating targeted fixes, and calling recipes as tools through MCP. The architecture is identical.
GenDD Brownfield Analysis generates context for the residual AI work. HatchWorks' Brownfield Analysis Engine is a GenDD primitive specifically designed for understanding legacy systems before transformation begins. It produces dependency maps, semantic understanding of business logic, risk-scored portfolios. The hybrid pattern's quality is significantly elevated by having that context up front, and Brownfield Analysis is how we generate it.

The reason the hybrid pattern is showing up in every credible modernization platform is the same reason GenDD works in greenfield development: at scale, AI without methodology produces output that does not hold up to scrutiny, and methodology without AI cannot move fast enough to be worth doing. The combination is the only one that meets the bar.

For an enterprise CTO or an architect looking at a modernization program right now, the practical implication is that the platform choice (Moderne, Amazon Q, Copilot Modernization, or a self-built equivalent) matters less than whether the organization has the methodological discipline to run the program well. Choosing the right platform without the methodology is a way to spend money on better-looking failure. Choosing the methodology first is what makes the platform choice productive.

At scale, AI without methodology produces output that does not hold up to scrutiny, and methodology without AI cannot move fast enough to be worth doing. The combination is the only one that meets the bar.

Working with HatchWorks AI

Build proprietary AI that holds up to buyer-side diligence

HatchWorks AI partners with private equity firms and their portfolio companies to build custom AI solutions that create defensible intellectual property and directly support exit value. We work portfolio-wide, with deal-team-ready timelines, and we produce the architectural posture and documentation that buyers' diligence teams will eventually review.

What we deliver

Proprietary AI solutions owned entirely by the portfolio company
Embedded AI in core business operations, products, and workflows
Production-grade systems built using our GenDD (Generative-Driven Development) methodology
Architecture, governance, and data-provenance documentation built for M&A readiness
Repeatable playbooks deployable across multiple portfolio companies in a fund

Start a conversation See Generative-Driven Development

Sources and references

Duolingo Engineering Blog. Automating Golden Path Upgrades at Scale: A Journey from Manual Upgrades to an AI-Powered Workflow. blog.duolingo.com/automating-jvm-golden-path/
OpenRewrite Documentation. Lossless Semantic Trees. docs.openrewrite.org
Slack Engineering Blog. Balancing Old Tricks with New Feats: AI-Powered Conversion from Enzyme to React Testing Library at Slack. slack.engineering
Thoughtworks Insights. Claude Code Saved Us 97% of the Work on the First Try. Then It Failed Utterly. (CodeConcise experiment). thoughtworks.com
Microsoft Learn. GitHub Copilot Modernization Agent Overview. learn.microsoft.com
Moderne. News from re:Invent 2023 — AWS Announces Amazon Q Code Transformation Using OpenRewrite. moderne.ai
Altra. Dr. Migrate product overview. altra.cloud
CAST Highlight product documentation and independent practitioner reviews (2024–2026)
FreshBrew benchmark and related peer-reviewed work on Java migration behavioral-failure modes
AWS Amazon Q Developer Transform product documentation
InfoQ Java Trends Report 2025 — discussion of OpenRewrite as a dominant automation engine for legacy Java modernization
HatchWorks AI. Generative-Driven Development methodology overview