Technical Due Diligence Checklist Before a Software Rewrite
Technical Due Diligence Checklist Before a Software Rewrite
A founder reaches out. The message is some variation of: "Our system has a lot of problems, we think we need to rewrite it, can you give us an estimate?"
My answer is always the same: I'll give you a number after I've spent two hours in the codebase. Not before.
This is not a negotiating tactic. It's the only honest answer. A quote without an audit is a lottery - either I overprice to cover uncertainty I haven't measured, or I underestimate something serious and we both regret it four weeks in. Neither outcome is useful to you.
There's also a more fundamental issue: most rescue projects don't need a full rewrite. They need targeted stabilization of the two or three things that are actually broken, while everything the previous team built correctly - and there's usually something - stays in place. A stabilization is, in most cases I've seen, materially cheaper than a rewrite and delivers a working system faster. But I can only tell you which situation you're in after I've looked. Anyone who quotes a full rewrite before opening the repository is either guessing or has a business reason to prefer rewrites.
This is what those two hours look like.
One honest scope note before I describe the method
The two-hour audit is calibrated for the kind of systems I'm typically asked to look at: small-to-mid SaaS applications, backend services with a single team, web products under perhaps 100k lines of code.
For larger and more complex systems - distributed architectures across many services, high-load infrastructure, enterprise monoliths with twenty years of history, event-driven platforms, compliance-sensitive domains, multi-team organizations - two hours is initial triage, not full assessment. A proper audit of those systems takes days, involves reading production telemetry rather than just code, and requires structured conversations with the people who operate the system day to day. The framework below still applies; the time budget doesn't. I'd be misleading you to imply that two hours gives the same depth on a 30-service event-driven platform as on a single Next.js app.
What I need before I start
The audit is only as good as the access I have. Before I open anything, I ask for the following:
Pre-audit access checklist
- โก Repository access - read-only, SSH key or GitHub user
- โก Production URL and staging URL (if staging exists)
- โก Hosting and infra context - Vercel, Coolify, AWS, bare metal?
- โก Database schema dump or read-only staging credentials
- โก List of external integrations - Stripe, SendGrid, S3, etc.
- โก Incident log for the last 3 months
- โก Who built it - agency, in-house team, freelancer, AI-assisted? How many developers currently maintain it?
- โก What "broken" means to you - which specific behaviours are wrong or unreliable right now?
Each item has a reason. The repository is obvious. The production URL tells me whether the thing is actually deployed and live, or exists only on someone's laptop. The infra context matters because what's broken in a Docker container on a VPS is different from what's broken on Vercel serverless. The schema dump is often more useful than running credentials - I can read a schema offline without worrying about accidentally touching a production database.
The incident log is the most underestimated item on this list. What broke, when, and how often tells me more about the real risk surface than any amount of static code reading. If the same user-facing error has appeared fourteen times in three months, that's a clue that everything else can wait. If there have been no incidents at all, that's also information - either the system is genuinely stable, or nobody is monitoring it.
The "who built it" question is not about blame. It tells me what category of problems to look for first. A heavily AI-assisted project, for instance, carries distinct patterns I documented in vibe-coded codebase patterns - and those patterns affect how I read the vatnode.dev class of systems differently from a traditionally-authored codebase. The failure patterns are different depending on origin: a codebase built by an agency over two years drifts in one way; a rushed MVP written by the founder over weekends drifts in another; something post-acquisition where two teams' code was merged drifts in a third; a heavily AI-assisted project carries its own recurring patterns. None of these is universal - every codebase deserves to be read on its own terms - but the origin calibrates where I look first.
The "what broken means to you" question is important precisely because the answer is often wrong. Founders describe symptoms. The underlying cause is usually something different. But the symptoms tell me where the business pain is, which shapes how I prioritize what I find.
The first hour: landscape
The first sixty minutes are about understanding the shape of the thing. I'm not debugging yet. I'm not forming opinions about what should be rewritten. I'm building a map.
README and package.json first. What does the project claim to be? What runtime, what framework, what dependencies are declared? I read the README not to follow its setup instructions, but to understand what the team thought they were building. Then I check whether those claims match reality - whether the documented setup actually produces a running application, whether the dependency versions in the lockfile match what's declared, whether the scripts in package.json correspond to anything in the project structure.
Folder structure. Is there structure at all, or is this flat chaos? Do I see folders named v2, old, _archive, new-approach? These are archaeological markers - remnants of previous attempts that were abandoned but never removed. The presence of multiple competing directories for the same concern tells me the codebase has accumulated history without ever being cleaned up.
Tests. Do they exist? Do they pass? Most importantly: what do they test? A healthy test suite takes a few minutes to assess. An unhealthy one is faster: I look at ten tests at random, and if most of them assert return types rather than return values, or if the coverage number is high but the tests are trivially satisfied, I've learned something significant. Green CI on incorrect behaviour is one of the more reliable signals that the codebase has problems its authors couldn't see.
CI/CD pipeline. Is there one? When did it last run green? A pipeline that hasn't passed in six weeks is a project that's been drifting for six weeks. No pipeline at all is a project being deployed by hand, which means every deployment is a manual operation that depends on whoever pressed the button remembering the steps correctly.
Git history. Who committed, how often, in what volumes. A first commit that adds 500 files is a signal worth investigating - it might be an AI generation, an import from a previous repository, a monorepo migration, a framework scaffold, or an internal code transfer. Each of these implies a different starting condition, and asking the team to explain it is more reliable than guessing. Commit messages dominated by "update" and "fix" can indicate weak engineering discipline, but they can also reflect a team that prioritized other communication channels (issue trackers, PR descriptions) - useful as a soft signal, not a verdict on its own. The patterns I find more reliable: cadence, who is committing where, whether changes are bundled or atomic, and whether the history shows the codebase being progressively cleaned or progressively accumulated.
Dependency audit. I run npm audit or the equivalent, note the count of high-severity vulnerabilities, and scan for duplicate coverage: two HTTP clients, three date libraries, multiple utilities that do the same thing. Dependency proliferation is a reliable indicator of how much architectural coordination happened during development.
Search passes for known red flags. TODO, FIXME, HACK in production paths. console.log statements that were left in. any scattered through TypeScript files. SQL strings assembled by template literals. These don't tell me the system is broken, but they tell me how carefully it was built - and they cluster in the same files as the actual bugs, reliably enough to be useful navigation.
At the end of the first hour, I have a landscape map. Not a verdict - a map. I know the scale of the problem, the zones of highest risk, and whether there are obvious immediate priorities. Now I go deeper.
The second hour: critical paths
The second hour is where I form the actual assessment. I pick five areas and read them carefully.
Authentication. Where does it live? Is it in one place or scattered? I trace a request from the browser through every layer that should be checking identity, and I verify those checks are actually present and consistent. Authentication is the highest-risk area in any application - not because it's the most likely thing to be completely broken, but because when it is broken, the consequences are irreversible. I'm looking for: tokens stored in the wrong place, session validation that can be bypassed, multiple partially-implemented auth approaches that might interact badly.
Database schema. Do the models match the migrations? Are foreign keys enforced at the database level, not just in the ORM? Are there indexes on the columns that actually appear in WHERE clauses, or were indexes added speculatively on columns that are never queried? Schema drift - where the migration history and the actual database schema have diverged - is one of the most expensive problems to discover late, because it means the database cannot be reliably reproduced from the repository alone.
Money paths. If there are payments, I read every line. Stripe webhook handling: is it idempotent? Can the same event be processed twice without creating a duplicate charge? (I've written about Stripe webhook idempotency in production if you want the implementation detail.) VAT logic: is it configured per jurisdiction or hardcoded for one country? The combination of payment bugs and incorrect tax handling is the category most likely to have legal and financial consequences beyond the technical problem, so I give it disproportionate attention relative to its share of the codebase.
The reported problem. Whatever the founder described as "broken" - I find the file that contains that functionality and read it. This is often the most revealing part of the audit, not because the bug is always immediately obvious, but because the state of the code around the reported problem is usually representative of the codebase at large. A well-maintained project has clean, readable code around its bugs. A project in serious trouble has code that's hard to even follow before you find the defect.
Operational failure handling. What happens when an external service goes down? What happens when the database is unreachable? What happens when a background job fails halfway through? Good operational handling is unglamorous to build and easy to skip - which means the quality of error handling, retry logic, and failure visibility is one of the most reliable proxies for how much production experience went into the codebase.
By the end of this pass, I can usually form an initial recommendation: what does this codebase look like it needs? The recommendation is provisional - production behaviour, real traffic, and the operational context I haven't yet seen can shift it. But it's grounded enough to start a serious conversation.
The decision framework I use
- If the business logic and data model are sound but the infrastructure and operational patterns are broken: stabilize. The foundation is there; it needs skilled work, not replacement.
- If the business model has changed significantly since the codebase was built, and the code no longer reflects what the product actually does: partial or full rewrite may be justified. The code is not wrong so much as it answers the wrong question.
Rewrite is also the honest answer in several other situations - when the original technology choice has become a permanent constraint on velocity and there is no migration path inside it; when the operational entropy is high enough that stabilization becomes an ongoing money pit rather than a finite engagement; when architectural decisions made early have become structurally irreversible (a single shared database under a monolith now serving multiple products, for example); or when the codebase carries enough accumulated risk that the cost of maintaining it exceeds the cost of replacing it.
I'm not against rewrites. I'm against rewriting reflexively before anyone has measured whether stabilization is feasible.
- If two or three specific modules are broken and the rest is functional: targeted rescue. Fix the things that are broken, leave the rest alone.
The part the code doesn't show
Reading code carefully will tell me a great deal. It won't tell me everything that matters, and pretending otherwise would be unprofessional.
A meaningful proportion of failing projects are not primarily failing because of the code. They're failing because of ownership ambiguity, missing operational knowledge, undocumented deployment rituals, unclear product direction, or accumulated tribal understanding that left the company when a key engineer did. The code in those projects reflects the dysfunction; it isn't the source. Rewriting that code without changing the conditions around it produces a new codebase that drifts toward the same problems.
I look for traces of this in the audit - incident retrospectives that point at the same coordination failure repeatedly, deployment instructions that exist only in someone's head, a critical service nobody on the current team can fully explain. When I see these signals strongly, I say so.
Comments
No comments yet. Start the discussion.