Code Review Checklist: What I Look for After 10 Years of PRs \u2014 COD-AI.com

Q: What about the business logic layer: where most bugs live?

After a decade of code reviews, I can tell you where bugs hide: in the business logic layer, specifically in the edge cases and state transitions. Not in the UI components. Not in the database queries. In the code that implements your actual business rules.

Three years ago, I approved a pull request that cost my company $47,000 in lost revenue over a single weekend. The code looked fine. The tests passed. The logic seemed sound. But I missed something subtle in how we handled timezone conversions for our subscription billing system, and customers in certain regions got charged twice.

💡 Key Takeaways

The First 30 Seconds: What the PR Description Tells You
Size Matters: The 400-Line Rule
The Business Logic Layer: Where Most Bugs Live
Error Handling: The Difference Between Good and Great Code

That incident changed how I review code forever. I'm Sarah Chen, and I've been a senior engineering manager at three different SaaS companies over the past decade, reviewing an average of 15-20 pull requests per week. That's roughly 8,000 PRs in my career. I've seen brilliant code that shipped bugs, and messy code that ran flawlessly in production for years. I've learned that code review isn't about finding syntax errors—your linter does that. It's about catching the invisible problems that only experience can reveal.

This article is the checklist I wish I'd had when I started. It's not comprehensive—no checklist could be—but it represents the patterns I've learned to recognize, the questions I've trained myself to ask, and the mental models that have saved my teams from countless production incidents.

The First 30 Seconds: What the PR Description Tells You

Before I look at a single line of code, I spend 30 seconds reading the PR description. This might seem backwards, but I've found that 60% of problematic PRs reveal themselves immediately through poor communication. A good PR description answers three questions: what changed, why it changed, and what could go wrong.

When I see a description that says "fixed bug" or "updated component," I immediately know this review will take longer. The author hasn't thought through the implications of their changes, which means I need to do that work for them. Conversely, when I see a description that says "Refactored user authentication to use JWT tokens instead of session cookies. This reduces database load by 40% during peak hours but requires clients to handle token refresh. Risk: existing mobile app versions (< 2.3) will need to re-authenticate," I know the author has done their homework.

I look for specific metrics in the description. Not vague improvements, but actual numbers. "Improved performance" means nothing. "Reduced API response time from 340ms to 180ms for the user profile endpoint under 1000 concurrent requests" tells me the author measured their work and understands the impact. In my experience, developers who include metrics in their PR descriptions write more thoughtful code overall.

The description should also link to relevant context—the ticket, the design doc, the Slack thread where the approach was discussed. If I'm reviewing a PR in isolation, without understanding the broader context, I'm going to miss things. I once approved a "simple refactoring" that unknowingly broke a critical integration because I didn't know about a dependency that wasn't documented anywhere except in a three-month-old email thread.

Red flags in PR descriptions include: no description at all, descriptions that are longer than the code changes themselves (usually indicates over-engineering), descriptions that say "WIP" or "draft" but the PR is marked as ready for review, and descriptions that include phrases like "not sure if this is the right approach" (then why are you asking me to review it?).

Size Matters: The 400-Line Rule

I have a hard rule: I don't thoroughly review PRs larger than 400 lines of actual code changes (excluding generated code, package-lock files, and test fixtures). This isn't arbitrary. Research from Cisco and SmartBear shows that code review effectiveness drops dramatically after 400 lines, and my own experience confirms this. After reviewing about 200 lines of code, my brain starts to gloss over details. By 400 lines, I'm basically just skimming.

"Code review isn't about finding syntax errors—your linter does that. It's about catching the invisible problems that only experience can reveal."

When someone submits a 1,200-line PR, I ask them to break it up. I don't care if it's "all related." Large PRs are where bugs hide. I've seen critical security vulnerabilities slip through in massive refactoring PRs because reviewers got fatigued and stopped paying attention around line 600. The vulnerability was on line 847.

There are exceptions, of course. Sometimes you need to move files around, or update generated code, or make sweeping changes to meet a new linting standard. In those cases, I ask the author to separate the mechanical changes from the logical changes. Submit the file moves in one PR, the actual logic changes in another. This makes both PRs easier to review and easier to revert if something goes wrong.

I also pay attention to the ratio of code added versus code deleted. In my experience, the best PRs have a negative net line count—they accomplish something while making the codebase smaller. When I see a PR that adds 500 lines and deletes 50, I get suspicious. Are we adding complexity? Are we duplicating existing functionality? Is the author aware of what's already in the codebase?

The size issue extends to individual files too. If a PR touches 30 different files, even if the total line count is reasonable, that's a red flag. It suggests the change is either poorly scoped or the codebase has coupling problems. Either way, it deserves extra scrutiny.

The Business Logic Layer: Where Most Bugs Live

After a decade of code reviews, I can tell you where bugs hide: in the business logic layer, specifically in the edge cases and state transitions. Not in the UI components. Not in the database queries. In the code that implements your actual business rules.

PR Description Quality	What It Says	Review Time	Risk Level
Poor	"fixed bug" or "updated component"	2-3x longer	High
Adequate	Basic what and why, no risk analysis	Normal	Medium
Good	Clear what, why, and potential risks identified	Efficient	Low
Excellent	Comprehensive context, trade-offs discussed, edge cases noted	Fast	Very Low

When I review business logic, I'm looking for assumptions. Every assumption is a potential bug. I once reviewed code for a discount calculation system that assumed all discounts were percentages. The code worked perfectly until marketing wanted to offer a "$10 off" promotion. The system crashed because someone divided by the discount amount, and dividing by 10 when you expected a number between 0 and 1 produces very wrong results.

I ask myself: what happens at the boundaries? What if the input is zero? What if it's negative? What if it's null? What if it's an empty string versus a null string? What if the array is empty? What if the user has no permissions? What if they have all permissions? What if the database returns no results? What if it returns a million results?

I look for magic numbers in business logic. When I see if (user.loginAttempts > 5), I ask: why 5? Is that documented? Is it configurable? What happens if we want to change it to 3 for certain user types? Magic numbers in business logic are technical debt waiting to happen.

State machines are another common source of bugs. When I see code that manages state transitions—order status, user lifecycle, workflow stages—I draw out the state diagram in my head. Are all transitions accounted for? Can you get into an invalid state? I once caught a bug where a user could be simultaneously "active" and "suspended" because the code that set one status didn't check the other.

🛠 Explore Our Tools

How to Generate Hash Values — Free Guide → CSS Minifier - Compress CSS Code Free → Developer Toolkit: Essential Free Online Tools →

I also watch for business logic that's spread across multiple layers. If I see validation in the UI, more validation in the API, and still more validation in the database layer, I know we're going to have inconsistencies. Business rules should live in one place, and that place should be obvious.

Error Handling: The Difference Between Good and Great Code

You can tell the quality of a developer by how they handle errors. Junior developers ignore errors or catch them generically. Senior developers handle errors thoughtfully. After reviewing thousands of PRs, I've developed a simple test: I look at every error handling block and ask "what will the person debugging this at 2am understand from this error?"

"I've found that 60% of problematic PRs reveal themselves immediately through poor communication. A good PR description answers three questions: what changed, why it changed, and what could go wrong."

Generic error messages are useless. catch (error) { console.log('Error occurred'); } tells me nothing. When did it occur? What was the system trying to do? What was the input? What's the recommended action? A good error message might be: Failed to process payment for order #12345 using Stripe token tok_abc123. Customer card was declined with code: insufficient_funds. User ID: usr_789. Timestamp: 2024-01-15T14:23:11Z.

I look for error handling that's too broad. try { /* 200 lines of code */ } catch (error) { /* handle error */ } is a code smell. What exactly are you catching? Different errors require different handling. A network timeout should be retried. A validation error should be shown to the user. A database constraint violation might indicate a race condition. Catching everything with one handler means you're not thinking about failure modes.

I also watch for silent failures. Code that catches an error, logs it, and continues as if nothing happened is dangerous. I once debugged a production issue where we were silently failing to send confirmation emails for 15% of orders. The error was caught and logged, but no one was monitoring those logs, and customers just thought we were unprofessional. The bug had been in production for six weeks.

Error recovery is another area I scrutinize. When something fails, does the code leave the system in a consistent state? I look for partial updates—code that updates the database but fails to update the cache, or updates one service but fails to update another. These are the bugs that cause data inconsistencies that take months to track down.

Finally, I check whether errors are being used for control flow. try { user = getUser(); } catch { user = createUser(); } is an anti-pattern. Exceptions should be exceptional. If you expect something to fail regularly, use explicit checks, not exception handling.

Performance: The Questions Most Reviewers Skip

Most code reviewers don't think about performance until it becomes a problem. I learned to think about it upfront after watching a "simple feature" bring down our entire platform because it generated 10,000 database queries per page load.

When I review code, I look for N+1 queries. This is the most common performance bug I see, and it's invisible in development with small datasets. The pattern looks like: fetch a list of users, then loop through and fetch each user's profile. With 10 users in development, this is fine. With 10,000 users in production, your database melts. I've seen this pattern in at least 200 PRs over my career.

I look at loops and ask: what's the worst-case scenario? for (let i = 0; i < items.length; i++) looks innocent, but what if items has a million elements? What if each iteration makes an API call? What if it's nested inside another loop? I once reviewed code that had three nested loops over user data. The complexity was O(n³). With 100 users, it ran in 50ms. With 1,000 users, it took 45 seconds.

Caching is another area I scrutinize. When I see a cache being added, I ask: what's the invalidation strategy? How long do items live in the cache? What happens if the cache and database get out of sync? I've seen caches that improved performance by 10x but caused data consistency issues that took weeks to debug.

I also look for unnecessary data fetching. Code that fetches entire objects when it only needs one field. Code that loads all records when it only needs the count. Code that fetches data "just in case" it might be needed. Every database query has a cost, and those costs add up.

Memory usage is harder to spot in code review, but I watch for obvious issues: loading large files entirely into memory, building large arrays or objects in loops, keeping references to objects that should be garbage collected. I once reviewed code that "fixed" a memory leak by increasing the server's RAM from 4GB to 16GB. That's not a fix, that's hiding the problem.

Security: The Checklist Within the Checklist

Security vulnerabilities are the bugs that make headlines. After seeing three different companies I've worked with deal with security incidents, I've developed a paranoid approach to security in code reviews. I assume every input is malicious until proven otherwise.

"I've seen brilliant code that shipped bugs, and messy code that ran flawlessly in production for years."

SQL injection is still shockingly common. When I see string concatenation in database queries—"SELECT * FROM users WHERE id = " + userId—I reject the PR immediately. Use parameterized queries. Always. No exceptions. I don't care if the input is "validated." I don't care if it's "only used internally." Use parameterized queries.

Cross-site scripting (XSS) is another common vulnerability. When I see user input being rendered in HTML, I check whether it's being escaped. innerHTML = userInput is dangerous. textContent = userInput is safe. The difference is one word, but the security implications are massive. I once caught an XSS vulnerability in a "simple" feature that displayed user names. The developer assumed names would only contain letters, but our system allowed special characters, including script tags.

Authentication and authorization logic gets special scrutiny. I look for checks that happen on the client side but not the server side. I look for authorization checks that happen before the operation but not during the operation (time-of-check to time-of-use vulnerabilities). I look for code that checks if a user is an admin but doesn't check if they're an admin of the specific resource they're trying to access.

I watch for secrets in code. API keys, passwords, tokens—these should never be hardcoded. I've seen developers commit AWS credentials to public repositories. I've seen production database passwords in configuration files. I've seen encryption keys in comments. If I see any secret in a PR, I reject it and ask the author to rotate the secret immediately.

Rate limiting and input validation are also on my security checklist. Can this endpoint be abused? Can someone submit a 10GB file? Can they make 10,000 requests per second? Can they submit negative numbers where only positive numbers make sense? Every input is an attack vector until you prove otherwise.

Testing: What the Tests Don't Tell You

I love tests. I require tests. But I've learned that passing tests don't mean working code. I've approved PRs with 100% test coverage that still had critical bugs in production. The issue isn't the tests themselves—it's what they test and what they don't test.

When I review tests, I look at what they're asserting. Tests that only check happy paths are worse than no tests because they give false confidence. I want to see tests for error cases, edge cases, boundary conditions. I want to see tests that verify the system behaves correctly when things go wrong, not just when they go right.

I look for tests that are too specific. Tests that assert exact string matches, exact array orders, exact timestamps. These tests are brittle—they break when you make unrelated changes. Good tests verify behavior, not implementation details. If I can refactor the code without changing the tests, the tests are probably good. If every code change requires updating tests, the tests are probably too coupled to the implementation.

I also watch for tests that don't actually test anything. I've seen tests that mock every dependency and then verify the mocks were called. That's not testing your code, that's testing your mocks. I've seen tests that catch exceptions and pass if an exception was thrown, without checking what exception or why. I've seen tests that make assertions inside try-catch blocks, so if the assertion fails, the test still passes.

Integration tests deserve special attention. When I see integration tests, I check: are they testing the actual integration, or are they just unit tests with extra steps? Are they using real dependencies or mocked ones? Are they testing the failure modes of the integration? I once reviewed integration tests for a payment system that only tested successful payments. They never tested what happened when the payment provider was down, or slow, or returned an error. Guess what failed in production?

Test data is another area I scrutinize. Are the tests using realistic data? I've seen tests that use "[email protected]" as an email address, "123" as a phone number, and "Test User" as a name. Then production breaks because real data has apostrophes, unicode characters, and edge cases the tests never considered.

The Human Element: Code Review as Communication

After 10 years, I've learned that code review is as much about communication as it is about code. The best technical review is useless if it's delivered in a way that makes the author defensive or discouraged.

I've developed a framework for my review comments. I categorize them as: blocking (must be fixed before merge), important (should be fixed but not blocking), suggestion (nice to have), and question (I'm genuinely asking, not criticizing). This helps the author prioritize and understand my intent.

I try to explain the "why" behind my comments. Instead of "This is wrong," I write "This could cause a race condition if two users update the same record simultaneously. Consider using optimistic locking or a database transaction." The author learns something, and they're more likely to apply that knowledge in future PRs.

I also try to find something positive in every PR. Even if the code needs significant changes, there's usually something good—a clever solution, a thorough test, a clear variable name. Acknowledging good work makes the criticism easier to accept and keeps the relationship collaborative rather than adversarial.

I've learned to pick my battles. Not every issue needs to be fixed in this PR. If I see a broader architectural problem, I might approve the PR but open a separate ticket to address the larger issue. Blocking a PR because of problems that existed before the PR is demoralizing and slows down the team.

Response time matters too. I try to review PRs within 4 hours during business hours. Developers lose context if they have to wait days for feedback. They also lose momentum. A quick review, even if it requests changes, is better than a perfect review that comes too late.

The Meta-Review: Reviewing Your Own Review Process

The final item on my checklist is the hardest: reviewing my own review process. Am I being consistent? Am I letting my biases affect my reviews? Am I reviewing code from senior developers less thoroughly than code from junior developers?

I track my review metrics. How many bugs make it to production from PRs I approved? How long do my reviews take? How often do I request changes? I've found that I approve about 40% of PRs without changes, request minor changes on 45%, and request major changes on 15%. If those numbers shift significantly, it tells me something—either the team's code quality is changing, or my standards are changing.

I also solicit feedback on my reviews. I ask developers: are my comments helpful? Are they clear? Am I being too nitpicky or not thorough enough? This feedback has changed how I review. I used to focus heavily on style and formatting. Now I focus on logic and architecture, and let automated tools handle style.

I've learned to recognize when I'm not the right reviewer. If I don't understand the domain, or the technology, or the context, I should either ask questions until I do understand, or bring in someone who already does. Rubber-stamping a PR because I don't understand it is worse than not reviewing it at all.

I also review my false positives—times I flagged something as a problem that turned out to be fine. This helps me calibrate my instincts and avoid wasting the author's time. I once spent 20 minutes writing a detailed comment about a potential race condition, only to realize the code was running in a single-threaded environment where race conditions were impossible.

The goal isn't perfection. The goal is continuous improvement. Every PR I review teaches me something—about the codebase, about the team, about software engineering, and about myself as a reviewer. That $47,000 billing bug I mentioned at the start? It taught me to always check timezone handling in financial code. I've caught three similar bugs since then, saving far more than $47,000.

Code review is a skill that compounds over time. The patterns you learn to recognize, the questions you learn to ask, the instincts you develop—these accumulate and make you better at spotting problems before they reach production. This checklist represents 10 years of accumulated pattern recognition. Your checklist will be different, shaped by your experiences and your domain. But the meta-skill—learning to review code thoughtfully and systematically—that's universal.

Disclaimer: This article is for informational purposes only. While we strive for accuracy, technology evolves rapidly. Always verify critical information from official sources. Some links may be affiliate links.