The API Testing Checklist I Use for Every Endpoint

Three years ago, I watched a production API fail spectacularly at 2 AM because nobody tested what happens when you send a date field formatted as "32/13/2021." The cascade was beautiful in the worst way possible: 47,000 failed transactions, angry customers flooding support channels, and a CEO who wanted answers I didn't have. That night changed how I approach API testing forever.

💡 Key Takeaways

Authentication and Authorization: The Foundation Layer
Request Validation: Input Boundary Testing
Response Validation: Ensuring Data Integrity
Error Handling: The Difference Between Good and Great APIs

I'm Sarah Chen, and I've been a QA automation engineer for eight years, the last five focused exclusively on API testing for fintech and healthcare platforms. I've tested everything from simple CRUD endpoints to complex payment processing APIs handling millions of dollars daily. What I've learned is this: most API failures aren't exotic edge cases—they're predictable problems that a systematic checklist would have caught.

The checklist I'm sharing today is the exact one I use for every single endpoint I test. It's saved my team from at least a dozen production incidents in the past year alone, and it's comprehensive enough that junior engineers can follow it while detailed enough that it catches issues senior developers miss. This isn't theory—this is battle-tested process refined through hundreds of API implementations.

Authentication and Authorization: The Foundation Layer

Before I test anything else, I verify the security perimeter. This isn't just about checking if authentication works—it's about systematically probing every authentication scenario and authorization boundary. I've seen too many APIs that work perfectly with valid credentials but fail catastrophically or leak data when credentials are missing, malformed, or belong to the wrong user.

First, I test with no authentication token at all. The endpoint should return a 401 Unauthorized status, not a 500 Internal Server Error, and definitely not actual data. I've encountered production APIs that returned full user records when no auth token was provided because the developer assumed the authentication middleware would always run. It didn't.

Next, I test with an expired token. This catches a surprising number of issues because token expiration logic often lives in a different part of the codebase than initial authentication. The response should be a clear 401 with a message indicating the token has expired, not a generic "unauthorized" that leaves the client guessing whether to refresh or re-authenticate.

Then I test with a malformed token—random strings, tokens with characters removed, tokens from other systems. The API should handle these gracefully without exposing stack traces or internal error details. I once found an API that would crash the entire service when given a token containing certain Unicode characters because the JWT parsing library wasn't properly handling encoding.

Authorization testing is where things get interesting. I test with valid tokens belonging to users who shouldn't have access to the resource. For a GET /users/123 endpoint, I'll authenticate as user 456 and try to access 123's data. The response should be 403 Forbidden, not 404 Not Found (which leaks information about resource existence) and definitely not 200 with the data.

I also test role-based access control systematically. If your API has admin, manager, and user roles, I test each endpoint with each role. I maintain a matrix spreadsheet: rows are endpoints, columns are roles, cells contain expected status codes. This catches permission bugs before they reach production, where they become security vulnerabilities.

Request Validation: Input Boundary Testing

Input validation is where most APIs show their true quality. A well-designed API validates every input field thoroughly and returns clear, actionable error messages. A poorly designed one either accepts garbage data or crashes mysteriously.

"Most API failures aren't exotic edge cases—they're predictable problems that a systematic checklist would have caught."

I start with required field testing. For every required field, I send a request without it. The API should return 400 Bad Request with a message clearly identifying which field is missing. I've seen APIs that return "validation error" without specifying what failed, forcing developers to guess which of 15 fields caused the problem.

Then I test data type validation. If a field expects an integer, I send strings, floats, booleans, null, arrays, and objects. Each should return a 400 with a clear message like "age must be an integer" not "invalid request format." I once tested an e-commerce API where sending a string for quantity caused the system to create orders for zero items, which broke the entire fulfillment pipeline.

String length validation is critical and often overlooked. I test with empty strings, single characters, strings at the maximum length, strings one character over the maximum, and absurdly long strings (10,000+ characters). I've crashed production databases by sending megabyte-sized strings to fields that weren't properly validated, causing memory exhaustion.

For numeric fields, I test boundary values systematically: zero, negative numbers, decimals when integers are expected, numbers larger than the maximum integer value, and special values like Infinity or NaN. A payment API I tested once accepted negative payment amounts, which would have allowed users to credit their accounts arbitrarily.

Date and time validation deserves special attention because it's consistently problematic. I test with invalid dates (February 30th, month 13), various formats (ISO 8601, Unix timestamps, human-readable strings), dates far in the past or future, and timezone edge cases. The 2 AM incident I mentioned at the start? That was a date validation failure.

For enum fields, I test with valid values, invalid values, case variations, and null. If the API accepts "active" and "inactive" as status values, I'll try "ACTIVE", "Active", "pending", empty string, and null. Each should either be accepted (if case-insensitive) or rejected with a clear message listing valid options.

Response Validation: Ensuring Data Integrity

Testing what comes back is just as important as testing what goes in. I've seen APIs that accept requests perfectly but return inconsistent, incomplete, or incorrectly formatted data that breaks client applications.

Test Scenario	Expected Response	Common Mistake	Risk Level
No Authentication Token	401 Unauthorized	Returns 500 or actual data	Critical
Invalid Date Format	400 Bad Request with clear error	Accepts "32/13/2021" and crashes	High
Wrong User Credentials	403 Forbidden	Leaks other user's data	Critical
Malformed JSON Payload	400 Bad Request	500 Internal Server Error	Medium
Missing Required Fields	400 with field-specific errors	Generic error or silent failure	High

First, I verify response status codes match the HTTP specification and API documentation. A successful creation should return 201 Created, not 200 OK. A resource not found should return 404, not 500. Proper status codes allow clients to handle errors programmatically instead of parsing error messages.

I validate response schemas rigorously. Every field documented in the API specification should be present in the response, with the correct data type. I use JSON Schema validation tools to automate this, but I also manually inspect responses because automated tools miss semantic issues like a "created_at" timestamp that's always null.

Response consistency is crucial. If I create a resource and then retrieve it, the data should match exactly (except for server-generated fields like timestamps). I've found APIs where POST /users returns different field names than GET /users/123, forcing clients to handle the same data structure differently depending on the endpoint.

I test pagination thoroughly for list endpoints. I verify that page size limits work, that page numbers or cursors function correctly, and that the total count is accurate. I request pages beyond the available data to ensure the API handles this gracefully. I also test with page_size=0, negative page numbers, and absurdly large page sizes like 1000000.

For endpoints that return collections, I verify sorting and filtering work as documented. If the API supports sorting by multiple fields or complex filter expressions, I test combinations systematically. I once found an API where combining certain filters would return data from other users' accounts—a critical security vulnerability discovered through systematic testing.

🛠 Explore Our Tools

HTML to PDF Converter — Free, Accurate Rendering → CSS Minifier - Compress CSS Code Free → Developer Tools for Coding Beginners →

Error Handling: The Difference Between Good and Great APIs

How an API handles errors reveals its quality more than how it handles success cases. Great APIs provide clear, actionable error messages that help developers fix problems quickly. Poor APIs return generic errors that require hours of debugging.

"I've seen too many APIs that work perfectly with valid credentials but fail catastrophically or leak data when credentials are missing, malformed, or belong to the wrong user."

I test error message quality by intentionally triggering every error condition I can think of. Each error response should include: a clear HTTP status code, a machine-readable error code, a human-readable message, and ideally, suggestions for fixing the problem. Compare "validation error" to "email field is required and must be a valid email address format (example: [email protected])"—the second saves developers significant time.

I verify that error responses follow a consistent structure across all endpoints. If one endpoint returns errors as {"error": "message"} and another returns {"errors": [{"field": "name", "message": "required"}]}, client applications need special handling for each format. Consistency reduces integration complexity.

Stack traces and internal error details should never appear in production API responses. I test error conditions that might cause exceptions—database connection failures, third-party service timeouts, unexpected data types—to ensure the API returns clean error messages, not raw stack traces that expose internal architecture and create security risks.

I test rate limiting error responses specifically. When a client exceeds rate limits, the API should return 429 Too Many Requests with headers indicating when they can retry (Retry-After) and what their current limit is. I've seen APIs that return 403 Forbidden for rate limiting, which is semantically incorrect and confuses clients about whether the issue is permissions or rate limits.

Timeout handling is another critical area. I use tools to simulate slow network connections and verify the API returns appropriate timeout errors rather than hanging indefinitely. I also test what happens when the API calls external services that timeout—does it fail gracefully or does the entire request hang?

Performance and Load Testing: Beyond Functional Correctness

An API that works correctly under normal conditions but collapses under load is still a broken API. I include basic performance testing in my standard checklist, even though comprehensive load testing is usually a separate effort.

I measure response times for typical requests and establish baselines. A simple GET request should typically respond in under 200ms, a POST creating a resource in under 500ms. These aren't hard rules, but significant deviations warrant investigation. I once found an endpoint that took 8 seconds to respond because it was making 47 separate database queries instead of using joins.

I test with realistic payload sizes. If users will upload images, I test with actual image files of various sizes, not just tiny test files. If the API accepts bulk operations, I test with the maximum allowed batch size. I've discovered APIs that work fine with 10 items but timeout with 100, even though the documentation claims to support batches of 1000.

Concurrent request testing reveals race conditions and resource contention issues. I send multiple requests simultaneously to the same endpoint and verify they all complete successfully without data corruption. I've found shopping cart APIs where concurrent updates would lose items, and inventory systems where simultaneous purchases could oversell products.

I test database connection pooling by sending bursts of requests and monitoring for connection exhaustion errors. APIs should handle connection pool limits gracefully, queuing requests if necessary rather than crashing or returning 500 errors.

Memory leak testing involves running extended test sessions and monitoring memory usage. If memory consumption grows continuously without leveling off, there's likely a leak. I once identified a leak where each request created event listeners that were never cleaned up, causing memory to grow by 2MB per request until the service crashed.

Idempotency and State Management: Ensuring Predictable Behavior

APIs should behave predictably when the same request is sent multiple times. This is especially critical for operations that modify data or trigger external actions like payments or notifications.

"The endpoint should return a 401 Unauthorized status, not a 500 Internal Server Error, and definitely not actual data."

For GET requests, I verify they're truly read-only and don't modify server state. I've encountered GET endpoints that incremented view counters, modified user preferences, or even deleted data—all violations of HTTP semantics that cause problems with caching and browser prefetching.

POST requests for creating resources should handle duplicate submissions gracefully. I test by sending the same creation request twice rapidly and verify that either: only one resource is created, or the second request returns an error indicating a duplicate. I've seen e-commerce APIs that would create duplicate orders when users double-clicked the submit button.

PUT and PATCH requests should be idempotent—sending the same update multiple times should produce the same result as sending it once. I test this by making an update, then making the identical update again, and verifying the resource state is the same. Non-idempotent updates cause problems when requests are retried due to network issues.

DELETE requests should handle already-deleted resources gracefully. Deleting a resource that doesn't exist should return 404 Not Found, not 500 Internal Server Error. Some APIs return 204 No Content for both successful deletion and deletion of non-existent resources, which is acceptable if documented.

I test state transitions systematically. If a resource has a lifecycle (draft → published → archived), I verify that only valid transitions are allowed and that invalid transitions return clear errors. I've found workflow APIs where you could transition from any state to any other state, bypassing business logic and creating data inconsistencies.

Integration and Dependency Testing: The Real-World Context

APIs rarely exist in isolation. They depend on databases, external services, message queues, and other systems. Testing how the API behaves when these dependencies fail is crucial for building resilient systems.

I test database failure scenarios by temporarily making the database unavailable and verifying the API returns appropriate errors (typically 503 Service Unavailable) rather than crashing. The error message should indicate a temporary issue and suggest retrying, not expose database connection strings or internal details.

For APIs that call external services, I test timeout scenarios, error responses, and partial failures. If the API calls a payment processor, what happens when the processor is down? Does the API return a clear error, or does it hang indefinitely? I use tools like WireMock to simulate external service failures without affecting actual services.

I test transaction rollback behavior for operations that modify multiple resources. If an API endpoint updates a user record and sends a notification email, what happens if the email fails? Is the user record update rolled back, or do you end up with inconsistent state? Proper transaction management is critical for data integrity.

Cache invalidation testing ensures that when data changes, cached responses are updated appropriately. I modify a resource through one endpoint and immediately retrieve it through another to verify the changes are visible. I've found APIs where cached data could be hours out of date, causing users to see stale information.

I test webhook and callback functionality by setting up test endpoints that receive notifications. I verify that webhooks are sent for the appropriate events, include the correct data, and handle delivery failures with retries. Webhook systems that don't retry failed deliveries lose data permanently.

Documentation and Contract Testing: Ensuring Accuracy

API documentation is only valuable if it accurately reflects the API's actual behavior. I treat documentation testing as a first-class concern, not an afterthought.

I verify that every endpoint documented in the API specification actually exists and behaves as described. I've found documentation for endpoints that were never implemented, endpoints that were removed but still documented, and endpoints whose behavior changed without documentation updates.

I test every example in the documentation to ensure it works exactly as shown. If the documentation shows a curl command, I copy it verbatim and run it. If it doesn't work, the documentation is wrong. I've seen documentation with examples that have syntax errors, wrong URLs, or outdated authentication methods.

Schema validation ensures that request and response formats match the documented schemas. I use tools like Dredd or Postman to automatically validate API behavior against OpenAPI specifications. This catches discrepancies between documentation and implementation before they reach users.

I verify that error codes and messages match the documentation. If the docs say a missing required field returns error code "FIELD_REQUIRED", the actual API should return exactly that code, not "VALIDATION_ERROR" or "MISSING_FIELD". Consistent error codes allow clients to handle errors programmatically.

Version compatibility testing ensures that API changes don't break existing clients. When a new API version is released, I test that old clients still work with the old version and that the new version behaves as documented. I've seen API updates that broke backward compatibility without warning, causing widespread client failures.

Security Testing: Beyond Authentication

Security testing goes far beyond checking authentication and authorization. I systematically probe for common vulnerabilities that could compromise the API or its users.

I test for SQL injection by sending SQL commands in input fields and verifying they're treated as data, not executed. I try variations like ' OR '1'='1, '; DROP TABLE users; --, and UNION SELECT statements. Modern frameworks usually prevent SQL injection, but I've still found vulnerable custom query builders.

Cross-site scripting (XSS) testing involves sending JavaScript code in input fields and verifying it's properly escaped in responses. I test with , event handlers like , and various encoding variations. APIs that return user input without sanitization create XSS vulnerabilities in client applications.

I test for mass assignment vulnerabilities by sending extra fields in requests that shouldn't be user-modifiable. If a user update endpoint accepts name and email, I'll also send is_admin: true and verify it's ignored. I've found APIs where you could make yourself an admin by including the right field in a profile update.

Rate limiting testing ensures the API can't be abused through excessive requests. I send rapid-fire requests and verify that rate limits are enforced consistently. I also test that rate limits are per-user, not global—otherwise one user can exhaust the limit for everyone.

I test for information disclosure by examining error messages, response headers, and timing differences. Error messages shouldn't reveal whether a username exists (use generic "invalid credentials" instead of "username not found"). Response timing shouldn't differ based on whether a resource exists, as this leaks information.

My Testing Workflow: Putting It All Together

Having a comprehensive checklist is valuable, but knowing how to apply it efficiently is what separates experienced testers from beginners. I've refined my workflow over years to maximize coverage while minimizing time spent.

I start with automated tests for the happy path—valid requests that should succeed. These run in seconds and catch obvious regressions. I use Postman collections with environment variables so I can run the same tests against development, staging, and production environments.

Next, I run automated negative tests—invalid inputs, missing authentication, wrong permissions. These catch the majority of bugs and take just a few minutes. I maintain a library of reusable test cases for common scenarios like "missing required field" that I can quickly adapt to new endpoints.

Manual exploratory testing comes next. This is where I try unusual combinations, edge cases, and scenarios that are hard to automate. I spend about 30 minutes per endpoint on exploratory testing, guided by my checklist but not rigidly following it. This is where I find the interesting bugs.

I document findings immediately in a structured format: endpoint, test case, expected result, actual result, severity. I've learned that bugs not documented immediately are bugs forgotten. I use severity levels (critical, high, medium, low) to help prioritize fixes.

For critical endpoints—anything involving payments, authentication, or sensitive data—I do a second pass with a colleague. Fresh eyes catch issues I've become blind to. Pair testing has caught critical security vulnerabilities that I missed in solo testing.

I maintain a regression test suite that grows with each bug found. When a bug is fixed, I add a test case to prevent it from recurring. My regression suite for a typical API has 200-300 test cases and runs in about 10 minutes. It's caught dozens of regressions before they reached production.

The checklist I've shared isn't meant to be followed rigidly for every endpoint—that would be inefficient. Simple CRUD endpoints need less scrutiny than complex business logic. But having the checklist ensures I don't forget critical test cases, especially when I'm tired or rushed. It's my safety net, refined through years of finding bugs the hard way. Use it, adapt it to your context, and add your own lessons learned. Your 2 AM self will thank you.

Disclaimer: This article is for informational purposes only. While we strive for accuracy, technology evolves rapidly. Always verify critical information from official sources. Some links may be affiliate links.