The systematic evaluation of prompts and outputs against expected behavior or quality criteria.
Testing is how prompt engineers determine whether a prompt actually works. It involves trying the prompt across different inputs, edge cases, and failure modes to see whether the output is accurate, consistent, and useful.
Without testing, prompt quality is often judged by anecdote. A prompt may look good in one example while still failing badly in realistic usage.
Testing becomes even more important in production systems, where prompts need to support repeatability, regression detection, and operational confidence.