You’re scoping your first paid penetration test. The trigger is usually one of five — a customer security questionnaire that won’t go away, an ISO 27001 or SOC 2 or DORA or NIS 2 evidence cycle, a board ask, a post-incident review, or a vendor whose last engagement underdelivered and a quiet decision to switch. Whatever the trigger, the buyer-side problem is the same: most teams don’t know what a useful penetration test should produce.
The market is full of vendors who deliver long reports of low-severity scanner noise, miss the attack path that actually compromises the business, and don’t retest what they recommend you fix. This guide walks through the artefacts a useful pen test should produce, how to scope honestly, and how to tell the difference between vendors who give you proof of risk and vendors who sell you a PDF. Written from the perspective of the team running the engagement, not the team selling it.
A useful penetration test produces three artefacts, not one report
Most pen test reports are written for one audience — the engineer who will fix the findings. The security leadership who has to explain the risk and the platform team who has to plan the remediation are left to extract what they need from a document not written for them. That extraction step is where engagement value leaks.
The Technical Report is the engineering artefact. It contains every finding, the reproduction steps, the affected scope, the CVSS score with the business-impact reasoning, and the remediation guidance. The audience is the engineer who will reproduce the issue, write the fix, and ship it. The question it answers: what exactly is broken and how do I close it?
The Executive Risk Brief is the leadership artefact. Six to ten pages, no jargon, no findings table. The audience is the CISO, the board sponsor, and any auditor or customer reviewer reading the engagement at arm’s length. The question it answers: what risk did this engagement surface, and what does the business need to decide?
The Action Plan is the operations artefact. It tracks every Critical and High finding through Open → In Progress → Remediated → Retested → Closed. The audience is the platform or SRE team that owns remediation scheduling. The question it answers: what work do we sequence next, and who owns it?
We walk through how the three artefacts are produced on the methodology page. One artefact for three audiences is the most common reporting failure in the market — each audience needs a different level of abstraction and a different question answered.
The engagement lifecycle — what each stage looks like from the buyer’s side
A useful engagement runs through six lifecycle stages, and each one produces something — a document, a decision, a deliverable. If a vendor compresses a stage or skips it, the gap shows up in the report.
1. Scoping. Before any quotation, you should be asked five questions: what asset are we testing, what’s the engagement trigger, what’s out of scope, what’s the timeline, and who will read each artefact. A vendor who quotes without those answers is pricing a guess. Output: a scoping document the buyer signs off.
2. Pre-engagement. The Rules of Engagement (ROE) is drafted — authorisation, contact tree, test windows, escalation contract, data-handling rules. Your legal team reviews it. Active testing does not start until the ROE is signed on both sides. Output: a signed ROE.
3. Active testing. The work happens. You should receive a daily standup or written status during the active testing window — including a “no critical findings observed today” note when that’s the case, so the absence of news isn’t ambiguous. Critical findings escalate within four hours, out-of-band of the regular cadence. Output: working evidence + draft findings.
4. Reporting. The three artefacts — Technical Report, Executive Risk Brief, Action Plan — are drafted, internally peer-reviewed, then walked through with you. The walkthrough is mandatory, not optional. Output: the three artefacts, plus a recorded walkthrough.
5. Action planning. Each Critical and High finding gets a remediation owner, a target date, and a verification approach agreed on with your platform or SRE team. This is a working session, not a handover email. Output: a sequenced, owner-assigned Action Plan.
6. Closure. Every Critical and High finding is retested after remediation. The Action Plan moves each finding from Open → In Progress → Remediated → Retested → Closed. Findings are not certified closed by the vendor unless they have been retested. Output: a Closure Memo that lists the final state of every finding.
Compressed Scoping or skipped Action Planning is the most common quality defect in the market — and the easiest to detect before signing the SOW. Ask for the scoping document, the ROE template, and the Action Plan template before the quotation lands. A vendor who can’t share all three is selling you a PDF.
Reproduction steps are not optional
For every finding, the report should contain five elements: the preconditions (state of the system, user role, or authentication context required), the exact request or payload used to trigger the issue, the expected response, the observed response, and the remediation hypothesis. That’s the reproducibility contract.
The reason it matters is not academic. Six months after the engagement closes, a third party — an auditor, a regulator under DORA or NIS 2, a prospective customer’s security review team — will read the report without the original tester present. If the finding cannot be reproduced from the report alone, it cannot be defended.
Run this test on any prior pen test report you have access to: open the top three Critical or High findings, hand them to a competent engineer on your team, and ask if they can reproduce the issue without contacting the testing firm. If the answer for any of the three is no, the report has a reproducibility defect. A finding without reproduction steps is an opinion. A finding with reproduction steps is evidence.
CVSS plus business impact — why severity inflation is a quality defect
CVSS v3.1 is the technical baseline. It scores exploitability (attack vector, complexity, privileges required, user interaction), scope (does the issue cross a security boundary), and impact (confidentiality, integrity, availability). The output is a number between 0.0 and 10.0, mapped to named tiers — Low, Medium, High, Critical.
CVSS alone is not enough. A CVSS-7.5 finding on a sandboxed test instance is not the same risk as a CVSS-7.5 finding on the production payment processor. Same number, different priority. A useful report applies a business-impact overlay — asset criticality, blast radius, exploitability in the buyer’s actual environment — and lets the overlay shift the priority away from what the CVSS tier alone would suggest.
Severity inflation is a sales tactic, not a security signal. Vendors who score every SQL injection finding as Critical regardless of context are signalling that the scoring is for invoice justification, not remediation prioritisation. The same pattern shows up in vendors who refuse to score anything Low — “if we found it, it must matter” — which inflates the report’s apparent value while breaking your engineering team’s ability to triage.
The scoring rubric we use is published. CVSS as the technical floor, business impact as the prioritisation overlay, the result documented per finding so the reasoning survives the report being read in isolation. If every finding in a report is Critical or High, the prioritisation is broken — and the report is shipping you alarm, not ranked work.
What happens when we find something that can’t wait until the report
Critical findings — active exploit paths into production, exposed credentials, live remote code execution, exposed sensitive data — are reported within four hours of discovery, out-of-band of the regular reporting cadence. A phone call, an encrypted message to the named incident contact in the ROE, a written confirmation within the same window. The work continues in parallel with the buyer-side response.
The reason for the four-hour SLA is operational. A critical finding sitting unread in a draft report for three weeks is a liability the buyer is paying for. The discovery doesn’t make the issue safer; it makes the silence riskier.
Ask every vendor: “What’s your critical-finding SLA? Is it written into the SOW?” If the answer is vague, or the SLA only appears in the marketing material and not in the contract, the vendor is reserving the right to bury bad news. The SLA belongs in the SOW.
Chained findings matter more than long lists of Mediums
A scanner-driven report flags individual findings. An information disclosure here, a misconfigured CORS header there, a stale dependency over there. Each one rated Low or Medium on CVSS, each one buried in a long table. The buyer reads the table, decides the report is mostly noise, and the engagement is over.
A useful report does something different: it tells you how the individual findings chain together to compromise a real business asset, written as a paragraph-format narrative with reproduction steps for each step in the chain.
Take a plausible chain. An information disclosure in the support portal leaks the internal staging environment’s hostname. The staging environment has weak authentication that the production environment was patched against four months ago but staging never was. Staging shares production secrets through its environment variables — a legacy convenience nobody removed. From staging, the production database read-replica is reachable on the internal network. Four findings, each individually CVSS-Medium or below; together, a Critical-business-impact path that exfiltrates customer data without touching the production application.
Scanners don’t do this. They flag the four findings individually and let the buyer connect the dots. Connecting the dots is the analyst’s job — the part of pen testing that cannot be automated, the part the buyer is actually paying for.
A long list of Mediums with no narrative is a scanner export. A short list of attack paths with reproduction steps is a pen test.
No closure without retest
The Action Plan moves every Critical and High finding through five states: Open → In Progress → Remediated → Retested → Closed. The vendor does not certify closure on its own authority; it certifies closure after retesting the remediation in the same environment and against the same reproduction steps that produced the original finding.
The reason this matters is that remediation is the part of the engagement most prone to silent failure. A fix that compiles and passes the existing test suite is not necessarily a fix that closes the attack path. Engineering teams under deadline pressure routinely close a finding with a partial mitigation — a WAF rule that blocks the specific payload from the report, a one-off input validation that doesn’t generalise. The retest is the only verification that the buyer was actually given the security outcome they paid for.
A vendor who issues a final report without a retest cycle, or who certifies closure based on a buyer-supplied screenshot of the new code, is shipping you compliance theatre rather than security. Ask before you sign: “What’s your retest process? Is the retest priced into the engagement, or is it a separate line item I have to authorise after the fact?” The answer determines whether the engagement closes with verified outcomes or with a PDF.
Out-of-scope by design — what a pen test isn’t
A useful penetration test has explicit exclusions, defined in writing before any active testing starts. The constraints exist because the alternative — “test everything, however you want” — carries legal exposure for both sides, breaks scope discipline, and gives the buyer no way to evaluate the outcome.
The standard exclusions are concrete. No denial-of-service or volumetric testing against production. No destructive payloads — no data deletion, no encryption, no actions that mutate state in ways the buyer cannot undo. No social engineering of staff outside a pre-agreed, written scope (and never against staff who haven’t been informed at the leadership level). No testing of systems the buyer does not own — third-party SaaS, partner APIs, infrastructure operated by a hosting provider — even when those systems are reachable from the in-scope environment.
The Rules of Engagement document captures every exclusion plus the test windows, the authorisation tree, the incident contact, the data-handling rules, and the escalation contract. It is signed by both sides before any active testing begins. Skipping it isn’t a faster engagement; it’s an engagement with no legal floor underneath it.
A vendor who promises to “test everything” without a signed ROE is offering to take on your legal liability and theirs, on a handshake. Do not sign that engagement. The exclusions in the ROE are what make the rest of the work defensible.
Five questions to answer before issuing the RFP or SOW
Treat this as the buyer-side scoping checklist. If you can’t answer all five, you’re not ready to issue the RFP — and any vendor who quotes anyway is pricing a guess.
1. What asset are you actually trying to test? Is it a single application or a portfolio of services? Are you testing the external attack surface, internal lateral movement, or both? Are you testing production, a production-equivalent staging environment, or a freshly-provisioned test instance? The asset shapes the scope and the cost; vague answers here mean vague answers everywhere else.
2. What’s the engagement trigger? Customer security questionnaire, audit, post-incident review, new feature launch, ISO 27001 or SOC 2 or DORA or NIS 2 evidence cycle. The trigger determines who reads the report, what artefacts they need, and what the engagement has to defend against six months later. Different triggers, different scopes.
3. What’s out of scope, and is that documented? Production database mutation. Denial-of-service. Social engineering outside a pre-agreed window. Third-party SaaS you don’t own. If the exclusions only live in the head of one person on your side, they don’t exist as a contract.
4. What’s the timeline? A compressed timeline produces a scanner-driven report. An honest engagement runs five to fifteen working days of active testing for a typical web application plus API; multi-service, cloud-infrastructure, or red-team scopes take longer. If the vendor agrees to a two-day pen test, what you’ll get is two days of scans.
5. Who will read each artefact? The engineering team reads the Technical Report. The CISO and board sponsor read the Executive Risk Brief. The platform or SRE team owns the Action Plan. If you can’t name the human who will receive each artefact, you’re scoping a PDF, not an engagement.
If you want help working through these five, book a scoping call. The conversation is shorter than the document above.
Close
Three artefacts, five scoping questions, one retest contract. That’s the shape of a useful engagement.
At HackingByte we run engagements senior-led, with reproduction-grade evidence, attack-path narratives instead of scanner exports, and a retest cycle that closes the loop on every Critical and High finding. The methodology page walks through how we work end-to-end. If you’re scoping your first paid penetration test or replacing a vendor whose last engagement underdelivered, the contact form is the shortest path to a conversation.
No fear marketing, no scare statistics, no certifications we don’t hold. Just the work, done seriously.