Penetration Testing Buyer Guide

You’re scoping your first paid penetration test. The trigger is usually one of five — a customer security questionnaire that won’t go away, an ISO 27001 or SOC 2 or DORA or NIS 2 evidence cycle, a board ask, a post-incident review, or a vendor whose last engagement underdelivered and a quiet decision to switch. Whatever the trigger, the buyer-side problem is the same: most teams don’t know what a useful penetration test should produce.

The market is full of vendors who deliver long reports of low-severity scanner noise, miss the attack path that actually compromises the business, and don’t retest what they recommend you fix. This guide walks through the artefacts a useful pen test should produce, how to scope honestly, and how to tell the difference between vendors who give you proof of risk and vendors who sell you a PDF. Written from the perspective of the team running the engagement, not the team selling it.

How to choose a penetration testing provider

In one line: demand proof, not a PDF. A useful penetration testing provider scopes the engagement honestly, delivers three distinct artefacts — a Technical Report, an Executive Risk Brief, and an Action Plan — proves every finding with reproduction steps, ranks the work by business impact rather than raw CVSS score, and retests what it tells you to fix. Choosing a penetration testing provider comes down to checking those signals before you sign, not comparing day-rate quotes.

Ask every candidate the same penetration testing questions, and demand the answers in writing in the RFP or SOW — not just in the sales deck. If you’re scoping a first paid penetration testing engagement or replacing a vendor whose last one underdelivered, the six quality signals below and the five scoping questions at the end give you the full grid.

A useful penetration test produces three artefacts, not one report

Three artefacts of a useful engagement — a Technical Report for the engineer who ships the fix, an Executive Risk Brief for the CISO and board, and an Action Plan for the platform/SRE team.

Most pen test reports are written for one audience — the engineer who will fix the findings. The security leadership who has to explain the risk and the platform team who has to plan the remediation are left to extract what they need from a document not written for them. That extraction step is where engagement value leaks.

The Technical Report is the engineering artefact. It contains every finding, the reproduction steps, the affected scope, the CVSS score with the business-impact reasoning, and the remediation guidance. The audience is the engineer who will reproduce the issue, write the fix, and ship it. The question it answers: what exactly is broken and how do I close it?

The Executive Risk Brief is the leadership artefact. Six to ten pages, no jargon, no findings table. The audience is the CISO, the board sponsor, and any auditor or customer reviewer reading the engagement at arm’s length. The question it answers: what risk did this engagement surface, and what does the business need to decide?

The Action Plan is the operations artefact. It tracks every Critical and High finding through Open → In Progress → Remediated → Retested → Closed. The audience is the platform or SRE team that owns remediation scheduling. The question it answers: what work do we sequence next, and who owns it?

We walk through how the three artefacts are produced on the methodology page. One artefact for three audiences is the most common reporting failure in the market — each audience needs a different level of abstraction and a different question answered.

The engagement lifecycle — what each stage looks like from the buyer’s side

The engagement lifecycle — six stages from Scoping to Closure, each producing an artefact the buyer can ask for: a scoping document, a signed ROE, working evidence, the three report artefacts, a sequenced Action Plan, and a Closure Memo.

A useful engagement runs through six lifecycle stages, and each one produces something — a document, a decision, a deliverable. If a vendor compresses a stage or skips it, the gap shows up in the report.

Scoping. Before any quotation, you should be asked five questions: what asset are we testing, what’s the engagement trigger, what’s out of scope, what’s the timeline, and who will read each artefact. A vendor who quotes without those answers is pricing a guess. Output: a scoping document the buyer signs off.
Pre-engagement. The Rules of Engagement (ROE), defining the scope of work, is drafted — authorisation, contact tree, test windows, escalation contract, data-handling rules. Your legal team reviews it. Active testing does not start until the ROE is signed on both sides. Output: a signed ROE.
Active testing. The work happens. You should receive a daily standup or written status during the active testing window — including a “no critical findings observed today” note when that’s the case, so the absence of news isn’t ambiguous. Critical findings escalate within four hours, out-of-band of the regular cadence. Output: working evidence + draft findings.
Reporting. The three artefacts — Technical Report, Executive Risk Brief, Action Plan — are drafted, internally peer-reviewed, then walked through with you. The walkthrough is mandatory, not optional. Output: the three artefacts, plus a recorded walkthrough.
Action planning. Each Critical and High finding gets a remediation owner, a target date, and a verification approach agreed on with your platform or SRE team. This is a working session, not a handover email. Output: a sequenced, owner-assigned Action Plan.
Closure. Every Critical and High finding is retested after remediation. The Action Plan moves each finding from Open → In Progress → Remediated → Retested → Closed. Findings are not certified closed by the vendor unless they have been retested. Output: a Closure Memo that lists the final state of every finding.

Compressed Scoping or skipped Action Planning is the most common quality defect in the market — and the easiest to detect before signing the SOW. Ask for the scoping document, the ROE template, and the Action Plan template before the quotation lands. A vendor who can’t share all three is selling you a PDF.

Reproduction steps are not optional

For every finding, the report should contain five elements: the preconditions (state of the system, user role, or authentication context required), the exact request or payload used to trigger the issue, the expected response, the observed response, and the remediation hypothesis. That’s the reproducibility contract.

The reason it matters is not academic. Six months after the engagement closes, a third party — an auditor, a regulator under DORA or NIS 2, a prospective customer’s security review team — will read the report without the original tester present. If the finding cannot be reproduced from the report alone, it cannot be defended.

Run this test on any prior pen test report you have access to: open the top three Critical or High findings, hand them to a competent engineer on your team, and ask if they can reproduce the issue without contacting the testing firm. If the answer for any of the three is no, the report has a reproducibility defect. A finding without reproduction steps is an opinion. A finding with reproduction steps is evidence.

CVSS plus business impact — why severity inflation is a quality defect

CVSS v3.1 is the technical baseline. It scores exploitability (attack vector, complexity, privileges required, user interaction), scope (does the issue cross a security boundary), and impact (confidentiality, integrity, availability). The output is a number between 0.0 and 10.0, mapped to named tiers — Low, Medium, High, Critical.

CVSS alone is not enough. A CVSS-7.5 finding on a sandboxed test instance is not the same risk as a CVSS-7.5 finding on the production payment processor. Same number, different priority. A useful report applies a business-impact overlay — asset criticality, blast radius, exploitability in the buyer’s actual environment — and lets the overlay shift the priority away from what the CVSS tier alone would suggest.

Severity inflation is a sales tactic, not a security signal. Vendors who score every SQL injection finding as Critical regardless of context are signalling that the scoring is for invoice justification, not remediation prioritisation. The same pattern shows up in vendors who refuse to score anything Low — “if we found it, it must matter” — which inflates the report’s apparent value while breaking your engineering team’s ability to triage.

The scoring rubric we use is published on the methodology page. CVSS as the technical floor, business impact as the prioritisation overlay, the result documented per finding so the reasoning survives the report being read in isolation. If every finding in a report is Critical or High, the prioritisation is broken — and the report is shipping you alarm, not ranked work.

What happens when we find something that can’t wait until the report

Critical findings — active exploit paths into production, exposed credentials, live remote code execution, exposed sensitive data — are reported within four hours of discovery, out-of-band of the regular reporting cadence. A phone call, an encrypted message to the named incident contact in the ROE, a written confirmation within the same window. The work continues in parallel with the buyer-side response.

The reason for the four-hour SLA is operational. A critical finding sitting unread in a draft report for three weeks is a liability the buyer is paying for. The discovery doesn’t make the issue safer; it makes the silence riskier.

Ask every vendor: “What’s your critical-finding SLA? Is it written into the SOW?” If the answer is vague, or the SLA only appears in the marketing material and not in the contract, the vendor is reserving the right to bury bad news. The SLA belongs in the SOW.

Chained findings matter more than long lists of Mediums

A scanner-driven report flags individual findings from automated scans. An information disclosure here, a misconfigured CORS header there, a stale dependency over there. Each one rated Low or Medium on CVSS, each one buried in a long table. The buyer reads the table, decides the report is mostly noise, and the engagement is over.

A useful report does something different: it tells you how the individual findings chain together to compromise a real business asset, written as a paragraph-format narrative with reproduction steps for each step in the real world attack chain.

Take a plausible chain. An information disclosure in the support portal leaks the internal staging environment’s hostname. The staging environment has weak authentication that the production environment was patched against four months ago but staging never was. Staging shares production secrets through its environment variables — a legacy convenience nobody removed. From staging, the production database read-replica is reachable on the internal network. Four findings, each individually CVSS-Medium or below; together, a Critical-business-impact path that exfiltrates customer data without touching the production application.

Scanners don’t do this. They flag the four findings individually and let the buyer connect the dots. Connecting the dots is the analyst’s job — the part of pen testing that cannot be automated, the part the buyer is actually paying for.

A long list of Mediums with no narrative is a scanner export. A short list of attack paths with reproduction steps is a pen test.

No closure without retest

The Action Plan moves every Critical and High finding through five states: Open → In Progress → Remediated → Retested → Closed. The vendor does not certify closure on its own authority; it certifies closure after retesting the remediation in the same environment and against the same reproduction steps that produced the original finding.

The reason this matters is that remediation is the part of the engagement most prone to silent failure. A fix that compiles and passes the existing test suite is not necessarily a fix that closes the attack path. Engineering teams under deadline pressure routinely close a finding with a partial mitigation — a WAF rule that blocks the specific payload from the report, a one-off input validation that doesn’t generalise. The retest is the only verification that the buyer was actually given the security outcome they paid for.

A vendor who issues a final report without a retest cycle, or who certifies closure based on a buyer-supplied screenshot of the new code, is shipping you compliance theatre rather than security. Ask before you sign: “What’s your retest process? Is the retest priced into the engagement, or is it a separate line item I have to authorise after the fact?” The answer determines whether the engagement closes with verified outcomes or with a PDF.

Out-of-scope by design — what a pen test isn’t

A useful penetration test has explicit exclusions, defined in writing before any active testing starts. The constraints exist because the alternative — “test everything, however you want” — carries legal exposure for both sides, breaks scope discipline, and gives the buyer no way to evaluate the outcome.

The standard exclusions are concrete. No denial-of-service or volumetric testing against production. No destructive payloads — no data deletion, no encryption, no actions that mutate state in ways the buyer cannot undo. No social engineering of staff outside a pre-agreed, written scope (and never against staff who haven’t been informed at the leadership level). No testing of systems the buyer does not own — third-party SaaS, partner APIs, infrastructure operated by a hosting provider — even when those systems are reachable from the in-scope environment.

The Rules of Engagement document captures every exclusion plus the test windows, the authorisation tree, the incident contact, the data-handling rules, and the escalation contract. It is signed by both sides before any active testing begins. Skipping it isn’t a faster engagement; it’s an engagement with no legal floor underneath it.

A vendor who promises to “test everything” without a signed ROE is offering to take on your legal liability and theirs, on a handshake. Do not sign that engagement. The exclusions in the ROE are what make the rest of the work defensible.

Five questions to answer before issuing the RFP or SOW

Treat this as the buyer-side scoping checklist — and the core of what to ask a penetration testing company once you start evaluating candidates. If you can’t answer all five, you’re not ready to issue the penetration testing RFP, and any vendor who quotes anyway is pricing a guess.

What asset are you actually trying to test? Is it a single application or a portfolio of services? Are you testing the external attack surface, internal lateral movement, or both? Are you testing production, a production-equivalent staging environment, or a freshly-provisioned test instance? The asset shapes the scope and the cost; vague answers here mean vague answers everywhere else.
What’s the engagement trigger? Customer security questionnaire, audit, post-incident review, new feature launch, ISO 27001 or SOC 2 or DORA or NIS 2 evidence cycle. The trigger determines who reads the report, what artefacts they need, and what the engagement has to defend against six months later. Different triggers, different scopes.
What’s out of scope, and is that documented? Production database mutation. Denial-of-service. Social engineering outside a pre-agreed window. Third-party SaaS you don’t own. If the exclusions only live in the head of one person on your side, they don’t exist as a contract.
What’s the timeline? A compressed timeline produces a scanner-driven report. An honest engagement runs five to fifteen working days of active testing for a typical web application plus API; multi-service, cloud-infrastructure, or red-team scopes take longer. If the vendor agrees to a two-day pen test, what you’ll get is two days of scans.
Who will read each artefact? The engineering team reads the Technical Report. The CISO and board sponsor read the Executive Risk Brief. The platform or SRE team owns the Action Plan. If you can’t name the human who will receive each artefact, you’re scoping a PDF, not an engagement.

If you want help working through these five, request a scoping call. The conversation is shorter than the document above.

Close

Three artefacts, five scoping questions, one retest contract. That’s the shape of a useful engagement.

At HackingByte we run engagements senior-led, with reproduction-grade evidence, attack-path narratives instead of scanner exports, and a retest cycle that closes the loop on every Critical and High finding. The methodology page walks through how we work, stage by stage. If you’re scoping your first paid penetration test or replacing a vendor whose last engagement underdelivered, the contact form is the shortest path to a conversation.

No fear marketing, no scare statistics, no certifications we don’t hold. Just the work, done seriously.

Frequently asked questions

How do I choose a penetration testing provider?

Compare providers on proof, not on day rate: honest scoping, three distinct artefacts (Technical Report, Executive Risk Brief, Action Plan), reproduction steps for every finding, prioritisation by business impact rather than raw CVSS score, a written critical-finding escalation SLA, and retest included. Ask every candidate the same penetration testing questions, and demand the answers in writing in the RFP or SOW — not just in the sales deck.

What does a useful penetration test produce?

Three artefacts — a Technical Report (engineering audience), an Executive Risk Brief (CISO + board), and an Action Plan (platform/SRE team) — plus retest verification for every Critical and High finding. The artefacts are paragraph-format, paired with reproduction-grade evidence for every finding.

How long does a penetration test take?

For a typical web application + API, 5-15 working days of active testing plus 5-10 working days of analysis and reporting. Compressed timelines (a 2-day pen test) signal a scanner-driven engagement, not a manual one. Multi-service, cloud-infrastructure, or red-team scope takes longer.

What’s the difference between a pen test and a vulnerability scan?

A vulnerability scan finds individual issues a scanner can detect (known CVEs, misconfigurations, outdated software). A penetration test chains findings into attack paths that compromise real business assets, reproduces each finding manually, and ranks the work by business impact. A useful pen test report contains attack-path narratives; a scan report contains a list.

How should pen test findings be scored?

CVSS v3.1 as the technical baseline, plus a business-impact overlay that accounts for asset criticality, blast radius, and exploitability in the target environment. Severity inflation (every finding scored Critical) is a quality defect, not a sales tactic — it breaks remediation prioritisation.

What is a critical-finding escalation SLA?

A written contract that critical findings (active exploit path, exposed credentials, live remote code execution on production) are reported within 4 hours of discovery, out-of-band of the regular reporting cadence. The SLA should be written into the SOW. If it isn’t, the vendor reserves the right to bury bad news until the final report.

Do penetration tests include retest verification?

A useful one does. Every Critical and High finding is retested after remediation; the Action Plan tracks each finding from Open → In Progress → Remediated → Retested → Closed. Vendors who certify closure without retest are shipping compliance theatre.

What should be excluded from a penetration test scope?

By design, denial-of-service testing on production, destructive payloads, social engineering of staff outside a pre-agreed scope, and testing of systems the buyer does not own. The Rules of Engagement document defines exclusions in writing, signed before any active testing starts.

Penetration testing buyer guide — what a useful pen test actually produces