A standard third-party questionnaire was built for software that does one predictable thing. AI vendors break that assumption: the same contract can put a model, an autonomous agent, and a hidden chain of sub-processors inside your environment. The result is that "who is accountable" stops being obvious, which is exactly the question a vendor risk assessment exists to answer.
This workflow keeps the answer concrete. For every vendor you name one accountable party per responsibility, and you ask for evidence that proves the vendor holds its side. Work through the steps in order. The companion vendor risk reference holds the per-category evidence lists you will pull from in steps 3 and 4.
Five-step workflow
Tier the vendor before you spend effort on it
Assessment depth should track risk, not contract value. Score the vendor on three axes and let the highest score set the tier. A cheap agent with write access to production outranks an expensive read-only analytics tool.
| Axis | Ask |
|---|---|
| Data sensitivity | Does the vendor touch regulated, confidential, or customer data, or only low-sensitivity data? |
| Autonomy | Does it take actions on your systems on its own, run with a human in the loop, or only return information? |
| SRF depth | How deep does it reach: app features at L3, the platform at L4, or the model itself at L5? |
Tier 1 (critical) if any axis is high: full assessment and a contractual responsibility matrix. Tier 2 (elevated) for business data or human-in-the-loop app features. Tier 3 (standard) for low-sensitivity, bounded, non-autonomous tools.
Place the vendor on the SRF
Match the vendor to one of the seven categories and read off the SRF layers it touches and the operating model it implies. The layer tells you which responsibilities are even in play; many failed assessments come from asking platform questions of a model provider, or model questions of a reseller.
A single vendor can occupy more than one category. A SaaS product with an autonomous agent and its own hosted model spans L3, L4, and L5. Assess each layer it reaches rather than forcing one label.
Split the accountability, one owner per responsibility
For each responsibility in the category, decide who owns it: the vendor or you. "Shared" is not a valid final answer. Where a duty looks joint, break it into the part the vendor controls and the part you control, and assign each part to one owner. This split becomes a column in your workpaper and settles disputes before an incident, not during one.
- Write the vendor's duties as commitments you can verify, not as marketing claims.
- Write your own duties down too. Tenant data classification, identity, override configuration, and output monitoring almost always stay with you.
- Flag any responsibility neither side will own. That gap is the finding.
Demand evidence, not assertions
Pull the evidence list for the category from the reference page and send it as the ask. Match each attestation to the risk it actually covers: SOC 2 proves security controls operated, but says nothing about model behavior, bias, or autonomy. ISO 42001 is the signal that a vendor runs a governed AI program. For Tier 1 vendors, insist on evidence of controls operating over time rather than a single annual snapshot.
The AI-specific asks are the ones checklists miss: prompt-injection and indirect-injection defenses, model and weight provenance with an AIBOM, agent permission and override controls, training-data provenance and IP indemnity, and the full sub-processor chain with data residency.
The workpaper below carries every category's asks with columns for the response, the evidence received, and a pass or fail decision.
Write the boundary into the contract
An assessment that lives in a spreadsheet expires the day the vendor ships a new model version. Move the conclusions into the agreement so they survive change. The responsibility matrix, the evidence cadence, and the AI-specific commitments all belong in contract terms, not just in the security review.
- Attach the responsibility matrix and require notice when the vendor changes models, sub-processors, or autonomy defaults.
- Bind data-use limits: tenant data is not used to train shared models, with deletion certificates on exit.
- Allocate liability for AI-generated harm and secure IP indemnity for model output.
- Set the evidence cadence: continuous or annual, and the right to re-assess on material change.
Re-tier on every material change. A vendor that adds an autonomous agent or starts hosting its own model has moved up the stack and earns a fresh assessment, whatever its last score was.