How to Audit AI Privacy Claims: Incognito Isn’t Private

A repeatable AI privacy audit method for verifying incognito claims, retention, training reuse, telemetry, and third-party sharing.

Perplexity’s alleged “incognito” dispute is a useful reminder for security teams: a privacy label is not a privacy guarantee. If an AI vendor says chats are private, not retained, not used for training, or not shared with third parties, the only defensible response is to verify those claims with a repeatable vendor assessment. That means checking product behavior, network telemetry, contractual terms, and operational controls—not just reading a marketing page. In practice, the same discipline you’d use for regulated software can and should be applied to consumer AI tools, internal copilots, and “private mode” interfaces.

This guide turns that mindset into a practical AI privacy audit workflow engineers and privacy officers can use to validate data retention, training reuse, telemetry, and third-party sharing. Along the way, I’ll show where teams commonly miss hidden data flows, why “incognito” is often a UI promise rather than a technical control, and how to build evidence that stands up in procurement, security review, and compliance audits. If you’ve ever had to evaluate a cloud service before rollout, the same rigor applies here—especially when the vendor’s product depends on collecting prompts, context, and usage patterns at scale.

One reason this issue keeps recurring is that AI products tend to blur the line between product functionality and model improvement. That creates a special risk profile for organizations already trying to manage shadow IT, sensitive prompts, and compliance obligations. If you’re modernizing internal tooling, it helps to think in terms of architecture and operational controls, similar to the decision-making in modernizing a legacy app without a big-bang rewrite. The goal is not to ban AI; it’s to evaluate it like any other vendor whose systems can ingest confidential data, route it to subprocessors, and keep records longer than users expect.

1) Start With the Claim: What Exactly Is the Vendor Promising?

Separate marketing language from testable assertions

The first step in any privacy validation exercise is to rewrite vague claims into testable statements. “Incognito” should be decomposed into specific assertions such as: prompts are not stored beyond session completion, conversation logs are excluded from training pipelines, telemetry is minimized, and third-party analytics do not receive raw content. This is the same discipline used in regulated vendor reviews, where every marketing statement must map to a control, a contract clause, or a technical test. If the vendor cannot tell you which subsystem enforces the promise, you should assume the promise is aspirational.

Define the data classes before you test the system

Not all AI data is equally sensitive, and you cannot audit privacy properly unless you classify what is being sent. At minimum, separate prompts, files/uploads, system prompts, metadata, account identifiers, billing data, and diagnostic logs. In many environments, the prompt itself is the sensitive record, but metadata can be just as revealing because it can expose project names, IP addresses, user behavior, and timing patterns. A good review process often resembles a data inventory exercise, similar to how teams centralize assets in a data-platform-style inventory before trying to secure them.

Write down the privacy claims in plain English

Before you start packet capture or contractual redlining, create a one-page claim register. Example rows might include: “private chats are not used to train foundation models,” “deleted chats are removed within 30 days,” or “telemetry is limited to diagnostics and fraud prevention.” Then assign each claim an evidence type: UI behavior, network logs, vendor documentation, DPA, subprocessors list, or technical configuration. This prevents the audit from drifting into vibes-based evaluation, which is especially common when procurement teams rely too heavily on an RFP response instead of verifying the actual service behavior.

2) Build an Audit Methodology That Repeats

Use a four-layer model: UI, network, storage, contract

The most reliable AI privacy audit methodology works across four layers. First, verify the user interface and settings, because many vendors expose toggles that change retention or training behavior. Second, inspect network telemetry to see what leaves the client and where it goes. Third, test storage behavior by submitting known markers, then checking whether they persist after deletion or expiry windows. Fourth, compare all of that against the contract stack: MSA, DPA, privacy policy, subprocessors, and security addendum. This layered model is more robust than asking a sales engineer for assurances, and it makes your conclusions defensible if legal or compliance later asks how the vendor was approved.

Set up a test account and a clean environment

Auditing privacy claims requires discipline in the lab. Use a dedicated test tenant, a fresh browser profile, and a controlled workstation or VM so cookies, extensions, and corporate SSO artifacts do not contaminate results. If the AI product has mobile apps or desktop clients, test those separately because their telemetry patterns often differ from the browser version. For teams that already maintain structured test procedures for service onboarding, this should feel similar to a hosting scorecard or acceptance checklist: you are measuring real behavior, not assuming consistency across surfaces.

Establish pass/fail criteria before testing

One of the most common mistakes is to inspect traffic first and decide what it means later. That approach produces confirmation bias. Instead, define clear acceptance thresholds: for example, no raw prompt text may be sent to analytics endpoints; session tokens may be transmitted but not message contents; deleted conversations must disappear from search within a specified SLA; and training opt-out must be effective by default for enterprise tenants. If the vendor cannot meet a threshold, you can still document compensating controls, but the result should be explicit rather than ambiguous.

3) Test Data Retention, Deletion, and Replay Behavior

Submit unique canary data

To validate data retention, send unique strings you can later search for in network logs, the UI, or exported data. Use canary phrases that are unlikely to appear elsewhere, such as random UUID-like markers or a short synthetic sentence with a date stamp. Then delete the conversation and wait for the vendor’s documented retention window, if any. The point is not just to confirm the delete button works; it’s to determine whether “delete” means immediate erasure, delayed purge, or only removal from the visible interface.

Look for secondary copies and asynchronous processing

AI systems frequently create hidden copies through caching, indexing, abuse review pipelines, and observability tools. Even when a chat is removed from the user interface, the data may remain in operational logs, model safety queues, or support tooling. That’s why you must ask whether the vendor supports true deletion across all processing layers or merely content hide-and-reveal behavior. This is where a disciplined review resembles other high-stakes software workflows, like approval workflows for signed documents, where the question is not whether a button exists but whether state changes propagate everywhere they must.

Check retention defaults and overrides

Some vendors offer enterprise controls that shorten or disable retention, but the defaults can still be dangerous if you do not explicitly configure them. Verify whether retention settings apply globally, per workspace, per user role, or only to future content. Also check whether support tickets, abuse investigations, and billing records are carved out of the retention promise, because that carve-out is often where sensitive context survives longest. If deletion only applies to the end-user UI while backend logs persist for 180 days, the “private” claim is materially weakened even if the product remains usable.

4) Determine Whether Prompts Are Used for Training Reuse

Differentiate model training, fine-tuning, and human review

The phrase training reuse is often overloaded. A vendor may say they do not train the core model on your data, yet still use prompts for human review, safety tuning, or evaluation datasets. Those are different pathways, but from a privacy and confidentiality standpoint they all matter. You need to know whether data is excluded from training entirely, whether it can be used in pseudonymized or aggregated form, and whether human reviewers can see raw content. If the answer depends on the service tier, record that explicitly because procurement frequently signs up for one tier while users adopt another.

Ask for technical and contractual evidence

Do not accept a policy sentence by itself. Ask for the mechanism: is opt-out enforced at the ingestion layer, or does the vendor simply flag records after collection? Are data pipelines segmented so enterprise tenant content cannot enter consumer training corpora? Is there a documented suppression list, or only a broad promise in the privacy policy? This is the same sort of contract diligence you’d apply when evaluating whether an AI vendor is fit for sensitive deployments, a process closely related to vendor assessment for regulated environments.

Test for “training by proxy” via feedback buttons

Vendors sometimes avoid saying “we train on your chats,” but they still route thumbs-up/down feedback, annotations, or support escalations into improvement pipelines. During your audit, click feedback controls, submit a bug report, and inspect whether those artifacts are tagged as reviewable content. Ask whether support personnel can attach transcripts to internal cases and whether those cases are excluded from training or only from model fine-tuning. In regulated settings, this distinction matters because confidential details can leak into systems that are less controlled than the primary product.

Capture network traffic from the client

Telemetry is the invisible layer that often breaks privacy promises. Capture browser and app traffic with your preferred proxy, endpoint agent, or firewall logs, and identify every domain the client contacts during login, chat submission, deletion, export, and settings changes. Look not just for the main API endpoint but also analytics, error monitoring, content delivery, fraud detection, and session replay services. In a modern AI stack, telemetry may be spread across several vendors, so your audit should treat each destination as a distinct processing relationship rather than assuming the primary AI provider is the only party involved.

Identify third parties by function, not just name

When you find a third-party destination, classify the purpose: product analytics, crash reporting, A/B testing, identity management, abuse detection, or customer support. Then determine whether any raw prompt text, identifiers, or device fingerprints are included. This matters because a vendor can truthfully claim they do not “share data” in a marketing sense while still transmitting prompt content to a subprocessors chain for operational tasks. For a broader perspective on hidden data dependencies, the investigative mindset described in company database analysis is a good model: you are trying to uncover relationships that are not obvious from the front-end experience.

Compare the privacy policy to observed traffic

Once you have a packet map, compare it line by line with the privacy policy and subprocessors page. If the policy says “we share only limited metadata,” but the client sends prompt snippets to a session replay vendor, the discrepancy is material. If the policy names one analytics provider but your logs show three additional endpoints, you may have an issue of stale disclosure or undisclosed subprocessing. Either way, the vendor should be able to explain the data flow, and if they cannot, that is itself a risk finding.

Audit Area	What to Check	Evidence Source	Common Failure Mode	Risk Level
Retention	How long prompts, logs, and backups persist	Policy, admin console, deletion tests	UI deletes only the visible chat	High
Training reuse	Whether content enters model improvement pipelines	DPA, product docs, support responses	Enterprise opt-out is partial or delayed	High
Telemetry	Endpoints, payloads, identifiers, and replay tools	Proxy logs, firewall logs, SDK docs	Prompt fragments sent to analytics	High
Third-party sharing	Subprocessors, support vendors, cloud platforms	Subprocessor list, contract exhibits	Undocumented vendors in client traffic	Medium-High
Deletion integrity	Whether deletion reaches backups and derived stores	Deletion tests, legal terms, engineering response	Soft-delete with indefinite backup retention	High

6) Perform Contract Checks That Match the Technical Reality

Read the DPA like a systems engineer

A privacy assessment is incomplete until the legal documents align with observed behavior. Read the DPA, MSA, and privacy addendum as if they were architecture specs: what data is processed, for what purpose, by whom, and under what retention limits? Make sure the definitions of “customer data,” “service data,” and “usage data” are not broad enough to swallow everything you care about. If the vendor reserves the right to use service data for “improving products,” that language may be too broad unless it explicitly excludes prompt content or enterprise data.

Check subprocessors, transfer mechanisms, and breach notice terms

Third-party sharing is often concealed in contract appendices. Verify whether the vendor publishes a current subprocessor list, how changes are notified, and whether you have a right to object. For cross-border data transfers, confirm the legal mechanism and whether it actually applies to the AI workload in question, not just the parent company’s generic stack. You should also review breach notification timelines, because an AI product that stores sensitive prompts without clear incident response obligations can create a reporting problem long before any technical exploit appears.

Demand written answers to ambiguous terms

If the contract says the vendor may “retain logs for security and debugging,” ask which logs, for how long, and whether prompts can appear in them. If the policy says “we may share with trusted partners,” ask who the partners are and what data they receive. Ambiguous clauses are not inherently bad, but they are not approval-ready until they are narrowed or backed by operational evidence. A strong internal review process often mirrors the same discipline used in versioning document automation templates: precision matters because small language changes can create real downstream differences.

7) Build a Repeatable Vendor Assessment Workflow

Turn the audit into a checklist and scorecard

Once you’ve validated one AI platform, convert the process into a reusable checklist. Include sections for product scope, data categories, retention, training reuse, telemetry, subprocessors, deletion behavior, and contract alignment. Assign weighted scores if your organization needs a quick go/no-go decision, but preserve narrative findings for nuance. If you need a model for structured evaluation, a practical AI vendor assessment checklist is a solid starting point, especially when you need to compare several tools under the same rubric.

Document exceptions and compensating controls

Not every finding should trigger a rejection. Some tools may retain logs longer than ideal but allow enterprise-wide redaction, access restrictions, and data localization controls. Others may use telemetry extensively but only in pseudonymized form with strict subprocessors. The key is to document the exception clearly, assign an owner, and define compensating controls such as restricted prompt use, redaction training, or technical gateway policies. This keeps the review practical instead of dogmatic and helps security teams support business adoption without pretending risk disappears.

Feed results into procurement and policy enforcement

A vendor assessment is only useful if it changes behavior. Feed your findings into procurement, legal review, security architecture, and acceptable-use policy. If the tool fails the audit, block rollout or limit it to low-risk use cases. If it passes with caveats, record the caveats in the approved-tool register and set a re-review date. In larger programs, this governance rhythm should feel similar to how operations teams manage product changes in operate-vs-orchestrate decision frameworks: standardize the path, but keep room for exceptions when the risk profile changes.

8) Practical Red Flags and What They Usually Mean

“Private mode” is only a UI label

Some products present incognito-like modes that stop chat history from showing in the account UI but do not change backend retention or operational processing. That can still be useful for local device hygiene, but it is not a privacy guarantee. If the vendor cannot show you the underlying state change in logs, settings, or contract terms, treat the label as a convenience feature, not a confidentiality control. The same caution applies whenever a product frames a feature as “protection” without specifying whether it affects storage, access, or reuse.

Telemetry is necessary for abuse defense—but should be minimized

It is reasonable for AI systems to collect some telemetry for anti-abuse, rate limiting, fraud detection, and service reliability. The red flag is when that telemetry includes content payloads, stable identifiers, or session replay data beyond what is necessary for defense. Teams should insist on minimization, short retention, and role-based access to logs. If you want a helpful analogy, think of a diagnostics channel as a narrowly scoped maintenance window, not a permanent recording system.

Policy language lags implementation

One of the most common findings in an AI privacy audit is stale documentation. The product has changed, new analytics vendors were added, or a training pipeline was modified, but the privacy policy still reflects an older architecture. That mismatch can create real compliance exposure and, just as importantly, it destroys trust. When policy and practice diverge, the safest assumption is that the undocumented behavior is what actually governs data handling.

Pro Tip: If you can’t explain a vendor’s data lifecycle in one paragraph—where prompts enter, where they are stored, who can see them, when they are deleted, and whether they can be reused for training—you probably do not understand the risk well enough to approve the tool.

9) A Practical Playbook for Privacy Officers and Engineers

For privacy officers: focus on accountability

Privacy officers should aim to make the audit defensible, not just thorough. Keep a dated record of claims, evidence, exceptions, and approvals. Require written confirmation for ambiguous points, and ensure the DPA and subprocessors list reflect the actual service. For organizations with sensitive customer data, the threshold for approval should include not only policy compliance but operational transparency and termination rights if the vendor materially changes processing behavior.

For engineers: focus on observability

Engineers should make the invisible visible. Use endpoint captures, browser dev tools, proxy logs, and controlled canary inputs to trace the system’s behavior. If possible, automate repeat checks so that a new vendor build or browser update cannot silently change telemetry or retention characteristics. This kind of rigor is similar to the discipline used in AI wearables engineering, where battery, latency, and privacy all have to be measured rather than assumed.

For procurement: make privacy part of selection criteria

Procurement should not treat privacy as a legal checkbox at the end of the process. Make it a scored requirement alongside security, supportability, and cost. Require vendors to answer specific questions about retention, training exclusion, telemetry, and third-party sharing before shortlisting. That changes the conversation from “Can we trust this promise?” to “Can you show us the control that makes the promise true?”

10) FAQ and Closing Guidance

FAQ: How do I know if an AI vendor’s incognito mode is real?

Check whether the mode changes backend retention, training reuse, and logging behavior—not just the UI history list. You need evidence from policy, admin settings, and network behavior to confirm that the mode does more than hide chats locally.

FAQ: What evidence should I collect during an AI privacy audit?

Collect screenshots of settings, export the privacy policy and DPA, capture network traffic, record retention tests with canary text, and document any vendor support responses. Save timestamps and version numbers so your evidence can be reproduced later.

FAQ: Is telemetry always a privacy problem?

No. Telemetry can be legitimate for security, reliability, and abuse prevention. The issue is whether it is minimized, whether it contains content payloads, how long it is retained, and whether third parties receive more data than necessary.

FAQ: Can I rely on a vendor’s “no training” statement?

Only if it is backed by contract language, a documented technical control, and a process that prevents data from entering model-improvement pipelines. If the vendor cannot show how the exclusion works, treat the statement as incomplete.

FAQ: What’s the biggest mistake teams make in AI vendor assessment?

The biggest mistake is trusting policy text without testing the actual product. The second biggest mistake is failing to align legal terms with observed telemetry and retention behavior. A real assessment needs all three: contracts, technical evidence, and governance.

Ultimately, “incognito” is not a substitute for privacy engineering. If your organization plans to use AI tools with user data, customer data, source code, or internal documents, treat the vendor as you would any other high-impact platform: verify retention, confirm training boundaries, inspect telemetry, map every third party, and document the controls that make the product safe enough to use. If you build this into your standard intake process, you’ll avoid the common trap of discovering a privacy problem only after a lawsuit, incident, or employee leak forces the issue into the open.

For teams building a durable program, the best next step is to combine product review with governance and periodic revalidation. That is especially true when the vendor updates features or changes infrastructure, because privacy guarantees can decay faster than most procurement cycles. If you want to extend this work into a broader governance model, revisit a structured AI vendor assessment, compare the result against your internal approval workflows, and treat each major product update as a new audit trigger. In other words: assume nothing, verify everything, and keep the evidence.

AI in Wearables: A Developer Checklist for Battery, Latency, and Privacy - A practical framework for balancing performance and data protection in connected devices.
A Checklist for Evaluating AI and Automation Vendors in Regulated Environments - A strong companion guide for formal procurement and risk review.
Benchmarking Web Hosting Against Market Growth: A Practical Scorecard for IT Teams - Useful for learning how to compare technical offerings with a repeatable scorecard.
How to Build an Approval Workflow for Signed Documents Across Multiple Teams - A useful model for documenting approvals and exception handling.
How to Modernize a Legacy App Without a Big-Bang Cloud Rewrite - Helpful for teams thinking about secure migration and controlled change management.