Sep 28, 2025·8 min read

Test data rules for AI vendors in staging and demos

Test data rules for AI vendors help teams keep demos and staging useful while removing customer names, account data, logs, and screenshots.

Table of Contents

Why this gets messy fast

It usually starts with a reasonable request: "Can you share a real example?" Vendors ask because real prompts, tickets, logs, and screenshots show how your team actually works. That context helps. It also pushes people toward the fastest option, which is often live customer material.

Most teams do not share sensitive data because they are careless. They do it because a demo is tomorrow, the vendor wants more context, and nobody wants to spend two hours rebuilding an example from scratch. So someone pastes a support thread into a prompt, exports a few logs, or drops a screenshot into a slide deck.

That is when things go wrong. The risk is not just the obvious customer name in a ticket. Small details leak too. A single log line can expose an email address, internal URL, order ID, IP address, or a token shaped string that should never leave your system.

Screenshots are even trickier because people skim them. One image can reveal browser tabs, Slack names, account balances, timestamps, internal project names, and the exact page path in your app. Crop one area and you might still leave a customer email in the sidebar or a real hostname in the address bar.

Staging does not fix this by itself. Teams often treat staging as safe because it is "not production." In practice, staging often contains copied production records, old backups, or debug logs from real incidents. If someone seeded staging with live data six months ago, that same exposure can show up in today's demo.

The same few mistakes keep repeating. People paste customer text into prompts so the model can "see the tone." Engineers share raw logs because they want faster troubleshooting. Sales and product teams use screenshots from internal accounts that still show real names. And hidden details like timestamps, ticket numbers, or full URLs go unnoticed until the file is already shared.

The hard part is that useful examples often look a lot like unsafe ones. A real support ticket gives better context than a fake one. A real log shows the bug faster than a rewritten sample. Under time pressure, the line between helpful and risky gets very thin.

Once one real detail gets into a prompt, screenshot, or log bundle, it tends to spread. It shows up in follow up emails, demo docs, recordings, and chat threads. The wider reuse is usually the real problem, not the first copy.

What to strip from shared material

Start with anything that points to a real person or company. If a vendor can tell who the customer is from a prompt, log, screenshot, or sample record, the material is not clean enough.

Names, email addresses, phone numbers, street addresses, company names, and profile photos should go first. Teams usually catch those. They miss the details around them, like a support note that says "Spoke with Jenna from Acme at 4:15" or a screenshot with a signed in user menu in the corner.

IDs matter too. Account numbers, order IDs, invoice numbers, subscription references, shipment codes, ticket numbers, and internal customer IDs can all tie a sample back to a real case. Even if the vendor cannot access your systems, those values still map to real business activity. Replace them with fake but realistic values that keep the same format.

Secrets need even stricter handling. Remove API keys, access tokens, session cookies, private URLs, webhook secrets, database connection strings, and anything copied from headers or config files. One pasted log line can expose more than a whole customer profile. Teams often focus on personal data and forget machine credentials.

Free text needs a rewrite, not a simple search and replace. A real support message might mention a child's name, a clinic, a refund amount, or a city block. Redacting only the name still leaves a clear trail. Keep the problem, but change the story. "My order for 12 monitors never reached our Denver office" can become "A shipment for office equipment did not arrive at the customer site."

Places teams forget

Some leaks hide in plain sight: file names like "Acme-Invoice-48392.pdf," image metadata with author names or GPS tags, browser tabs and bookmarks in screenshots, exported logs with cookies or raw request bodies, and sample CSV headers that still use real client labels.

Screenshots need the same care as raw data. Crop hard. Avoid light blur, because blurred text can still be guessed and sometimes recovered. If a screenshot teaches the same lesson with fake data, use the fake version.

A simple rule works well: if a stranger could connect the sample to a real customer, employee, or live system, remove it or rewrite it before the session starts.

What useful test data should keep

If you strip too much, the sample stops behaving like real work. The vendor gets a clean prompt, a neat screenshot, and a tiny log file, then builds around a version of your process that does not exist. Good test data keeps the shape of the problem after you remove customer details.

Start with structure. Keep field lengths close to reality and keep the same formats people actually enter. If account IDs usually have 12 characters, keep 12 characters. If phone numbers arrive in mixed formats, leave that mess in place. The goal is not pretty data. The goal is realistic data that still shows what your system deals with every day.

Mess matters too. Real users leave blanks, misspell names, paste extra spaces, and type notes that ignore every rule you hoped they would follow. Sanitized examples should keep some of that friction. A prompt tested only on perfect inputs can look great in a demo and fail in the first week of real use.

Keep the normal flow of the system visible as well. Preserve common status changes and believable timestamps so someone reviewing the sample can follow the order of events. A support ticket that moves from "new" to "waiting on customer" to "resolved" tells a fuller story than one frozen record. Time gaps matter too. A five minute response and a three day delay create different behavior in routing, escalation, and summaries.

A small set of believable failures helps even more. Duplicate records, missing attachments, invalid dates, conflicting status updates, and partial error logs are all worth keeping if they match the problems your team actually sees. Do not invent strange edge cases just to look thorough. One or two honest failures are enough.

Screenshots follow the same rule. Replace names, emails, and IDs, but keep the flow intact. A generic screenshot still needs the real buttons, steps, and layout order, or the vendor cannot judge whether the workflow makes sense. If users usually open a ticket, add a note, upload a file, and change priority, show that path.

Teams often get this backwards. They hide the sensitive parts, then also hide the timing, sequence, and mess that make the work real. Keep those parts, and your staging review will be much closer to the truth.

How to build safe examples

Start with one real workflow, not a pile of random files. Pick a common case such as a support request, sales follow up, or bug report. Then trace every place customer data appears in that workflow. Most teams think about the prompt first and forget the log output, browser tabs, error messages, CSV exports, and screenshot corners.

A simple process keeps this under control:

List everything you plan to share: prompts, chat transcripts, logs, screenshots, sample records, exported files, and model output.
Mark what must never leave your systems. That includes names, emails, phone numbers, account IDs, contract terms, internal URLs, API keys, and anything tied to a real person.
Swap real details for fake ones, but keep them consistent. If "Maria Chen" becomes "Nina Reed" in the prompt, use the same fake name in the log and the screenshot.
Review the whole set together. A clean prompt can still leak data through a log line or a browser sidebar.
Save the approved version in one shared folder or repository so people reuse it instead of making fresh copies each time.

Consistency matters more than perfect realism. Vendors need examples that preserve the shape of the work: field formats, order size, error types, writing style, and rough complexity. They do not need your actual customer list. If one fake company appears in the prompt, another in the screenshot, and a third in the logs, the sample gets harder to follow and easier to mistrust.

Make the fake data believable enough to test the task. Keep valid email formats, date formats, product codes, and ticket states. Keep the kind of mess people create in real life too: typos, short messages, duplicate entries, and vague requests. That gives the vendor something useful without exposing real details.

Screenshots deserve a stricter review than text files. People miss names in the browser profile, open tabs, sidebars, timestamps, and small notification popups. Crop hard, blur hard, or rebuild the screen with sample data if you are unsure.

Once your team approves a small library of sanitized examples, reuse it everywhere. That saves time, avoids last minute scrambling before demos, and gives vendors cleaner material to test against.

A simple support ticket example

Clean Up Staging Data

Find copied production records, risky logs, and weak habits before they reach a vendor.

Get Help

A vendor does not need your real ticket history to test a prompt. They need a believable case with enough detail to show the flow: who reported the issue, what broke, how urgent it feels, and what the agent should do next. Good test data keeps that shape while removing anything that points to a real customer.

Start with a real ticket, then rewrite it instead of lightly editing it. If the original message came from a named customer at a real company, replace both. "Marta from Northwind Dental" can become "Jordan from Pine Ridge Clinic." The problem can stay the same, but the people and company should be fully made up.

The email thread needs the same treatment. Do not keep the original replies and blank out a few lines. Rewrite the thread in plain sample text. Keep the tone, the issue, and the sequence of events, but remove signatures, phone numbers, order IDs, inbox footers, and anything copied from a live mailbox.

Subject: Duplicate charge after plan upgrade

Customer message:
Hi team, I upgraded our account yesterday and the system charged us twice.
The extra charge is still pending on our card. Can you confirm whether it will drop or if you need to issue a refund?

Agent reply:
Thanks for reporting this. I checked the upgrade event and sent the case to billing review.
We will update you within one business day.

The ticket should still look real when someone reviews it. Keep normal fields such as status, tags, created date, last reply, and priority. Those details matter because prompts often depend on them. An AI assistant may answer differently if a ticket is new, overdue, or already waiting on another team.

Screenshots need extra care. Teams often clean the ticket body and forget the browser itself. Hide open tabs, bookmarks, profile photos, chat sidebars, and any small account switcher in the corner. One visible tab name or one face can give away more than the ticket text.

Logs deserve the same caution. A single line can expose an email address, tenant name, internal hostname, or payment reference. Read the final prompt and every pasted log line one by one. If a detail does not help the vendor test the workflow, cut it or replace it with a sample value.

Light redaction is where teams usually slip. A fake customer name does not help much if the screenshot still shows the real support rep, the company favicon, and a log entry with the production domain. Clean examples work when every part tells the same safe story.

Where teams slip when cleaning data

Keep Test Data Useful

Preserve real formats and messy edge cases while removing customer and system details.

Get Guidance

Most leaks happen in the leftovers, not in the main example. A team edits the prompt, swaps out a customer name, and thinks the job is done. Then the pasted log output still shows an email address, account ID, internal hostname, or full error path.

Logs are risky because people treat them like background noise. They are not. A single stack trace can expose more than the prompt itself, especially if it includes user IDs, file paths, bucket names, or raw request payloads.

Teams also blur screenshots and stop there. The slide looks clean, but the original file name still tells the story. A file called "Acme-Bank-payroll-issue-48392.png" reveals a lot even if the screenshot itself is unreadable. The same problem appears in exported PDFs, presentation notes, and image metadata.

Changing names is another half fix. If you replace "Sarah Lee" with "Jane Doe" but keep order number 91827461, someone can still tie that record back to a real customer if that number appears anywhere else in the material. Unique IDs, rare timestamps, exact invoice totals, and unusual job titles can point to one person or company even after you rename them.

A few areas deserve extra attention: log output pasted under a prompt, file and attachment names, order numbers and ticket IDs, timestamps tied to a real event, and auto recorded meeting files or transcripts.

Bad fake data creates a different problem. Some teams scrub so hard that the demo stops making sense. They replace real examples with random text, broken dates, empty fields, and impossible values. Then the workflow fails and everyone learns the wrong lesson. Good demo data should keep the shape of the real thing: the same field types, similar lengths, normal edge cases, and believable relationships between records.

A simple example makes the point. If a support ticket mentions a refund, the masked data should still include a matching order, payment status, and timeline. If those details do not line up, the vendor cannot test the prompt properly.

Recording adds one more risk. Many teams let a vendor record the session before anyone checks the prompt, logs, screenshots, open tabs, and file names. That is backwards. Review first, record second. If nobody has approved the material, keep recording off and share only the cleaned copy.

That may sound strict. It is still much cheaper than cleaning up a preventable leak.

Quick checks before a vendor session

A ten minute review catches most leaks before anyone shares a screen or pastes a prompt. Teams usually focus on the prompt and forget the screenshot in the deck, the log snippet in a doc, or the CSV someone drags into a demo at the last minute.

You do not need a huge policy to make this work. You need a short routine that everyone follows every time.

Before the session starts:

Read every prompt out loud once. If a sentence sounds too specific, it probably is.
Zoom in on screenshots and look for patterns, not just obvious names. Search for email addresses, phone numbers, company names in browser tabs, profile photos, calendar invites, and chat sidebars.
Open every log you plan to share and scan for secrets. API keys, session tokens, internal hostnames, account IDs, trace IDs, and file paths slip through because they look technical rather than personal.
Make sure everyone uses the same approved sample set.
Pick one person to handle surprise data requests. Vendors often ask for "one real example" when a mock case feels too clean. Someone on your side should own the answer and know when to say no.

The screenshot check deserves special attention. A clean mock dashboard can still leak private data through a browser bookmark, an open Slack channel, or a saved username in the corner. Search images for "@", full names, invoice numbers, and anything unique enough to trace back to a real person.

Logs need the same care. Even after you remove names, raw logs can still expose customer details through metadata. A support event with a user ID, region, timestamp, and device model may be enough to identify one account when the vendor also sees other meeting context.

Consistency matters more than most teams expect. If product, support, and engineering all work from one approved sample pack, people stop improvising. That alone cuts a lot of risk.

One last rule helps when the meeting gets rushed: nobody shares fresh data on the fly. If the vendor needs another example, send it later after someone reviews it. A short delay is much cheaper than fixing an avoidable leak.

What to do next

Bring In a Fractional CTO

Work with an experienced CTO on safer AI vendor reviews, demos, and technical decisions.

Talk to Oleg

Your policy for demos and staging should fit on one page. If people need a long handbook to follow it, they will skip steps when a vendor call starts in ten minutes.

Write down the few rules your team must follow every time. Keep them plain: use fake names, remove account numbers, clean screenshots, replace IDs in logs, and never paste live customer text into prompts.

One person should own the final check. That person reviews sample records, exported logs, screenshots, and prompt examples before anything leaves the team. They do not need to build every file themselves. They just need clear authority to say, "No, this still looks too real."

Right before the session, rehearse it once with fresh eyes. Teams often clean the dataset, then forget the screen recording, browser history, sidebar notifications, or a stray terminal window with real data. A quick dry run catches what slips through because everyone got used to seeing it.

It also helps to ask the vendor direct questions before you share anything: what they store during the demo, who can access prompts and uploads, how long they keep the data, whether they use it for training or product improvement, and whether they can delete session data on request. You want short, specific answers. If the reply sounds vague, tighten what you share.

This makes internal decisions easier too. Sales may want a more realistic example. Engineering may want raw logs because they are easier to debug. A short rule sheet and a named owner settle those arguments quickly. Realistic enough is usually enough.

If your team works with several vendors, keep a small approved pack ready: one sanitized dataset, one safe prompt set, one cleaned log sample, and a couple of reviewed screenshots. Reusing approved material saves time and lowers risk.

If you want a second review before the next vendor session, Oleg Sotnikov at oleg.is helps startups and smaller companies review AI vendor setups, staging workflows, and demo materials as a fractional CTO. A short consultation can catch the details your team has stopped noticing.

Frequently Asked Questions

Do I need to share real customer data with an AI vendor?

No. Start with fake examples that keep the same workflow, field formats, and normal errors. Share real customer data only if you have a clear legal and security reason, and most demo sessions do not need it.

Is staging data safe to use in a demo?

Not by itself. Teams often copy production records into staging, leave old backups there, or store debug logs from real incidents. Treat staging like any other environment and review every sample before you share it.

What should I strip first from prompts, logs, and tickets?

First remove anything that points to a real person, company, or live system. That includes names, emails, phone numbers, account IDs, order numbers, internal URLs, API keys, tokens, and session cookies.

Are screenshots riskier than text?

Yes. Screenshots hide a lot of detail in small places like browser tabs, sidebars, bookmarks, profile menus, timestamps, and file names. One image can leak more than a short prompt if nobody checks the whole screen.

Can I just blur sensitive parts of a screenshot?

Usually no. Light blur often leaves enough shape to guess the text, and cropping one area may leave private data somewhere else on the screen. If you are unsure, rebuild the screen with sample data instead.

How do I make fake data still useful for testing?

Keep the structure and the mess, not the real identities. Use the same field lengths, date formats, status changes, typos, blanks, and duplicate records your team sees every day, but swap in made up names, IDs, and companies.

Should I redact a support message or rewrite it?

Rewrite them. A simple name swap leaves too much context behind, like locations, refund amounts, unusual timelines, or family details. Keep the problem and tone, then change the story so nobody can trace it back to a real case.

What do teams usually miss in logs?

Check for secrets and hidden identifiers, not just names. Logs often expose tokens, internal hostnames, request bodies, bucket names, file paths, trace IDs, and exact timestamps that tie back to a real event.

Can a vendor record our demo session?

Allow recording only after your team reviews the prompt, screenshots, logs, tabs, and file names. If nobody approved the material, keep recording off and send a cleaned version later.

What is the simplest safe process before a vendor session?

Pick one owner, use one approved sample pack, and ban fresh data during the call. A short review right before the session catches most leaks, and an outside reviewer can help if your team keeps rushing or arguing over what is safe enough to share.