Apr 06, 2026·8 min read

Import data quality before launch with 3 sample files

Stop guessing about import data quality before launch. Ask new customers for three real files, score gaps, duplicates and naming drift, and fix risks early.

Table of Contents

Why import issues show up late

Most teams test imports with demo files that look tidy because someone on the team made them. The columns are named the way the product expects, dates use one format, and every row has the fields people planned for during development. That kind of file proves the happy path works. It does not prove real customers can get through onboarding.

Customer exports are usually messier. A sales team may send one CSV from an old CRM, another from Excel, and a third pulled from a billing tool. You start seeing blank cells in required fields, old column names like "Company Name" instead of "Account", mixed date formats, duplicate rows, and notes pasted into fields that were supposed to hold clean values.

One broken column can stop the whole process on day one. If the importer expects "email" but the file says "primary_email", or if a required ID column is half empty, the user often cannot move forward without help. What looked like a small mismatch in testing turns into a blocked setup call, a frustrated new customer, and a support ticket that lands with the team right away.

Late fixes are expensive because people make them under pressure. Product and engineering start adding one-off rules for a single customer file, then another rule for the next customer, and soon the import logic gets harder to trust. Support writes workarounds. Success teams ask customers to clean files by hand. Everyone loses time, and the first impression of the product gets worse.

This is why import data quality needs a reality check before launch. A clean demo file can hide the exact problems that matter most: missing columns, duplicate records, and naming drift across exports. Teams usually discover those problems only when a real customer tries to import their own data and expects it to work without a rescue call.

What three real sample files tell you

One clean demo file can fool you. Three real files give you the mess you actually need to see.

Ask for files from different people or different source systems. If all three exports come from the same template, you learn very little. A sales spreadsheet, a finance CSV, and an ops export often describe the same customer in slightly different ways, and those small differences are where import problems start.

The first thing three files reveal is naming drift. One file might say "Customer ID," another uses "customer_id," and a third says "Client Number." That sounds minor, but it changes how your mapper works. You may also find one file splits a full name into two columns while another keeps it in one field. That is a warning that your import rules depend on assumptions your customers do not share.

They also show how often required fields are empty. Count blanks in the columns your product needs to work, such as email, company name, order date, or status. If one of the three files has 20 percent of a required field missing, you do not have a rare edge case. You have a normal onboarding case.

Duplicates show up better when you compare files side by side. One export may contain repeated rows because someone downloaded a report twice and merged it by hand. Another may have near duplicates, where the same record appears with a small spelling change, extra space, or different letter case. If your import data quality checks only catch exact matches, those records will slip through.

Three files also expose format drift that a single sample hides. Dates may appear as "03/04/2024," "2024-04-03," or plain text. Currency might show "$1,200.00," "1200," or a value stored in cents. Status labels often drift too: "Active," "Live," "Enabled," and "Current" may all mean the same thing to the customer, but not to your importer.

That is why three sample files are enough to spot the pattern, but still small enough to review quickly. Before launch, they tell you whether your importer handles normal variation or only the nicest file a customer could send.

How to ask for the right files

Ask for files that came from real work last week or last month. Do not accept a file someone made just to help the project along. Fake examples are usually too neat, and they hide the mess that breaks an importer on launch day.

A good request is specific. Tell the customer you need three exports from normal use: one file they think is clean, one file that feels typical, and one file they already know causes trouble. That mix gives you a faster read on import data quality than ten polished examples.

Keep the request simple so people actually send the files. A short note often works better than a long form. Ask for:

three recent exports from the system they use today
one clean file, one average file, and one messy file
the file type you need to support, such as CSV or XLSX
a short note on who creates the file and how often they export it

That last point matters more than teams expect. If the sales manager exports the file once a month, you get one kind of data. If five account managers edit it by hand every day, you get another. Naming drift, extra columns, and duplicate rows often come from the process around the file, not the system itself.

Ask about privacy early. In many cases, the customer can remove names, emails, phone numbers, or order details before sending anything. They should keep the same column layout, data types, odd spellings, blank cells, and duplicate patterns. If they strip out every useful detail, you lose the very signals you need to test.

Confirm the format before anyone sends files back and forth. A team may say they use "Excel," then send CSVs from one department and XLSX files from another. Some exports also change separators, date formats, or header names depending on who runs them.

One small example: a customer says they upload a "simple spreadsheet" every Friday. When you ask who makes it, you learn one person exports it from the CRM, another renames columns by hand, and a third adds notes before upload. That one answer tells you where the importer is likely to fail.

How to score each file before launch

Put each sample file on one row in a small sheet, then score facts, not feelings. This takes the guesswork out of import data quality and gives the team one clear view of risk before launch.

Start with required columns. Make a short master list of fields the import must have to work, such as customer ID, email, order number, product code, or signup date. When a file misses one of those columns, mark it at once. If the column exists but many cells are blank, note that too. A column with 60 percent empty values is not really ready.

Next, count duplicates in the fields that should stay unique. Check IDs, emails, invoice numbers, or order numbers, depending on the import. Count both exact duplicate rows and repeated values in unique columns. Those are different problems, and both can break reporting, billing, or account setup.

Then look for naming drift across files. One customer may send "email", "Email Address", and "primary_email" in three exports from three tools. The data may mean the same thing, but your import rules still need to map it. Mark every renamed column so you can see where the import needs aliases or manual review.

Cleanup needs their own note. Dates in mixed formats, phone numbers with text, country names spelled three ways, and free text like "N/A" or "unknown" all slow the import down. Write down what needs cleanup and how much work it will take.

Use a simple traffic light score:

Green: all required columns exist, duplicate counts stay low, and cleanup looks minor.
Yellow: one issue needs attention, such as a renamed required column or a small duplicate rate.
Red: a required column is missing, duplicate identifiers are common, or several fields need heavy cleanup.

Keep the scoring plain. If one of the three sample files gets a red score, pause the launch plan and fix the import rules first. The full customer dataset usually looks messier, not cleaner.

Build a small score sheet

Review Your Import Flow

Let Oleg spot import blockers before they turn into launch week fixes.

Book Review

A score sheet keeps import reviews boring in a good way. When every sample file gets the same checks, the team stops arguing from memory and starts looking at the same facts. That alone improves import data quality more than another meeting.

Use one row for each sample file. Put the file name, customer name, date, and owner at the start so nobody loses context a week later.

Then add a few fixed columns:

missing required columns
duplicate rows
naming drift
severity score
short notes

That is enough for an early launch decision. If a file misses "email" and "company_id", mark the count and write the exact column names in the notes. If duplicates show up, note whether they are exact copies or near-matches. If naming drift appears, write what changed, such as "Phone", "phone_number", and "Primary Phone" across three files.

Keep notes short and specific

A good note answers one question: what will break if the customer imports this file today? "Header mismatch blocks mapping" is better than "naming issue found." "42 duplicate contacts by email" is better than "some duplicates."

You do not need a fancy formula. A plain 0 to 3 score works well for each check:

0 = no issue
1 = minor cleanup
2 = manual work needed
3 = launch blocker

Add the three scores together for each file. Then set a threshold before the team reviews anything. For example, you might allow launch only if every sample file scores 3 or below overall, with no single blocker in missing columns.

That rule matters because teams get soft when a deadline gets close. A written threshold removes the last-minute debate.

Review the sheet with product, support, and engineering in the same short meeting. Product decides whether the import flow still fits the promise. Support sees what customers will ask about on day one. Engineering decides which fixes belong in the importer and which belong in onboarding instructions.

If one file scores low and two score high, do not average the pain away. Launches fail on the ugly file, not the clean one.

A simple example from customer onboarding

A SaaS team that imports customer contact lists asks each new account for three real files they already use. They do not ask for a neat demo export. They want the messy spreadsheets people send on day one, because that is where import data quality problems show up fast.

One customer sends three contact sheets from three different people. The sales file looks close to what the product expects. It has first name, last name, company, email, and a signup date. The support file uses Work email instead of Email, and it repeats the same company under slightly different names, like Acme Ltd and ACME Limited. The third file comes from an office manager who keeps a manual list. Many email cells are blank, and the date column contains plain text such as Nov 3 23 and last Friday.

Three customer sample files from one company often reveal three different habits. One team uses formal labels. Another shortens them. A third leaves gaps because they track the missing details somewhere else.

The team scores the files before launch.

Check	What they found	Score
Missing columns	One file has many blank email cells	2/5
Duplicate row detection	Two files repeat the same company with different spellings	3/5
Column naming drift	Email and Work email should map to one field	1/5
Date parsing	One file stores dates as plain text	2/5

That score tells them the importer needs rules before launch, not after the first support ticket. If they test with only the clean sales file, they get false confidence. The messy files show the real work.

So they add a few rules. They map Email and Work email to the same field. They warn users when email is empty instead of importing half-finished rows. They normalize company names before duplicate checks. They accept a short list of common date formats and reject the rest with a clear message.

The result is not a perfect importer. It is an importer that matches what customers actually upload.

Mistakes teams keep making

Clean Up Import Risks

Review duplicates, blanks, and format drift with an experienced startup advisor.

Book Call

Most import data quality problems are ordinary, not mysterious. Teams miss them because they test with neat files that look nothing like customer data on a normal Tuesday.

The first mistake is trusting one polished sample. A customer often sends the file they cleaned up for a sales call, not the one their staff exports every day. That file has tidy headers, no odd date formats, and almost no blanks. It proves the happy path works. It does not prove your importer can survive real use.

Blank cells are another place teams fool themselves. People talk about missing values like they are edge cases, but they show up everywhere. One row misses a phone number, another has no company name, five more skip country or state. If your rules treat blanks as rare, users hit errors on the first live upload.

Duplicate handling gets messy fast. Many teams merge records by name alone because it feels simple. That works until you get two "Alex Lee" rows from different companies, or one customer exports "Acme Ltd" in one file and "ACME Limited" in another. A weak merge rule can combine different records or leave obvious duplicates untouched. Both outcomes create support pain.

Header names also drift more than teams expect. One customer uploads "Email", another uses "Email Address", and a third has "Work Email". Some teams assume every file follows the same column names because their own template does. Customers rarely care about your template as much as you do.

The timing mistake is the most expensive one. Teams wait until launch week to test imports with real files, then find five small issues that pile into one big delay. They patch around the worst case, promise support will help with the rest, and ship a fragile flow.

A better habit is boring but effective: test early with messy files, score what breaks, and fix the common failures before customers ever see the import screen.

Quick checks before you ship

Cut Support Back and Forth

Improve your import flow so new customers need less support on day one.

Talk to Oleg

If one sample file needs a hidden workaround, the importer is not ready. The last check is simple: take the three real files you collected and run them through the exact same mapping logic, with no manual rescue step from an engineer.

That single test catches more than most teams expect. It shows whether your import data quality rules work in the messy cases, not just in the clean demo file someone made in a spreadsheet five minutes earlier.

Use this short checklist with the product, support, and engineering team in the same room:

Run all three files with the same field mapping and validation rules. If one file needs a special case, write it down and decide whether you will support it or block it.
Ask support to explain each error message out loud in plain words. If they need developer help to translate it, users will get stuck too.
Check whether the product points to the exact failed row and names the reason. "Import failed" is useless. "Row 42: missing email" gives people something they can fix.
Try correcting one bad row and reimporting only that row or a small edited file. Users should not have to start from zero because of one typo.
Name one owner for import fixes after release. If nobody owns it, every broken file becomes a slow internal debate.

A good test feels a little boring. You want the same behavior every time: same column matching, same duplicate handling, same error wording. If results change from file to file, users will not trust the importer.

Plain language matters more than teams admit. A user does not care whether the problem came from schema validation, parser rules, or duplicate row detection. They care about what to do next. "Column 'Company Name' does not match 'Customer Name'" is clear. "Invalid source schema" is not.

One more check saves a lot of support time later: ask who will review import complaints in the first week after release. Put a name on it. When real customer sample files start exposing edge cases, that person decides whether to fix the product, adjust the mapping, or update the guidance.

What to do next

Start by making the sample-file request part of every new customer onboarding flow. Do not treat it as an extra step for only the tricky accounts. If every customer sends three real files early, your team gets a much clearer view of import data quality before support tickets pile up.

Keep the request simple. Ask for recent files that people actually export and use, not cleaned-up examples made for demos. Real files show the mess that matters: blank fields, copied rows, old column names, and small formatting habits that break imports.

Then save the score for each file in one place. A shared sheet, CRM field, or onboarding tracker is enough at first. What matters is that the team can spot repeat patterns across customers, data sources, and industries.

After a month or two, the patterns usually get obvious. You may find that one source always misses two required columns, or that one naming style keeps causing mapping errors. That gives you a better order for product work than guessing.

A simple next step looks like this:

add a three-file request to every onboarding checklist
score missing columns, duplicate rows, and naming drift for each file
save the scores where product, support, and onboarding can all see them
fix the biggest blockers before you add new import formats or settings

That last point matters more than most teams want to admit. New import options feel like progress, but they often hide the same old problems under more settings and more edge cases. If one missing-column check would prevent half your failed imports, build that first.

Keep the review small and regular. A 15-minute look at recent file scores each week is often enough to catch repeat issues and decide what to fix next.

If imports still slow onboarding, bring in outside help before the problem spreads through the product. A Fractional CTO like Oleg Sotnikov can review the import flow, check where bad files create the most friction, and set a practical plan for product, support, and engineering. That kind of review is often cheaper than months of patching the wrong parts of the importer.