Data import checklist before a new customer sends CSVs
Use this data import checklist to set file limits, mapping rules, and cleanup ownership before a new customer arrives with messy CSVs.

Why imports go wrong before day one
Most import delays start in the sales call, not in the importer.
A rep hears "we have CSVs" and treats that as an easy yes, but nobody has opened the files yet. By the time engineering sees them, the promise is already part of the deal.
CSV files look tidy from a distance. Open them and the mess shows up fast. One file uses "Customer Name," another uses "client," and a third splits the same value across two columns. Dates change format from row to row. Phone numbers lose country codes. Whole columns sit empty because someone exported every field "just in case."
That puts pressure on both sides. The customer expects a quick start because they were told the import would be simple. Engineering has to slow things down, ask basic questions, and explain why the files need work before anything can go live.
The fix starts before the customer sends anything. You need a few plain rules up front: which columns are required, which date and number formats you accept, who removes duplicates and fixes blanks, and what happens to unknown values or extra fields.
A small gap in any one of those rules can waste days. If sales says, "send us your data and we'll handle it," the customer hears speed. Engineering hears risk, because twenty CSV files usually mean twenty slightly different versions of the truth.
Picture a new account sending exports from billing, CRM, support, and an old spreadsheet someone kept on the side. None of the headers match. A few files include test records. One sheet mixes active and archived customers. Nobody made a bad choice on purpose, but the team still loses a week sorting out ownership, definitions, and cleanup.
The import itself usually is not the hard part. The hard part is that nobody set limits before day one.
Decide what the import actually includes
When a new customer says they have twenty CSV files, the first job is not to load all twenty. The first job is to decide what belongs in the launch import and what can wait.
Teams get into trouble when they treat every file as urgent. That turns a simple onboarding task into a migration project.
Start with the record types people need on day one. In many cases, that means active contacts, companies, open deals or orders, and the current products or services tied to those records. That is usually enough to get the customer working.
Old invoices from 2017, closed opportunities, duplicate contacts, and half-used custom fields can wait unless someone has a real day-one reason for them.
History needs its own decision. Some history matters right away, like open support cases or unpaid invoices. Some history is only nice to have, and importing it too early slows everything down. If a record will not affect daily work in the first week, move it to a later phase.
Field choice matters just as much as record choice. Names, email addresses, company name, status, owner, and current plan often matter on day one. Old notes, legacy IDs, unused tags, and blank custom columns usually do not.
Write the out-of-scope items down before kickoff. Be direct. Say that launch includes active contacts and open deals, but excludes archived records, deleted users, historical logs, and note cleanup. That one step avoids a lot of arguments later.
A short scope note works better than a big promise. When both teams can point to the same launch list, the import stays smaller, faster, and easier to check.
Set file limits before sales says yes
Sales often promises an import before anyone has seen the files. That sounds harmless until a new customer arrives with exports from five tools, odd encodings, and tabs that mix three record types in one sheet. File limits stop that mess early.
Start with row count. Pick a maximum per file and put it in writing. Even a simple cap like 100,000 rows per file gives everyone a clear line. If the customer has more data, ask them to split it by table, date range, or account. Smaller files are easier to test, easier to rerun, and much easier to fix when one batch fails.
Keep the accepted formats narrow. If your team supports CSV, say CSV. If you need UTF-8, say UTF-8. Do not leave room for Excel workbooks with hidden tabs, copied tables from email, PDFs, or screenshots. Those are manual cleanup jobs, not normal imports.
One file should usually hold one table. Contacts go in one file. Companies go in another. Orders go in a third. When one sheet mixes contacts, notes, invoices, and status history, the import turns into guesswork.
Your parsing rules should be short and specific. Use one delimiter. Use one date format, such as YYYY-MM-DD. Use one decimal rule. Decide how blank values should appear. Require headers in the first row.
These rules sound small, but they prevent quiet errors. A date like 03/04/2024 can mean two different things. A number like 1,234 can mean one thousand two hundred thirty-four or one point two three four, depending on the source.
Ask for sample files before the contract starts. Ten or twenty real rows from each file usually expose broken headers, strange characters, duplicate IDs, and columns that need mapping work. If the sample does not fit your rules, change the scope before the customer hears "yes."
Write mapping rules both teams can use
A mapping sheet should settle arguments before the files arrive. Sales can use it to explain what the customer needs to send. Engineering can use the same sheet to build the import without guessing. If the two teams use different rules, the customer pays for that confusion with delays.
Start with one simple rule: each destination field gets one source column. Do not map both "Company" and "Account Name" into the same target field unless you also write the tie-break rule. If a customer sends three similar columns, pick one official source and state what happens to the others.
A practical field mapping template needs five things:
- Destination field name
- Whether the field is required
- Whether blanks are allowed
- Accepted values for fixed-choice fields
- A default value, if one exists
Required fields need plain rules, not assumptions. If email is required, say whether an empty cell blocks the whole row or skips that record. If blanks are allowed for phone, write that down too. That saves the support team from asking the same question over and over during onboarding.
Status and type fields need extra care because customers label them in messy ways. One file says "Active," another says "A," and a third says "Current." Your CSV import rules should list every accepted value and the exact result. If you need a fallback, define it. For example, a missing lifecycle stage might default to "lead," while a missing country stays empty and triggers review.
Extra columns need a rule too. You might ignore them, store them in notes, or reject the file until the customer explains them. Any of those approaches can work. Just pick one and write it down.
A small example helps. If the target field is "customer_status," the mapping should say: use only the source column "Status," accept "active," "inactive," and "trial," treat blanks as "trial," and ignore any separate "state" column unless it was approved in scope. That is clear enough for sales to promise and clear enough for engineering to build.
Give cleanup work an owner
This is where a lot of imports go sideways. Bad dates, missing IDs, duplicate rows, and odd column names do not fix themselves. If sales promises "we can import it," your team needs a clear answer to one question: who fixes what?
Start with source data. If the customer sends broken exports, someone has to decide whether your team will clean them, the customer will resend them, or both sides will split the work. Put that in writing before kickoff. Otherwise engineering ends up doing unpaid spreadsheet repair while everyone else waits.
Sales should collect real sample files early, not screenshots and not a verbal summary. A sample from the actual system tells you more than call notes ever will. One messy CSV can expose date format problems, empty required fields, and columns that change names from file to file.
Engineering then needs to validate the importer against those samples. This is not only about whether the file uploads. The team should check field mapping, row limits, encoding, required columns, and what happens when data is wrong. If the importer rejects 12,000 rows because one value is malformed, you want to know that before the customer is on a deadline.
Split the work before kickoff
A simple role split usually works well:
- Sales collects real exports and confirms what the customer expects to import.
- Engineering tests the importer and documents failure cases.
- Customer success keeps a live list of cleanup items and due dates.
- One named approver decides on exceptions, such as skipping fields or importing partial data.
That last role matters more than most teams expect. When a customer sends twenty CSV files in six formats, edge cases pile up fast. Someone needs the authority to say, "We will import these four files now, reject these two, and ask the customer to fix the rest." If nobody owns that call, the issue drifts between teams for days.
Customer success should track every open cleanup item in one place. Keep it simple: file name, problem, owner, status, and deadline. That gives sales, engineering, and the customer the same view of what still blocks the import.
Run a simple intake process
A good intake process takes about 20 minutes to set up and can save days later. It turns "we have some CSVs" into a job with clear limits.
Ask the customer for two sample files before kickoff, not the full dump. One file should show normal data. The other should show the messy cases they already know about. Get expected row counts too. A file with 800 rows is one kind of job. A file with 4 million rows is another.
Then check the basics by hand:
- Do the headers use clear names, or do they change between files?
- Are required fields blank in a large share of rows?
- Do the same contacts, orders, or companies appear more than once?
- Do dates, currencies, and phone numbers follow one format?
- Do the samples match the scope sales already approved?
Put field mapping in one shared template. One column should show the source header. Another should show the destination field. Add short notes for rules like "leave blank if unknown" or "match duplicates by email and company name." If sales, onboarding, and engineering keep separate notes, the first test import usually fails for boring reasons.
Flag cleanup items before kickoff starts. If the customer needs to fix blanks, broken dates, or duplicate records, say so early. If your team will do some cleanup, list exactly what you will fix and what you will ignore.
Finish with approval on scope, timing, and fallback rules. Decide what happens if a column has no match, if row counts differ from the estimate, or if one file arrives late. The intake should end with a clear decision: import as planned, import after cleanup, or pause until the customer fixes the data.
A realistic example with twenty CSV files
A new account sends twenty CSV files on Tuesday morning. They want contacts, deals, invoices, and every note from their old system loaded before launch.
At first glance, it looks manageable. Then the team opens the files and finds five different date formats. One file uses 03/04/2024, another uses 2024-04-03, another spells out Apr 3 2024, and two more mix day-first and month-first dates in the same column.
The company list is messy too. The same customer appears as "North Star LLC", "NorthStar", and "North Star, LLC". If nobody fixes that first, the import creates three accounts, three histories, and a lot of support tickets.
Sales already promised that old notes would come over as well. That promise sounds small until engineering checks the export. The notes sit in long text fields with timestamps, user names, and tags jammed together. There is no mapping plan, no duplicate rule, and no agreement on where those notes should live in the new system.
A calm team cuts scope before the mess grows. They choose a smaller launch set: active contacts, open deals, unpaid invoices, one date standard for every file, and cleaned company names before import.
That decision usually saves the launch.
The old notes move to a second phase. The customer gets a simple rule: they clean company names and confirm date format, while the internal team builds one field mapping template and tests a sample import. If the notes still matter after launch, the team maps them later with real examples instead of guesses.
That is what a good import checklist is for. The goal is not to import everything people can export. The goal is to load the records the account needs on day one, keep them clean, and schedule the rest without drama.
Teams that do this well rarely look heroic. They just avoid the week of chaos that starts when twenty CSV files arrive and nobody has said "no" yet.
Common mistakes that slow the account down
The slowest imports often start with one harmless sentence: "We can import anything." Sales says it to keep momentum. Engineering inherits it later and has to explain why five date formats, blank IDs, and mismatched account names do not load cleanly. Clear limits stop that problem early.
Another common mistake is skipping the duplicate rule. If nobody decides how to match records, the team guesses under pressure. Do you match by email, external ID, company name, or a mix of fields? Pick the rule before the first file arrives. If you wait, you get duplicate contacts, merged accounts that should stay separate, and a cleanup job that keeps growing.
Cleanup work also gets mislabeled as "part of the import." That sounds small until engineering spends days fixing broken phone numbers, trimming spaces, splitting full names, and removing test rows. That work is not free, and it should not sit in a vague bucket.
Custom fields create a quieter mess. A customer asks for fields like "Tier," "Segment 2," or "Status New," and everyone nods. Then the names change three times, nobody agrees on allowed values, and the mapping breaks. Finalize field names and accepted formats before anyone builds the import.
The last mistake shows up in launch week. Teams test the happy path and ignore bad rows until the deadline gets close. Then a handful of ugly records stops the whole job. Test with messy samples early, not just clean examples. Ask for rows with missing values, duplicate IDs, invalid dates, and extra columns.
A simple rule helps: if a file can fail, decide now who fixes it, how you flag it, and whether the import continues or stops.
Quick checks before kickoff
A kickoff call goes better when the team answers a few plain questions before the customer sends anything.
Start with ownership. Each file needs one named owner on your side and one contact on the customer side. If nobody owns a file, nobody fixes it when dates break, headers change, or rows disappear.
Then check the mapping. Required fields should point to real columns in the sample files, not guessed column names from a sales call. If your field mapping template says "email" but the CSV says "primary_contact" and half the rows are blank, you do not have a mapping yet.
A short pre-kickoff review should confirm five things:
- Every incoming file has a person responsible for reviewing it.
- Required app fields match real columns in the sample data.
- The row counts and file sizes stay inside the agreed limits.
- The team wrote down how it will spot and merge duplicates.
- Everyone can see which requests fall outside the original import scope.
Limits matter more than teams think. A promise to "import everything" sounds harmless until one account sends twenty files with 800,000 rows, mixed encodings, and three years of junk records. If sales agreed to smaller limits, engineering can push back early and ask the customer to split the work.
Duplicate handling needs plain language. Decide whether the system matches by email, external ID, phone number, or a mix. One sentence in writing can save days of rework.
Scope needs daylight too. If the customer wants cleanup, deduping, enrichment, and historical backfills, put those items on the table before kickoff. Hidden work is what turns a simple import into a messy project.
What to do next
Put these rules into one shared onboarding document. Keep it short enough that sales will read it and clear enough that engineering will trust it. It should answer four questions without debate: what files you accept, how mapping works, who fixes bad data, and when the import starts.
Then give sales a script they can use in real calls. It does not need polished language. It needs plain answers about custom CSV formats, partial exports, missing columns, and cleanup work. If sales and engineering use different words for the same import, the handoff will break.
A simple package usually works best: one onboarding document with file limits, required columns, and import scope; one field mapping template that customers fill in before kickoff; and one short sales script with approved promises and red flags.
After that, review one recent import that went badly or took too long. Do not turn it into a blame exercise. Look for where the delay started. Maybe sales promised too much. Maybe engineering accepted files before mapping was approved. Maybe nobody owned cleanup, so the customer kept sending revised CSVs for a week.
Fix the weak spot first. Small changes matter. One required template can save days of back and forth. One clear cleanup owner can stop an import from drifting across three teams.
If the same onboarding problems keep repeating, outside help can be useful. Oleg Sotnikov at oleg.is works with startups and small teams on technical process, architecture, and AI-first operations, and this kind of sales-to-engineering handoff is exactly the sort of mess an experienced Fractional CTO can tighten up.
When the next customer sends twenty CSV files, your team should not need a rescue plan. You should already have a process that says yes, no, or not yet.
Frequently Asked Questions
Why do CSV imports take longer than expected?
Most delays start before anyone opens the files. Sales promises a simple import, then engineering finds mixed headers, bad dates, blanks, duplicates, and extra columns that nobody defined.
You avoid that by setting scope and file rules before the customer sends data.
What should we import first for a new customer?
Start with records people need to work on day one. Active contacts, companies, open deals or orders, and current products usually cover the first launch.
Leave old history, archived records, deleted users, and unused custom fields for a later phase unless the customer has a real first-week need.
Should we set a file limit before kickoff?
Set a hard cap in writing before sales says yes. A row limit per file keeps testing, reruns, and fixes manageable.
If the customer has more data, ask them to split files by table, date range, or account instead of sending one huge export.
What file rules should sales promise?
Keep the rules narrow. Accept CSV only if that is what your team supports, require UTF-8 if you need it, and ask for one table per file.
Pick one delimiter, one date format, one decimal rule, and headers in the first row. Tight rules stop quiet errors later.
What belongs in a field mapping template?
A useful mapping sheet names the source column, the destination field, whether the field is required, whether blanks are allowed, accepted values for fixed fields, and any default value.
That gives sales, onboarding, and engineering one shared rule set instead of three different guesses.
Who should clean bad data before the import?
Give cleanup work a named owner before kickoff. Decide whether the customer fixes broken exports, your team fixes a small agreed set of issues, or both sides split the work.
If nobody owns cleanup, engineering ends up repairing spreadsheets while the launch date slips.
How should we handle duplicates?
Pick one matching rule early and write it down. Most teams match by email, external ID, company name, or a clear combination.
If you wait until launch week, people guess under pressure and create duplicate contacts or merge records that should stay separate.
Should we import old notes and history on day one?
Usually no. Import the history that affects daily work right away, such as open support cases or unpaid invoices, and move the rest to phase two.
Old notes and long-tail history often need extra mapping and cleanup, so they slow the launch more than they help.
What should we ask for before kickoff starts?
Ask for two real sample files, not screenshots. One should show normal data and one should show known messy cases.
Check headers, required fields, date formats, duplicates, row counts, and whether the samples match the approved scope. That quick review catches most problems early.
When should we pause the import or ask for outside help?
Pause when the files break your written rules, the scope grows into a migration project, or nobody can answer who owns cleanup and approvals.
If the same handoff problems keep repeating, bring in an experienced CTO to tighten the process before the next account lands.