First operations data set for AI: start with tickets
Choosing a first operations data set for AI gets easier when you start with tickets, forms, and structured requests instead of email threads.

Why teams pick the wrong starting data
Teams often start with the noisiest pile of work, not the cleanest one. They look at a huge mailbox, years of chat logs, or a shared inbox and think, "We already have tons of data. Let's use that." It sounds practical. Most of the time, it slows the first AI pilot.
The mistake is simple. People confuse volume with usefulness. A giant archive of messy history feels safer than a small batch of clean tickets or forms. It looks complete. It feels real. But AI works better when the task arrives in a consistent shape, with the same fields, the same intent, and a clear result.
Email threads are the usual trap. One thread might include the original request, three replies, a side conversation, missing files, and a final answer that never clearly says what solved the problem. Another thread might cover the same issue in a completely different way. A person can often fill in the gaps from context. A model usually cannot, especially when the same task shows up in ten formats.
That is why first pilots often disappoint. The model is not just trying to solve the task. It is also trying to figure out what the task is. That confusion adds delay, increases review time, and makes early tests look worse than they should.
A smaller, cleaner workflow usually wins. An IT help desk is a good example. If tickets already have categories, priority, request type, and a resolution note, the model has a real chance to learn patterns that hold up in daily work. That matters more than having a bigger archive.
This fits what Oleg Sotnikov sees in AI-first operations work. Teams move faster when they cut noise early and start with inputs that already have some structure. The data does not need to be perfect. It just needs to describe one request clearly enough that both a person and a model can act on it the same way each time.
What good starter data looks like
Good starter data looks a little boring, and that is exactly the point. The best first data set already follows a pattern, so the model sees the same structure again and again instead of guessing what each record means.
Tickets, intake forms, and structured requests usually work well because each entry asks for the same details. One person may write more clearly than another, but the fields stay stable. That gives you cleaner inputs and more reliable outputs.
At this stage, consistency matters more than depth. If every request includes a type, short description, owner, priority, and status, you already have enough to test routing, summaries, tagging, or draft replies. If half the data lives in email threads and the other half sits in chat, the pilot gets messy fast.
A quick test helps. Open 20 recent items and see whether they all answer the same questions: what is being requested, who owns it, how urgent it is, what status it is in, and how you know it is done. If you can answer those without reading a long back-and-forth, the data is probably in decent shape.
Clear endings matter as much as clean starts. A request should have an obvious finish line, such as "account created," "invoice approved," or "bug fixed and closed." If "done" means something different to each person, the model has nothing solid to learn from.
You also need enough volume to see repeat patterns. That does not mean thousands of records on day one. It means a steady flow of new work so the team can test, review, adjust, and see whether results improve.
An internal IT help desk is a solid example. Tickets come in through a form, staff assign an owner, priority gets set early, and closed tickets usually include a short resolution note. That is much easier to automate than an inbox full of forwarded emails with missing context.
The best starting data is not the most complex data your team has. It is the data that repeats, stays structured, and gives you a clear way to judge whether the output helped or created more work.
Good places to start
For a first pilot, pick work that already arrives in a fixed shape. You want rows, fields, labels, and a clear result. If people can sort it by category today, AI can usually learn from it much faster.
The best starting points have one thing in common: two people looking at the same record would describe it in almost the same way. That cuts review time and makes mistakes easier to spot.
Help desk tickets are a common starting point when agents already use categories like access, hardware, billing, or outage. In that setup, AI can route requests, summarize them, or suggest replies without guessing the issue type first.
Intake forms in HR, finance, and IT are another safe bet. New hire requests, expense approvals, and software access forms already collect facts in the same order every time. Sales handoff forms can work too when they include standard fields such as company size, deal stage, product interest, and promised delivery date. Approval queues are often even simpler because they usually end with one of a few outcomes: approve, reject, or ask for changes.
Bug reports can also work well if the team already uses steps to reproduce, severity labels, version numbers, and expected versus actual behavior. That structure helps AI group duplicates and draft triage notes.
A small IT team shows the difference clearly. If every password reset, laptop request, and access issue enters through the same ticket form, the team can test routing or response drafts in days. If those same requests arrive through a shared inbox, cleanup usually takes longer than the pilot itself.
Early on, structured requests beat messy communication for a simple reason: they reduce the number of judgment calls. Start where the work already has rules, names, and clear endings.
What to avoid at first
If you want a fast win, skip anything that needs heavy cleanup before a model can do useful work. The first data set should be consistent, labeled, and easy to compare.
Long email threads look attractive because there are so many of them. In practice, they are hard to use. Replies quote old text, people change the subject halfway through, and forwarded messages drag in extra context that has nothing to do with the actual request.
Shared inboxes cause a similar problem. Ten people answer in ten different styles, and they rarely fill in the same details every time. One person writes a full summary, another sends two lines, and a third forgets the order number or deadline. That makes training, tagging, and automation harder than it needs to be.
Chat logs are often worse. Teams use shorthand, inside jokes, and half-finished thoughts. Someone writes "done" or "check this," and everyone in the channel knows what it means that day. A model usually does not.
Old spreadsheets can slow things down too. One tab tracks requests, another tracks owners, and a third has notes from three years ago. Columns change names, dates use mixed formats, and blank cells mean different things depending on who filled them in.
Unstable processes are another bad starting point. If the approval steps, rules, or fields change every week, the pilot never gets stable input. The team ends up chasing process changes instead of learning whether the automation helps.
A simple rule works well here: avoid data that needs a person to explain it every time. If someone has to say, "This row means one thing in March and another thing in April," leave it for later.
That is a common pattern in small teams. They want to start with the noisiest source because it feels complete. A better first move is smaller and cleaner, like service tickets with fixed fields or intake forms with the same questions each time. Results usually come faster, and the next step becomes much easier.
How to choose your first data set
Start with work your team repeats every week. Pick three routine processes, not ten. Good candidates include access requests, support tickets with standard categories, and simple intake forms. Skip anything rare, political, or full of back-and-forth.
Then score each process with three plain questions:
- How often does it happen?
- Do people submit it in roughly the same way each time?
- Can you tell, without debate, whether the result was right?
A basic 1 to 5 score is enough. The winner is usually not the biggest process. It is the one with enough volume, steady inputs, and a clear finish.
Choose the smallest option with the cleanest inputs. That often leads to the fastest pilot because you can test it without spending weeks cleaning data. Teams get stuck when they start with messy inboxes. Email looks useful, but it hides missing details, side conversations, and too many exceptions.
Before you test anything, pull 50 to 200 recent examples and review them by hand. Read them like a skeptical operator, not like a manager looking for good news. Check for duplicates, half-filled requests, mixed categories, and cases where different staff members solved the same issue in different ways.
That manual review saves time later. If people used one form for five unrelated problems, the data set is not ready. If most items follow the same pattern, you have something workable.
Strip out personal data you do not need before any trial. Names, phone numbers, email signatures, and long message histories often add risk without helping the model. Keep only the fields needed to classify, route, summarize, or draft a reply.
You have probably picked well if two people can review the same sample and mostly agree on what each item is and what should happen next. If they argue over half the sample, go narrower and pick a cleaner process.
A simple example from an IT team
An IT team wants a quick pilot, so it starts with laptop requests instead of the shared mailbox. That sounds boring, but boring is useful here. The team already gets the same type of request every week, and each request comes through one form.
Employees fill in a few fields: laptop model, budget, manager name, and due date. Some teams add job role or office location, but they keep the form short. Because the fields stay consistent, the AI does not need to guess what the person meant.
The team gives the model two small jobs. First, it sorts requests into simple groups such as "ready for approval," "missing budget," or "needs manager review." Second, it drafts a short approval note for the IT lead or finance manager.
A request might look like this: a designer asks for a MacBook Pro, lists a budget of $2,300, names her manager, and says she needs it before her start date next Friday. The model checks whether the form is complete, flags the higher price, and drafts a note saying the request is complete, the deadline is close, and manager approval is still needed before purchase.
The team leaves the shared mailbox alone for now. Those email threads mix purchase questions, shipping updates, random follow-ups, and missing details. People reply out of order, forward old messages, and change the subject line. That kind of mess slows the pilot and makes mistakes more likely.
After a month, the team compares a few simple measures: average time from request to approval, the number of requests sent back for missing details, manual triage time per request, and drafting time for approval notes. If those numbers improve, the team knows why. The data was clean, the task was narrow, and staff could check the output without much effort.
Mistakes that slow the first pilot
A slow pilot usually begins with one bad assumption: the process is messy now, and AI will clean it up on the way. It rarely works like that. If requests arrive with vague titles, missing details, and five different ways to describe the same issue, the model learns noise before it learns patterns.
Scope creep is another common problem. A team starts with service desk tickets, then adds email requests, chat messages, spreadsheet logs, and a form that only half the team uses. Once that happens, results become hard to read. If the test goes badly, nobody knows which workflow caused the problem.
Smaller and cleaner usually wins. One request type is enough. One form is enough. One ticket queue with consistent fields can teach you more than three months of messy email chains.
The data itself can quietly ruin a decent test. Missing fields, duplicate records, and old tickets full of copied text all make the output less useful. If twenty tickets all say "please help" and nothing else, the model cannot sort them in a meaningful way. If the same request appears three times under different IDs, your measurements stop meaning much.
Too little data causes a different issue. Teams sometimes test on 20 or 30 records and expect clear patterns. That is rarely enough, especially if requests vary a lot. You do not need a giant warehouse of data, but you do need enough examples to see repeated cases, edge cases, and common mistakes.
Trust matters too. Many teams ask for full automation on day one. They want the system to classify every ticket, write every reply, route every request, and close the easy ones. It sounds efficient, but it often alarms the people who have to use it.
A better first step is narrower. Let the model suggest categories before a human confirms them. Let it flag missing fields before intake starts. Let it recommend routing for common requests or draft a reply for simple cases. That kind of pilot gives the team something concrete to review. Once people see that the output is usually right, they stop treating the system like a risky experiment and start using it.
Quick checks before you start
A small pilot usually fails early when the team needs a workshop just to explain where requests come from. If one person cannot describe the request flow in about two minutes, the process is still too fuzzy.
Look for work that repeats with only small changes. A support queue, access request form, or internal IT ticket often follows the same path: a request arrives, someone checks a few fields, the work gets done, and the ticket closes. That consistency makes testing much easier than digging through long email chains.
You also need a clear way to judge results. Can the team mark each case as good, bad, late, sent to the wrong team, missing details, or solved on the first pass? If nobody can label outcomes, people will argue about whether the model helped. Simple labels are enough.
Data access matters more than most teams expect. If ticket data sits in one tool and comments live in three inboxes, cleanup can eat the whole month. Export a sample first and open it in a spreadsheet. If dates, owners, statuses, and request types already make sense, you probably have a workable starting point. If people spend weeks fixing names, merging duplicates, and guessing what happened, pick another source.
Ownership is the last check, and teams skip it all the time. One team should run the trial, review the output, and decide what success looks like. Shared ownership sounds nice, but it usually slows the work. A small group with one manager and one hands-on reviewer tends to move faster.
This is one reason experienced operators often start with forms instead of email. The work is easier to explain, easier to export, and easier to score. Even a modest time saving per request is enough to show whether the pilot is worth expanding.
What to do next
Pick one workflow today and keep it narrow. A simple ticket queue or form beats a shared inbox because the fields stay consistent and the requests follow the same path.
Write down the fields people use in that workflow. Start with the basics: request type, team, system, urgency, owner, and final outcome. If a field appears only in rare cases or depends on a long email explanation, leave it out for now. The goal is a clean first data set, not a complete map of every edge case.
Next, save a small sample of recent requests in one place. Thirty to fifty examples is usually enough for a first review. Read through them yourself and look for blanks, duplicate labels, missing outcomes, and requests that arrived by email even though they should have gone through a form. That quick pass will tell you whether the data is usable or still too messy.
Set one result you can measure quickly. You might want to route requests to the right team with fewer mistakes, cut triage time for new tickets, sort incoming requests by urgency, or flag incomplete submissions before a person reviews them. Keep the goal simple. If you try to classify, summarize, prioritize, and auto-reply at the same time, you will learn less and wait longer for a useful answer.
A small IT team shows the point well. If it starts with access requests submitted through one form, it can test whether AI sends each ticket to the right owner. If it starts with six months of mixed email chains, it will spend most of its time cleaning text instead of testing the pilot.
If you want a practical second opinion, Oleg Sotnikov at oleg.is works with startups and smaller companies as a Fractional CTO and advisor. He helps teams choose sensible AI workflows, clean up the process around them, and avoid wasting time on the wrong starting data.
Frequently Asked Questions
What should I use for my first AI operations pilot?
Use tickets, forms, or structured requests that follow the same pattern every time. A small clean queue usually works better than a huge archive of messy messages because the model can learn the task instead of guessing the format.
Why do email chains usually fail as starter data?
Email chains mix the real request with side talk, quoted text, missing files, and subject changes. People can fill in those gaps from context, but a model often cannot, so errors and review time go up.
Is a shared inbox any better than email threads?
Not really. A shared inbox still gives you mixed writing styles, missing details, and uneven replies from different people. That makes routing, tagging, and draft replies harder than they need to be.
How much historical data do I need?
You do not need thousands of records. Pull about 50 to 200 recent examples for a serious review, then make sure the workflow keeps producing new items so you can test changes over time.
What makes a ticket or form good enough for AI?
Look for stable fields like request type, owner, priority, status, and a clear final outcome. If two reviewers read the same item and mostly agree on what it is and what should happen next, the data is probably usable.
Should I pick the biggest workflow or the cleanest one?
Pick the cleanest workflow, not the biggest one. A narrow process with steady inputs and a clear finish line usually gives you a faster result than a large process full of exceptions.
What is a sensible first job for the model?
Start with one small job such as routing, tagging, missing-field checks, summaries, or draft replies. Those tasks stay easy to review, and the team can see quickly whether the output saves time or adds work.
Do I need to clean or remove personal data before testing?
Yes. Remove names, phone numbers, email signatures, and long message history you do not need. Keep only the fields the model needs to classify, route, summarize, or draft a response.
How should I measure whether the pilot worked?
Track one or two simple numbers the team already understands, such as triage time, wrong routing, send-backs for missing details, or time from request to approval. Keep the scoring plain so people do not argue about what success means.
What if my team cannot agree on labels or outcomes?
Go narrower and clean the process first. Choose one request type, fix the labels, and define what "done" means in plain language. If people disagree on half the sample, the pilot will stay noisy until you tighten the workflow.