Quickly build a CSV import feature for your MVP

So you've built out the MVP of your SaaS product, and you're ready to do a pilot for your first business customer.

There's one problem--the business has hundreds or thousands of existing customer's they'd like to import into your application, and nobody is going to type that manually.

You think "OK, we'll build a quick CSV import feature, and be on our way" -- and then it turns into a nightmare. In fact, that's probably why you're reading this right now.

Since users are the most common form of data to migrate into a SaaS product, we'll use this as a concrete example for this article.

Why do CSV import features always explode?

Usually, the first mental model we use to think about a CSV import is wrong. In fact, even labeling the feature "CSV Import" and speaking about it that way in your standups and JIRA boards will lead you down the wrong path.

It's scary being in an early-stage startup, with limited funding and limited time, and an overwhelming heap of responsibilities on your plate--your brain wants the CSV import to be simple, so that you can move on to other features which are badly in need of your attention.

The truth is that this simplified, happy, and optimistic mental model is a trick of your brain. The cure is to swap your light and fluffy "CSV Import" mental model with a heavy and dense "CSV Migration" model--the reward is accurate time and cost expectations, a solid MVP architecture, and a great experience for your very first clients.

What's the difference in mindset, between an Import and a Migration?

When we talk about a "file import", the mental model is that it's one-shot data:

importing a CSV into Excel, and then you forever work in Excel afterwards
importing an Excel spreadsheet in to a tax app, and then you complete your taxes in the app
importing your pictures into a photo app, and then you work with them inside the photo app

A migration is different, because you are importing "living" entities from another database--entities that change, get added to, get removed, and get re-added.

If you focus on just the import part, you are setting yourself up for disaster with your first customer, the second they hand you a CSV file with a duplicate user, a missing user, or a user record that's changed.

How do I know if I should use a Migration mindset, versus an Import?

Look at the fields you need to import for your records, if it contains an identifier from the other business' system, then it's a migration:

a numeric database ID that represents the record in the other database
a unique email that identifies the user in the other system
a UUID string that uniquely identifies the user
a slug or specially-formatted word that identifies each records

These types of fields are called "foreign keys", and if your imported records contain them, then buckle up--you need a migration, not an import.

What is the migration mental model?

With an Import, it's just one-shot data--all we focus on is getting the data into our system, and then we're done.

With a migration, we focus on keeping our system in sync with our clients' systems via the foreign keys.

In the migration mental model, the CSV file is simply the medium used to convey the current state of our client's system--your MVP needs to look at each CSV when it comes in, look at it's own records, and then make an intelligent decision on what to do with each one.

Migrations are a much heavier feature than doing a simple import--particularly when you're importing users, which often play a central role inside your product.

Ok, I need to build a CSV migration feature, what additional concerns to I need to worry about?

Think about the customers using the client's existing system--they're changing, being added, and being removed, and sometimes there are problems which require human intervention.

Think about your own system--the same thing happens.

When these two systems are running separately, a CSV file comes to bridge the gap--now you need to handle the following scenarios:

a row in the CSV is invalid for some reason--it's incomplete, the data is mangled, or there is a logical conflict with an existing record in your MVP
a user was created in the client's system, and they need to be provisioned in yours
a user's state changed in the client's system (think deactivated or reactivated), and this state change also needs to happen in your MVP, and fire additional logic if needed
a user was deleted from the client's system, and they need to be removed from yours

In addition, if a business is migrating users to your platform, chances are those users already have a password in the client's system, and don't want to set up another one in yours--therefore, your clients will ask for Single Sign-On.

How to build robust CSV Migrations quickly

Now that we're in a CSV Migration mental model, we have a clear path forward to serve our first clients, and we can pick and choose the pieces we need in the MVP.
Here are some general technical features needed to complete a CSV Migration for users--run these by your software developer to decide which ones you can implement with the resources you have:

You'll need to transport and store the CSV files themselves--usually done through something called SFTP
You'll need to process the CSV file using a background job, to avoid timeouts and errors within the UI of your MVP
You'll need a provisioning system to create users "headless"--without the UI that's currently in your MVP, and without user intervention
The CSV Migration records need to be in their own table in your database, so that you can manually intervene to fix records with errors of any kind
(optional) If the import conflicts need to be resolved by a non-technical, you might need to make a UI within your MVP to do so
(optional) The client will usually ask for Single Sign-On, which you can implement with something called SAML--but this can usually be delayed until after the pilot

How does this make it quicker to build a CSV Import?

When you thought you were building only one feature (just a CSV Import), you actually need six. Indeed--building out this migration capability will take a lot more effort, money, and time that you originally expected.

If you are able to overcome that feeling of overwhelm, and scrape together the resources necessary to execute this, these are the rewards you'll reap:

when you mention things like SFTP and SAML (or their non-technical versions "Single Sign-On" and "Secure File Server") you'll be viewed as a subject matter expert
when you give your pilot clients a launch date, it'll be accurate, because you already know the major pieces to implement
you'll avoid that feeling of terror, when a client uploads their first CSV, and it errors--because you already have the architecture in place to handle all the concerns of the migration
you'll never have to spend time and money re-writing a CSV Import feature that was built correctly as a CSV Migration in the first place

Sometimes the long road is indeed faster.

Good luck!