Your boss has asked you to import a bunch of someone else's data into your perfect, precious, pristine database. Catch is, their data is hot garbage. What do you do?

The challenge

You need to validate this other joker's data so that we can tell them what's wrong with it and they can resubmit.

This challenge is very much rooted in the real world, so it's not really about puzzling out the solution. Solutions are readily available for those of you who understand or care to use existing libraries, and that's perfectly all right. The idea here is to demonstrate concepts surrounding data validation and require the author to create a workable solution for the entirety of this problem space.

...Consider this a "real job simulation" challenge. Because this one is kind of realistic, it has several parts.

First, deserialize import data

The first thing you'll have to do is deserialize the import data. I'm providing it as JSON. Not all fields are actually required. In particular, ignore the ID field in the import data. First off, we don't care about the foreign identifier. Second, not all records include an identifier. Third, some of them are duplicated. In short, the entire field is a clusterfuck and you should ignore it.

The other fields are all strings, which means that JSON deserializers will apply no validation to the rest of them. These include:

  • Name
  • Email
  • Company
  • Birth date

We want all of those.

Second, validate imported data

Each record should have a first and last name. This is primarily because we said so; I understand this doesn't always work out all that well. It doesn't matter. Our database schema is infallible!

Email addresses should look valid. We can make some strong assertions about addresses because we know these addresses should be from a given set of addresses. As such, we can at least assert that they should contain an @ sign and have a dot-something at the end.

Each record must list a company. All of these fields are required, obviously, but that's the only requirement for this one. We have a list of companies we could check against, but that story didn't make the sprint.

Birth dates must be valid. How else will we know when to spam these people with coupons for cakes and party favors?

Third, produce usable output

What is "usable" is entirely down to the circumstances, but for our purposes we will say that this is usable: <record number>: <bad field names>

13: name, email
14: birthDate
15: company

If you want to be able to check your results automatically (I use a diff tool for this stuff), be sure you print the names of bad fields in order of appearance: name, email, company, birthDate.

The input

You can find your input data here.

It was generated using this library and the following code:

extern crate person;
extern crate rand;
extern crate serde_json;

use person::*;
use rand::Rng;
use serde_json as json;

fn main() -> Result<(), json::Error> {
    let entities: Vec<Entity<Person>> = (1..101)
        .map(|id| Entity::new(id, rand::thread_rng().gen()))

    println!("{}", json::to_string_pretty(&entities)?);

Peeking might give you some hints as to how to proceed, so maybe don't do it if you worry about that kind of thing--but, really, it's just a struct with some strings in it.

And here's your answer, as usual. If your results differ from mine, hit me up! Maybe My code was wrong? :)