Ninety percent of what we do is putting data on a server. The other ninety percent is wondering where the hell it went.

"ETL" stands for "Extract, Transform, Load." Or something like that. It describes the process of getting useful, structured data out of an unstructured resource and storing it in some other form on a resource where it can be readily used. Did that make any sense? No? Ok...

Imagine you have two computers. One keeps a list of people who owe money and another keeps a list of payments. You might have an ETL process for having the payments machine tell the other machine who's paid each month. This is going to involve a lot of really big files and will wind up pretty annoying, but whatever. That's life. Let's talk about the challenge.

Today's challenge

ETL systems often take on a particular form: a pipeline with many inputs and one output. You might get your data in the form of a CSV, a JSON file, an XML file, a fixed-length file, or what have you, but you really only want the data they store, and you don't care what form it takes. That's why today's challenge has two forms.

Feel free to pick just one or to do both separately. In a perfect world, you'll write a single ETL consumer that can accept data from both on the basis of some kind of IEtl interface or something. Here's your data:

  1. Json format
  2. Csv format

Neither of these are likely to self-destruct any time soon. Your mission, should you choose to accept it, is to answer the following questions on the basis of these data sets:

  1. Who is the most improved player in this year's Over 60 bracket?
  2. Who is the best rookie in the junior (Under 50) division?
  3. What is the average score of an octogenarian?
  4. Who has the lowest score overall?

Feel free to answer just one question, or all of them. Note that the data sets are not identical, so you'll get different answers from each data set.

Answers

etl-statifier [master●] base64 <(cargo run --release -- data.*)
    Finished release [optimized] target(s) in 0.0 secs
     Running `target/release/etl-statifier data.csv data.json`
UGF0aDogZGF0YS5jc3YKMS4gTm8gZWxpZ2libGUgcGxheWVycy4KMi4gQ2xlbWVudCBMYXdsb3IKMy4gNTEuMjgKNC4gRnJhbmtsaW4gUHJpdmV0dGUKClBhdGg6IGRhdGEuanNvbgoxLiBKYXJyZWQgQXJ0aHVyCjIuIEphcXVlbHluIENhc2hpb24KMy4gNTAuNDYKNC4gRXN0ZWxhIFNpZ3VlbnphCgpPdmVyYWxsOgoxLiBKYXJyZWQgQXJ0aHVyCjIuIENsZW1lbnQgTGF3bG9yCjMuIDUwLjk0CjQuIEZyYW5rbGluIFByaXZldHRlCg==

Lemme know if you decide I'm wrong. :)