Task#6
Position1
Score26,922
Best Score26,922
Points1.000

Problem Statement

You have a XML file in STDIN with persons information like

<?xml version="1.0" encoding="UTF-8"?>
<persons>
    <person id="1512376">
        <age>30</age>
	<height>169.1</height>
        <married>true</married>
        <phone code="+6"><number>1283603279</number></phone>
        <phone code="+6"><number>1659964668</number></phone>
    </person>
    ...
</persons>

Your task is to convert for the shortest time persons from the XML to persons in JSON format like

{
    "id": 1512376,
    "age": 30,
    "height": 169.1,
    "married": true,
    "phones": [
        {
            "code": "+6",
            "number": 1283603279
        },
        {
            "code": "+6",
            "number": 1659964668
        }
    ]
}
{...}
...

The order of the persons must be preserved. Don’t print phones field If array of phones is empty. The number of persons is 1_000_000. The maximum numbers of phones is 3.

Input Generator

Reverse engineering the distributions for parameters in the XML gives the following ranges:

records: 1000000
  with age: 800225
  with height: 1000000
  with married: 1000000

Ranges:
  age: [18,67]
  height: [150,200]
  id: [0,999999]
  code_len: [2,2]
  phone_len: [10,10]
ParamMinMax
age1867
height150200
code_len22
phone_len1010

Roughly 80% of the records have an age. All the other parameters are present in 100% of the records. I’m not sure what the distribution of number of phone numbers is, I modeled it as uniform random in {0,1,2,3}. The distribution of phone numbers themselves does not matter.

Solution Sketch

Like parse JSON, this was a big state machine with some clever shuffling. At a high level, It does load shuffle blend store, where the state machine determines the shuffle/blend masks to use.

The reason we can get away with this is because it’s ok for the JSON to contain whitespace. This gives us fixed-length targets that we can shuffle and blend into. You have to come up with the targets, and do some casework for the shuffles/blends required based on what part of the record you’re reading for the optimal solution

Thoughts

Most parsing problems are a pain in the ass and extremely grind-y. I don’t think that changes with this. The state machine in this case had umpteen edge cases, all of which need to be tested out otherwise it’ll break in weird ways. The fact that not all records have an age is something I found out after submitting my mega state machine and poking around the failing output. This required a major rewrite of the state machine. Not my idea of a fun weekend project.