| Task | #6 |
|---|---|
| Position | 1 |
| Score | 26,922 |
| Best Score | 26,922 |
| Points | 1.000 |
Problem Statement
You have a XML file in STDIN with persons information like
<?xml version="1.0" encoding="UTF-8"?>
<persons>
<person id="1512376">
<age>30</age>
<height>169.1</height>
<married>true</married>
<phone code="+6"><number>1283603279</number></phone>
<phone code="+6"><number>1659964668</number></phone>
</person>
...
</persons>
Your task is to convert for the shortest time persons from the XML to persons in JSON format like
{
"id": 1512376,
"age": 30,
"height": 169.1,
"married": true,
"phones": [
{
"code": "+6",
"number": 1283603279
},
{
"code": "+6",
"number": 1659964668
}
]
}
{...}
...
The order of the persons must be preserved. Don’t print phones field If array of phones is empty. The number of persons is 1_000_000. The maximum numbers of phones is 3.
Input Generator
Reverse engineering the distributions for parameters in the XML gives the following ranges:
records: 1000000
with age: 800225
with height: 1000000
with married: 1000000
Ranges:
age: [18,67]
height: [150,200]
id: [0,999999]
code_len: [2,2]
phone_len: [10,10]
| Param | Min | Max |
|---|---|---|
| age | 18 | 67 |
| height | 150 | 200 |
| code_len | 2 | 2 |
| phone_len | 10 | 10 |
Roughly 80% of the records have an age. All the other parameters are present in 100% of the records. I’m not sure what the distribution of number of phone numbers is, I modeled it as uniform random in {0,1,2,3}. The distribution of phone numbers themselves does not matter.
Solution Sketch
Like parse JSON, this was a big state machine with some clever shuffling. At a high level, It does load → shuffle → blend → store, where the state machine determines the shuffle/blend masks to use.
The reason we can get away with this is because it’s ok for the JSON to contain whitespace. This gives us fixed-length targets that we can shuffle and blend into. You have to come up with the targets, and do some casework for the shuffles/blends required based on what part of the record you’re reading for the optimal solution
Thoughts
Most parsing problems are a pain in the ass and extremely grind-y. I don’t think that changes with this. The state machine in this case had umpteen edge cases, all of which need to be tested out otherwise it’ll break in weird ways. The fact that not all records have an age is something I found out after submitting my mega state machine and poking around the failing output. This required a major rewrite of the state machine. Not my idea of a fun weekend project.