XML to JSON

Task	#6
Position	1
Score	26,922
Best Score	26,922
Points	1.000

Problem Statement

You have a XML file in STDIN with persons information like

<?xml version="1.0" encoding="UTF-8"?>
<persons>
    <person id="1512376">
        <age>30</age>
	<height>169.1</height>
        <married>true</married>
        <phone code="+6"><number>1283603279</number></phone>
        <phone code="+6"><number>1659964668</number></phone>
    </person>
    ...
</persons>

Your task is to convert for the shortest time persons from the XML to persons in JSON format like

{
    "id": 1512376,
    "age": 30,
    "height": 169.1,
    "married": true,
    "phones": [
        {
            "code": "+6",
            "number": 1283603279
        },
        {
            "code": "+6",
            "number": 1659964668
        }
    ]
}
{...}
...

The order of the persons must be preserved. Don’t print phones field If array of phones is empty. The number of persons is 1_000_000. The maximum numbers of phones is 3.

Input Generator

Reverse engineering the distributions for parameters in the XML gives the following ranges:

records: 1000000
  with age: 800225
  with height: 1000000
  with married: 1000000

Ranges:
  age: [18,67]
  height: [150,200]
  id: [0,999999]
  code_len: [2,2]
  phone_len: [10,10]

Param	Min	Max
age	18	67
height	150	200
code_len	2	2
phone_len	10	10

Roughly 80% of the records have an age. All the other parameters are present in 100% of the records. I’m not sure what the distribution of number of phone numbers is, I modeled it as uniform random in {0,1,2,3}. The distribution of phone numbers themselves does not matter.

Solution Sketch

Like parse JSON, this was a big state machine with some clever shuffling. At a high level, It does load → shuffle → blend → store, where the state machine determines the shuffle/blend masks to use.

The reason we can get away with this is because it’s ok for the JSON to contain whitespace. This gives us fixed-length targets that we can shuffle and blend into. You have to come up with the targets, and do some casework for the shuffles/blends required based on what part of the record you’re reading for the optimal solution

Thoughts

Most parsing problems are a pain in the ass and extremely grind-y. I don’t think that changes with this. The state machine in this case had umpteen edge cases, all of which need to be tested out otherwise it’ll break in weird ways. The fact that not all records have an age is something I found out after submitting my mega state machine and poking around the failing output. This required a major rewrite of the state machine. Not my idea of a fun weekend project.

Aniruddha Deb

Explorer

XML to JSON

Problem Statement

Input Generator

Solution Sketch

Thoughts

Graph View

Table of Contents