Task#14
Position3
Score18,105
Best Score13,449
Points0.743

Problem Statement

You have a datastream of datetimes in RFC3339 format sending to STDIN. One line - one datetime. For example:

2017-05-04T14:31:30-03:00
2046-06-23T11:51:56-06:00
2031-08-14T13:18:38+06:00
2048-04-14T05:55:06-09:00
1980-08-28T00:43:03+02:00

Try to find the sum(int64) of UTC timestamps for these datetimes for the shortest time. Number of lines is 5 millions.

Datetimes between 1950-01-01T00:00:00 and 2050-12-31T23:59:59

Input Generator

Did not do the uniformity test on datetimes, because my solution does not depend on it. Timezones are also generated uniform randomly with 15-minute offsets.

The range of datetimes is slightly larger than stated in the problem: I’ve seen dates in the range [01-01-1949, 31-12-2052].

Solution Sketch

Use SIMD for shuffling a datetime around and extracting the year, month, day, minutes, seconds and timezone offset (as minutes) out. You can use shuffles and maddubs for coalescing the digits into numbers. Once this is done:

  1. accumulate the days, minutes, seconds and offsets into an accumulator. These are linear with time, i.e. 19 days in March will add up to 19 days in any other month. Same for minutes/seconds/offsets. You’ll have to take care of overflow if this is done in SIMD with u32’s, I reset the counter once I’m done with ~half the dates.
  2. Histogram the year/month together. You’ll need to allocate an array of size 103 * 12 for this, and take care to offset year from 1949 and month from January You’ll have to take care of alignment because an offset of 0 is Z and has nothing after it, and do some casework with the shuffles. You’ll also have to take care of alignment for the

Once the accumulation is done, compute the (signed) number of days since the epoch for each of the 103 * 12 months there are in the range, multiply with the number of (year, month) pairs encountered in the histogram, and add to the answer as seconds. Then add the accumulated minutes, seconds and offset minutes as seconds. You should get the right answer.

The reason this is faster than a naive solution is because the year/month contribution computation is the expensive part, and it needs a couple multiplies/divides in a loop. Those kill performance, so doing them at the end for 1236 cases is less expensive than doing them for 5M. It’s the same as memoization.

Implement the usual tricks for fast memory access; prefetching etc. I couldn’t get around aligned access because of the weird lengths; maybe more casework could make all my loads aligned, but the state machine for something like that would be extremely complicated

Thoughts

Debugging the logic was painful; you have a big number that does not match another big number. A trick that helps is to debug-print the accumulators for your code and for a reference solution to find out what part of the logic is going wrong.

Since Z’s are rare, an optimization is to do aligned loads until you encounter a Z, and when you do, move to unaligned ones. I tried this out, but couldn’t see much gain, possibly because Z’s are relatively more dense.