Optimizations

Constexpr LUT

Using constexpr to create a LUT is clean and incurs no runtime cost. Here’s a small example of fast unsigned int→string conversion using constexpr LUTs:

using lut_t = std::array<uint64_t, 100'000>;
 
static constexpr lut_t construct_lut(char pad) {
 
    lut_t lut {};
    for (int i = 0; i < 100'000; i++) {
        int j = i;
        int n_digits = 0;
        uint64_t lut_val = 0;
        while (j > 0) {
            lut_val = (lut_val << 8) | uint64_t((j % 10) + '0');
            j /= 10;
            n_digits++;
        }
        for (; n_digits < 5; n_digits++) {
            lut_val = (lut_val << 8) | uint64_t(pad);
        }
        lut[i] = lut_val;
    }
    return lut;
}
 
static constexpr lut_t lo_lut = construct_lut('0');
static constexpr lut_t hi_lut = construct_lut(' ');
 
void to_fixed_len_string(char* buffer, uint32_t value) {
	uint32_t div = value / 100'000;
	uint32_t mod = value % 100'000;
	uint64_t n1, n2;
	if (div == 0) [[unlikely]] {
		n1 = &hi_lut[mod];
		n2 = 0x2020202020ULL;
	}
	else {
		n1 = lo_lut[mod];
		n2 = hi_lut[div];	
	}
	memcpy(buffer, &n2, 5);
	memcpy(buffer+5, &n1, 5);
}

while writing this, I came across an interesting constexpr quirk: array dereferencing contributes to constexpr operation count. If instead of lut[i] = lut_val at the end, you directly assign to lut[i], you will hit your constexpr ops limit. You could get around this by increasing your constexpr ops limit (I do), but it’s an interesting quirk.

Median Hacking

Since HighLoad uses 9 runs and gets the median runtime, you could make a fast solution with an abysmal pass rate pass by ‘mixing’ it with a slow solution with high pass rate. Toss a slightly head-biased coin (you can generate a random byte and check if it’s greater than 100, for example). If it turns heads, run the fast solution and if it turns tails, run the slow solution.

This was used in both Order Book and Largest Square Submatrix, and in an older version of Unique Strings (v2). It’s not the cleanest trick in the book, but I can’t see how you can be competitive in Largest Square Submatrix without it.

Integer to/from String conversions

These are all varying degrees of useful

DateTime parsing

My old DateTime solution (before I did bucketing) used a vectorized version of the counting the days algorithm.

Extracting Data

HighLoad’s stderr output is limited, so the best way of extracting a large amount of data is to base64 encode raw binary data and print it. This is what the distribution checker does when the check fails. Reading the data back and plotting it is trivial:

import numpy as np
import matplotlib.pyplot as plt
import base64
 
data = base64.b64decode("<data here>")
numbers = np.frombuffer(data, dtype=np.int32)
plt.plot(numbers)

If you have a lot of data, you could compress it. There’s a header-only compression library somewhere in openload, but I recall it being a bit finnnicky.

Other misc stuff

https://users.ece.utexas.edu/~patt/22s.382N/handouts/dicksites0.pdf - optimizing memory bandwidth https://codeforces.com/blog/entry/48798 - FFT intro

Aniruddha Deb

Explorer