Task#18
Position8
Score589,911
Best Score438,832
Points0.744

Problem Statement

You have a data stream of uint32 numbers in binary representation sending to STDIN. The byte order is little endian. Encoded in the stream are two square N by N matrices in row-major order. They are encoded back-to-back.

The size of N is fixed to 2000.

Peform matrix multiplication on these two matrices and output the resulting square matrix.

Input Generator

distcheck results:

p = 0.644444 (256 bins)
p = 0.531287 (512 bins)
p = 0.450448 (1024 bins)
p = 0.0497906 (2048 bins)

Didn’t analyze further, but like Large Integer Multiplication I think there’s some small skew for overflow reasons. For testing, I used a uniform random number generator.

Solution Sketch

Blocking + a SIMD kernel for block multiplication. I used a 5x16 block.

I did not spend as much time optimizing this as I should have, and most of my code was copy-pasted or generated.

Thoughts

Maybe I should have spent more time on this? Seemed boring, nobody does integer matrix multiplication and there’s more benefit to optimizing GEMM on an accelerator (GPU/TPU) instead of squeezing out performance on the CPU.