| Task | #18 |
|---|---|
| Position | 8 |
| Score | 589,911 |
| Best Score | 438,832 |
| Points | 0.744 |
Problem Statement
You have a data stream of uint32 numbers in binary representation sending to
STDIN. The byte order is little endian. Encoded in the stream are two square
N by N matrices in row-major order. They are encoded back-to-back.
The size of N is fixed to 2000.
Peform matrix multiplication on these two matrices and output the resulting square matrix.
Input Generator
distcheck results:
p = 0.644444 (256 bins)
p = 0.531287 (512 bins)
p = 0.450448 (1024 bins)
p = 0.0497906 (2048 bins)
Didn’t analyze further, but like Large Integer Multiplication I think there’s some small skew for overflow reasons. For testing, I used a uniform random number generator.
Solution Sketch
Blocking + a SIMD kernel for block multiplication. I used a 5x16 block.
I did not spend as much time optimizing this as I should have, and most of my code was copy-pasted or generated.
Thoughts
Maybe I should have spent more time on this? Seemed boring, nobody does integer matrix multiplication and there’s more benefit to optimizing GEMM on an accelerator (GPU/TPU) instead of squeezing out performance on the CPU.