# Publications - Marcin Copik

### Journal Article

**Work-stealing prefix scan: Addressing load imbalance in large-scale image registration**IEEE Transactions on Parallel and Distributed Systems, Volume 33(3), pp. 523-535, 2022.@article{Copik2022:900, author = "Marcin Copik and {Tobias } Grosser and {Torsten } Hoefler and Paolo Bientinesi and Benjamin Berkels", title = "Work-stealing prefix scan: Addressing load imbalance in large-scale image registration", journal = "IEEE Transactions on Parallel and Distributed Systems", year = 2022, volume = 33, number = 3, pages = "523-535", url = "https://arxiv.org/pdf/2010.12478.pdf" }

abstractwebPDFbibtexParallelism patterns (e.g., map or reduce) have proven to be effective tools for parallelizing high-performance applications. In this paper, we study the recursive registration of a series of electron microscopy images – a time consuming and imbalanced computation necessary for nano-scale microscopy analysis. We show that by translating the image registration into a specific instance of the prefix scan, we can convert this seemingly sequential problem into a parallel computation that scales to over thousand of cores. We analyze a variety of scan algorithms that behave similarly for common low-compute operators and propose a novel work-stealing procedure for a hierarchical prefix scan. Our evaluation shows that by identifying a suitable and well-optimized prefix scan algorithm, we reduce time-to-solution on a series of 4,096 images spanning ten seconds of microscopy acquisition from over 10 hours to less than 3 minutes (using 1024 Intel Haswell cores), enabling derivation of material properties at nanoscale for long microscopy image series.

### Peer Reviewed Conference Publication

**The Generalized Matrix Chain Algorithm**Proceedings of 2018 IEEE/ACM International Symposium on Code Generation and Optimization, pp. 11, 24 February 2018.@inproceedings{Barthels2018:130, author = "Henrik Barthels and Marcin Copik and Paolo Bientinesi", title = "The Generalized Matrix Chain Algorithm", booktitle = "Proceedings of 2018 IEEE/ACM International Symposium on Code Generation and Optimization", year = 2018, pages = 11, address = "Vienna, Austria", month = feb, url = "https://arxiv.org/pdf/1804.04021.pdf" }

abstractwebPDFbibtexIn this paper, we present a generalized version of the matrix chain algorithm to generate efficient code for linear algebra problems, a task for which human experts often invest days or even weeks of works. The standard matrix chain problem consists in finding the parenthesization of a matrix product $M := A_1 A_2 \cdots A_n$ that minimizes the number of scalar operations. In practical applications, however, one frequently encounters more complicated expressions, involving transposition, inversion, and matrix properties. Indeed, the computation of such expressions relies on a set of computational kernels that offer functionality well beyond the simple matrix product. The challenge then shifts from finding an optimal parenthesization to finding an optimal mapping of the input expression to the available kernels. Furthermore, it is often the case that a solution based on the minimization of scalar operations does not result in the optimal solution in terms of execution time. In our experiments, the generated code outperforms other libraries and languages on average by a factor of about 5. The motivation for this work comes from the fact that---despite great advances in the development of compilers---the task of mapping linear algebra problems to optimized kernels is still to be done manually. In order to relieve the user from this complex task, new techniques for the compilation of linear algebra expressions have to be developed.