Scientific Publications

Here, NLAFET scientific journal articles and peer-reviewed conference papers are listed.

Each item includes information about authors, title of the publication (in italics), current publication status and DoI code or http address, if available. Listing in alphabetic order with respect to family name of the first author:

  • Alan Ayala, Xavier Claeys, Laura Grigori: Affine low-rank approximations. Journal ofScientific Computing, 79:1135-1160, 2019. https://doi.org/10.1007/s10915-018-0885-5
  • Björn Adlerborn, Lars Karlsson, and Bo Kågström. Distributed One-Stage Hessenberg-Triangular Reduction with Wavefront Scheduling. SIAM J. Sci. Comput., 40 (2):C157-C180, 2018.  https://doi.org/10.1137/16M1103890
  • Zvonimir Bujanović, Lars Karlsson, Daniel Kressner: A Householder-Based Algorithm for Hessenberg-Triangular Reduction. SIAM Journal on Matrix Analysis and Applications, SIAM Publications 2018, Vol. 39, (3) : 1270-1294. https://doi.org/10.1137/17M1153637
  • Jack Dongarra, Mark Gates, Azzam Haidar, Jakub Kurzak, Piotr Luszczek, Panruo Wu, Ichitaro Yamazaki, Asim YarKhan, Maksims Abalenkovs, Neigin Bagherpour, Sven Hammarling, Jakub Sistek, David Stevens, Mawussi Zounon, Samuel Relton: PLASMA: Parallel Linear Algebra Software for Multicore Using OpenMP. ACM Transactions on Mathematical Software. Accepted July 24, 2018
  • J. Dongarra, S. Hammarling, N.J. Higham, S.D. Relton, and  M. Zounon (2017) Optimized Batched Linear Algebra for Modern Architectures. In Rivera F., Pena T., Cabaleiro J. (eds) Euro-Par 2017: Parallel Processing. Euro-Par 2017. LNCS 10417, pp 511-522, Springer, Cham. https://doi.org/10.1007/978-3-319-64203-1_37
  • J. Dongarra, S. Hammarling, N. J. Higham, S. D. Relton, P. Valero-Lara, and M. Zounon. The Design and Performance of Batched BLAS on Modern High-Performance Computing Systems. Procedia Computer Science, 108, pp.495-504, 2017. https://doi.org/10.1016/j.procs.2017.05.138
  • Jack Dongarra, Sven Hammarling, Nicholas J. Higham, Samuel D. Relton, and Mawussi Zounon, Creating a Standardised Set of Batched BLAS Routines. Proceedings of the Fourth Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE4, 2016), Gabrielle Allen, Jeffrey Carver et al, volume 1686, CEUR Workshop Proceedings.
  • Iain S. Duff, Florent Lopez and Stojce Nakov (2018) Sparse Direct Solution on Parallel Computers. In M. Al-Baali et al (eds) Numerical Analysis and Optimization: NAOIV 2017, Springer Proceedings in Mathematics & Statistics 235. https://doi.org/10.100/978-3-319-90026-1_4
  • Iain S. Duff and Florent Lopez (2018) Experiments with Sparse Cholesky Using a Parametrized Task Graph Implementation. In Wyrzykowski, R. et al (eds) Parallel Processing and Applied Mathematics, PPAM 2017, LNCS 10777, pp. 197–206. Springer, Cham. https://doi.org/10.1007/978-3-319-78024-5_18
  • Iain S. Duff, Florent Lopez and Jonathan Hogg. Experiments with Sparse Cholesky Using a Sequential Task-Flow Implementation. Numerical Algebra, Control and Optimization (NACO),  8 (2): pp 235-258, June 2018. http://dx.doi.org/10.3934/naco.2018014
  • Mahmoud Eljammaly, Lars Karlsson, and Bo Kågström (2018) On the Tunability of a New Hessenberg Reduction Algorithm using Parallel Cache Assignment. In Wyrzykowski, R. et al (eds) Parallel Processing and Applied Mathematics, PPAM 2017, LNCS 10777, pp. 579–589. Springer, Cham. https://doi.org/10.1007/978-3-319-78024-5
  • Mahmoud Eljammaly, Lars Karlsson, and Bo Kågström. An Auto-Tuning Framework for a
    NUMA-Aware Hessenberg Reduction Algorithm. In Proc. International Conference on Performance Engineering, ICPE’18. Assoc. Computing Machinery, 2018. https://doi.org/10.1145/3185768.3186304
  • Robert Granat, Bo Kågström, Daniel Kressner, and Meiyue Shao. ALGORITHM 953: Parallel Library Software for the Multishift QR Algorithm with Aggressive Early Deflation. ACM Trans. Math. Software, 41(4): Article 29:1–23, 2015. https://doi.org/10.1145/2699471
  • Laura Grigori, Sebastien Cayrols, and James W. Demmel. Low Rank Approximation of a Sparse Matrix Based on LU Factorization with Column and Row Tournament Pivoting. SIAM J. Sci. Comput., 40 (2):C181-C209, 2018. https://doi.org/10.1137/16M1074527
  • A. Haidar, A. Abdelfattah, M. Zounon, S. Tomov, and J. Dongarra, A Guide for Achieving High Performance with Very Small Matrices on GPU: A Case Study of Batched LU and Cholesky Factorizations. In IEEE Transactions on Parallel and Distributed Systems, vol. 29, no. 5, pp. 973-984, May 1 2018. https://doi.org/10.1109/TPDS.2017.2783929
  • Azzam Haidar, Ahmad Abdelfattah, Mawussi Zounon, Srikara Pranesh, Panruo Wu, Stanimire Tomov, and Jack Dongarra, The Design of Fast and Energy-Efficient Linear Solvers: On The potential Of Half Precision Arithmetic And Iterative Refinement Techniques. Computational Science – ICCS 2018 – 18th International Conference, https://doi.org/10.1007/978-3-319-93698-7
  • Azzam Haidar, Stanimire Tomov, Jack Dongarra, Nick Higham, Harnessing GPU’s Tensor Cores Fast FP16 Arithmetic to Speedup Mixed-Precision Iterative Refinement Solvers. The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC18)
  • Laura Grigori, Sébastien Cayrols, and James W. Demmel, Low rank approximation of a sparse matrix based on LU factorization with column and row tournament pivoting. SIAM Journal on Scientific Computing, 40(2):181-209, 2018 https://doi.org/10.1137/16M1074527
  • W. Liu (RAL and Univ. of Copenhagen), A. Li (Eindhoven), J. Hogg, I. Duff (RAL), B. Vinter
    (Univ. Copenhagen), A Synchronization-Free Algorithm for Parallel Sparse Triangular Solves. Proceedings of Euro-Par 2016, LNCS 9833, pp 617-630. Springer international Publishing, 2016. https://doi.org/10.1007/978-3-319-43659-3_45
  • Weifeng Liu, Ang Li, Jonathan Hogg, Iain S. Duff, Brian Vinter, Fast synchronization-free algorithms for parallel sparse triangular solves. Concurrency and Computation: Practice and Experience, 2017, vol 29, no 21. John Wiley & Sons. https://doi.org/10.1002/cpe.4244
  • Carl Christian Kjelgaard Mikkelsen and Lars Karlsson (2018) Blocked Algorithms for Robust Solution of Triangular Systems. In Wyrzykowski, R. et al (eds) Parallel Processing and Applied Mathematics, PPAM 2017, LNCS 10777, pp. 68–78. Springer, Cham.
    https://doi.org/10.1007/978-3-319-78024-5_7
  • Carl Christian Kjelgaard Mikkelsen, Angelika Schwarz, and Lars Karlsson, Parallel robust solution of triangular linear systems. Accepted October 5th 2018 for publication in Concurrency and Computing: Practice and Experience, a special issue dedicated to PPAM 2017. https://doi.org/10.1002/cpe.5064
  • Mirko Myllykoski (2018) A Task-Based Algorithm for Reordering the Eigenvalues of a Matrix
    in Real Schur Form. In Wyrzykowski, R. et al (eds) Parallel Processing and Applied Mathematics, PPAM 2017, LNCS 10777, pp. 207–216. Springer, Cham. https://doi.org/10.1007/978-3-319-78024-5
  • Mirko Myllykoski, Tuomo Rossi, Jari Toivanen, On solving separable block tridiagonal linear systems using a GPU implementation of radix-4 PSCR method. Journal of Parallel and Distributed Computing, Elsevier 2018, Vol. 115: 56-66. https://doi.org/10.1016/j.jpdc.2018.01.004
  • J. Papež, L. Grigori and R. Stompor, Solving linear equations with messenger-field and conjugate gradient techniques: An application to CMB data analysis. Astronomy & Astrophysics, Volume 620, 2018, Article number A59. https://doi.org/10.1051/0004-6361/201832987
  • Angelika Schwarz, Lars Karlsson, Scalable eigenvector computation for the nonsymmetric eigenvalue problem. Parallel Computing, Vol 85, pp131-140. Elsevier July 2019. https://doi.org/10.1016/j.parco.2019.04.001
  • I. Yamazaki, J. Kurzak, P. Wu, M. Zounon, and J. Dongarra, Symmetric Indefinite Linear Solver using OpenMP Task on Multicore Architecture. In IEEE Transactions on Parallel and Distributed Systems. https://doi.org/10.1109/TPDS.2018.2808964

Leave a Reply