Scientific Publications

Here, NLAFET scientific journal articles and peer-reviewed conference papers are listed.

Each item includes information about authors, title of the publication (in italics), current publication status and DoI code or http address, if available. Listing in alphabetic order with respect to family name of the first author:

  • Björn Adlerborn, Lars Karlsson, and Bo Kågström. Distributed One-Stage Hessenberg-Triangular Reduction with Wavefront Scheduling. SIAM J. Sci. Comput., 40 (2):C157-C180, 2018.  https://doi.org/10.1137/16M1103890
  • J. Dongarra, S. Hammarling, N.J. Higham, S.D. Relton, and  M. Zounon (2017) Optimized Batched Linear Algebra for Modern Architectures. In Rivera F., Pena T., Cabaleiro J. (eds) Euro-Par 2017: Parallel Processing. Euro-Par 2017. LNCS 10417, pp 511-522, Springer, Cham.
    https://doi.org/10.1007/978-3-319-64203-1_37
  • J. Dongarra, S. Hammarling, N. J. Higham, S. D. Relton, P. Valero-Lara, and M. Zounon. The Design and Performance of Batched BLAS on Modern High-Performance Computing Systems. Procedia Computer Science, 108, pp.495-504, 2017.
    https://doi.org/10.1016/j.procs.2017.05.138
  • Iain S. Duff, Florent Lopez and Stojce Nakov (2018) Sparse Direct Solution on Parallel Computers. In M. Al-Baali et al (eds) Numerical Analysis and Optimization: NAOIV 2017, Springer Proceedings in Mathematics & Statistics 235.
    https://doi.org/10.100/978-3-319-90026-1_4
  • Iain S. Duff and Florent Lopez (2018) Experiments with Sparse Cholesky Using a Parametrized Task Graph Implementation. In Wyrzykowski, R. et al (eds) Parallel Processing and Applied Mathematics, PPAM 2017, LNCS 10777, pp. 197–206. Springer, Cham.
    https://doi.org/10.1007/978-3-319-78024-5_18
  • Iain S. Duff, Florent Lopez and Jonathan Hogg. Experiments with Sparse Cholesky Using a Sequential Task-Flow Implementation. Numerical Algebra, Control and Optimization (NACO),   8 (2): pp 235-258, June 2018. http://dx.doi.org/10.3934/naco.2018014
  • Mahmoud Eljammaly, Lars Karlsson, and Bo Kågström (2018) On the Tunability of a New Hessenberg Reduction Algorithm using Parallel Cache Assignment. In Wyrzykowski, R. et al (eds) Parallel Processing and Applied Mathematics, PPAM 2017, LNCS 10777, pp. 579–589. Springer, Cham. DoI 10.1007/978-3-319-78024-5
  • Mahmoud Eljammaly, Lars Karlsson, and Bo Kågström. An Auto-Tuning Framework for a
    NUMA-Aware Hessenberg Reductiton Algorithm. In Proc. International Conference on Performance Engineering, ICPE’18. Assoc. Computing Machinery, 2018 (To appear).
  • Robert Granat, Bo Kågström, Daniel Kressner, and Meiyue Shao. ALGORITHM 953: Parallel
    Library Software for the Multishift QR Algorithm with Aggressive Early Deflation. ACM Trans. Math. Software, 41(4): Article 29:1–23, 2015. https://doi.org/10.1145/2699471
  • Laura Grigori, Sebastien Cayrols, and James W. Demmel. Low Rank Approximation of a Sparse Matrix Based on LU Factorization with Column and Row Tournament Pivoting. SIAM J. Sci. Comput., 40 (2):C181-C209, 2018. https://doi.org/10.1137/16M1074527
  • A. Haidar, A. Abdelfattah, M. Zounon, S. Tomov, and J. Dongarra, A Guide for Achieving High Performance with Very Small Matrices on GPU: A Case Study of Batched LU and Cholesky Factorizations. In IEEE Transactions on Parallel and Distributed Systems, vol. 29, no. 5, pp. 973-984, May 1 2018. doi: 10.1109/TPDS.2017.2783929
  • W. Liu (RAL and Univ. of Copenhagen), A. Li (Eindhoven), J. Hogg, I. Duff (RAL), B. Vinter
    (Univ. Copenhagen), A Synchronization-Free Algorithm for Parallel Sparse Triangular Solves. Proceedings of Euro-Par 2016, LNCS 9833, pp 617-630. Springer international Publishing, 2016. DoI 10.1007/978-3-319-43659-3_45
  • Carl Christian Kjelgaard Mikkelsen and Lars Karlsson (2018) Blocked Algorithms for Robust Solution of Triangular Systems. In Wyrzykowski, R. et al (eds) Parallel Processing and Applied Mathematics, PPAM 2017, LNCS 10777, pp. 68–78. Springer, Cham.
    https://doi.org/10.1007/978-3-319-78024-5_7
  • Carl Christian Kjelgaard Mikkelsen, Angelika Schwarz, and Lars Karlsson, Parallel robust solution of triangular linear systems. Accepted October 5th 2018 for publication in Concurrency and Computing: Practice and Experience, a special issue dedicated to PPAM 2017.
  • Mirko Myllykoski (2018) A Task-Based Algorithm for Reordering the Eigenvalues of a Matrix
    in Real Schur Form. In Wyrzykowski, R. et al (eds) Parallel Processing and Applied Mathematics, PPAM 2017, LNCS 10777, pp. 207–216. Springer, Cham. DoI 10.1007/978-3-319-78024-5
  • J. Papež, L. Grigori and R. Stompor, Solving linear equations with messenger-field and conjugate gradient techniques: An application to CMB data analysis. Astronomy & Astrophysics, Volume 620, 2018, Article number A59. https://doi.org/10.1051/0004-6361/201832987
  • I. Yamazaki, J. Kurzak, P. Wu, M. Zounon, and J. Dongarra, Symmetric Indefinite Linear Solver using OpenMP Task on Multicore Architecture. In IEEE Transactions on Parallel and Distributed Systems. doi: 10.1109/TPDS.2018.2808964 (To appear)

Leave a Reply