Scientific Publications

Here, NLAFET scientific journal articles and peer-reviewed conference papers are listed.

Each item includes information about authors, title of the publication (in italics), current publication status and DoI code or http address, if available. Listing in alphabetic order with respect to family name of the first author:

Alan Ayala, Xavier Claeys, Laura Grigori: Affine low-rank approximations. Journal ofScientific Computing, 79:1135-1160, 2019. https://doi.org/10.1007/s10915-018-0885-5
Björn Adlerborn, Lars Karlsson, and Bo Kågström. Distributed One-Stage Hessenberg-Triangular Reduction with Wavefront Scheduling. SIAM J. Sci. Comput., 40 (2):C157-C180, 2018. https://doi.org/10.1137/16M1103890
Zvonimir Bujanović, Lars Karlsson, Daniel Kressner: A Householder-Based Algorithm for Hessenberg-Triangular Reduction. SIAM Journal on Matrix Analysis and Applications, SIAM Publications 2018, Vol. 39, (3) : 1270-1294. https://doi.org/10.1137/17M1153637
Jack Dongarra, Mark Gates, Azzam Haidar, Jakub Kurzak, Piotr Luszczek, Panruo Wu, Ichitaro Yamazaki, Asim YarKhan, Maksims Abalenkovs, Neigin Bagherpour, Sven Hammarling, Jakub Sistek, David Stevens, Mawussi Zounon, Samuel Relton: PLASMA: Parallel Linear Algebra Software for Multicore Using OpenMP. ACM Transactions on Mathematical Software. Accepted July 24, 2018
J. Dongarra, S. Hammarling, N.J. Higham, S.D. Relton, and M. Zounon (2017) Optimized Batched Linear Algebra for Modern Architectures. In Rivera F., Pena T., Cabaleiro J. (eds) Euro-Par 2017: Parallel Processing. Euro-Par 2017. LNCS 10417, pp 511-522, Springer, Cham. https://doi.org/10.1007/978-3-319-64203-1_37
J. Dongarra, S. Hammarling, N. J. Higham, S. D. Relton, P. Valero-Lara, and M. Zounon. The Design and Performance of Batched BLAS on Modern High-Performance Computing Systems. Procedia Computer Science, 108, pp.495-504, 2017. https://doi.org/10.1016/j.procs.2017.05.138
Jack Dongarra, Sven Hammarling, Nicholas J. Higham, Samuel D. Relton, and Mawussi Zounon, Creating a Standardised Set of Batched BLAS Routines. Proceedings of the Fourth Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE4, 2016), Gabrielle Allen, Jeffrey Carver et al, volume 1686, CEUR Workshop Proceedings.
Iain S. Duff, Florent Lopez and Stojce Nakov (2018) Sparse Direct Solution on Parallel Computers. In M. Al-Baali et al (eds) Numerical Analysis and Optimization: NAOIV 2017, Springer Proceedings in Mathematics & Statistics 235. https://doi.org/10.100/978-3-319-90026-1_4
Iain S. Duff and Florent Lopez (2018) Experiments with Sparse Cholesky Using a Parametrized Task Graph Implementation. In Wyrzykowski, R. et al (eds) Parallel Processing and Applied Mathematics, PPAM 2017, LNCS 10777, pp. 197–206. Springer, Cham. https://doi.org/10.1007/978-3-319-78024-5_18
Iain S. Duff, Florent Lopez and Jonathan Hogg. Experiments with Sparse Cholesky Using a Sequential Task-Flow Implementation. Numerical Algebra, Control and Optimization (NACO), 8 (2): pp 235-258, June 2018. http://dx.doi.org/10.3934/naco.2018014
Mahmoud Eljammaly, Lars Karlsson, and Bo Kågström (2018) On the Tunability of a New Hessenberg Reduction Algorithm using Parallel Cache Assignment. In Wyrzykowski, R. et al (eds) Parallel Processing and Applied Mathematics, PPAM 2017, LNCS 10777, pp. 579–589. Springer, Cham. https://doi.org/10.1007/978-3-319-78024-5
Mahmoud Eljammaly, Lars Karlsson, and Bo Kågström. An Auto-Tuning Framework for a
NUMA-Aware Hessenberg Reduction Algorithm. In Proc. International Conference on Performance Engineering, ICPE’18. Assoc. Computing Machinery, 2018. https://doi.org/10.1145/3185768.3186304
Robert Granat, Bo Kågström, Daniel Kressner, and Meiyue Shao. ALGORITHM 953: Parallel Library Software for the Multishift QR Algorithm with Aggressive Early Deflation. ACM Trans. Math. Software, 41(4): Article 29:1–23, 2015. https://doi.org/10.1145/2699471
Laura Grigori, Sebastien Cayrols, and James W. Demmel. Low Rank Approximation of a Sparse Matrix Based on LU Factorization with Column and Row Tournament Pivoting. SIAM J. Sci. Comput., 40 (2):C181-C209, 2018. https://doi.org/10.1137/16M1074527
A. Haidar, A. Abdelfattah, M. Zounon, S. Tomov, and J. Dongarra, A Guide for Achieving High Performance with Very Small Matrices on GPU: A Case Study of Batched LU and Cholesky Factorizations. In IEEE Transactions on Parallel and Distributed Systems, vol. 29, no. 5, pp. 973-984, May 1 2018. https://doi.org/10.1109/TPDS.2017.2783929
Azzam Haidar, Ahmad Abdelfattah, Mawussi Zounon, Srikara Pranesh, Panruo Wu, Stanimire Tomov, and Jack Dongarra, The Design of Fast and Energy-Efficient Linear Solvers: On The potential Of Half Precision Arithmetic And Iterative Refinement Techniques. Computational Science – ICCS 2018 – 18th International Conference, https://doi.org/10.1007/978-3-319-93698-7
Azzam Haidar, Stanimire Tomov, Jack Dongarra, Nick Higham, Harnessing GPU’s Tensor Cores Fast FP16 Arithmetic to Speedup Mixed-Precision Iterative Refinement Solvers. The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC18)
Laura Grigori, Sébastien Cayrols, and James W. Demmel, Low rank approximation of a sparse matrix based on LU factorization with column and row tournament pivoting. SIAM Journal on Scientific Computing, 40(2):181-209, 2018 https://doi.org/10.1137/16M1074527
W. Liu (RAL and Univ. of Copenhagen), A. Li (Eindhoven), J. Hogg, I. Duff (RAL), B. Vinter
(Univ. Copenhagen), A Synchronization-Free Algorithm for Parallel Sparse Triangular Solves. Proceedings of Euro-Par 2016, LNCS 9833, pp 617-630. Springer international Publishing, 2016. https://doi.org/10.1007/978-3-319-43659-3_45
Weifeng Liu, Ang Li, Jonathan Hogg, Iain S. Duff, Brian Vinter, Fast synchronization-free algorithms for parallel sparse triangular solves. Concurrency and Computation: Practice and Experience, 2017, vol 29, no 21. John Wiley & Sons. https://doi.org/10.1002/cpe.4244
Carl Christian Kjelgaard Mikkelsen and Lars Karlsson (2018) Blocked Algorithms for Robust Solution of Triangular Systems. In Wyrzykowski, R. et al (eds) Parallel Processing and Applied Mathematics, PPAM 2017, LNCS 10777, pp. 68–78. Springer, Cham.
https://doi.org/10.1007/978-3-319-78024-5_7
Carl Christian Kjelgaard Mikkelsen, Angelika Schwarz, and Lars Karlsson, Parallel robust solution of triangular linear systems. Accepted October 5th 2018 for publication in Concurrency and Computing: Practice and Experience, a special issue dedicated to PPAM 2017. https://doi.org/10.1002/cpe.5064
Mirko Myllykoski (2018) A Task-Based Algorithm for Reordering the Eigenvalues of a Matrix
in Real Schur Form. In Wyrzykowski, R. et al (eds) Parallel Processing and Applied Mathematics, PPAM 2017, LNCS 10777, pp. 207–216. Springer, Cham. https://doi.org/10.1007/978-3-319-78024-5
Mirko Myllykoski, Tuomo Rossi, Jari Toivanen, On solving separable block tridiagonal linear systems using a GPU implementation of radix-4 PSCR method. Journal of Parallel and Distributed Computing, Elsevier 2018, Vol. 115: 56-66. https://doi.org/10.1016/j.jpdc.2018.01.004
J. Papež, L. Grigori and R. Stompor, Solving linear equations with messenger-field and conjugate gradient techniques: An application to CMB data analysis. Astronomy & Astrophysics, Volume 620, 2018, Article number A59. https://doi.org/10.1051/0004-6361/201832987
Angelika Schwarz, Lars Karlsson, Scalable eigenvector computation for the nonsymmetric eigenvalue problem. Parallel Computing, Vol 85, pp131-140. Elsevier July 2019. https://doi.org/10.1016/j.parco.2019.04.001
I. Yamazaki, J. Kurzak, P. Wu, M. Zounon, and J. Dongarra, Symmetric Indefinite Linear Solver using OpenMP Task on Multicore Architecture. In IEEE Transactions on Parallel and Distributed Systems. https://doi.org/10.1109/TPDS.2018.2808964

November 2023
M	T	W	T	F	S	S
« Aug
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Leave a Reply Cancel reply