Efficient execution of numerical algorithms requires adapting the code both to the problem and the underlying execution platform. In this talk we will show the process of fine tuning our sparse Hypermatrix Cholesky factorization in order to exploit efficiently two important machine resources: processor and memory.
It is a basic tenet of numerical analysis that structure should be exploited whenever solving a problem. Algorithms for general matrix problems can be streamlined in the presence of such properties as symmetry, definiteness and sparsity. A very important operation is the Sparse Cholesky factorization of a symmetric positive definite matrix. This operation is used frequently in applications of Finite Element Methods as well as in Interior Point Methods of linear programming, taking a substantial proportion of their total computing time.
A lot of research has been conducted on the efficient parallelization of numerical algorithms. However, the efficiency of a parallel algorithm depends ultimately on the performance obtained from the computations performed on each node. Currently, our work focuses on the efficient sequential execution on a single processor.
There exists a number of data structures for sparse computations which can be used in order to avoid the storage of and computation on zero elements. We work with a hierarchical data structure known as hypermatrix. A matrix is subdivided recursively an arbitrary number of times. Several pointer matrices are used to store the location of submatrices at each level. The last level consists of data submatrices which are dealt with as dense submatrices. Our goal is that of reducing the overhead introduced by the unnecessary operation on zeros when a hypermatrix data structure is used to produce a sparse Cholesky factorization.
In this talk we will provide some details on the work we have done in order to produce a rather efficient implementation of a sparse Cholesky factorization using a hypermatrix data structure. We have developed several overhead reduction techniques trying to reduce the operations on zeros which can be stored within data submatrices. An important contribution to the performance improvement obtained by our sparse hypermatrix Cholesky comes from the usage of routines from our Small Matrix Library (SML): a set of routines specialized in the operation on matrices of a given and small size. The use of such routines, allows us to reduce data block size while keeping BLAS3 performance. In fact, the best results are obtained when rectangular data submatrices are used since they usually fit better the structure of the matrix. We have tried to reduce the number of non productive operations performed on parts of submatrices which are full of zeros. We will present the results obtained by using "bit vectors" and "windows" of non-zeros within data submatrices. Both techniques improve the performance of our sparse hypermatrix Cholesky factorization. However, the former becomes unnecessary when the latter is used. We have also investigated two forms of "amalgamation" for use in a sparse hypermatrix Cholesky: in both cases we allow for the possibility of storing zeros and performing computations on them. First, using "Intra-block amalgamation" we extend windows within data submatrices under certain circumstances. Second, we have developed algorithms which implement a "hypermatrix oriented supernode amalgamation". Only some matrices benefit from this technique in a sequential code. However, the resulting hypermatrix is better partitioned for parallel factorization.
Current 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2004 2003 2002 2001 2000 1999 1998 1997 1996