Composyx
Tutorial
Table of Contents
Back to index.
1. A minimal example
1.1. Requirements
For this tutorial we shall use Composyx
(previously Maphys++
and Compose
) as a header-only C++ library, so requirement tools
are limited. You only need git
, cmake
and a C++ compiler (g++
or clang++
for instance) supporting C++20 standard.
Later on, when performance becomes important, optimized libraries
such as BLAS, LAPACK, direct sparse solvers… must be used.
Interfaces for those are already present in Composyx
and you will
be able to take advantage of them will little effort. For small
prototypes such as this tutorial, or when performance is not
important, the header-only version is sufficient.
Now let's get the Composyx
C++ headers. The following command will
clone the source code (with git
) and set the variable
COMPOSYX_INCLUDES
to the path of those headers.
git clone --recursive https://gitlab.inria.fr/composyx/composyx.git COMPOSYX_INCLUDES=`realpath composyx/include` TLAPACK_INCLUDES=`realpath composyx/tlapack/include`
The --recursive
parameter is important: we do need the
submodules. If you forgot it, you can use the following commands to
get and update them:
# In the repository of compose git submodule init git submodule update
1.2. A simple example
Let's start with the following C++ example in the file minimal.cpp
:
#include <composyx.hpp> int main(){ using namespace composyx; DenseMatrix<double> A(3, 3); A(0, 0) = 4; A(1, 0) = -1; A(0, 1) = A(1, 0); A(1, 1) = 3; A(2, 1) = -0.5; A(1, 2) = A(2, 1); A(2, 2) = A(0, 0); A.display("A"); return 0; }
A m: 3 n: 3 Symmetry: general; Storage: full 4.000e+00 -1.000e+00 0.000e+00 -1.000e+00 3.000e+00 -5.000e-01 0.000e+00 -5.000e-01 4.000e+00
1.3. Compiling and running
Composyx
is written in C++20. In this tutorial we use the GNU
compiler g++
. The beginning of the command line will be:
g++ -std=c++20
Since we are using header-only C++ libraries, all we need to do is to include
the correct headers to compile. We need to indicate where the headers of
Composyx
and <T>LAPACK
are:
-I${COMPOSYX_INCLUDES} -I${TLAPACK_INCLUDES}
We an extra flag to indicate to Composyx
that we want to use the
header-only implementation, renouncing to the dependencies that
require compilation and linking. In particular, we can't use MPI or
a compiled BLAS implementation with this flag.
-DCOMPOSYX_HEADER_ONLY
Remark: if it's more convenient for you, it is also possible to add the
definition in the code before including composyx.hpp
rather that using the
flag in the command line:
#include <iostream> #include <vector> #define COMPOSYX_HEADER_ONLY #include <composyx.hpp> //...
So the command line to compile our example minimal.cpp
is:
g++ -std=c++20 -I${COMPOSYX_INCLUDES} -I ${TLAPACK_INCLUDES} -DCOMPOSYX_HEADER_ONLY minimal.cpp -o minimal
And we can run the code with ./minimal
. We obtain the output
described in the previous section.
1.4. Explanation of the program
The only feature used here is the class DenseMatrix
implemented in
Composyx
. It corresponds to a matrix where all coefficient are stored in a
2D-array. In this example we created a 3 by 3 matrix \(A\) with the constructor
DenseMatrix<double> A(3, 3)
, filled with zeros. The scalar type of the
coefficients of the matrix is given as a template parameter, here we use
double precision floating-point arithmetic for real numbers, which
corresponds to the type double
. Then, we can access and assign the
coefficients \(a_{i, j}\) of the matrix with A(i, j)
. Here we filled \(A\) with
some values. Finally, we displayed the matrix on the standard output using
the display
method of the class. Optionally we can pass the name of the
matrix we display as a parameter of the method, so that we can easily
identify it on the output (here we passed "A"
).
On the output, we can see the name of the matrix "A"
, followed by its
description. \(m\) is the number of rows, \(n\) the number of columns. We can see
that no assumption is made on the properties of the matrix: although the
matrix is clearly symmetric, the program doesn't have this piece of
information unless specified by the user. For this example it is not
necessary make this effort.
1.5. Another example with a vector
Let's write another simple example to perform a matrix-vector product.
#include <composyx.hpp> int main(){ using namespace composyx; DenseMatrix<double> A({ 4, -1, 0, -1, 3, -0.5, 0, -0.5, 4}, 3, 3); Vector<double> u{1, 2, 3}; Vector<double> b = A * u; u.display("u"); b.display("b"); return 0; }
u m: 3 (vector) 1 2 3 b m: 3 (vector) 2 3.5 11
First we introduced a new constructor for the DenseMatrix
. The first
parameter is a bracket list of the matrix coefficients, then its number of
rows \(m\) and number of columns \(n\). The coefficients are given in
column-major format, meaning the \(m\) first values describe the first
column, the \(m\) next the second column, etc. Here the matrix is symmetric
so it does not make a difference, but you can try to modify a non diagonal
coefficient and display the matrix to see the effect.
Secondly, we declared \(u\), a Vector
of double
. Actually Vector
is
special case of DenseMatrix
where the number of columns is fixed to 1.
The constructor is therefore very similar, you only need to give the list
of the coefficients of the vectors, and the size of the vector is deduced
from the size of the list.
Then we perform a matrix vector product of the matrix \(A\) and \(u\) and store
the result in a new Vector
\(b\). We display \(u\) and \(b\) in the same way we
displayed the matrix \(A\) before, using the display
method.
1.6. Solving a linear system
1.6.1. With a Jacobi iterative solver
Now let's try to solve a linear system. The matrix \(A\) is diagonally dominant matrix, so we can try so solve it with a Jacobi iterative solver. On the following example, we use this iterative method to find \(x\) such that \(Ax = b\). We can display \(x\) and make sure it has the same values as \(u = (1, 2, 3)^T\).
#include <composyx.hpp> #include <composyx/solver/Jacobi.hpp> int main(){ using namespace composyx; DenseMatrix<double> A({ 4, -1, 0, -1, 3, -0.5, 0, -0.5, 4}, 3, 3); Vector<double> u{1, 2, 3}; Vector<double> b = A * u; Jacobi<DenseMatrix<double>, Vector<double>> solver; solver.setup(parameters::A{A}, parameters::verbose{true}, parameters::max_iter{20}, parameters::tolerance{1e-8}); Vector<double> x = solver * b; x.display("x"); return 0; }
Starting Jacobi. Max iterations: 20 Stopping criterion ||r||/||b|| < 1e-08 with ||b|| = 11.7154 --- 0 - ||b-Ax||/||b|| 1 1 - ||b-Ax||/||b|| 0.194964 2 - ||b-Ax||/||b|| 0.067276 3 - ||b-Ax||/||b|| 0.0203088 4 - ||b-Ax||/||b|| 0.00700792 5 - ||b-Ax||/||b|| 0.0021155 6 - ||b-Ax||/||b|| 0.000729992 7 - ||b-Ax||/||b|| 0.000220364 8 - ||b-Ax||/||b|| 7.60408e-05 9 - ||b-Ax||/||b|| 2.29546e-05 10 - ||b-Ax||/||b|| 7.92092e-06 11 - ||b-Ax||/||b|| 2.39111e-06 12 - ||b-Ax||/||b|| 8.25095e-07 13 - ||b-Ax||/||b|| 2.49073e-07 14 - ||b-Ax||/||b|| 8.59474e-08 15 - ||b-Ax||/||b|| 2.59452e-08 16 - ||b-Ax||/||b|| 8.95286e-09 x m: 3 (vector) 1 2 3
We include the header with the conjugate gradient in
composyx/solver/Jacobi.hpp
to have access to the solver. We need
to indicate the type of matrix and vector used to the solver in template
parameters: DenseMatrix<double>
for the matrix, and Vector<double>
for
the vector.
Many parameters can be given to the solver, under the namespace parameters
.
Here we are giving the most useful ones.
parameters::A
, which is mandatory, is the input square matrix, or the operator of the system;parameters::tolerance
is the targeted precision of the solution, more specifically, the iterative solver considers the convergence achieved when the backward error \(||b - Ax||_2 / ||b||_2\) is lower than this value;parameters::max_iter
is the maximum number of iterations. When reached, the solver stops even if it has not converged to the required tolerance;parameters::verbose
is a boolean value,false
by default, printing information on the standard output whentrue
.Those parameters can be passed to the
setup
method in any order and have default values if not given (although they may not be relevant values in your context).Once setup, solving the linear system consists in a simple multiplication:
solver * rhs
, returning the solution vector. One may consider thesolver
instance as the \(A^{-1}\) operator, even if no explicit inversion of a matrix is ever performed.
1.6.2. With a conjugate gradient
The matrix \(A\) is actually not only diagonally dominant, it's also symmetric
positive definite. Therefore we can use a conjugate gradient to solve the
system. Since iterative solvers share the same set of basic parameters, we
can take the same code and just replace Jacobi
by ConjugateGradient
to
solve the system.
#include <composyx.hpp> #include <composyx/solver/ConjugateGradient.hpp> int main(){ using namespace composyx; DenseMatrix<double> A({ 4, -1, 0, -1, 3, -0.5, 0, -0.5, 4}, 3, 3); Vector<double> u{1, 2, 3}; Vector<double> b = A * u; ConjugateGradient<DenseMatrix<double>, Vector<double>> solver; solver.setup(parameters::A{A}, parameters::verbose{true}, parameters::max_iter{20}, parameters::tolerance{1e-8}); Vector<double> x = solver * b; x.display("x"); return 0; }
Starting Conjugate Gradient. Max iterations: 20 Stopping criterion ||r||/||b|| < 1e-08 with ||b|| = 11.7154 --- 0 - ||r||/||b|| 1 1 - ||r||/||b|| 0.248805 2 - ||r||/||b|| 0.048057 3 - ||r||/||b|| 2.6488e-18 ||b-Ax||/||b|| 0 x m: 3 (vector) 1 2 3
As expected, we only need 3 (the size of the linear system) iterations to converge with the conjugate gradient.
2. Sparse matrices
We propose 3 types of sparse matrices in Composyx
:
SparseMatrixCOO
: a matrix in coordinate format (also known as IJV format);SparseMatrixCSC
: a matrix in Compressed Sparse Column format;SparseMatrixLIL
: a matrix in list of lists format, a special format for coefficient insertion.
2.1. Create a sparse matrix with SparseMatrixLIL
The easiest way to create a sparse matrix is to insert the coefficients of
the matrix one by one. The only format that allows this kind of insertion is
SparseMatrixLIL
, which is implemented with an array of \(n\) (the number of
columns) lists. Thus insertion can be performed in linear time.
#include <composyx.hpp> #include <composyx/loc_data/SparseMatrixLIL.hpp> int main(){ using namespace composyx; SparseMatrixLIL<double> lilmat(3, 3); lilmat.insert(0, 0, 4); lilmat.insert(1, 0, -1); lilmat.insert(0, 1, -1); lilmat.insert(1, 1, 3); lilmat.insert(1, 2, -0.5); lilmat.insert(2, 1, -0.5); lilmat.insert(2, 2, 4); lilmat.display("lilmat"); lilmat.to_dense().display("lilmat converted to dense"); return 0; }
lilmat m: 3 n: 3 nnz: 7 Symmetry: general; Storage: full i v --- Col 0 0 4 1 -1 --- Col 1 0 -1 1 3 2 -0.5 --- Col 2 1 -0.5 2 4 lilmat converted to dense m: 3 n: 3 Symmetry: general; Storage: full 4.000e+00 -1.000e+00 0.000e+00 -1.000e+00 3.000e+00 -5.000e-01 0.000e+00 -5.000e-01 4.000e+00
We included the header for the lil matrix:
composyx/loc_data/SparseMatrixLIL.hpp
. First we construct the sparse matrix
its dimensions \((m, n)\), and then we add the coefficients \((i, j, v)\) (index
of row, index of column and value of the coefficient respectively) one by
one. The matrix type is very well suited when you want to construct a matrix
but you don't have information about the final numbers of nonzero elements
it contains.
2.2. Using half storage for symmetric matrices
As we fill this symmetric sparse matrix, we can see an obvious optimization: half storage. Indeed, if we could indicate that the matrix is symmetric, we would not need to add twice the same value \(v\) in \((i, j)\) and \((j, i)\).
#include <composyx.hpp> #include <composyx/loc_data/SparseMatrixLIL.hpp> int main(){ using namespace composyx; SparseMatrixLIL<double> lilmat(3, 3); lilmat.insert(0, 0, 4); lilmat.insert(1, 0, -1); lilmat.insert(1, 1, 3); lilmat.insert(2, 1, -0.5); lilmat.insert(2, 2, 4); lilmat.set_property(MatrixSymmetry::symmetric, MatrixStorage::lower); lilmat.display("lilmat"); lilmat.to_dense().display("lilmat converted to dense"); return 0; }
lilmat m: 3 n: 3 nnz: 5 Symmetry: symmetric; Storage: lower i v --- Col 0 0 4 1 -1 --- Col 1 1 3 2 -0.5 --- Col 2 2 4 lilmat converted to dense m: 3 n: 3 Symmetry: symmetric; Storage: lower 4.000e+00 - - -1.000e+00 3.000e+00 - 0.000e+00 -5.000e-01 4.000e+00
Here we filled only the lower triangular part of the matrix. We
can indicate with set_property(MatrixSymmetry::symmetric)
that
the matrix is symmetric, and then with
lilmat.set_property(MatrixStorage::lower)
that only the lower
part of the matrix is stored. These methods can be called on any
type of matrix in Composyx
, and any kind of matrix property. As
shown on the example, it's possible to set several properties at
once when calling set_property
.
Notice that for the DenseMatrix
constructed at the end, the conversion
propagates the properties, and only the lower triangular part of the matrix
is displayed. The other coefficients, whatever their values, won't be used
in any operation performed with the matrix.
Remark: some shortcuts exist to fill the properties of a matrix.
Here in particular, we have an symmetric positive definite matrix,
so we could (should?) have done
lilmat.set_spd(MatrixStorage::lower)
to set all properties at
the same time.
For more information on properties, see MatrixProperties.
2.3. Dealing with duplicates
What happens when we add twice a value in the same locate \((i, j)\) ? Well it's entirely up to you. The default behavior is to sum the two values given, but a lambda function can be passed to override this. The function takes two parameters: the first one is a reference to the value stored in the matrix, and the second is a const reference to the value you are inserting.
Let's see on a example:
#include <composyx.hpp> #include <composyx/loc_data/SparseMatrixLIL.hpp> int main(){ using namespace composyx; SparseMatrixLIL<double> lilmat(2, 2); std::cout << "Default behavior: add coefficients\n"; std::cout << "Insert (0, 0) -> 1\n"; lilmat.insert(0, 0, 1); std::cout << "Insert (0, 0) -> 0.5\n"; lilmat.insert(0, 0, 0.5); lilmat.to_dense().display("lilmat converted to dense"); auto replace_coeff = [](double& present_v, const double& insert_v){ present_v = insert_v; }; std::cout << "\nNew behavior: replace coefficients\n"; std::cout << "Insert (0, 1) -> 1\n"; lilmat.insert(0, 1, 1, replace_coeff); std::cout << "Insert (0, 1) -> 0.5\n"; lilmat.insert(0, 1, 0.5, replace_coeff); lilmat.to_dense().display("lilmat updated 1"); auto mean_coeff = [](double& present_v, const double& insert_v){ present_v = (present_v + insert_v) / 2; }; std::cout << "\nNew behavior: compute the mean of the coefficients\n"; std::cout << "Insert (1, 1) -> 1\n"; lilmat.insert(1, 1, 1, mean_coeff); std::cout << "Insert (1, 1) -> 0.5\n"; lilmat.insert(1, 1, 0.5, mean_coeff); lilmat.to_dense().display("lilmat updated 2"); return 0; }
Default behavior: add coefficients Insert (0, 0) -> 1 Insert (0, 0) -> 0.5 lilmat converted to dense m: 2 n: 2 Symmetry: general; Storage: full 1.500e+00 0.000e+00 0.000e+00 0.000e+00 New behavior: replace coefficients Insert (0, 1) -> 1 Insert (0, 1) -> 0.5 lilmat updated 1 m: 2 n: 2 Symmetry: general; Storage: full 1.500e+00 5.000e-01 0.000e+00 0.000e+00 New behavior: compute the mean of the coefficients Insert (1, 1) -> 1 Insert (1, 1) -> 0.5 lilmat updated 2 m: 2 n: 2 Symmetry: general; Storage: full 1.500e+00 5.000e-01 0.000e+00 7.500e-01
Remark: after the construction of your matrix, you may have zeros or almost
zeros values stored in your matrix. You can remove them with the .drop()
method. Optionally, this method takes a real value tolerance
, and all values
lower than this tolerance
in absolute value will be removed. For example,
you can write: lilmat.drop(1e-12)
.
2.4. Conversion to other matrix types
The lil format cannot be used to do any other operation than inserting at
present. Once you are satisfied with the lil matrix you have built, you
can convert it to the format you want to use: DenseMatrix
,
SparseMatrixCOO
or SparseMatrixCSC
.
#include <composyx.hpp> #include <composyx/loc_data/SparseMatrixLIL.hpp> int main(){ using namespace composyx; SparseMatrixLIL<double> lilmat(2, 2); lilmat.insert(0, 0, 4); lilmat.insert(1, 0, -1); lilmat.insert(1, 1, 3); lilmat.display("lilmat"); const SparseMatrixCOO<double> coomat(lilmat.to_coo()); coomat.display("coomat"); const SparseMatrixCSC<double> cscmat(lilmat.to_csc()); cscmat.display("cscmat"); const DenseMatrix<double> densemat(lilmat.to_dense()); densemat.display("densemat"); return 0; }
lilmat m: 2 n: 2 nnz: 3 Symmetry: general; Storage: full i v --- Col 0 0 4 1 -1 --- Col 1 1 3 coomat m: 2 n: 2 nnz: 3 Symmetry: general; Storage: full i j v 0 0 4 1 0 -1 1 1 3 cscmat m: 2 n: 2 nnz: 3 Symmetry: general; Storage: full i v --- Col 0 0 4 1 -1 --- Col 1 1 3 densemat m: 2 n: 2 Symmetry: general; Storage: full 4.000e+00 0.000e+00 -1.000e+00 3.000e+00
Remark: notice that the header file SparseMatrixLIL
includes the headers
of the other types of matrices (DenseMatrix
, SparseMatrixCOO
and
SparseMatrixCSC
).
2.5. Other sparse
While constructing a SparseMatrixLIL
adding the coefficients one by one can
be convenient, it is not optimal if you already know the number of non-zero
values before hand. In that case it is simpler to just pass the \(i\), \(j\) and
\(v\) arrays to build the coo, csc or csr matrix directly.
Several constructors are available to construct the matrix :
\begin{pmatrix} 0.5 & 1.5 \\ 0 & 1.0 \end{pmatrix}#include <composyx.hpp> #include <composyx/loc_data/SparseMatrixCOO.hpp> int main(){ using namespace composyx; const int M = 2; // Number of rows const int N = 2; // Number of columns const int NNZ = 3; // Number of non-zero values // Pass 3 initalizer lists i, j and v, then dimensions M and N SparseMatrixCOO<double> coomat_1({0, 0, 1}, {0, 1, 1}, {0.5, 1.5, 2}, M, N); coomat_1.display("coomat build with initializer lists"); // Pass dimensions M, N then NNZ and 3 pointers to the arrays for i, j and v // The arrays are not destroyed after the operation: values are copied int v_i[] = {0, 0, 1}; int v_j[] = {0, 1, 1}; double v_v[] = {0.5, 1.5, 2}; SparseMatrixCOO<double> coomat_2(M, N, NNZ, v_i, v_j, v_v); coomat_2.display("coomat build from pointers"); // Use composyx std::vector class (an extended std::vector) We move the values // into the SparseMatrixCOO class, the std::vector are empty after the // operation but no copy is performed std::vector<int> ia_i{0, 0, 1}; std::vector<int> ia_j{0, 1, 1}; std::vector<double> ia_v{0.5, 1.5, 2}; SparseMatrixCOO<double> coomat_3(M, N, NNZ, std::move(ia_i), std::move(ia_j), std::move(ia_v)); coomat_3.display("coomat build with std::vector"); return 0; }
coomat build with initializer lists m: 2 n: 2 nnz: 3 Symmetry: general; Storage: full i j v 0 0 0.5 0 1 1.5 1 1 2 coomat build from pointers m: 2 n: 2 nnz: 3 Symmetry: general; Storage: full i j v 0 0 0.5 0 1 1.5 1 1 2 coomat build with std::vector m: 2 n: 2 nnz: 3 Symmetry: general; Storage: full i j v 0 0 0.5 0 1 1.5 1 1 2
The same example can be written with SparseMatrixCSC
instead of
SparseMatrixCOO
. We do not describe here how the csc format works, but in
our case the arrays for \(i\) and \(v\) are the same as the ones for the
coordinate format, only the \(j\) array (of size \(N + 1\)) is different.
#include <iostream> #include <vector> #include <composyx.hpp> #include <composyx/loc_data/SparseMatrixCSC.hpp> int main(){ using namespace composyx; const int M = 2; // Number of rows const int N = 2; // Number of columns const int NNZ = 3; // Number of non-zero values std::vector<int> ia_i{0, 0, 1}; std::vector<int> ia_j{0, 1, 3}; std::vector<double> ia_v{0.5, 1.5, 2}; SparseMatrixCSC<double> cscmat(M, N, NNZ, std::move(ia_i), std::move(ia_j), std::move(ia_v)); cscmat.display("csc build with std::vector"); return 0; }
csc build with std::vector m: 2 n: 2 nnz: 3 Symmetry: general; Storage: full i v --- Col 0 0 0.5 --- Col 1 0 1.5 1 2
Same thing with SparseMatrxCSR
:
#include <iostream> #include <vector> #include <composyx.hpp> #include <composyx/loc_data/SparseMatrixCSC.hpp> int main(){ using namespace composyx; const int M = 2; // Number of rows const int N = 2; // Number of columns const int NNZ = 3; // Number of non-zero values std::vector<int> ia_i{0, 2, 3}; std::vector<int> ia_j{0, 1, 1}; std::vector<double> ia_v{0.5, 1.5, 2}; SparseMatrixCSR<double> csrmat(M, N, NNZ, std::move(ia_i), std::move(ia_j), std::move(ia_v)); csrmat.display("csr build with std::vector"); return 0; }
csr build with std::vector m: 2 n: 2 nnz: 3 Symmetry: general; Storage: full j v --- Row 0 0 1 5.000e-01 1.500e+00 --- Row 1 1 2.000e+00
Remark: these format also support conversion, as for the lil matrices,
with the methods .to_coo()
, .to_csc()
, to_csr()
and to_dense()
.
2.6. Solving the system
Now let's take a look at the system we wanted to solve: \(A\) is a symmetric definite positive matrix with 7 non-zero values (5 if we use half storage). Let's build this matrix as a coo sparse matrix in lower storage and use the conjugate gradient to solve the system:
#include <composyx.hpp> #include <composyx/solver/ConjugateGradient.hpp> #include <composyx/loc_data/SparseMatrixCOO.hpp> int main(){ using namespace composyx; SparseMatrixCOO<double> A({0, 1, 1, 2, 2}, {0, 0, 1, 1, 2}, {4, -1, 3, -0.5, 4}, 3, 3); A.set_spd(MatrixStorage::lower); Vector<double> u{1, 2, 3}; Vector<double> b = A * u; ConjugateGradient<SparseMatrixCOO<double>, Vector<double>> solver; solver.setup(parameters::A{A}, parameters::verbose{true}, parameters::max_iter{20}, parameters::tolerance{1e-8}); Vector<double> x = solver * b; x.display("x"); return 0; }
Starting Conjugate Gradient. Max iterations: 20 Stopping criterion ||r||/||b|| < 1e-08 with ||b|| = 11.7154 --- 0 - ||r||/||b|| 1 1 - ||r||/||b|| 0.248805 2 - ||r||/||b|| 0.048057 3 - ||r||/||b|| 2.6488e-18 ||b-Ax||/||b|| 0 x m: 3 (vector) 1 2 3
If we look at the modification compared to the dense version, we only build a
SparseMatrixCOO
instead of a DenseMatrix
, and we indicated to the
ConjugateGradient
solver that our operator \(A\) is given by the matrix
format SparseMatrixCOO
. The rest of the code did not change.
One could make a very generic code using the C++ tools: templates,
decltype
, auto
…
#include <composyx.hpp> #include <composyx/solver/ConjugateGradient.hpp> #include <composyx/loc_data/SparseMatrixCOO.hpp> using namespace composyx; template<typename Mat, typename Vect> void solve_with_cg(const Mat& A, const Vect& b){ ConjugateGradient<Mat, Vect> solver; solver.setup(parameters::A{A}, parameters::verbose{true}, parameters::max_iter{20}, parameters::tolerance{1e-8}); auto x = solver * b; x.display("x"); } int main(){ SparseMatrixCOO<double> A_coo({0, 1, 1, 2, 2}, {0, 0, 1, 1, 2}, {4, -1, 3, -0.5, 4}, 3, 3); A_coo.set_spd(MatrixStorage::lower); const Vector<double> u{1, 2, 3}; const Vector<double> b = A_coo * u; std::cout << "Solve CG with COO matrix\n"; solve_with_cg(A_coo, b); const DenseMatrix<double> A_dense = A_coo.to_dense(); std::cout << "Solve CG with dense matrix\n"; solve_with_cg(A_dense, b); return 0; }
Solve CG with COO matrix Starting Conjugate Gradient. Max iterations: 20 Stopping criterion ||r||/||b|| < 1e-08 with ||b|| = 11.7154 --- 0 - ||r||/||b|| 1 1 - ||r||/||b|| 0.248805 2 - ||r||/||b|| 0.048057 3 - ||r||/||b|| 2.6488e-18 ||b-Ax||/||b|| 0 x m: 3 (vector) 1 2 3 Solve CG with dense matrix Starting Conjugate Gradient. Max iterations: 20 Stopping criterion ||r||/||b|| < 1e-08 with ||b|| = 11.7154 --- 0 - ||r||/||b|| 1 1 - ||r||/||b|| 0.248805 2 - ||r||/||b|| 0.048057 3 - ||r||/||b|| 2.6488e-18 ||b-Ax||/||b|| 0 x m: 3 (vector) 1 2 3
3. A domain decomposition example
Introduction
In this tutorial we shall compile and run a small example of Composyx
on
the supercomputer plafrim or on a simple unix laptop.
3.1. Environment setup
3.1.1. guix setup
The simplest way to install Composyx
is to use guix. This piece
of software is available on plafrim and can be easily installed on
a laptop. The Composyx
package is distributed in the guix-hpc
repository. You may also be interested by the guix-hpc-non-free
package which gives access to additional non open-source packages
such as the intel MKL.
To check that you installation is correct, the following command should give you an output:
guix search composyx
and you should be able to find the Composyx
package. If it is not
the case, refer to the documentations on repositories linked above.
You can then check that the Composyx
package compiles:
guix build composyx
3.1.2. guix environment
To obtain a clean environment with Composyx
, we launch a bash in a pure
guix shell
:
guix shell --pure composyx
In this environment, only Composyx
is available. Furthermore,
environment variables are set so that every library can be easily
found by a compilation tool such as CMake (you can check the
values of C_INCLUDE_PATH
, LIBRARY_PATH
, PKG_CONFIG_PATH
…)
In particular, you will find the c++ headers of Composyx
:
#NB: add coreutils to use "ls" guix shell --pure composyx coreutils ls $GUIX_ENVIRONMENT/include/composyx
To get an environment containing the dependencies of Composyx
(but not Composyx
itself), you can use the flag -D
to obtain
the development environment of the package.
guix shell --pure -D composyx
Now if you want Composyx
and all its dependencies you can combine
both commands with:
guix shell --pure -D composyx composyx
- Add other packages in the environment
You can add other packages available in
guix
in your environment, for instance you may need your favorite text editor.guix shell --pure -D composyx composyx emacs
- Using MKL
If you have guix-hpc-non-free setup and you want to use intel MKL, you can try to use the package recipe
mapyhys++-mkl
.guix shell --pure -D composyx-mkl composyx-mkl
Under the hood, the recipe will replace
openblas
withmkl
.
3.2. Compile an example
3.2.1. Example description
- Domain decomposition
In the example presented, the adjacency graph of the global matrix consists in 7 vertices which are assigned to the following subdomains:
Node 0 1 2 3 4 5 6 Subdomain(s) 0, 2 0 0, 1 0, 1, 2 1 1, 2 2 The MPI ranks are numbered from 0 to \(N\) - 1 consistently with the default MPI numbering. However, subdomains are numbered from 1 to \(N\) to stay consistent with previous chapters where index 0 is reserved for the coarse space. In the code, the subdomain \(\Omega_i\) is therefore identified its unique ID \(i\). A local numbering of the unknowns is used in each subdomain: in the example, the local indices [0, 1, 2, 3] in \(\Omega_0\) correspond to the same global indices [0, 1, 2, 3]. The global indices corresponding to local indices [0, 1, 2, 3] of \(\Omega_1\) and \(\Omega_2\) are [3, 2, 4, 5] and [5, 6, 0, 3], respectively. Each subdomain has to know which unknowns it shares with which other subdomains (dotted lines in the figure (c)). In our example, for instance, in local ordering:
- \(\Omega_0\) shares [2, 3] with \(\Omega_1\) and [0, 3] with \(\Omega_2\).
- \(\Omega_1\) shares [1, 0] with \(\Omega_0\) and [0, 3] with \(\Omega_2\).
\(\Omega_2\) shares [2, 3] with \(\Omega_0\) and [3, 0] with \(\Omega_1\).
The global domain in (a) is partitioned in three subdomains \(\Omega_0\), \(\Omega_1\) and \(\Omega_2\) in (b). Each subdomain can use a local ordering of its unknowns as in (c); duplicate unknowns in different subdomains are linked by a dotted line.}
The interior unknowns can be eliminated to keep only interface unknown that are shared by two subdomains or more (a). A global interface numbering is introduced (b). The global interface can be expressed as the union of the three local interfaces \(\G_0\), \(\G_1\) and \(\G_2\) in (c).
- Input matrix
The system we are solving is \(K . u = f\), where \(K\) is a partitioned sparse SPD matrix, \(u\) the unknown vector and \(f\) the right hand side.
For the sake of simplicity, we put the same local matrix on each subdomain. Notice that the sparse structure of the matrix corresponds to the domain decomposition describe above.
Local matrices: \[\begin{pmatrix} 3 &-1 & 0 &-1 \\ -1 & 4 &-1 &-1 \\ 0 &-1 & 3 &-1 \\ -1 &-1 &-1 & 4 \end{pmatrix}\]
Which results in this global matrix:
Global matrix: \[\K = \begin{pmatrix} 6 & -1 & 0 & -2 & 0 & 0 & -1 \\ -1 & 4 & -1 & -1 & 0 & 0 & 0 \\ 0 & -1 & 7 & -2 & -1 & -1 & 0 \\ -2 & -1 & -2 & 11 & 0 & -2 & -1 \\ 0 & 0 & -1 & 0 & 3 & -1 & 0 \\ 0 & 0 & -1 & -2 & -1 & 7 & -1 \\ -1 & 0 & 0 & -1 & 0 & -1 & 4 \end{pmatrix}\]
The solution vector we expect to find is simply: \[u = \begin{pmatrix} 0 \\ 1 \\ 2 \\ 3 \\ 4 \\ 5 \\ 6 \end{pmatrix}\]
Thus, we shall use \(f = \K u\) for the right hand side vector:
Global vector: \[f = \begin{pmatrix} -13 \\ -1 \\ -2 \\ 12 \\ 5 \\ 17 \\ 16 \end{pmatrix}\]
The vectors are always assembled. So the local right hand side vectors are respectively:
\[\begin{pmatrix} -13 \\ -1 \\ -2 \\12 \end{pmatrix}, \begin{pmatrix} 12 \\ -2 \\ 5 \\ 17 \end{pmatrix} \text{, and } \begin{pmatrix} 17 \\ 16 \\ -13 \\ 12 \end{pmatrix}\]
3.2.2. Example code
#include <iostream> #include <vector> #include <mpi.h> #include <composyx.hpp> #include <composyx/part_data/PartMatrix.hpp> #include <composyx/solver/Pastix.hpp> #include <composyx/solver/ConjugateGradient.hpp> #include <composyx/solver/PartSchurSolver.hpp> int main(){ using namespace composyx; using Scalar = double; MMPI::init(); { const int rank = MMPI::rank(); const int M = 4; const int NNZ = 9; const int N_DOFS = 4; const int N_SUBDOMAINS = 3; // Bind subdomains to MPI process std::shared_ptr<Process> p = bind_subdomains(N_SUBDOMAINS); // Define topology for each subdomain std::vector<Subdomain> sd; if(p->owns_subdomain(0)){ std::map<int, std::vector<int>> nei_map_0 {{1, {2, 3}}, {2, {0, 3}}}; sd.emplace_back(0, N_DOFS, std::move(nei_map_0)); } if(p->owns_subdomain(1)){ std::map<int, std::vector<int>> nei_map_1 {{0, {1, 0}}, {2, {0, 3}}}; sd.emplace_back(1, N_DOFS, std::move(nei_map_1)); } if(p->owns_subdomain(2)){ std::map<int, std::vector<int>> nei_map_2 {{0, {2, 3}}, {1, {3, 0}}}; sd.emplace_back(2, N_DOFS, std::move(nei_map_2)); } p->load_subdomains(sd); // Local matrices (here the same matrix on every subdomain) std::vector<int> a_i{0, 0, 0, 1, 1, 1, 2, 2, 3}; std::vector<int> a_j{0, 1, 3, 1, 2, 3, 2, 3, 3}; std::vector<Scalar> a_v{3, -1, -1, 4, -1, -1, 3, -1, 4}; SparseMatrixCOO<Scalar, int> A_loc(M, M, NNZ, a_i.data(), a_j.data(), a_v.data()); A_loc.set_spd(MatrixStorage::upper); // Create the partitioned matrix std::map<int, SparseMatrixCOO<Scalar, int>> A_map; for(int k = 0; k < 3; ++k) if(p->owns_subdomain(k)) A_map[k] = A_loc; const PartMatrix<SparseMatrixCOO<Scalar>> A(p, std::move(A_map)); // Local rhs vectors std::map<int, Vector<Scalar>> b_map; if(p->owns_subdomain(0)) b_map[0] = Vector<Scalar>{-13, -1, -2, 12}; if(p->owns_subdomain(1)) b_map[1] = Vector<Scalar>{12, -2, 5, 17}; if(p->owns_subdomain(2)) b_map[2] = Vector<Scalar>{17, 16, -13, 12}; // Create the partitioned vector const PartVector<Vector<Scalar>> b(p, std::move(b_map)); // Solver definitions using Solver_K = Pastix<SparseMatrixCOO<Scalar>, Vector<Scalar>>; using Solver_S = ConjugateGradient<PartMatrix<DenseMatrix<Scalar>>, PartVector<Vector<Scalar>>>; using Solver = PartSchurSolver<PartMatrix<SparseMatrixCOO<Scalar>>, PartVector<Vector<Scalar>>, Solver_K, Solver_S>; // Setup Solver schur_solver; schur_solver.setup(A); // Set parameters for the solver S (ConjugateGradient) Solver_S& iter_solver = schur_solver.get_solver_S(); iter_solver.setup(parameters::max_iter{50}, parameters::tolerance{1e-8}, parameters::verbose{(rank == 0)}); // Solve auto X = schur_solver * b; X.display_centralized("X found"); if(rank == 0) std::cout << "Niter: " << iter_solver.get_n_iter() << '\n'; } MMPI::finalize(); return 0; }
First we need an example of Composyx
usage. Let's take the example in
src/examples/simple_part.cpp
of the Composyx
repository:
It is an example that solves a distributed sparse linear system on 3
subdomains, using Pastix
to compute the Schur complement and a conjugate
gradient.
3.2.3. Compilation
Now let's create a directory tutorial
and copy this file simple_part.cpp
inside:
mkdir tutorial && cd tutorial #create or copy simple_part.cpp
In order to get Composyx
and all its dependencies, we use the following
CMakeLists.txt
file:
cmake_minimum_required(VERSION 3.12) project(COMPOSYX_EXAMPLE CXX C Fortran) find_package(composyx REQUIRED) # Need c++20 set(CMAKE_CXX_STANDARD 20) set(CMAKE_CXX_STANDARD_REQUIRED ON) # compile your example add_executable(simple_part simple_part.cpp) # link to composyx target_link_libraries(simple_part PRIVATE COMPOSYX::composyx)
Now all we can compile our example using the standard CMake way:
mkdir build && cd build
cmake ..
make
3.2.4. Running the example
Let's run the example. It is written so that it can be run with any number of MPI processes. However, the optimal number of processes is one per subdomain, which is 3 in this examples:
mpirun -np 3 ./simple_part
The output should look like:
Starting Conjugate Gradient. Max iterations: 50 Stopping criterion ||r||/||b|| < 1e-08 with ||b|| = 29.116 --- 0 - ||r||/||b|| 0.97707 1 - ||r||/||b|| 0.441184 2 - ||r||/||b|| 0.220872 3 - ||r||/||b|| 0.0646697 4 - ||r||/||b|| 2.80055e-17 ||b-Ax||/||b|| 1.49027e-16 X found m: 7 (vector) 2.22045e-16 1 2 3 4 5 6 Niter: 4
4. Dense matrices
Introduction
In this tutorial we show an examples of dense matrices operations
on a laptop and on the supercomputer plafrim, using either
Blas/Lapack (e.g. Intel MKL, OpenBLAS) or Chameleon. The Composyx
class used is DenseMatrix
for which there are associated kernels
(scal, axpy, norm2, gemm, symm, eig, svd, etc) and solvers
(getrf/s, potrf/s, geqrf/s, gelqf/s, trsm). Note that operations on
dense matrices are limited to one single machine node for now (no
distributed MPI) but with Chameleon there is the possibility to
exploit GPUs in addition to CPUs.
4.1. Source code
The source code is a C++ file demo_dense.cpp
which relies on Composyx to
- perform a dense matrix multiplication \(C = A * B\),
- solve a linear system \(A * x = b\) with \(A\) symmetric positive definite (SPD) using Cholesky factorization \(A=LL^T\),
- solve an overdetermined system \(A * x = b\) with \(A\) rectangular using least squares and a factorization \(A=QR\).
- compute the Randomized SVD of a rank-R general matrix.
- compute the Randomized EVD of a rank-R symmetric matrix.
The Chameleon C library is also used to generate random matrices cf. CHAMELEON_dplrnt
(general matrix) and
CHAMELEON_dplgsy
(SPD matrix). Chameleon must be initialized with Composyx
i.e. composyx::initialize
. The number of
CPUs, GPUs can be set here i.e. composyx::initialize(nc, ng)
. The size of sub-blocks (tiles) to be used by chameleon
can be set by calling composyx::setChameleonParameters(nb)
. If not set, those parameters will be chosen by default
internally and will try to use all CPUs, no GPUs and tile size equal 320. Note that the composyx::initialize
, when
the executable is compiled with COMPOSYX_USE_CHAMELEON
, will set Blas/Lapack number of threads to 1. Use
setNumThreads(getMaxThreads());
before calls to Blas/Lapack kernels and setNumThreads(1);
before calls to Chameleon
kernels.
#include <composyx.hpp> using namespace composyx; int main(int argc, char *argv[]) { using Scalar = double; using Real = typename arithmetic_real<Scalar>::type; using Matrix = DenseMatrix<Scalar>; /* Initialize Chameleon context */ initialize(4, 0); setChameleonParameters(640); /* Matrix product GEMM: C=AB */ // Fill the matrix with random values const int M=10; Matrix A(M, M); Matrix B(M, M); Matrix C(M, M); const int seedA=10; CHAMELEON_dplrnt( M, M, get_ptr(A), M, seedA ); const int seedB=20; CHAMELEON_dplrnt( M, M, get_ptr(B), M, seedB ); // Compute the product (GEMM) chameleon_kernels::gemm(A, B, C); /* Linear system solve (Cholesky): Ax=b */ // Fill the matrix A with random values so that it is square and SPD CHAMELEON_dplgsy( (Scalar)M, ChamLower, M, get_ptr(A), M, seedA ); A.set_spd(MatrixStorage::lower); Vector<Scalar> b(M); CHAMELEON_dplrnt( M, 1, get_ptr(b), M, seedB ); // Declare the Solver ChameleonSolver<Matrix, Vector<Scalar>> SCcham; // if A is given it is copied in the solver class // giving A as rvalue allows to avoid copy SCcham.setup(A); // Solve the linear system Vector<Scalar> x = SCcham * b; // SCcham.factorize(); can be called to perform factorization before solving // check Vector<Scalar> diff = A*x-b; Real normc = diff.norm(); std::cout << "Cholesky - ||Ax-b|| = " << normc << "\n" << std::endl; /* Overdetermined system solve (QR): Ax=b */ // clear previous A and realloc const int N=5; A = Matrix(M, N); // Fill the rectangular matrix A with random values CHAMELEON_dplrnt( M, N, get_ptr(A), M, seedA ); // Declare the solver ChameleonSolver<Matrix, Vector<Scalar>> SLcham; SLcham.setup(A); // Solve the least squares problem x = SLcham * b; // check diff = A*x-b; Real normls = diff.norm(); std::cout << "Least squares - Sum r_i² = " << normls*normls << "\n" << std::endl; /* Randomized SVD to compute SVD on rank-R (R<<N) matrices */ // generate a general matrix of rank R const int R=2; Matrix Al = chameleon_kernels::random_matrix_normal<Matrix>(M, R); Matrix Ar = chameleon_kernels::random_matrix_normal<Matrix>(R, N); A = Al*Ar; // Compute the RSVD of A auto [U, sigmas, Vt] = chameleon_kernels::rsvd(A, R, 0, false); // check SVD Matrix Sigma(n_cols(U), n_rows(Vt)); auto k = std::min(n_cols(U), n_rows(Vt)); for(Size i = 0; i < k; ++i) Sigma(i, i) = Scalar{sigmas[i]}; auto Asvd = U * Sigma * Vt; diff = A-Asvd; Real normsvd = diff.norm(); std::cout << "RSVD - ||A-(U S V^T)|| = " << normsvd << "\n" << std::endl; /* Randomized EVD to compute EVD on rank-R (R<<N) symmetric/hermitian matrices */ // generate a symmetric/hermitian matrix of rank R A = Al*Al.t(); A.set_property(MatrixSymmetry::symmetric); // Compute the REVD of A auto [lambdas, V] = chameleon_kernels::revd(A, R, 0); // check EVD for(Size i = 0; i < n_cols(V); ++i){ Vector<Scalar> AV = A * V.get_vect_view(i); Vector<Scalar> lamV; lamV = lambdas[i] * V.get_vect_view(i); Vector<Scalar> DiffA = AV - lamV; std::cout << "REVD - Norm " << i << " = " << DiffA.norm() << std::endl; } finalize(); return 0; }
Another version of this example, more complete, can be read in src/examples/demo_dense.cpp
of the Composyx
repository,
with cpu time comparison between Blas/Lapack and Chameleon. See also src/examples/demo_dense_rsvd.cpp
for the rsvd.
There are of course other routines available to perform dense matrices operations but there is not always a Chameleon implementation of Blas/Lapack function.
We summary the Blas/Lapack operations available in Composyx through the kernels calls in the following table. All
routines can be called from blas_kernels::
or chameleon_kernels::
but if there is no Chameleon implementation it will
actually call the blas/lapack kernel. The X symbol in the table means there is a Chameleon implementation which can
accelerate this operation.
Operation | Chameleon |
---|---|
\(\text{scal}\) | |
\(\text{axpy}\) | |
\(\text{dot}\) | |
\(\text{norm2}\) | |
\(\text{gemv}\) | |
\(\text{gemm}\) | X |
\(\text{symm}\) | X |
\(\text{hemm}\) | X |
\(\text{trmm}\) | X |
\(\text{eig}\) | |
\(\text{eigvals}\) | |
\(\text{eigh}\) | |
\(\text{eigvalsh}\) | |
\(\text{eigh_select}\) | |
\(\text{eigh_smallest}\) | |
\(\text{eigh_largest}\) | |
\(\text{gen_eigh_select}\) | |
\(\text{gen_eigh_smallest}\) | |
\(\text{gen_eigh_largest}\) | |
\(\text{svd}\) | |
\(\text{svd_left_sv}\) | |
\(\text{svd_right_sv}\) | |
\(\text{svdvals}\) | |
\(\text{rsvd}\) | X |
\(\text{rsvd_left_sv}\) | X |
\(\text{rsvd_right_sv}\) | X |
\(\text{rsvdvals}\) | X |
\(\text{revd}\) | X |
\(\text{revdvals}\) | X |
We summary the solver routines available in Composyx through the Solver class (BlasSolver
, ChameleonSolver
) in the
following table.
Operation | Chameleon |
---|---|
LU (\(\text{getrf/s}\)) | |
Cholesky (\(\text{potrf/s}\)) | X |
Symmetric complex (\(\text{sytrf/s}\)) | X |
QR/LQ ($\text{geqrf/s}$/\(\text{gelqf/s}\)) | X |
Least Squares (\(\text{geqrf/s}\)) | X |
Minimum norm (\(\text{gelqf/s}\)) | X |
Triangular Solve (\(\text{trsm}\)) | X |
4.2. Execution on a Linux Ubuntu laptop
4.2.1. Install Composyx with CMake
Install system packages requirements
sudo apt-get update -y sudo apt-get install -y git cmake build-essential gfortran python wget tar curl pkg-config libmkl-dev
Notice one can replace libmkl-dev by libopenblas-dev.
Install specific packages with CMake
# install blaspp, lapackpp and arpack-ng git clone https://github.com/icl-utk-edu/blaspp.git git clone https://github.com/icl-utk-edu/lapackpp.git git clone https://github.com/opencollab/arpack-ng.git cd blaspp cmake -B build -DCMAKE_INSTALL_PREFIX=/usr/local cmake --build build -j5 sudo cmake --install build cd ../ cd lapackpp cmake -B build -DCMAKE_INSTALL_PREFIX=/usr/local cmake --build build -j5 sudo cmake --install build cd ../ cd arpack-ng cmake -B build -DICB=ON cmake --build build -j5 sudo cmake --install build cd ../ # Install Starpu and Chameleon git clone https://gitlab.inria.fr/starpu/starpu.git cd starpu ./configure --prefix=/usr/local --disable-build-doc --disable-build-test --disable-opencl --disable-fortran --disable-socl --disable-starpufft make -j5 sudo make install cd ../ git clone --recursive https://gitlab.inria.fr/solverstack/chameleon.git cd chameleon cmake -B build -DCHAMELEON_USE_MPI=OFF -DCMAKE_INSTALL_PREFIX=/usr/local -DBUILD_SHARED_LIBS=ON # if starpu is installed in a specific path add the following to the configuration: -DCMAKE_PREFIX_PATH="path to starpu" cmake --build build -j5 sudo cmake --install build cd ../
Install Composyx
git clone --recursive https://gitlab.inria.fr/composyx/composyx.git
cd composyx
cmake -B build -DCMAKE_INSTALL_PREFIX=/usr/local -DCOMPOSYX_USE_CHAMELEON=ON -DCOMPOSYX_USE_MUMPS=OFF -DCOMPOSYX_USE_PASTIX=OFF
cmake --build build -j5
sudo cmake --install build
4.2.2. Compile and run the example
To link a C++ executable with Composyx from a file demo_dense.cpp
the straightforward way is to use CMake, example of
CMakeLists.txt
file:
cmake_minimum_required(VERSION 3.5) project(DEMO_DENSE CXX C Fortran) find_package(composyx REQUIRED) set(CMAKE_CXX_STANDARD 20) set(CMAKE_CXX_STANDARD_REQUIRED ON) add_executable(demo_dense demo_dense.cpp) target_link_libraries(demo_dense PRIVATE COMPOSYX::composyx) target_compile_definitions(demo_dense PRIVATE COMPOSYX_USE_CHAMELEON) target_compile_definitions(demo_dense PRIVATE CHAMELEON_COMPLEX_CPP)
Then, in the same directory where demo_dense.cpp
and CMakeLists.txt
files are located use cmake command as follows:
cmake -B build # if dependencies are installed in specific directories , add them to the configure: -DCMAKE_PREFIX_PATH="path to chameleon;path to starpu cmake --build build # Run the example ./build/demo_dense
4.3. Execution using GNU Guix
We consider Guix is up-to-date and that the guix-hpc-non-free is setup. We consider using the complete example here,
located in src/examples/demo_dense.cpp
. Lets compare CPU times given by Lapack (Intel MKL here) and Chameleon using
four threads i.e. four CPUs.
Compile the example using Guix
git clone --recursive https://gitlab.inria.fr/composyx/composyx.git
cd composyx
guix shell --pure -D composyx-mkl -- /bin/bash --norc
cmake -B build-guix -DCOMPOSYX_COMPILE_EXAMPLES=ON -DCOMPOSYX_COMPILE_TESTS=OFF -DCOMPOSYX_DRIVERS=OFF -DCOMPOSYX_C_DRIVER=OFF -DCOMPOSYX_Fortran_DRIVER=OFF -DCOMPOSYX_USE_CHAMELEON=ON -DCOMPOSYX_USE_MUMPS=OFF -DCOMPOSYX_USE_PASTIX=OFF
cmake --build build-guix --target composyx_demo_dense composyx_demo_dense_rsvd
Arguments which can be used with the executable:
- 0 for Blas/Lapack (blaspp/lapackpp+Intel MKL MT), 1 for Chameleon (+Intel MKL sequential)
- Size M (size of the GEMM and linear system, number of rows and columns)
- Size N (number of columns of the rectangular matrix during QR and least squares problem)
- Chameleon block/tile size
- Chameleon number of CPU workers
- Chameleon number of GPU workers
Execution with Blas/Lapack from the MKL
./build-guix/src/examples/composyx_demo_dense 0 10000 5000
Example results
Least squares system time, 1, 3.06074 Linear system (Cholesky) time, 1, 2.03154 Gemm time, 1, 10.5614
Notice that we can set MKL_NUM_THREADS
to set the number of threads used by MKL.
Execution with Chameleon using a bloc size of 500 and 4 threads
./build-guix/src/examples/composyx_demo_dense 1 10000 5000 500 4 0
Example results
Least squares system time, 1, 3.36202 Linear system (Cholesky) time, 1, 2.17157 Gemm time, 1, 11.3575
For the composyx_demo_dense_rsvd
executable the rank R of the matrix can be given as fourth argument.
Execution with Blas/Lapack from the MKL
./build-guix/src/examples/composyx_demo_dense_rsvd 0 16000 8000 995
Example results
Check time, 1, 3.13929 RSVD time, 1, 4.54151 Matrix building time, 1, 2.14337
Execution with Chameleon a bloc size of 500 and 4 threads
./build-guix/src/examples/composyx_demo_dense_rsvd 1 16000 8000 995 500 4 0
Example results
Check time, 1, 3.23624 RSVD time, 1, 4.7384 Matrix building time, 1, 1.8179
Chameleon delivers performances close to MKL ones, knowing that Intel MKL is a Blas/Lapack reference for using a single machine with Intel CPUs.
4.4. Execution on the Plafrim supercomputer using GPUs
Lets compare CPU times given by Lapack (Intel MKL here) and Chameleon on a plafrim sirocco (2x16 cores + 2x gpus p100) node.
Compile the example using Guix
ssh plafrim git clone --recursive https://gitlab.inria.fr/composyx/composyx.git cd composyx guix shell --pure -D composyx-mkl --with-input=chameleon-mkl-mt-nompi=chameleon-cuda-mkl-mt-nompi --with-input=starpu=starpu-cuda --with-input=openmpi=openmpi-cuda slurm -- /bin/bash --norc cmake -B build-guix -DCOMPOSYX_COMPILE_EXAMPLES=ON -DCOMPOSYX_COMPILE_TESTS=OFF -DCOMPOSYX_DRIVERS=OFF -DCOMPOSYX_C_DRIVER=OFF -DCOMPOSYX_Fortran_DRIVER=OFF -DCOMPOSYX_USE_CHAMELEON=ON -DCOMPOSYX_USE_MUMPS=OFF -DCOMPOSYX_USE_PASTIX=OFF cmake --build build-guix --target composyx_demo_dense composyx_demo_dense_rsvd
Notice we replace the default chameleon package by a version built with CUDA to be able to use GPUs.
Reserve one p100 node and expose the location of the CUDA driver in the Guix environment (can be different depending on the supercomputer)
salloc --nodes=1 --time=01:00:00 --constraint p100 --exclusive --ntasks-per-node=1 --threads-per-core=1 export LD_PRELOAD="/usr/lib64/libcuda.so" #export STARPU_DISABLE_PINNING=1 export STARPU_NWORKER_PER_CUDA=2
Execution with Blas/Lapack from the MKL
srun -n 1 -c 30 ./build-guix/src/examples/composyx_demo_dense 0 24800 12400 1240 30 0
Example results
Least squares system time, 1, 10.9655 Linear system (Cholesky) time, 1, 9.42283 Gemm time, 1, 40.1059
Execution with Chameleon (set ncpus parameter to -1 to let Chameleon decide the best number of CPUs to use) and call
setChameleonParameters(NB, ChamOutOfPlace);
to enable conversion of the Lapack matrix into a Chameleon tiled matrix
whose storage is more efficient for GPUs.
srun -n 1 -c 30 ./build-guix/src/examples/composyx_demo_dense 1 24800 12400 1240 -1 2
Example results
Least squares system time, 1, 6.36323 Linear system (Cholesky) time, 1, 1.78136 Gemm time, 1, 11.7963
Execution of RSVD with Blas/Lapack from the MKL
srun -n 1 -c 30 ./build-guix/src/examples/composyx_demo_dense_rsvd 0 49600 24800 1235
Result
RSVD time, 1, 21.528
Execution of RSVD with Chameleon
srun -n 1 -c 30 ./build-guix/src/examples/composyx_demo_dense_rsvd 1 49600 24800 1235 1240 -1 2
Example results
RSVD time, 1, 11.3395
Performances with Chameleon outperform MKL ones thanks to the GPUs kernels (cuBLAS here) that Chameleon is able to exploit.
5. Composability examples
5.1. Conjugate gradient solver
In order to demonstrate the feasibility of utilising different objects from different libraries with a solver, we will consider the example of the conjugate gradient presented earlier. We will consider a matrix in the Eigen format and a vector in the composyx format. This represents an initial step in the development of a composable and generic library.
5.1.1. Eigen Matrix and a composyx vector
- Source code
You can find the source code in
src/examples/demo_cg_eigen_cpx.cpp
.#include <iostream> // the Eigen wrappers for matrix and vector #include "composyx/wrappers/Eigen/EigenMatrix.hpp" // The composyx matrix/vector include #include "composyx/loc_data/DenseMatrix.hpp" /// Type definition using value_type = double; // Composyx vector type using my_vector_type = composyx::Vector<value_type>; // Eigen matrix type (we use the composyx Eigen wrapper) using my_matrix_type = composyx::E_DenseMatrix<value_type>; /// Composability namespace composyx { // For concepts // We specify that the vector type for an Eigen matrix is a composyx vector type template <CPX_Scalar Scalar> struct vector_type<E_DenseMatrix<Scalar>> : public std::true_type { using type = my_vector_type; }; // The Eigen matrix * composyx vector operator auto operator*(my_matrix_type const& A, my_vector_type const& x) -> my_vector_type { using value_type = my_matrix_type::value_type; using C_matrix_type = composyx::DenseMatrix<value_type>; // The eigen matrix is encapsulated (without copying) in a composyx matrix C_matrix_type AA(A.rows(), A.cols(), const_cast<value_type*>(A.data()), true); // return the resuls in a CPX vector return AA * x; } } // namespace composyx #include <composyx/solver/ConjugateGradient.hpp> int main() { my_matrix_type A({{4, -1, 0}, {-1, 3, -0.5}, {0, -0.5, 4}}); my_vector_type u({1, 2, 3}, 3); my_vector_type b = A * u; // composyx::ConjugateGradient<my_matrix_type, my_vector_type> solver; solver.setup(composyx::parameters::A{A}, composyx::parameters::verbose{true}, composyx::parameters::max_iter{20}, composyx::parameters::tolerance{1e-8}); my_vector_type x = solver * b; composyx::display(x, "x:\n", std::cout); return 0; }
- Explanation
In order to utilise the composyx solvers, it is necessary to be aware of the return type of the matrix-vector product and to have a matrix-vector product within the composyx namespace.
template <CPX_Scalar Scalar> struct vector_type<E_DenseMatrix<Scalar>> : public std::true_type {using type = my_vector_type;};
In this approach, the Eigen matrix (column-major storage) is encapsulated within a Composer matrix. To achieve this, the following constructor is employed:
explicit DenseMatrix(const size_t m, const size_t n, Scalar * ptr, bool fixed_ptr = false) ;
where m represents the number of rows, n the number of columns, ptr the ptr pointer to the memory area, and fixedptr specifies whether the memory space allocated by Eigen is utilised (true) or whether this space is copied (false).
auto operator*(my_matrix_type const& A, my_vector_type const& x) -> my_vector_type { using value_type = my_matrix_type::value_type; using C_matrix_type = composyx::DenseMatrix<value_type>; // The eigen matrix is encapsulated (without copying) in a composyx matrix C_matrix_type AA(A.rows(), A.cols(), const_cast<value_type*>(A.data()), true); // return the resuls in a CPX vector return AA * x; }
5.1.2. Eigen matrix and an eigen vector
You can find the source code in src/examples/demo_cg_eigen.cpp
.
- Source code
#include <iostream> // the full Eigen wrappers for matrix and vector #include <composyx/wrappers/Eigen/Eigen.hpp> /// Type definition using value_type = double; using my_vector_type = composyx::E_Vector<value_type>; using my_matrix_type = composyx::E_DenseMatrix<value_type>; // Constraints for using Eigen vector in CPX iterative solver namespace Eigen { auto dot(my_vector_type const &v1, my_vector_type const &v2) -> my_vector_type::value_type { return v1.dot(v2); } [[nodiscard]] size_t size(my_vector_type const &v) { return static_cast<size_t>(v.size()); } } // namespace Eigen #include "composyx.hpp" // needed for compilation ! #include <composyx/solver/ConjugateGradient.hpp> int main() { my_matrix_type A(3, 3); A << 4, -1, 0, -1, 3, -0.5, 0, -0.5, 4; my_vector_type u(3); u << 1, 2, 3; my_vector_type b = A * u; // composyx::ConjugateGradient<my_matrix_type, my_vector_type> solver; solver.setup(composyx::parameters::A{A}, composyx::parameters::verbose{true}, composyx::parameters::max_iter{20}, composyx::parameters::tolerance{1e-8}); my_vector_type x = solver * b; composyx::display(x, "x:\n", std::cout); return 0; }
- Explanation
In this example, we are working within the Eigen framework, where the constraints on the matrix from the previous example have already been defined in the Eigen.hpp file. However, we still need to satisfy the solver's constraints on the Eigen vectors. To achieve this, we must define two functions, dot and size.
namespace Eigen { auto dot(my_vector_type const &v1, my_vector_type const &v2) -> my_vector_type::value_type { return v1.dot(v2); } [[nodiscard]] auto size(my_vector_type const &v) -> size_t {return static_cast<size_t>(v.size()); } } // namespace Eigen
In order to solve Argument-dependent lookup, these functions must be defined within the Eigen namespace.