\( \newcommand{\blu}[1]{{\color{blue}#1}} \newcommand{\red}[1]{{\color{red}#1}} \newcommand{\grn}[1]{{\color{green!50!black}#1}} \newcommand{\local}{_i} \newcommand{\inv}{^{-1}} % Index for interface and interior \newcommand{\G}{\Gamma} \newcommand{\Gi}{\Gamma_i} \newcommand{\I}{{\cal I}} \newcommand{\Ii}{\I_i} % Matrix A \newcommand{\A}{{\cal A}} \newcommand{\Ai}{\A\local} \newcommand{\Aj}{\A_j} \newcommand{\Aib}{\bar{\A}\local} \newcommand{\AII}{\A_{\I\I}} \newcommand{\AIG}{\A_{\I\G}} \newcommand{\AGI}{\A_{\G\I}} \newcommand{\AGG}{\A_{\G\G}} \newcommand{\AIiIi}{\A_{\Ii\Ii}} \newcommand{\AIiGi}{\A_{\Ii\Gi}} \newcommand{\AGiIi}{\A_{\Gi\Ii}} \newcommand{\AGiGi}{\A_{\Gi\Gi}} \newcommand{\AGiGiw} {{\ensuremath{\A_{\Gi\Gi}^w}}} \newcommand{\AIIi}{\AII\local} \newcommand{\AIGi}{\AIG\local} \newcommand{\AGIi}{\AGI\local} \newcommand{\AGGi}{\AGG\local} \newcommand{\Ab}{\bar{\A}} \newcommand{\Ah}{{\widehat{\A}}} \newcommand{\Aih}{{\Ah\local}} \newcommand{\At}{{\widetilde{\A}}} \newcommand{\Ait}{{\At\local}} \newcommand{\Ao}{\A_0} \newcommand{\Aot}{\At_0} \newcommand{\Aob}{\Ab_0} \newcommand{\AiNN}{\Ai^{(NN)}{}} \newcommand{\AitNN}{\Ait^{(NN)}{}} \newcommand{\AiAS}{\Ai^{(AS)}{}} \newcommand{\AitAS}{\Ait^{(AS)}{}} % Matrix S \renewcommand{\S}{{\cal S}} \newcommand{\Si}{\S\local} \newcommand{\Sb}{\bar{\S}} \newcommand{\Sib}{\Sb\local} \newcommand{\Sh}{{\widehat{\S}}} \newcommand{\Sih}{{\Sh\local}} \newcommand{\St}{{\widetilde{\S}}} \newcommand{\Sit}{{\St\local}} \newcommand{\So}{\S_0} \newcommand{\Soi}{\S_0^i} \newcommand{\Sot}{\St_0} \newcommand{\Sob}{\Sb_0} \newcommand{\SiNN}{\Si^{(NN)}{}} \newcommand{\SitNN}{\Sit^{(NN)}{}} \newcommand{\SiAS}{\Si^{(AS)}{}} \newcommand{\SitAS}{\Sit^{(AS)}{}} % Matrix K \newcommand{\K}{{\cal K}} \newcommand{\Ki}{\K\local} \newcommand{\Kb}{\bar{\K}} \newcommand{\Kib}{\Kb\local} \newcommand{\Kh}{{\widehat{\K}}} \newcommand{\Kih}{{\Kh\local}} \newcommand{\Kt}{{\widetilde{\K}}} \newcommand{\Kit}{{\Kt\local}} \newcommand{\Ko}{\K_0} \newcommand{\Kot}{\Kt_0} \newcommand{\Kob}{\Kb_0} \newcommand{\KiNN}{\Ki^{(NN)}{}} \newcommand{\KitNN}{\Kit^{(NN)}{}} \newcommand{\KiAS}{\Ki^{(AS)}{}} \newcommand{\KitAS}{\Kit^{(AS)}{}} \newcommand{\KII}{\K_{\I\I}} \newcommand{\KIG}{\K_{\I\G}} \newcommand{\KGI}{\K_{\G\I}} \newcommand{\KGG}{\K_{\G\G}} \newcommand{\KIiIi}{\K_{\Ii\Ii}} \newcommand{\KIiGi}{\K_{\Ii\Gi}} \newcommand{\KGiIi}{\K_{\Gi\Ii}} \newcommand{\KGiGi}{\K_{\Gi\Gi}} \newcommand{\KIIi}{\KII\local} \newcommand{\KIGi}{\KIG\local} \newcommand{\KGIi}{\KGI\local} \newcommand{\KGGi}{\KGG\local} \newcommand{\KGiGiw} {{\ensuremath{\K_{\Gi\Gi}^w}}} % Matrix B \newcommand{\B}{{\cal B}} \newcommand{\Bi}{\B\local} \newcommand{\Bib}{\widehat{\B}\local} \newcommand{\Bob}{\widehat{\B}_0} % Matrix C \newcommand{\C}{{\cal C}} % Matrix T \newcommand{\T}{{\cal T}} \newcommand{\Ti}{{\T\local}} % Vectors \newcommand{\uI}{u_\I} \newcommand{\uG}{u_\G} \newcommand{\xI}{x_\I} \newcommand{\xG}{x_\G} \newcommand{\bI}{b_\I} \newcommand{\bG}{b_\G} \newcommand{\fI}{f_\I} \newcommand{\fG}{f_\G} \newcommand{\ftG}{\widetilde f_\G} \newcommand{\ftGi}{\widetilde f_\Gi^{(i)}} \newcommand{\ftGj}{\widetilde f_\Gj^{(j)}} \)

Composyx
Tutorial

Table of Contents

Back to index.

1. A minimal example

1.1. Requirements

For this tutorial we shall use Composyx (previously Maphys++ and Compose) as a header-only C++ library, so requirement tools are limited. You only need git, cmake and a C++ compiler (g++ or clang++ for instance) supporting C++20 standard.

Later on, when performance becomes important, optimized libraries such as BLAS, LAPACK, direct sparse solvers… must be used. Interfaces for those are already present in Composyx and you will be able to take advantage of them will little effort. For small prototypes such as this tutorial, or when performance is not important, the header-only version is sufficient.

Now let's get the Composyx C++ headers. The following command will clone the source code (with git) and set the variable COMPOSYX_INCLUDES to the path of those headers.

git clone --recursive https://gitlab.inria.fr/composyx/composyx.git
COMPOSYX_INCLUDES=`realpath composyx/include`
TLAPACK_INCLUDES=`realpath composyx/tlapack/include`

The --recursive parameter is important: we do need the submodules. If you forgot it, you can use the following commands to get and update them:

# In the repository of compose
git submodule init
git submodule update

1.2. A simple example

Let's start with the following C++ example in the file minimal.cpp:

#include <composyx.hpp>

int main(){

  using namespace composyx;

  DenseMatrix<double> A(3, 3);

  A(0, 0) = 4;
  A(1, 0) = -1;

  A(0, 1) = A(1, 0);
  A(1, 1) = 3;
  A(2, 1) = -0.5;

  A(1, 2) = A(2, 1);
  A(2, 2) = A(0, 0);

  A.display("A");

  return 0;
}
A
m: 3 n: 3 
Symmetry: general; Storage: full

 4.000e+00 -1.000e+00  0.000e+00 
-1.000e+00  3.000e+00 -5.000e-01 
 0.000e+00 -5.000e-01  4.000e+00 


1.3. Compiling and running

Composyx is written in C++20. In this tutorial we use the GNU compiler g++. The beginning of the command line will be:

g++ -std=c++20

Since we are using header-only C++ libraries, all we need to do is to include the correct headers to compile. We need to indicate where the headers of Composyx and <T>LAPACK are:

-I${COMPOSYX_INCLUDES} -I${TLAPACK_INCLUDES}

We an extra flag to indicate to Composyx that we want to use the header-only implementation, renouncing to the dependencies that require compilation and linking. In particular, we can't use MPI or a compiled BLAS implementation with this flag.

-DCOMPOSYX_HEADER_ONLY

Remark: if it's more convenient for you, it is also possible to add the definition in the code before including composyx.hpp rather that using the flag in the command line:

#include <iostream>
#include <vector>
#define COMPOSYX_HEADER_ONLY
#include <composyx.hpp>

//...

So the command line to compile our example minimal.cpp is:

g++ -std=c++20 -I${COMPOSYX_INCLUDES} -I ${TLAPACK_INCLUDES} -DCOMPOSYX_HEADER_ONLY minimal.cpp -o minimal

And we can run the code with ./minimal. We obtain the output described in the previous section.

1.4. Explanation of the program

The only feature used here is the class DenseMatrix implemented in Composyx. It corresponds to a matrix where all coefficient are stored in a 2D-array. In this example we created a 3 by 3 matrix \(A\) with the constructor DenseMatrix<double> A(3, 3), filled with zeros. The scalar type of the coefficients of the matrix is given as a template parameter, here we use double precision floating-point arithmetic for real numbers, which corresponds to the type double. Then, we can access and assign the coefficients \(a_{i, j}\) of the matrix with A(i, j). Here we filled \(A\) with some values. Finally, we displayed the matrix on the standard output using the display method of the class. Optionally we can pass the name of the matrix we display as a parameter of the method, so that we can easily identify it on the output (here we passed "A").

On the output, we can see the name of the matrix "A", followed by its description. \(m\) is the number of rows, \(n\) the number of columns. We can see that no assumption is made on the properties of the matrix: although the matrix is clearly symmetric, the program doesn't have this piece of information unless specified by the user. For this example it is not necessary make this effort.

1.5. Another example with a vector

Let's write another simple example to perform a matrix-vector product.

#include <composyx.hpp>

int main(){

  using namespace composyx;

  DenseMatrix<double> A({ 4,   -1,    0,
                         -1,    3, -0.5,
                          0, -0.5,    4}, 3, 3);

  Vector<double> u{1, 2, 3};
  Vector<double> b = A * u;

  u.display("u");
  b.display("b");

  return 0;
}
u
m: 3 (vector)

1
2
3

b
m: 3 (vector)

2
3.5
11

First we introduced a new constructor for the DenseMatrix. The first parameter is a bracket list of the matrix coefficients, then its number of rows \(m\) and number of columns \(n\). The coefficients are given in column-major format, meaning the \(m\) first values describe the first column, the \(m\) next the second column, etc. Here the matrix is symmetric so it does not make a difference, but you can try to modify a non diagonal coefficient and display the matrix to see the effect.

Secondly, we declared \(u\), a Vector of double. Actually Vector is special case of DenseMatrix where the number of columns is fixed to 1. The constructor is therefore very similar, you only need to give the list of the coefficients of the vectors, and the size of the vector is deduced from the size of the list.

Then we perform a matrix vector product of the matrix \(A\) and \(u\) and store the result in a new Vector \(b\). We display \(u\) and \(b\) in the same way we displayed the matrix \(A\) before, using the display method.

1.6. Solving a linear system

1.6.1. With a Jacobi iterative solver

Now let's try to solve a linear system. The matrix \(A\) is diagonally dominant matrix, so we can try so solve it with a Jacobi iterative solver. On the following example, we use this iterative method to find \(x\) such that \(Ax = b\). We can display \(x\) and make sure it has the same values as \(u = (1, 2, 3)^T\).

#include <composyx.hpp>
#include <composyx/solver/Jacobi.hpp>

int main(){

  using namespace composyx;

  DenseMatrix<double> A({ 4,  -1,    0,
                          -1,   3, -0.5,
                          0, -0.5,   4}, 3, 3);

  Vector<double> u{1, 2, 3};
  Vector<double> b = A * u;

  Jacobi<DenseMatrix<double>, Vector<double>> solver;
  solver.setup(parameters::A{A},
               parameters::verbose{true},
               parameters::max_iter{20},
               parameters::tolerance{1e-8});

  Vector<double> x = solver * b;

  x.display("x");

  return 0;
}
Starting Jacobi. Max iterations: 20
Stopping criterion ||r||/||b|| < 1e-08
with ||b|| = 11.7154
---
0 -  ||b-Ax||/||b||	1
1 -	||b-Ax||/||b||	0.194964
2 -	||b-Ax||/||b||	0.067276
3 -	||b-Ax||/||b||	0.0203088
4 -	||b-Ax||/||b||	0.00700792
5 -	||b-Ax||/||b||	0.0021155
6 -	||b-Ax||/||b||	0.000729992
7 -	||b-Ax||/||b||	0.000220364
8 -	||b-Ax||/||b||	7.60408e-05
9 -	||b-Ax||/||b||	2.29546e-05
10 -	||b-Ax||/||b||	7.92092e-06
11 -	||b-Ax||/||b||	2.39111e-06
12 -	||b-Ax||/||b||	8.25095e-07
13 -	||b-Ax||/||b||	2.49073e-07
14 -	||b-Ax||/||b||	8.59474e-08
15 -	||b-Ax||/||b||	2.59452e-08
16 -	||b-Ax||/||b||	8.95286e-09
x
m: 3 (vector)

1
2
3

We include the header with the conjugate gradient in composyx/solver/Jacobi.hpp to have access to the solver. We need to indicate the type of matrix and vector used to the solver in template parameters: DenseMatrix<double> for the matrix, and Vector<double> for the vector.

Many parameters can be given to the solver, under the namespace parameters. Here we are giving the most useful ones.

  • parameters::A, which is mandatory, is the input square matrix, or the operator of the system;
  • parameters::tolerance is the targeted precision of the solution, more specifically, the iterative solver considers the convergence achieved when the backward error \(||b - Ax||_2 / ||b||_2\) is lower than this value;
  • parameters::max_iter is the maximum number of iterations. When reached, the solver stops even if it has not converged to the required tolerance;
  • parameters::verbose is a boolean value, false by default, printing information on the standard output when true.

    Those parameters can be passed to the setup method in any order and have default values if not given (although they may not be relevant values in your context).

    Once setup, solving the linear system consists in a simple multiplication: solver * rhs, returning the solution vector. One may consider the solver instance as the \(A^{-1}\) operator, even if no explicit inversion of a matrix is ever performed.

1.6.2. With a conjugate gradient

The matrix \(A\) is actually not only diagonally dominant, it's also symmetric positive definite. Therefore we can use a conjugate gradient to solve the system. Since iterative solvers share the same set of basic parameters, we can take the same code and just replace Jacobi by ConjugateGradient to solve the system.

#include <composyx.hpp>
#include <composyx/solver/ConjugateGradient.hpp>

int main(){

  using namespace composyx;

  DenseMatrix<double> A({ 4,  -1,    0,
                          -1,   3, -0.5,
                          0, -0.5,   4}, 3, 3);

  Vector<double> u{1, 2, 3};
  Vector<double> b = A * u;

  ConjugateGradient<DenseMatrix<double>, Vector<double>> solver;
  solver.setup(parameters::A{A},
               parameters::verbose{true},
               parameters::max_iter{20},
               parameters::tolerance{1e-8});

  Vector<double> x = solver * b;

  x.display("x");

  return 0;
}
Starting Conjugate Gradient. Max iterations: 20
Stopping criterion ||r||/||b|| < 1e-08
with ||b|| = 11.7154
---
0 -  ||r||/||b||  1
1 -	||r||/||b||  0.248805
2 -	||r||/||b||  0.048057
3 -	||r||/||b||  2.6488e-18
  ||b-Ax||/||b||  0
x
m: 3 (vector)

1
2
3

As expected, we only need 3 (the size of the linear system) iterations to converge with the conjugate gradient.

2. Sparse matrices

We propose 3 types of sparse matrices in Composyx:

  • SparseMatrixCOO: a matrix in coordinate format (also known as IJV format);
  • SparseMatrixCSC: a matrix in Compressed Sparse Column format;
  • SparseMatrixLIL: a matrix in list of lists format, a special format for coefficient insertion.

2.1. Create a sparse matrix with SparseMatrixLIL

The easiest way to create a sparse matrix is to insert the coefficients of the matrix one by one. The only format that allows this kind of insertion is SparseMatrixLIL, which is implemented with an array of \(n\) (the number of columns) lists. Thus insertion can be performed in linear time.

#include <composyx.hpp>
#include <composyx/loc_data/SparseMatrixLIL.hpp>

int main(){
  using namespace composyx;

  SparseMatrixLIL<double> lilmat(3, 3);

  lilmat.insert(0, 0, 4);

  lilmat.insert(1, 0, -1);
  lilmat.insert(0, 1, -1);

  lilmat.insert(1, 1, 3);

  lilmat.insert(1, 2, -0.5);
  lilmat.insert(2, 1, -0.5);

  lilmat.insert(2, 2, 4);

  lilmat.display("lilmat");

  lilmat.to_dense().display("lilmat converted to dense");

  return 0;
}
lilmat
m: 3 n: 3 nnz: 7
Symmetry: general; Storage: full

i	v
--- Col 0
0	4
1	-1
--- Col 1
0	-1
1	3
2	-0.5
--- Col 2
1	-0.5
2	4

lilmat converted to dense
m: 3 n: 3 
Symmetry: general; Storage: full

 4.000e+00 -1.000e+00  0.000e+00 
-1.000e+00  3.000e+00 -5.000e-01 
 0.000e+00 -5.000e-01  4.000e+00 


We included the header for the lil matrix: composyx/loc_data/SparseMatrixLIL.hpp. First we construct the sparse matrix its dimensions \((m, n)\), and then we add the coefficients \((i, j, v)\) (index of row, index of column and value of the coefficient respectively) one by one. The matrix type is very well suited when you want to construct a matrix but you don't have information about the final numbers of nonzero elements it contains.

2.2. Using half storage for symmetric matrices

As we fill this symmetric sparse matrix, we can see an obvious optimization: half storage. Indeed, if we could indicate that the matrix is symmetric, we would not need to add twice the same value \(v\) in \((i, j)\) and \((j, i)\).

#include <composyx.hpp>
#include <composyx/loc_data/SparseMatrixLIL.hpp>

int main(){
  using namespace composyx;

  SparseMatrixLIL<double> lilmat(3, 3);

  lilmat.insert(0, 0, 4);
  lilmat.insert(1, 0, -1);
  lilmat.insert(1, 1, 3);
  lilmat.insert(2, 1, -0.5);
  lilmat.insert(2, 2, 4);

  lilmat.set_property(MatrixSymmetry::symmetric, MatrixStorage::lower);

  lilmat.display("lilmat");

  lilmat.to_dense().display("lilmat converted to dense");

  return 0;
}
lilmat
m: 3 n: 3 nnz: 5
Symmetry: symmetric; Storage: lower

i	v
--- Col 0
0	4
1	-1
--- Col 1
1	3
2	-0.5
--- Col 2
2	4

lilmat converted to dense
m: 3 n: 3 
Symmetry: symmetric; Storage: lower

 4.000e+00          -          - 
-1.000e+00  3.000e+00          - 
 0.000e+00 -5.000e-01  4.000e+00 


Here we filled only the lower triangular part of the matrix. We can indicate with set_property(MatrixSymmetry::symmetric) that the matrix is symmetric, and then with lilmat.set_property(MatrixStorage::lower) that only the lower part of the matrix is stored. These methods can be called on any type of matrix in Composyx, and any kind of matrix property. As shown on the example, it's possible to set several properties at once when calling set_property.

Notice that for the DenseMatrix constructed at the end, the conversion propagates the properties, and only the lower triangular part of the matrix is displayed. The other coefficients, whatever their values, won't be used in any operation performed with the matrix.

Remark: some shortcuts exist to fill the properties of a matrix. Here in particular, we have an symmetric positive definite matrix, so we could (should?) have done lilmat.set_spd(MatrixStorage::lower) to set all properties at the same time.

For more information on properties, see MatrixProperties.

2.3. Dealing with duplicates

What happens when we add twice a value in the same locate \((i, j)\) ? Well it's entirely up to you. The default behavior is to sum the two values given, but a lambda function can be passed to override this. The function takes two parameters: the first one is a reference to the value stored in the matrix, and the second is a const reference to the value you are inserting.

Let's see on a example:

#include <composyx.hpp>
#include <composyx/loc_data/SparseMatrixLIL.hpp>

int main(){
  using namespace composyx;

  SparseMatrixLIL<double> lilmat(2, 2);

  std::cout << "Default behavior: add coefficients\n";
  std::cout << "Insert (0, 0) -> 1\n";
  lilmat.insert(0, 0, 1);
  std::cout << "Insert (0, 0) -> 0.5\n";
  lilmat.insert(0, 0, 0.5);
  lilmat.to_dense().display("lilmat converted to dense");

  auto replace_coeff = [](double& present_v, const double& insert_v){ present_v = insert_v; };

  std::cout << "\nNew behavior: replace coefficients\n";
  std::cout << "Insert (0, 1) -> 1\n";
  lilmat.insert(0, 1, 1, replace_coeff);
  std::cout << "Insert (0, 1) -> 0.5\n";
  lilmat.insert(0, 1, 0.5, replace_coeff);
  lilmat.to_dense().display("lilmat updated 1");

  auto mean_coeff = [](double& present_v, const double& insert_v){ present_v = (present_v + insert_v) / 2; };

  std::cout << "\nNew behavior: compute the mean of the coefficients\n";
  std::cout << "Insert (1, 1) -> 1\n";
  lilmat.insert(1, 1, 1, mean_coeff);
  std::cout << "Insert (1, 1) -> 0.5\n";
  lilmat.insert(1, 1, 0.5, mean_coeff);
  lilmat.to_dense().display("lilmat updated 2");

  return 0;
}
Default behavior: add coefficients
Insert (0, 0) -> 1
Insert (0, 0) -> 0.5
lilmat converted to dense
m: 2 n: 2 
Symmetry: general; Storage: full

 1.500e+00  0.000e+00 
 0.000e+00  0.000e+00 



New behavior: replace coefficients
Insert (0, 1) -> 1
Insert (0, 1) -> 0.5
lilmat updated 1
m: 2 n: 2 
Symmetry: general; Storage: full

 1.500e+00  5.000e-01 
 0.000e+00  0.000e+00 



New behavior: compute the mean of the coefficients
Insert (1, 1) -> 1
Insert (1, 1) -> 0.5
lilmat updated 2
m: 2 n: 2 
Symmetry: general; Storage: full

 1.500e+00  5.000e-01 
 0.000e+00  7.500e-01 


Remark: after the construction of your matrix, you may have zeros or almost zeros values stored in your matrix. You can remove them with the .drop() method. Optionally, this method takes a real value tolerance, and all values lower than this tolerance in absolute value will be removed. For example, you can write: lilmat.drop(1e-12).

2.4. Conversion to other matrix types

The lil format cannot be used to do any other operation than inserting at present. Once you are satisfied with the lil matrix you have built, you can convert it to the format you want to use: DenseMatrix, SparseMatrixCOO or SparseMatrixCSC.

#include <composyx.hpp>
#include <composyx/loc_data/SparseMatrixLIL.hpp>

int main(){
  using namespace composyx;

  SparseMatrixLIL<double> lilmat(2, 2);

  lilmat.insert(0, 0, 4);
  lilmat.insert(1, 0, -1);
  lilmat.insert(1, 1, 3);
  lilmat.display("lilmat");

  const SparseMatrixCOO<double> coomat(lilmat.to_coo());
  coomat.display("coomat");

  const SparseMatrixCSC<double> cscmat(lilmat.to_csc());
  cscmat.display("cscmat");

  const DenseMatrix<double> densemat(lilmat.to_dense());
  densemat.display("densemat");

  return 0;
}
lilmat
m: 2 n: 2 nnz: 3
Symmetry: general; Storage: full

i	v
--- Col 0
0	4
1	-1
--- Col 1
1	3

coomat
m: 2 n: 2 nnz: 3
Symmetry: general; Storage: full

i	j	v

0	0	4
1	0	-1
1	1	3

cscmat
m: 2 n: 2 nnz: 3
Symmetry: general; Storage: full

i	v
--- Col 0
0	4
1	-1
--- Col 1
1	3

densemat
m: 2 n: 2 
Symmetry: general; Storage: full

 4.000e+00  0.000e+00 
-1.000e+00  3.000e+00 


Remark: notice that the header file SparseMatrixLIL includes the headers of the other types of matrices (DenseMatrix, SparseMatrixCOO and SparseMatrixCSC).

2.5. Other sparse

While constructing a SparseMatrixLIL adding the coefficients one by one can be convenient, it is not optimal if you already know the number of non-zero values before hand. In that case it is simpler to just pass the \(i\), \(j\) and \(v\) arrays to build the coo, csc or csr matrix directly.

Several constructors are available to construct the matrix :

\begin{pmatrix} 0.5 & 1.5 \\ 0 & 1.0 \end{pmatrix}
#include <composyx.hpp>
#include <composyx/loc_data/SparseMatrixCOO.hpp>

int main(){
  using namespace composyx;

  const int M = 2; // Number of rows
  const int N = 2; // Number of columns
  const int NNZ = 3; // Number of non-zero values

  // Pass 3 initalizer lists i, j and v, then dimensions M and N
  SparseMatrixCOO<double> coomat_1({0, 0, 1}, {0, 1, 1}, {0.5, 1.5, 2}, M, N);
  coomat_1.display("coomat build with initializer lists");

  // Pass dimensions M, N then NNZ and 3 pointers to the arrays for i, j and v
  // The arrays are not destroyed after the operation: values are copied
  int v_i[] = {0, 0, 1};
  int v_j[] = {0, 1, 1};
  double v_v[] = {0.5, 1.5, 2};
  SparseMatrixCOO<double> coomat_2(M, N, NNZ, v_i, v_j, v_v);

  coomat_2.display("coomat build from pointers");

  // Use composyx std::vector class (an extended std::vector) We move the values
  // into the SparseMatrixCOO class, the std::vector are empty after the
  // operation but no copy is performed
  std::vector<int> ia_i{0, 0, 1};
  std::vector<int> ia_j{0, 1, 1};
  std::vector<double> ia_v{0.5, 1.5, 2};
  SparseMatrixCOO<double> coomat_3(M, N, NNZ, std::move(ia_i), std::move(ia_j), std::move(ia_v));

  coomat_3.display("coomat build with std::vector");

  return 0;
}
coomat build with initializer lists
m: 2 n: 2 nnz: 3
Symmetry: general; Storage: full

i	j	v

0	0	0.5
0	1	1.5
1	1	2

coomat build from pointers
m: 2 n: 2 nnz: 3
Symmetry: general; Storage: full

i	j	v

0	0	0.5
0	1	1.5
1	1	2

coomat build with std::vector
m: 2 n: 2 nnz: 3
Symmetry: general; Storage: full

i	j	v

0	0	0.5
0	1	1.5
1	1	2

The same example can be written with SparseMatrixCSC instead of SparseMatrixCOO. We do not describe here how the csc format works, but in our case the arrays for \(i\) and \(v\) are the same as the ones for the coordinate format, only the \(j\) array (of size \(N + 1\)) is different.

#include <iostream>
#include <vector>
#include <composyx.hpp>
#include <composyx/loc_data/SparseMatrixCSC.hpp>

int main(){
  using namespace composyx;

  const int M = 2; // Number of rows
  const int N = 2; // Number of columns
  const int NNZ = 3; // Number of non-zero values

  std::vector<int> ia_i{0, 0, 1};
  std::vector<int> ia_j{0, 1, 3};
  std::vector<double> ia_v{0.5, 1.5, 2};
  SparseMatrixCSC<double> cscmat(M, N, NNZ, std::move(ia_i), std::move(ia_j), std::move(ia_v));

  cscmat.display("csc build with std::vector");

  return 0;
}
csc build with std::vector
m: 2 n: 2 nnz: 3
Symmetry: general; Storage: full

i	v
--- Col 0
0	0.5
--- Col 1
0	1.5
1	2

Same thing with SparseMatrxCSR:

#include <iostream>
#include <vector>
#include <composyx.hpp>
#include <composyx/loc_data/SparseMatrixCSC.hpp>

int main(){
  using namespace composyx;

  const int M = 2; // Number of rows
  const int N = 2; // Number of columns
  const int NNZ = 3; // Number of non-zero values

  std::vector<int> ia_i{0, 2, 3};
  std::vector<int> ia_j{0, 1, 1};
  std::vector<double> ia_v{0.5, 1.5, 2};
  SparseMatrixCSR<double> csrmat(M, N, NNZ, std::move(ia_i), std::move(ia_j), std::move(ia_v));

  csrmat.display("csr build with std::vector");

  return 0;
}
csr build with std::vector
m: 2 n: 2 nnz: 3
Symmetry: general; Storage: full

j
v
--- Row 0
	    0          1 
 5.000e-01  1.500e+00 
--- Row 1
	    1 
 2.000e+00 

Remark: these format also support conversion, as for the lil matrices, with the methods .to_coo(), .to_csc(), to_csr() and to_dense().

2.6. Solving the system

Now let's take a look at the system we wanted to solve: \(A\) is a symmetric definite positive matrix with 7 non-zero values (5 if we use half storage). Let's build this matrix as a coo sparse matrix in lower storage and use the conjugate gradient to solve the system:

#include <composyx.hpp>
#include <composyx/solver/ConjugateGradient.hpp>
#include <composyx/loc_data/SparseMatrixCOO.hpp>

int main(){

  using namespace composyx;

  SparseMatrixCOO<double> A({0, 1, 1, 2, 2},
                            {0, 0, 1, 1, 2},
                            {4, -1, 3, -0.5, 4},
                            3, 3);
  A.set_spd(MatrixStorage::lower);

  Vector<double> u{1, 2, 3};
  Vector<double> b = A * u;

  ConjugateGradient<SparseMatrixCOO<double>, Vector<double>> solver;
  solver.setup(parameters::A{A},
               parameters::verbose{true},
               parameters::max_iter{20},
               parameters::tolerance{1e-8});

  Vector<double> x = solver * b;

  x.display("x");

  return 0;
}
Starting Conjugate Gradient. Max iterations: 20
Stopping criterion ||r||/||b|| < 1e-08
with ||b|| = 11.7154
---
0 -  ||r||/||b||  1
1 -	||r||/||b||  0.248805
2 -	||r||/||b||  0.048057
3 -	||r||/||b||  2.6488e-18
  ||b-Ax||/||b||  0
x
m: 3 (vector)

1
2
3

If we look at the modification compared to the dense version, we only build a SparseMatrixCOO instead of a DenseMatrix, and we indicated to the ConjugateGradient solver that our operator \(A\) is given by the matrix format SparseMatrixCOO. The rest of the code did not change.

One could make a very generic code using the C++ tools: templates, decltype, auto

#include <composyx.hpp>
#include <composyx/solver/ConjugateGradient.hpp>
#include <composyx/loc_data/SparseMatrixCOO.hpp>

using namespace composyx;

template<typename Mat, typename Vect>
void solve_with_cg(const Mat& A, const Vect& b){
  ConjugateGradient<Mat, Vect> solver;
  solver.setup(parameters::A{A},
               parameters::verbose{true},
               parameters::max_iter{20},
               parameters::tolerance{1e-8});

  auto x = solver * b;
  x.display("x");
}

int main(){
  SparseMatrixCOO<double> A_coo({0, 1, 1, 2, 2},
                                {0, 0, 1, 1, 2},
                                {4, -1, 3, -0.5, 4},
                                3, 3);
  A_coo.set_spd(MatrixStorage::lower);

  const Vector<double> u{1, 2, 3};
  const Vector<double> b = A_coo * u;

  std::cout << "Solve CG with COO matrix\n";
  solve_with_cg(A_coo, b);

  const DenseMatrix<double> A_dense = A_coo.to_dense();
  std::cout << "Solve CG with dense matrix\n";
  solve_with_cg(A_dense, b);

  return 0;
}
Solve CG with COO matrix
Starting Conjugate Gradient. Max iterations: 20
Stopping criterion ||r||/||b|| < 1e-08
with ||b|| = 11.7154
---
0 -  ||r||/||b||  1
1 -	||r||/||b||  0.248805
2 -	||r||/||b||  0.048057
3 -	||r||/||b||  2.6488e-18
  ||b-Ax||/||b||  0
x
m: 3 (vector)

1
2
3

Solve CG with dense matrix
Starting Conjugate Gradient. Max iterations: 20
Stopping criterion ||r||/||b|| < 1e-08
with ||b|| = 11.7154
---
0 -  ||r||/||b||  1
1 -	||r||/||b||  0.248805
2 -	||r||/||b||  0.048057
3 -	||r||/||b||  2.6488e-18
  ||b-Ax||/||b||  0
x
m: 3 (vector)

1
2
3

3. A domain decomposition example

Introduction

In this tutorial we shall compile and run a small example of Composyx on the supercomputer plafrim or on a simple unix laptop.

3.1. Environment setup

3.1.1. guix setup

The simplest way to install Composyx is to use guix. This piece of software is available on plafrim and can be easily installed on a laptop. The Composyx package is distributed in the guix-hpc repository. You may also be interested by the guix-hpc-non-free package which gives access to additional non open-source packages such as the intel MKL.

To check that you installation is correct, the following command should give you an output:

guix search composyx

and you should be able to find the Composyx package. If it is not the case, refer to the documentations on repositories linked above.

You can then check that the Composyx package compiles:

guix build composyx

3.1.2. guix environment

To obtain a clean environment with Composyx, we launch a bash in a pure guix shell:

guix shell --pure composyx

In this environment, only Composyx is available. Furthermore, environment variables are set so that every library can be easily found by a compilation tool such as CMake (you can check the values of C_INCLUDE_PATH, LIBRARY_PATH, PKG_CONFIG_PATH…)

In particular, you will find the c++ headers of Composyx:

#NB: add coreutils to use "ls"
guix shell --pure composyx coreutils
ls $GUIX_ENVIRONMENT/include/composyx

To get an environment containing the dependencies of Composyx (but not Composyx itself), you can use the flag -D to obtain the development environment of the package.

guix shell --pure -D composyx

Now if you want Composyx and all its dependencies you can combine both commands with:

guix shell --pure -D composyx composyx
  1. Add other packages in the environment

    You can add other packages available in guix in your environment, for instance you may need your favorite text editor.

    guix shell --pure -D composyx composyx emacs
    
  2. Using MKL

    If you have guix-hpc-non-free setup and you want to use intel MKL, you can try to use the package recipe mapyhys++-mkl.

    guix shell --pure -D composyx-mkl composyx-mkl
    

    Under the hood, the recipe will replace openblas with mkl.

3.2. Compile an example

3.2.1. Example description

  1. Domain decomposition

    In the example presented, the adjacency graph of the global matrix consists in 7 vertices which are assigned to the following subdomains:

    Node 0 1 2 3 4 5 6
    Subdomain(s) 0, 2 0 0, 1 0, 1, 2 1 1, 2 2

    The MPI ranks are numbered from 0 to \(N\) - 1 consistently with the default MPI numbering. However, subdomains are numbered from 1 to \(N\) to stay consistent with previous chapters where index 0 is reserved for the coarse space. In the code, the subdomain \(\Omega_i\) is therefore identified its unique ID \(i\). A local numbering of the unknowns is used in each subdomain: in the example, the local indices [0, 1, 2, 3] in \(\Omega_0\) correspond to the same global indices [0, 1, 2, 3]. The global indices corresponding to local indices [0, 1, 2, 3] of \(\Omega_1\) and \(\Omega_2\) are [3, 2, 4, 5] and [5, 6, 0, 3], respectively. Each subdomain has to know which unknowns it shares with which other subdomains (dotted lines in the figure (c)). In our example, for instance, in local ordering:

    • \(\Omega_0\) shares [2, 3] with \(\Omega_1\) and [0, 3] with \(\Omega_2\).
    • \(\Omega_1\) shares [1, 0] with \(\Omega_0\) and [0, 3] with \(\Omega_2\).
    • \(\Omega_2\) shares [2, 3] with \(\Omega_0\) and [3, 0] with \(\Omega_1\).

      example_subdomains.svg

      The global domain in (a) is partitioned in three subdomains \(\Omega_0\), \(\Omega_1\) and \(\Omega_2\) in (b). Each subdomain can use a local ordering of its unknowns as in (c); duplicate unknowns in different subdomains are linked by a dotted line.}

      example_sd_intrf.svg

      The interior unknowns can be eliminated to keep only interface unknown that are shared by two subdomains or more (a). A global interface numbering is introduced (b). The global interface can be expressed as the union of the three local interfaces \(\G_0\), \(\G_1\) and \(\G_2\) in (c).

  2. Input matrix

    The system we are solving is \(K . u = f\), where \(K\) is a partitioned sparse SPD matrix, \(u\) the unknown vector and \(f\) the right hand side.

    For the sake of simplicity, we put the same local matrix on each subdomain. Notice that the sparse structure of the matrix corresponds to the domain decomposition describe above.

    Local matrices: \[\begin{pmatrix} 3 &-1 & 0 &-1 \\ -1 & 4 &-1 &-1 \\ 0 &-1 & 3 &-1 \\ -1 &-1 &-1 & 4 \end{pmatrix}\]

    Which results in this global matrix:

    Global matrix: \[\K = \begin{pmatrix} 6 & -1 & 0 & -2 & 0 & 0 & -1 \\ -1 & 4 & -1 & -1 & 0 & 0 & 0 \\ 0 & -1 & 7 & -2 & -1 & -1 & 0 \\ -2 & -1 & -2 & 11 & 0 & -2 & -1 \\ 0 & 0 & -1 & 0 & 3 & -1 & 0 \\ 0 & 0 & -1 & -2 & -1 & 7 & -1 \\ -1 & 0 & 0 & -1 & 0 & -1 & 4 \end{pmatrix}\]

    The solution vector we expect to find is simply: \[u = \begin{pmatrix} 0 \\ 1 \\ 2 \\ 3 \\ 4 \\ 5 \\ 6 \end{pmatrix}\]

    Thus, we shall use \(f = \K u\) for the right hand side vector:

    Global vector: \[f = \begin{pmatrix} -13 \\ -1 \\ -2 \\ 12 \\ 5 \\ 17 \\ 16 \end{pmatrix}\]

    The vectors are always assembled. So the local right hand side vectors are respectively:

    \[\begin{pmatrix} -13 \\ -1 \\ -2 \\12 \end{pmatrix}, \begin{pmatrix} 12 \\ -2 \\ 5 \\ 17 \end{pmatrix} \text{, and } \begin{pmatrix} 17 \\ 16 \\ -13 \\ 12 \end{pmatrix}\]

3.2.2. Example code

#include <iostream>
#include <vector>
#include <mpi.h>
#include <composyx.hpp>
#include <composyx/part_data/PartMatrix.hpp>
#include <composyx/solver/Pastix.hpp>
#include <composyx/solver/ConjugateGradient.hpp>
#include <composyx/solver/PartSchurSolver.hpp>

int main(){

  using namespace composyx;
  using Scalar = double;

  MMPI::init();
  {
    const int rank = MMPI::rank();

    const int M = 4;
    const int NNZ = 9;
    const int N_DOFS = 4;
    const int N_SUBDOMAINS = 3;

    // Bind subdomains to MPI process
    std::shared_ptr<Process> p = bind_subdomains(N_SUBDOMAINS);

    // Define topology for each subdomain
    std::vector<Subdomain> sd;
    if(p->owns_subdomain(0)){
      std::map<int, std::vector<int>> nei_map_0 {{1, {2, 3}},
                                                {2, {0, 3}}};
      sd.emplace_back(0, N_DOFS, std::move(nei_map_0));
    }
    if(p->owns_subdomain(1)){
      std::map<int, std::vector<int>> nei_map_1 {{0, {1, 0}},
                                                {2, {0, 3}}};
      sd.emplace_back(1, N_DOFS, std::move(nei_map_1));
    }
    if(p->owns_subdomain(2)){
      std::map<int, std::vector<int>> nei_map_2 {{0, {2, 3}},
                                                {1, {3, 0}}};
      sd.emplace_back(2, N_DOFS, std::move(nei_map_2));
    }
    p->load_subdomains(sd);

    // Local matrices (here the same matrix on every subdomain)
    std::vector<int> a_i{0, 0, 0, 1, 1, 1, 2, 2, 3};
    std::vector<int> a_j{0, 1, 3, 1, 2, 3, 2, 3, 3};
    std::vector<Scalar> a_v{3, -1, -1, 4, -1, -1, 3, -1, 4};
    SparseMatrixCOO<Scalar, int> A_loc(M, M, NNZ, a_i.data(), a_j.data(), a_v.data());
    A_loc.set_spd(MatrixStorage::upper);

    // Create the partitioned matrix
    std::map<int, SparseMatrixCOO<Scalar, int>> A_map;
    for(int k = 0; k < 3; ++k) if(p->owns_subdomain(k)) A_map[k] = A_loc;
    const PartMatrix<SparseMatrixCOO<Scalar>> A(p, std::move(A_map));

    // Local rhs vectors
    std::map<int, Vector<Scalar>> b_map;
    if(p->owns_subdomain(0)) b_map[0] = Vector<Scalar>{-13, -1, -2, 12};
    if(p->owns_subdomain(1)) b_map[1] = Vector<Scalar>{12, -2, 5, 17};
    if(p->owns_subdomain(2)) b_map[2] = Vector<Scalar>{17, 16, -13, 12};

    // Create the partitioned vector
    const PartVector<Vector<Scalar>> b(p, std::move(b_map));

    // Solver definitions
    using Solver_K = Pastix<SparseMatrixCOO<Scalar>, Vector<Scalar>>;
    using Solver_S = ConjugateGradient<PartMatrix<DenseMatrix<Scalar>>, PartVector<Vector<Scalar>>>;
    using Solver = PartSchurSolver<PartMatrix<SparseMatrixCOO<Scalar>>, PartVector<Vector<Scalar>>, Solver_K, Solver_S>;

    // Setup
    Solver schur_solver;
    schur_solver.setup(A);

    // Set parameters for the solver S (ConjugateGradient)
    Solver_S& iter_solver = schur_solver.get_solver_S();
    iter_solver.setup(parameters::max_iter{50},
                      parameters::tolerance{1e-8},
                      parameters::verbose{(rank == 0)});

    // Solve
    auto X = schur_solver * b;

    X.display_centralized("X found");
    if(rank == 0) std::cout << "Niter: " << iter_solver.get_n_iter() << '\n';

  }
  MMPI::finalize();

  return 0;
}

First we need an example of Composyx usage. Let's take the example in src/examples/simple_part.cpp of the Composyx repository:

It is an example that solves a distributed sparse linear system on 3 subdomains, using Pastix to compute the Schur complement and a conjugate gradient.

3.2.3. Compilation

Now let's create a directory tutorial and copy this file simple_part.cpp inside:

mkdir tutorial && cd tutorial
#create or copy simple_part.cpp

In order to get Composyx and all its dependencies, we use the following CMakeLists.txt file:

cmake_minimum_required(VERSION 3.12)

project(COMPOSYX_EXAMPLE CXX C Fortran)

find_package(composyx REQUIRED)

# Need c++20
set(CMAKE_CXX_STANDARD 20)
set(CMAKE_CXX_STANDARD_REQUIRED ON)

# compile your example
add_executable(simple_part simple_part.cpp)

# link to composyx
target_link_libraries(simple_part PRIVATE COMPOSYX::composyx)

Now all we can compile our example using the standard CMake way:

mkdir build && cd build
cmake ..
make

3.2.4. Running the example

Let's run the example. It is written so that it can be run with any number of MPI processes. However, the optimal number of processes is one per subdomain, which is 3 in this examples:

mpirun -np 3 ./simple_part

The output should look like:

Starting Conjugate Gradient. Max iterations: 50
Stopping criterion ||r||/||b|| < 1e-08
with ||b|| = 29.116
---
0 -  ||r||/||b||  0.97707
1 -	||r||/||b||  0.441184
2 -	||r||/||b||  0.220872
3 -	||r||/||b||  0.0646697
4 -	||r||/||b||  2.80055e-17
  ||b-Ax||/||b||  1.49027e-16
X found
m: 7 (vector)

2.22045e-16
1
2
3
4
5
6

Niter: 4

4. Dense matrices

Introduction

In this tutorial we show an examples of dense matrices operations on a laptop and on the supercomputer plafrim, using either Blas/Lapack (e.g. Intel MKL, OpenBLAS) or Chameleon. The Composyx class used is DenseMatrix for which there are associated kernels (scal, axpy, norm2, gemm, symm, eig, svd, etc) and solvers (getrf/s, potrf/s, geqrf/s, gelqf/s, trsm). Note that operations on dense matrices are limited to one single machine node for now (no distributed MPI) but with Chameleon there is the possibility to exploit GPUs in addition to CPUs.

4.1. Source code

The source code is a C++ file demo_dense.cpp which relies on Composyx to

  1. perform a dense matrix multiplication \(C = A * B\),
  2. solve a linear system \(A * x = b\) with \(A\) symmetric positive definite (SPD) using Cholesky factorization \(A=LL^T\),
  3. solve an overdetermined system \(A * x = b\) with \(A\) rectangular using least squares and a factorization \(A=QR\).
  4. compute the Randomized SVD of a rank-R general matrix.
  5. compute the Randomized EVD of a rank-R symmetric matrix.

The Chameleon C library is also used to generate random matrices cf. CHAMELEON_dplrnt (general matrix) and CHAMELEON_dplgsy (SPD matrix). Chameleon must be initialized with Composyx i.e. composyx::initialize. The number of CPUs, GPUs can be set here i.e. composyx::initialize(nc, ng). The size of sub-blocks (tiles) to be used by chameleon can be set by calling composyx::setChameleonParameters(nb). If not set, those parameters will be chosen by default internally and will try to use all CPUs, no GPUs and tile size equal 320. Note that the composyx::initialize, when the executable is compiled with COMPOSYX_USE_CHAMELEON, will set Blas/Lapack number of threads to 1. Use setNumThreads(getMaxThreads()); before calls to Blas/Lapack kernels and setNumThreads(1); before calls to Chameleon kernels.

#include <composyx.hpp>

using namespace composyx;

int main(int argc, char *argv[]) {

  using Scalar = double;
  using Real = typename arithmetic_real<Scalar>::type;
  using Matrix = DenseMatrix<Scalar>;

  /* Initialize Chameleon context */
  initialize(4, 0);
  setChameleonParameters(640);

  /* Matrix product GEMM: C=AB */

  // Fill the matrix with random values
  const int M=10;
  Matrix A(M, M);
  Matrix B(M, M);
  Matrix C(M, M);

  const int seedA=10;
  CHAMELEON_dplrnt( M, M, get_ptr(A), M, seedA );
  const int seedB=20;
  CHAMELEON_dplrnt( M, M, get_ptr(B), M, seedB );

  // Compute the product (GEMM)
  chameleon_kernels::gemm(A, B, C);


  /* Linear system solve (Cholesky): Ax=b */

  // Fill the matrix A with random values so that it is square and SPD
  CHAMELEON_dplgsy( (Scalar)M, ChamLower, M, get_ptr(A), M, seedA );
  A.set_spd(MatrixStorage::lower);
  Vector<Scalar> b(M);
  CHAMELEON_dplrnt( M, 1, get_ptr(b), M, seedB );

  // Declare the Solver
  ChameleonSolver<Matrix, Vector<Scalar>> SCcham;
  // if A is given it is copied in the solver class
  // giving A as rvalue allows to avoid copy
  SCcham.setup(A);

  // Solve the linear system
  Vector<Scalar> x = SCcham * b;
  // SCcham.factorize(); can be called to perform factorization before solving

  // check
  Vector<Scalar> diff = A*x-b;
  Real normc = diff.norm();
  std::cout << "Cholesky - ||Ax-b|| = " << normc << "\n" << std::endl;


  /* Overdetermined system solve (QR): Ax=b */

  // clear previous A and realloc
  const int N=5;
  A = Matrix(M, N);

  // Fill the rectangular matrix A with random values
  CHAMELEON_dplrnt( M, N, get_ptr(A), M, seedA );

  // Declare the solver
  ChameleonSolver<Matrix, Vector<Scalar>> SLcham;
  SLcham.setup(A);

  // Solve the least squares problem
  x = SLcham * b;

  // check
  diff = A*x-b;
  Real normls = diff.norm();
  std::cout << "Least squares - Sum r_i² = " << normls*normls << "\n" << std::endl;


  /* Randomized SVD to compute SVD on rank-R (R<<N) matrices */

  // generate a general matrix of rank R
  const int R=2;
  Matrix Al = chameleon_kernels::random_matrix_normal<Matrix>(M, R);
  Matrix Ar = chameleon_kernels::random_matrix_normal<Matrix>(R, N);
  A = Al*Ar;

  // Compute the RSVD of A
  auto [U, sigmas, Vt] = chameleon_kernels::rsvd(A, R, 0, false);

  // check SVD
  Matrix Sigma(n_cols(U), n_rows(Vt));
  auto k = std::min(n_cols(U), n_rows(Vt));
  for(Size i = 0; i < k; ++i) Sigma(i, i) = Scalar{sigmas[i]};
  auto Asvd = U * Sigma * Vt;
  diff = A-Asvd;
  Real normsvd = diff.norm();
  std::cout << "RSVD - ||A-(U S V^T)|| = " << normsvd << "\n" << std::endl;

  /* Randomized EVD to compute EVD on rank-R (R<<N) symmetric/hermitian matrices */

  // generate a symmetric/hermitian matrix of rank R
  A = Al*Al.t();
  A.set_property(MatrixSymmetry::symmetric);

  // Compute the REVD of A
  auto [lambdas, V] = chameleon_kernels::revd(A, R, 0);

  // check EVD
  for(Size i = 0; i < n_cols(V); ++i){
    Vector<Scalar> AV = A * V.get_vect_view(i);
    Vector<Scalar> lamV;
    lamV = lambdas[i] * V.get_vect_view(i);
    Vector<Scalar> DiffA = AV - lamV;
    std::cout << "REVD - Norm " << i << " = " << DiffA.norm() << std::endl;
  }

  finalize();
  return 0;
}

Another version of this example, more complete, can be read in src/examples/demo_dense.cpp of the Composyx repository, with cpu time comparison between Blas/Lapack and Chameleon. See also src/examples/demo_dense_rsvd.cpp for the rsvd.

There are of course other routines available to perform dense matrices operations but there is not always a Chameleon implementation of Blas/Lapack function.

We summary the Blas/Lapack operations available in Composyx through the kernels calls in the following table. All routines can be called from blas_kernels:: or chameleon_kernels:: but if there is no Chameleon implementation it will actually call the blas/lapack kernel. The X symbol in the table means there is a Chameleon implementation which can accelerate this operation.

Operation Chameleon
\(\text{scal}\)  
\(\text{axpy}\)  
\(\text{dot}\)  
\(\text{norm2}\)  
\(\text{gemv}\)  
\(\text{gemm}\) X
\(\text{symm}\) X
\(\text{hemm}\) X
\(\text{trmm}\) X
\(\text{eig}\)  
\(\text{eigvals}\)  
\(\text{eigh}\)  
\(\text{eigvalsh}\)  
\(\text{eigh_select}\)  
\(\text{eigh_smallest}\)  
\(\text{eigh_largest}\)  
\(\text{gen_eigh_select}\)  
\(\text{gen_eigh_smallest}\)  
\(\text{gen_eigh_largest}\)  
\(\text{svd}\)  
\(\text{svd_left_sv}\)  
\(\text{svd_right_sv}\)  
\(\text{svdvals}\)  
\(\text{rsvd}\) X
\(\text{rsvd_left_sv}\) X
\(\text{rsvd_right_sv}\) X
\(\text{rsvdvals}\) X
\(\text{revd}\) X
\(\text{revdvals}\) X

We summary the solver routines available in Composyx through the Solver class (BlasSolver, ChameleonSolver) in the following table.

Operation Chameleon
LU (\(\text{getrf/s}\))  
Cholesky (\(\text{potrf/s}\)) X
Symmetric complex (\(\text{sytrf/s}\)) X
QR/LQ ($\text{geqrf/s}$/\(\text{gelqf/s}\)) X
Least Squares (\(\text{geqrf/s}\)) X
Minimum norm (\(\text{gelqf/s}\)) X
Triangular Solve (\(\text{trsm}\)) X

4.2. Execution on a Linux Ubuntu laptop

4.2.1. Install Composyx with CMake

Install system packages requirements

sudo apt-get update -y
sudo apt-get install -y git cmake build-essential gfortran python wget tar curl pkg-config libmkl-dev

Notice one can replace libmkl-dev by libopenblas-dev.

Install specific packages with CMake

# install blaspp, lapackpp and arpack-ng
git clone https://github.com/icl-utk-edu/blaspp.git
git clone https://github.com/icl-utk-edu/lapackpp.git
git clone https://github.com/opencollab/arpack-ng.git

cd blaspp
cmake -B build -DCMAKE_INSTALL_PREFIX=/usr/local
cmake --build build -j5
sudo cmake --install build
cd ../

cd lapackpp
cmake -B build -DCMAKE_INSTALL_PREFIX=/usr/local
cmake --build build -j5
sudo cmake --install build
cd ../

cd arpack-ng
cmake -B build -DICB=ON
cmake --build build -j5
sudo cmake --install build
cd ../

# Install Starpu and Chameleon
git clone https://gitlab.inria.fr/starpu/starpu.git
cd starpu
./configure --prefix=/usr/local --disable-build-doc --disable-build-test --disable-opencl --disable-fortran --disable-socl --disable-starpufft
make -j5
sudo make install
cd ../

git clone --recursive https://gitlab.inria.fr/solverstack/chameleon.git
cd chameleon
cmake -B build -DCHAMELEON_USE_MPI=OFF -DCMAKE_INSTALL_PREFIX=/usr/local -DBUILD_SHARED_LIBS=ON
# if starpu is installed in a specific path add the following to the configuration: -DCMAKE_PREFIX_PATH="path to starpu"
cmake --build build -j5
sudo cmake --install build
cd ../

Install Composyx

git clone --recursive https://gitlab.inria.fr/composyx/composyx.git
cd composyx
cmake -B build -DCMAKE_INSTALL_PREFIX=/usr/local -DCOMPOSYX_USE_CHAMELEON=ON -DCOMPOSYX_USE_MUMPS=OFF -DCOMPOSYX_USE_PASTIX=OFF
cmake --build build -j5
sudo cmake --install build

4.2.2. Compile and run the example

To link a C++ executable with Composyx from a file demo_dense.cpp the straightforward way is to use CMake, example of CMakeLists.txt file:

cmake_minimum_required(VERSION 3.5)
project(DEMO_DENSE CXX C Fortran)

find_package(composyx REQUIRED)

set(CMAKE_CXX_STANDARD 20)
set(CMAKE_CXX_STANDARD_REQUIRED ON)

add_executable(demo_dense demo_dense.cpp)
target_link_libraries(demo_dense PRIVATE COMPOSYX::composyx)
target_compile_definitions(demo_dense PRIVATE COMPOSYX_USE_CHAMELEON)
target_compile_definitions(demo_dense PRIVATE CHAMELEON_COMPLEX_CPP)

Then, in the same directory where demo_dense.cpp and CMakeLists.txt files are located use cmake command as follows:

cmake -B build
# if dependencies are installed in specific directories , add them to the configure: -DCMAKE_PREFIX_PATH="path to chameleon;path to starpu
cmake --build build
# Run the example
./build/demo_dense

4.3. Execution using GNU Guix

We consider Guix is up-to-date and that the guix-hpc-non-free is setup. We consider using the complete example here, located in src/examples/demo_dense.cpp. Lets compare CPU times given by Lapack (Intel MKL here) and Chameleon using four threads i.e. four CPUs.

Compile the example using Guix

git clone --recursive https://gitlab.inria.fr/composyx/composyx.git
cd composyx
guix shell --pure -D composyx-mkl -- /bin/bash --norc
cmake -B build-guix  -DCOMPOSYX_COMPILE_EXAMPLES=ON -DCOMPOSYX_COMPILE_TESTS=OFF -DCOMPOSYX_DRIVERS=OFF -DCOMPOSYX_C_DRIVER=OFF -DCOMPOSYX_Fortran_DRIVER=OFF -DCOMPOSYX_USE_CHAMELEON=ON -DCOMPOSYX_USE_MUMPS=OFF -DCOMPOSYX_USE_PASTIX=OFF
cmake --build build-guix --target composyx_demo_dense composyx_demo_dense_rsvd

Arguments which can be used with the executable:

  1. 0 for Blas/Lapack (blaspp/lapackpp+Intel MKL MT), 1 for Chameleon (+Intel MKL sequential)
  2. Size M (size of the GEMM and linear system, number of rows and columns)
  3. Size N (number of columns of the rectangular matrix during QR and least squares problem)
  4. Chameleon block/tile size
  5. Chameleon number of CPU workers
  6. Chameleon number of GPU workers

Execution with Blas/Lapack from the MKL

./build-guix/src/examples/composyx_demo_dense 0 10000 5000

Example results

Least squares system time, 1, 3.06074
Linear system (Cholesky) time, 1, 2.03154
Gemm time, 1, 10.5614

Notice that we can set MKL_NUM_THREADS to set the number of threads used by MKL.

Execution with Chameleon using a bloc size of 500 and 4 threads

./build-guix/src/examples/composyx_demo_dense 1 10000 5000 500 4 0

Example results

Least squares system time, 1, 3.36202
Linear system (Cholesky) time, 1, 2.17157
Gemm time, 1, 11.3575

For the composyx_demo_dense_rsvd executable the rank R of the matrix can be given as fourth argument.

Execution with Blas/Lapack from the MKL

./build-guix/src/examples/composyx_demo_dense_rsvd 0 16000 8000 995

Example results

Check time, 1, 3.13929
RSVD time, 1, 4.54151
Matrix building time, 1, 2.14337

Execution with Chameleon a bloc size of 500 and 4 threads

./build-guix/src/examples/composyx_demo_dense_rsvd 1 16000 8000 995 500 4 0

Example results

Check time, 1, 3.23624
RSVD time, 1, 4.7384
Matrix building time, 1, 1.8179

Chameleon delivers performances close to MKL ones, knowing that Intel MKL is a Blas/Lapack reference for using a single machine with Intel CPUs.

4.4. Execution on the Plafrim supercomputer using GPUs

Lets compare CPU times given by Lapack (Intel MKL here) and Chameleon on a plafrim sirocco (2x16 cores + 2x gpus p100) node.

Compile the example using Guix

ssh plafrim
git clone --recursive https://gitlab.inria.fr/composyx/composyx.git
cd composyx
guix shell --pure -D composyx-mkl --with-input=chameleon-mkl-mt-nompi=chameleon-cuda-mkl-mt-nompi --with-input=starpu=starpu-cuda --with-input=openmpi=openmpi-cuda slurm -- /bin/bash --norc
cmake -B build-guix -DCOMPOSYX_COMPILE_EXAMPLES=ON -DCOMPOSYX_COMPILE_TESTS=OFF -DCOMPOSYX_DRIVERS=OFF -DCOMPOSYX_C_DRIVER=OFF -DCOMPOSYX_Fortran_DRIVER=OFF -DCOMPOSYX_USE_CHAMELEON=ON -DCOMPOSYX_USE_MUMPS=OFF -DCOMPOSYX_USE_PASTIX=OFF
cmake --build build-guix --target composyx_demo_dense composyx_demo_dense_rsvd

Notice we replace the default chameleon package by a version built with CUDA to be able to use GPUs.

Reserve one p100 node and expose the location of the CUDA driver in the Guix environment (can be different depending on the supercomputer)

salloc --nodes=1 --time=01:00:00 --constraint p100 --exclusive --ntasks-per-node=1 --threads-per-core=1
export LD_PRELOAD="/usr/lib64/libcuda.so"
#export STARPU_DISABLE_PINNING=1
export STARPU_NWORKER_PER_CUDA=2

Execution with Blas/Lapack from the MKL

srun -n 1 -c 30 ./build-guix/src/examples/composyx_demo_dense 0 24800 12400 1240 30 0

Example results

Least squares system time, 1, 10.9655
Linear system (Cholesky) time, 1, 9.42283
Gemm time, 1, 40.1059

Execution with Chameleon (set ncpus parameter to -1 to let Chameleon decide the best number of CPUs to use) and call setChameleonParameters(NB, ChamOutOfPlace); to enable conversion of the Lapack matrix into a Chameleon tiled matrix whose storage is more efficient for GPUs.

srun -n 1 -c 30 ./build-guix/src/examples/composyx_demo_dense 1 24800 12400 1240 -1 2

Example results

Least squares system time, 1, 6.36323
Linear system (Cholesky) time, 1, 1.78136
Gemm time, 1, 11.7963

Execution of RSVD with Blas/Lapack from the MKL

srun -n 1 -c 30 ./build-guix/src/examples/composyx_demo_dense_rsvd 0 49600 24800 1235

Result

RSVD time, 1, 21.528

Execution of RSVD with Chameleon

srun -n 1 -c 30 ./build-guix/src/examples/composyx_demo_dense_rsvd 1 49600 24800 1235 1240 -1 2

Example results

RSVD time, 1, 11.3395

Performances with Chameleon outperform MKL ones thanks to the GPUs kernels (cuBLAS here) that Chameleon is able to exploit.

5. Composability examples

5.1. Conjugate gradient solver

In order to demonstrate the feasibility of utilising different objects from different libraries with a solver, we will consider the example of the conjugate gradient presented earlier. We will consider a matrix in the Eigen format and a vector in the composyx format. This represents an initial step in the development of a composable and generic library.

5.1.1. Eigen Matrix and a composyx vector

  1. Source code

    You can find the source code in src/examples/demo_cg_eigen_cpx.cpp.

    #include <iostream>
    // the Eigen wrappers for matrix and vector
    #include "composyx/wrappers/Eigen/EigenMatrix.hpp"
    
    // The composyx matrix/vector include
    #include "composyx/loc_data/DenseMatrix.hpp"
    
    ///  Type definition
    using value_type = double;
    // Composyx vector type
    using my_vector_type = composyx::Vector<value_type>;
    // Eigen matrix type (we use the composyx Eigen wrapper)
    using my_matrix_type = composyx::E_DenseMatrix<value_type>;
    
    /// Composability
    namespace composyx
    {
    // For concepts
    // We specify that the vector type for an Eigen matrix is a composyx vector type
    template <CPX_Scalar Scalar>
    struct vector_type<E_DenseMatrix<Scalar>> : public std::true_type {
      using type = my_vector_type;
        };
        // The Eigen matrix * composyx vector operator
        auto operator*(my_matrix_type const& A, my_vector_type const& x) -> my_vector_type
        {
            using value_type = my_matrix_type::value_type;
            using C_matrix_type = composyx::DenseMatrix<value_type>;
                  // The eigen matrix is encapsulated (without copying) in a composyx matrix
            C_matrix_type AA(A.rows(), A.cols(), const_cast<value_type*>(A.data()), true);
            // return the resuls in a CPX vector
            return AA * x;
        }
        } // namespace composyx
    
    #include <composyx/solver/ConjugateGradient.hpp>
    
    int main()
    {
        my_matrix_type A({{4, -1, 0}, {-1, 3, -0.5}, {0, -0.5, 4}});
    
        my_vector_type u({1, 2, 3}, 3);
        my_vector_type b = A * u;
        //
        composyx::ConjugateGradient<my_matrix_type, my_vector_type> solver;
        solver.setup(composyx::parameters::A{A},
                     composyx::parameters::verbose{true},
                     composyx::parameters::max_iter{20},
                     composyx::parameters::tolerance{1e-8});
    
        my_vector_type x = solver * b;
    
        composyx::display(x, "x:\n", std::cout);
    
        return 0;
    }
    
  2. Explanation

    In order to utilise the composyx solvers, it is necessary to be aware of the return type of the matrix-vector product and to have a matrix-vector product within the composyx namespace.

    template <CPX_Scalar Scalar>
    struct vector_type<E_DenseMatrix<Scalar>> : public std::true_type {using type = my_vector_type;};
    

    In this approach, the Eigen matrix (column-major storage) is encapsulated within a Composer matrix. To achieve this, the following constructor is employed:

    explicit DenseMatrix(const size_t m, const size_t n, Scalar * ptr, bool fixed_ptr = false) ;
    

    where m represents the number of rows, n the number of columns, ptr the ptr pointer to the memory area, and fixedptr specifies whether the memory space allocated by Eigen is utilised (true) or whether this space is copied (false).

    auto operator*(my_matrix_type const& A, my_vector_type const& x) -> my_vector_type
      {
          using value_type = my_matrix_type::value_type;
          using C_matrix_type = composyx::DenseMatrix<value_type>;
                // The eigen matrix is encapsulated (without copying) in a composyx matrix
          C_matrix_type AA(A.rows(), A.cols(), const_cast<value_type*>(A.data()), true);
          // return the resuls in a CPX vector
          return AA * x;
      }
    

5.1.2. Eigen matrix and an eigen vector

You can find the source code in src/examples/demo_cg_eigen.cpp.

  1. Source code
    #include <iostream>
    // the full Eigen wrappers for matrix and vector
    #include <composyx/wrappers/Eigen/Eigen.hpp>
    
    ///  Type definition
    using value_type = double;
    
    using my_vector_type = composyx::E_Vector<value_type>;
    using my_matrix_type = composyx::E_DenseMatrix<value_type>;
    
    // Constraints for using Eigen vector in CPX iterative solver
    namespace Eigen {
    auto dot(my_vector_type const &v1,
             my_vector_type const &v2) -> my_vector_type::value_type {
      return v1.dot(v2);
    }
    [[nodiscard]] size_t size(my_vector_type const &v) {
      return static_cast<size_t>(v.size());
    }
    } // namespace Eigen
    
    #include "composyx.hpp" // needed for compilation !
    #include <composyx/solver/ConjugateGradient.hpp>
    
    int main() {
      my_matrix_type A(3, 3);
      A << 4, -1, 0, -1, 3, -0.5, 0, -0.5, 4;
      my_vector_type u(3);
      u << 1, 2, 3;
    
      my_vector_type b = A * u;
      //
      composyx::ConjugateGradient<my_matrix_type, my_vector_type> solver;
      solver.setup(composyx::parameters::A{A}, composyx::parameters::verbose{true},
                   composyx::parameters::max_iter{20},
                   composyx::parameters::tolerance{1e-8});
    
      my_vector_type x = solver * b;
    
      composyx::display(x, "x:\n", std::cout);
    
      return 0;
    }
    
  2. Explanation

    In this example, we are working within the Eigen framework, where the constraints on the matrix from the previous example have already been defined in the Eigen.hpp file. However, we still need to satisfy the solver's constraints on the Eigen vectors. To achieve this, we must define two functions, dot and size.

    namespace Eigen {
      auto dot(my_vector_type const &v1, my_vector_type const &v2) -> my_vector_type::value_type {
      return v1.dot(v2);
    }
      [[nodiscard]] auto size(my_vector_type const &v) -> size_t {return static_cast<size_t>(v.size()); }
    } // namespace Eigen
    

    In order to solve Argument-dependent lookup, these functions must be defined within the Eigen namespace.

Date: 04/02/2025 at 13:09:31 04/02/2025 at 13:09:31

Author: Concace Gilles Marait

Created: 2025-02-04 Tue 13:09

Validate