exaFMM has been designed to simultaneously maximize performance, scalability, portability, modularity, simplicity, and extensibility. exaFMM has the following specific features with respect to these six properties.

  • Performance: SSE, AVX, MIC intrinsics for inner kernels. Low precision reciprocal square root with Newton-Raphson correction. Template meta-programming for high order recursion in translation operators. Optimized CUDA kernels with warp- unit execution model. Kahan summation for extended precision arithmetic.
  • Scalability: Various parallel programming models {OpenMP, pthreads, qthreads, MassiveThreads, Intel thread building blocks, Cilk, QUARK, OmpSs, Charm++}. Various partitioning schemes {orthogonal recursive multisection, Morton/Hilbert curves}. Load-balancing using workload+communication from previous step. MPI communicator splitting. Hypercube alltoall communication. Local multithread radix sort with global sample-based histogram sort.
  • Portability: No dependence on external libraries. No use of c++11 features. Auto- detection of SIMD vector length for inner kernels. Auto-tuning of parameters such as {bodies per leaf, multipole acceptance criteria, minimum task size}. Support for both autotools and CMake. Tested on major supercomputers {Titan, Mira, Stampede, K computer, Shaheen, Piz Daint, TSUBAME}.
  • Modularity: Complete modularization of inner kernels, tree construction, partitioning, local essential tree, up/down sweep, and tree traversal (evaluation). The modules are joined by a stable interface and can be developed independently by different people, even using different languages such as {Fortran, C, C++, CUDA}.
  • Simplicity: Operator overloading for all SIMD intrinsics with auto-detection of SSE, AVX, MIC compatibility. Operator overloading for Kahan summation types (can be combined with SIMD intrinsic overloading). Minimum use of C++ features such as {templates, functors, polymorphism, class inheritance, virtual classes}. Continuous effort to minimize lines of code at every step of the development. Every single line of the code has useful comments on the right side of it.
  • Extensibility: We considered the following topics when designing the data structures and interface. Extension of mathematical kernels to Helmholtz, Stokes, Maxwell, Yukawa, Navier, 1/R−ν, Biharmonic, continuous (Gaussian), etc. Algebraic variants of the FMM using low-rank approximation such as H-matrices, H2-matrices, HSS, HBS, IFMM, etc. Periodic boundary conditions and Ewald methods. Application to boundary integral equations. High dimensional problems.