[2]ifpackageloaded#1#2 [2]ifpackageloaded#1#2 [3]ifpackageloaded#1#2#3

#1

Interface Layers of the PFFT Library

We give a quick overview of the PFFT interface layers in the order of increasing flexibility at the example of c2c-FFTs. For r2c-, c2r-, and r2r-FFT similar interface layer specifications apply. A full reference list of all PFFT functions is given in Chapter [chap:ref].

Basic Interface

The _3d interface is the simplest interface layer. It is suitable for the planning of three-dimensional FFTs.

ptrdiff_t pfft_local_size_dft_3d(
    const ptrdiff_t *n, MPI_Comm comm_cart, unsigned pfft_flags,
    ptrdiff_t *local_ni, ptrdiff_t *local_i_start,
    ptrdiff_t *local_no, ptrdiff_t *local_o_start);
void pfft_local_block_dft_3d(
    const ptrdiff_t *n, MPI_Comm comm_cart,
    int pid, unsigned pfft_flags,
    ptrdiff_t *local_ni, ptrdiff_t *local_i_start,
    ptrdiff_t *local_no, ptrdiff_t *local_o_start);
pfft_plan pfft_plan_dft_3d(
    const ptrdiff_t *n,
    pfft_complex *in, pfft_complex *out, MPI_Comm comm_cart,
    int sign, unsigned pfft_flags);

Hereby, n, local_ni, local_i_start, local_no, and local_o_start are ptrdiff_t arrays of length 3.

The basic interface generalizes the _3d interface to FFTs of arbitrary dimension rnk_n.

ptrdiff_t pfft_local_size_dft(
    int rnk_n, const ptrdiff_t *n,
    MPI_Comm comm_cart, unsigned pfft_flags,
    ptrdiff_t *local_ni, ptrdiff_t *local_i_start,
    ptrdiff_t *local_no, ptrdiff_t *local_o_start);
void pfft_local_block_dft(
    int rnk_n, const ptrdiff_t *n,
    MPI_Comm comm_cart, int pid, unsigned pfft_flags,
    ptrdiff_t *local_ni, ptrdiff_t *local_i_start,
    ptrdiff_t *local_no, ptrdiff_t *local_o_start);
pfft_plan pfft_plan_dft(
    int rnk_n, const ptrdiff_t *n,
    pfft_complex *in, pfft_complex *out, MPI_Comm comm_cart,
    int sign, unsigned pfft_flags);

Therefore, n, local_ni, local_i_start, local_no, and local_o_start become arrays of length rnk_n.

Advanced Interface

The advanced interface introduces the arrays ni and no of length rnk_n that give the pruned FFT input and output size. Furthermore, the arrays iblock and oblock of length rnk_pm (rnk_pm being the dimension of the process mesh) serve to adjust the block size of the input and output block decomposition. The additional parameter howmany gives the number of transforms that will be computed simultaneously.

ptrdiff_t pfft_local_size_many_dft(
    int rnk_n, const ptrdiff_t *n,
    const ptrdiff_t *ni, const ptrdiff_t *no, ptrdiff_t howmany,
    const ptrdiff_t *iblock, const ptrdiff_t *oblock,
    MPI_Comm comm_cart, unsigned pfft_flags,
    ptrdiff_t *local_ni, ptrdiff_t *local_i_start,
    ptrdiff_t *local_no, ptrdiff_t *local_o_start);
void pfft_local_block_many_dft(
    int rnk_n, const ptrdiff_t *ni, const ptrdiff_t *no,
    const ptrdiff_t *iblock, const ptrdiff_t *oblock,
    MPI_Comm comm_cart, int pid, unsigned pfft_flags,
    ptrdiff_t *local_ni, ptrdiff_t *local_i_start,
    ptrdiff_t *local_no, ptrdiff_t *local_o_start);
pfft_plan pfft_plan_many_dft(
    int rnk_n, const ptrdiff_t *n,
    const ptrdiff_t *ni, const ptrdiff_t *no, ptrdiff_t howmany,
    const ptrdiff_t *iblock, const ptrdiff_t *oblock,
    pfft_complex *in, pfft_complex *out, MPI_Comm comm_cart,
    int sign, unsigned pfft_flags);

Preliminary: Skip Serial Transformations

The _skipped interface extends the _many interface by adding the possibility to skip some of the serial FFTs.

pfft_plan pfft_plan_many_dft_skipped(
    int rnk_n, const ptrdiff_t *n,
    const ptrdiff_t *ni, const ptrdiff_t *no, ptrdiff_t howmany,
    const ptrdiff_t *iblock, const ptrdiff_t *oblock,
    (red@const int *skip_trafos,@*)
    pfft_complex *in, pfft_complex *out, MPI_Comm comm_cart,
    int sign, unsigned pfft_flags);

Hereby, skip_trafos is an int array of length rnk_pm1+ (rnk_pm being the mesh dimension of the communicator comm_cart). For t=0,...,rnk_pm set skip_trafos[t]=1 if the t-th serial transformation should be computed, otherwise set skip_trafos[t]=0. Note that the local transpositions are always performed, since they are a prerequisite for the global communication to work. At the moment it is only possible to skip the whole serial transform along the last rnk_n-rnk_pm-1 dimensions. However, this behaviour can be realized by a call of a (rnk_pm1)+-dimensional PFFT with

for(int t=rnk_pm+1; t<rnk_n; t++)
  howmany *= n[t];

and manual computation of the desired serial transforms along the last rnk_n-rnk_pm-1 dimensions.