[2]ifpackageloaded#1#2 [2]ifpackageloaded#1#2 [3]ifpackageloaded#1#2#3
#1
Advanced Features¶
How to Deal with FFT Index Shifts in Parallel¶
Let \(n\in2{\ensuremath{\mathbb{N}}}\). A common problem is that the index of the FFT input and/or output array runs between \(-\nicefrac n2,\dots,\nicefrac n2-1\), but the FFT library requires them to run between \(0,\dots,n-1\). With serial program execution one can easily remap the input data \(\hat g_k\) in a way that is suitable for the library, i.e.,
Similarly, one could remap the outputs of the library \(f_l\), \(l=0,\cdots,n-1\) in the opposite direction in order to get the required outputs, i.e.,
These shifts are also known as fftshift
in Matlab.
However, with distributed memory these fftshift
operations require
more complex data movements and result in a global communication. For
example, the first index of the array moves to the middle and,
therefore, the corresponding data move to another MPI process.
Fortunately, this communication can be avoided at the cost of little
extra computation. At the end of the section we present two PFFT library
functions that perform the necessary pre- and postprocessing for shifted
input and output index sets.
Shift with half the FFT size¶
The special case of input shift \(k_s=-\nicefrac n2\) and/or output
shift \(l_s=-\nicefrac n2\) is supported by PFFT. User can choose to
shift the input (PFFT_SHIFTED_IN
) and/or to shift the output
(PFFT_SHIFTED_OUT
).
Here, we are interested in the computation of
with \(n, n_i, n_o \in 2{\ensuremath{\mathbb{N}}}\) and \(n>n_i\), \(n>n_o\).
With an index shift of \(\nicefrac n2\) both in \(k\) and \(l\) this equivalent to the computation of
for :math:` l=nicefrac n2-nicefrac{n_o}{2},dots,nicefrac n2 +nicefrac{n_o}{2}-1`. Therefore, we get the following algorithm
The special case \(k_s=-\frac{n_i}{2}, l_s=-\frac{n_o}{2}\) corresponds to the shifts the arrays ()
[1] =1.1ex For \(k=0,\dots,n-1\) set \(\hat f_k = 0\). For \(k=-\nicefrac{n_i}{2},\dots,\nicefrac{n_i}{2}-1\) compute \(\hat f_{(k+\nicefrac{n}{2})} = (-1)^{(k+\nicefrac{n}{2})} \hat g_{k}\). For \(l=0,\dots,n-1\) compute \(f_l = \sum_{k=0}^{n} \hat f_k {\ensuremath{\mathrm{e}^{-2\pi{{\ensuremath{\text{\scriptsize{i}}}}} kl/n}}}\) using PFFT. For \(l=-\nicefrac{n_o}{2},\dots,\nicefrac{n_o}{2}-1\) compute :math:`g_l = (-1)^l f_{(l+n/2)} `.
Note, that this shift implies that the library deals with pruned FFTs in a special way, i.e., half of the zeros are added at the beginning of the inputs and the other half is added at the end.
Arbitrary shifts¶
More general shifts must be done by the user.
In a more general setting, we are interested in the computation of FFTs with shifted index sets, i.e., assume \(k_s,l_s\in{\ensuremath{\mathbb{Z}}}\) and compute
Because of the periodicity of the FFT this can be easily performed by Alg. [alg:fftshift:sub:translation].
[alg:fftshift:sub:translation]
[1] =1.1ex For \(k=0,\dots,n_i-1\) assign \(\hat f_k = \hat g_{(k+k_s\bmod n_i)}\). For \(l=0,\dots,n_o-1\) compute \(f_l = \sum_{k=0}^{n_i} \hat f_k {\ensuremath{\mathrm{e}^{-2\pi{{\ensuremath{\text{\scriptsize{i}}}}} kl/n}}}\) using PFFT. For \(l=0,\dots,n_o-1\) assign \(g_l = f_{(l-l_s\bmod n_o)}\).
However, this involves explicit data movement since the sequence of data changes. For a our parallel data decomposition the change of data layout requires data communication. A simple index shift results in the computation of
for all \(l=0,\dots,n_o-1\). The resulting Alg. [alg:fftshift:sub:modulation] preserves the sequence of data at the price of some extra computation.
[alg:fftshift:sub:modulation]
[1] =1.1ex For \(k=0,\dots,n_i-1\) compute \(\hat f_k = \hat g_{(k+k_s)} {\ensuremath{\mathrm{e}^{-2\pi{{\ensuremath{\text{\scriptsize{i}}}}} (k+k_s)l_s/n}}}\). For \(l=0,\dots,n_o-1\) compute \(f_l = \sum_{k=0}^{n_i} \hat f_k {\ensuremath{\mathrm{e}^{-2\pi{{\ensuremath{\text{\scriptsize{i}}}}} kl/n}}}\) using PFFT. For \(l=0,\dots,n_o-1\) compute \(g_{(l+l_s)} = f_l {\ensuremath{\mathrm{e}^{-2\pi{{\ensuremath{\text{\scriptsize{i}}}}} k_sl/n}}}\).
The special case \(k_s=-\frac{n_i}{2}, l_s=-\frac{n_o}{2}\) corresponds to the shifts the arrays ()
[1] =1.1ex For \(k=0,\dots,n_i-1\) compute \(\hat f_k = \hat g_{(k-\nicefrac{n_i}{2})} {{\ensuremath{\mathrm{e}}}}^{+\pi{\ensuremath{\text{\scriptsize{i}}}}(k-\nicefrac{n_i}{2})n_o/n}\). For \(l=0,\dots,n_o-1\) compute \(f_l = \sum_{k=0}^{n_i} \hat f_k {\ensuremath{\mathrm{e}^{-2\pi{{\ensuremath{\text{\scriptsize{i}}}}} kl/n}}}\) using PFFT. For \(l=0,\dots,n_o-1\) compute \(g_{(l-\nicefrac{n_o}{2})} = f_l {{\ensuremath{\mathrm{e}}}}^{+\pi{\ensuremath{\text{\scriptsize{i}}}}n_i l/n}\).
Parallel pruned FFT¶
Within PFFT we define a pruned FFT as
Formally, this is equivallent to the following regular size \(n\) FFT
with
and \(f_l := g_l\), \(k=0,\dots,n_o-1\). I.e., we add \(n-n_i\) zeros at the end of the input array and throw away \(n-n_o\) entries at the end of the output array.
The definition of pruned FFT changes for PFFT_SHIFTED_IN
and
PFFT_SHIFTED_OUT
.