kdcount.correlate module

Correlation function (pair counting) with KDTree.

Pair counting is the basic algorithm to calculate correlation functions. Correlation function is a commonly used metric in cosmology to measure the clustering of matter, or the growth of large scale structure in the universe.

We implement paircount for pair counting. Since this is a discrete estimator, the binning is modeled by subclasses of Binning. For example

kdcount takes two types of input data: ‘point’ and ‘field’.

kdcount.models.points describes data with position and weight. For example, galaxies and quasars are point data. point.pos is a row array of the positions of the points; other fields are used internally. point.extra is the extra properties that can be used in the Binning. One use is to exclude the Lyman-alpha pixels and Quasars from the same sightline.

kdcount.models.field describes a continious field sampled at given positions, each sample with a weight; a notorious example is the over-flux field in Lyman-alpha forest it is a proxy of the over-density field sampled along quasar sightlines.

In the Python Interface, to count, one has to define the ‘binning’ scheme, by subclassing Binning. Binning describes a multi-dimension binning scheme. The dimensions can be derived, for example, the norm of the spatial separation can be a dimension the same way as the ‘x’ separation. For example, see RmuBinning.

class kdcount.correlate.Binning(dims, edges, compute_mean_coords=False)[source]

Bases: object

Binning of the correlation function. Pairs whose distance is within a bin is counted towards the bin.

Attributes

dims ( array_like) internal; descriptors of binning dimensions.
edges ( array_like) edges of bins per dimension
centers ( array_like) centers of bins per dimension; currently it is the mid point of the edges.
compute_mean_coords (bool, optional) If True, store and compute the mean coordinate values in the __call__ function. Default is False
digitize(r, i, j, data1, data2)[source]

Calculate the bin number of pairs separated by distances r, Use linear() to convert from multi-dimension bin index to linear index.

Parameters:

r : array_like

separation

i, j : array_like

index (i, j) of pairs.

data1, data2 :

The position of first point is data1.pos[i], the position of second point is data2.pos[j].

linear(**tobin)[source]

Linearize bin indices.

This function is called by subclasses. Refer to the source code of RBinning for an example.

Parameters:

args : list

a list of bin index, (xi, yi, zi, ..)

Returns:

linearlized bin index

sum_shapes(data1, data2)[source]

Return the shapes of the summation arrays, given the input data and shape of the bins

update_mean_coords(dig, **tobin)[source]

Update the mean coordinate sums

class kdcount.correlate.FlatSkyBinning(rbins, Nmu, los, **kwargs)[source]

Bases: kdcount.correlate.Binning

Binning in R and mu, in the flat sky approximation, such that all pairs have the same line-of-sight, which is taken to be the axis specified by the los parameter (default is the last dimension)

Parameters:

rmax : float

the maximum radius to measure to

Nr : int

the number of bins in r direction.

Nmu : int

the number of bins in mu direction.

los : int, {0, 1, 2}

the axis to treat as the line-of-sight

digitize(r, i, j, data1, data2)[source]
class kdcount.correlate.FlatSkyMultipoleBinning(rbins, ells, los, **kwargs)[source]

Bases: kdcount.correlate.Binning

Binning in R and ell, the multipole number, in the flat sky approximation, such that all pairs have the same line-of-sight, which is taken to be the axis specified by the los parameter (default is the last dimension)

Parameters:

rmax : float

the maximum radius to measure to

Nr : int

the number of bins in r direction.

ells : list of int

the multipole numbers to compute

los : int, {0, 1, 2}

the axis to treat as the line-of-sight

digitize(r, i, j, data1, data2)[source]
sum_shapes(data1, data2)[source]

Prepend the shape of ells to the summation arrays

class kdcount.correlate.RBinning(rbins, **kwargs)[source]

Bases: kdcount.correlate.Binning

Binning along radial direction.

Parameters:

Rmax : float

max radius to go to

Nbins : int

number of bins in each direction.

digitize(r, i, j, data1, data2)[source]
class kdcount.correlate.RmuBinning(rbins, Nmu, observer, **kwargs)[source]

Bases: kdcount.correlate.Binning

Binning in R and mu (angular along line of sight) mu = cos(theta), relative to line of sight from a given observer.

Parameters:

Rmax : float

max radius to go to

Nbins : int

number of bins in R direction.

Nmubins : int

number of bins in mu direction.

observer : array_like (Ndim)

location of the observer (for line of sight)

digitize(r, i, j, data1, data2)[source]
class kdcount.correlate.XYBinning(Rmax, Nbins, observer, **kwargs)[source]

Bases: kdcount.correlate.Binning

Binning along Sky-Lineofsight directions.

The bins are be (sky, los)

Parameters:

Rmax : float

max radius to go to

Nbins : int

number of bins in each direction.

observer : array_like (Ndim)

location of the observer (for line of sight)

Notes

with numpy imshow , the second axis los, will be vertical
with imshow( ..T,) the sky will be vertical.
digitize(r, i, j, data1, data2)[source]
kdcount.correlate.compute_sum_values(i, j, data1, data2)[source]

Return the sum1_ij and sum2_ij values given the input indices and data instances.

Parameters:

i,j : array_like

the bin indices for these pairs

data1, data2 : points, field instances

the two points or field objects

Returns:

sum1_ij, sum2_ij : float, array_like (N,...)

contributions to sum1, sum2 – either a float or array of shape (N, ...) where N is the length of i, j

Notes

This is called in Binning.__call__ to compute the sum1 and sum2 contributions for indices (i,j)

class kdcount.correlate.paircount(data1, data2, binning, usefast=True, np=None)[source]

Bases: object

Paircounting via a KD-tree, on two data sets.

Notes

The value of sum1 and sum2 depends on the types of input

For kdcount.models.points and kdcount.models.points:
  • sum1 is the per bin sum of products of weights
  • sum2 is always 1.0
For kdcount.models.field and kdcount.models.points:
  • sum1 is the per bin sum of products of weights and the field value
  • sum2 is the per bin sum of products of weights
For kdcount.models.field and kdcount.models.field:
  • sum1 is the per bin sum of products of weights and the field value (one value per field)
  • sum2 is the per bin sum of products of weights

With this convention the usual form of Landy-Salay estimator is ( for points x points:

(DD.sum1 -2r DR.sum1 + r2 RR.sum1) / (r2 RR.sum1)

with r = sum(wD) / sum(wR)

Attributes

sum1 ( array_like) the numerator in the correlator
sum2 ( array_like) the denominator in the correlator
centers (list) the centers of the corresponding corr bin, one item per binning direction.
edges ( list) the edges of the corresponding corr bin, one item per binning direction.
binning (Binning) binning object of this paircount
data1 (dataset) input data set1. It can be either field for discrete sampling of a continuous field, or kdcount.models.points for a point set.
data2 (dataset) input data set2, see above.
np (int) number of parallel processes. set to 0 to disable parallelism
class kdcount.correlate.paircount_worker(pc, binning, data, np=None, usefast=True)[source]

Bases: object

Context that runs the actual pair counting, attaching the appropriate attributes to the parent paircount