NNNCorrelation: Count-count-count correlations

class treecorr.NNNCorrelation(config=None, *, logger=None, **kwargs)[source]

Bases: Corr3

This class handles the calculation and storage of a 2-point count-count correlation function. i.e. the regular density correlation function.

See the doc string of Corr3 for a description of how the triangles can be binned.

Ojects of this class holds the following attributes:

Attributes:
  • nbins – The number of bins in logr where r = d2

  • bin_size – The size of the bins in logr

  • min_sep – The minimum separation being considered

  • max_sep – The maximum separation being considered

  • logr1d – The nominal centers of the nbins bins in log(r).

  • tot – The total number of triangles processed, which is used to normalize the randoms if they have a different number of triangles.

If the bin_type is LogRUV, then it will have these attributes:

Attributes:
  • nubins – The number of bins in u where u = d3/d2

  • ubin_size – The size of the bins in u

  • min_u – The minimum u being considered

  • max_u – The maximum u being considered

  • nvbins – The number of bins in v where v = +-(d1-d2)/d3

  • vbin_size – The size of the bins in v

  • min_v – The minimum v being considered

  • max_v – The maximum v being considered

  • u1d – The nominal centers of the nubins bins in u.

  • v1d – The nominal centers of the nvbins bins in v.

If the bin_type is LogSAS, then it will have these attributes:

Attributes:
  • nphi_bins – The number of bins in phi where v = +-(d1-d2)/d3.

  • phi_bin_size – The size of the bins in phi.

  • min_phi – The minimum phi being considered.

  • max_phi – The maximum phi being considered.

  • phi1d – The nominal centers of the nphi_bins bins in phi.

If the bin_type is LogMultipole, then it will have these attributes:

Attributes:
  • max_n – The maximum multipole index n being stored.

  • n1d – The multipole index n in the 2*max_n+1 bins of the third bin direction.

In addition, the following attributes are numpy arrays whose shape is:

  • (nbins, nubins, nvbins) if bin_type is LogRUV

  • (nbins, nbins, nphi_bins) if bin_type is LogSAS

  • (nbins, nbins, 2*max_n+1) if bin_type is LogMultipole

If bin_type is LogRUV:

Attributes:
  • logr – The nominal center of each bin in log(r).

  • rnom – The nominal center of each bin converted to regular distance. i.e. r = exp(logr).

  • u – The nominal center of each bin in u.

  • v – The nominal center of each bin in v.

  • meanu – The mean value of u for the triangles in each bin.

  • meanv – The mean value of v for the triangles in each bin.

  • weight – The total weight in each bin.

  • ntri – The number of triangles going into each bin (including those where one or more objects have w=0).

If bin_type is LogSAS:

Attributes:
  • logd2 – The nominal center of each bin in log(d2).

  • d2nom – The nominal center of each bin converted to regular d2 distance. i.e. d2 = exp(logd2).

  • logd3 – The nominal center of each bin in log(d3).

  • d3nom – The nominal center of each bin converted to regular d3 distance. i.e. d3 = exp(logd3).

  • phi – The nominal center of each angular bin.

  • meanphi – The (weighted) mean value of phi for the triangles in each bin.

If bin_type is LogMultipole:

Attributes:
  • logd2 – The nominal center of each bin in log(d2).

  • d2nom – The nominal center of each bin converted to regular d2 distance. i.e. d2 = exp(logd2).

  • logd3 – The nominal center of each bin in log(d3).

  • d3nom – The nominal center of each bin converted to regular d3 distance. i.e. d3 = exp(logd3).

  • n – The multipole index n for each bin.

For any bin_type:

Attributes:
  • meand1 – The (weighted) mean value of d1 for the triangles in each bin.

  • meanlogd1 – The mean value of log(d1) for the triangles in each bin.

  • meand2 – The (weighted) mean value of d2 for the triangles in each bin.

  • meanlogd2 – The mean value of log(d2) for the triangles in each bin.

  • meand3 – The (weighted) mean value of d3 for the triangles in each bin.

  • meanlogd3 – The mean value of log(d3) for the triangles in each bin.

  • weight – The total weight in each bin.

  • ntri – The number of triangles going into each bin (including those where one or more objects have w=0).

If sep_units are given (either in the config dict or as a named kwarg) then the distances will all be in these units.

Note

If you separate out the steps of the Corr3.process command and use process_auto and/or Corr3.process_cross, then the units will not be applied to meanr or meanlogr until the finalize function is called.

The typical usage pattern is as follows:

>>> nnn = treecorr.NNNCorrelation(config)
>>> nnn.process(cat)         # For auto-correlation.
>>> rrr.process(rand)        # Likewise for random-random correlations
>>> drr.process(cat,rand)    # If desired, also do data-random correlations
>>> rdd.process(rand,cat)    # Also with two data and one random
>>> nnn.write(file_name,rrr=rrr,drr=drr,...)  # Write out to a file.
>>> zeta,varzeta = nnn.calculateZeta(rrr=rrr,drr=drr,rdd=rdd)  # Or get zeta directly.
Parameters:
  • config (dict) – A configuration dict that can be used to pass in kwargs if desired. This dict is allowed to have addition entries besides those listed in Corr3, which are ignored here. (default: None)

  • logger – If desired, a logger object for logging. (default: None, in which case one will be built according to the config dict’s verbose level.)

Keyword Arguments:

**kwargs – See the documentation for Corr3 for the list of allowed keyword arguments, which may be passed either directly or in the config dict.

__iadd__(other)[source]

Add a second Correlation object’s data to this one.

Note

For this to make sense, both objects should not have had finalize called yet. Then, after adding them together, you should call finalize on the sum.

__init__(config=None, *, logger=None, **kwargs)[source]

Initialize NNNCorrelation. See class doc for details.

calculateZeta(*, rrr, drr=None, rdd=None)[source]

Calculate the 3pt function given another 3pt function of random points using the same mask, and possibly cross correlations of the data and random.

There are two possible formulae that are currently supported.

  1. The simplest formula to use is \(\zeta^\prime = (DDD-RRR)/RRR\). In this case, only rrr needs to be given, the NNNCorrelation of a random field. However, note that in this case, the return value is not normally called \(\zeta\). Rather, this is an estimator of

    \[\zeta^\prime(d_1,d_2,d_3) = \zeta(d_1,d_2,d_3) + \xi(d_1) + \xi(d_2) + \xi(d_3)\]

    where \(\xi\) is the two-point correlation function for each leg of the triangle. You would typically want to calculate that separately and subtract off the two-point contributions.

  2. For auto-correlations, a better formula is \(\zeta = (DDD-RDD+DRR-RRR)/RRR\). In this case, RDD is the number of triangles where 1 point comes from the randoms and 2 points are from the data. Similarly, DRR has 1 point from the data and 2 from the randoms. These are what are calculated from calling:

    >>> drr.process(data_cat, rand_cat)
    >>> rdd.process(rand_cat, data_cat)
    

    Note

    One might thing the formula should be \(\zeta = (DDD-3RDD+3DRR-RRR)/RRR\) by analogy with the 2pt Landy-Szalay formula. However, the way these are calculated, the object we are calling RDD already includes triangles where R is in each of the 3 locations. So it is really more like RDD + DRD + DDR. These are not computed separately. Rather the single computation of rdd described above accumulates all three permutations together. So that one object includes everything for the second term. Likewise drr has all the permutations that are relevant for the third term.

  • If only rrr is provided, the first formula will be used.

  • If all of rrr, drr, rdd are provided then the second will be used.

Note

This method is not valid for bin_type=’LogMultipole’. I don’t think there is a straightforward way to go directly from the multipole expoansion of DDD and RRR to Zeta. Normally one would instead convert both to LogSAS binning (cf. toSAS) and then call calculateZeta with those.

Parameters:
Returns:

Tuple containing

  • zeta = array of \(\zeta(d_1,d_2,d_3)\)

  • varzeta = array of variance estimates of \(\zeta(d_1,d_2,d_3)\)

copy()[source]

Make a copy

finalize()[source]

Finalize the calculation of meand1, meanlogd1, etc.

The process_auto, process_cross12 and process_cross commands accumulate values in each bin, so they can be called multiple times if appropriate. Afterwards, this command finishes the calculation of meand1, meand2, etc. by dividing by the total weight.

classmethod from_file(file_name, *, file_type=None, logger=None, rng=None)[source]

Create an NNNCorrelation instance from an output file.

This should be a file that was written by TreeCorr.

Parameters:
  • file_name (str) – The name of the file to read in.

  • file_type (str) – The type of file (‘ASCII’, ‘FITS’, or ‘HDF’). (default: determine the type automatically from the extension of file_name.)

  • logger (Logger) – If desired, a logger object to use for logging. (default: None)

  • rng (RandomState) – If desired, a numpy.random.RandomState instance to use for bootstrap random number generation. (default: None)

Returns:

An NNNCorrelation object, constructed from the information in the file.

getStat()[source]

The standard statistic for the current correlation object as a 1-d array.

This raises a RuntimeError if calculateZeta has not been run yet.

getWeight()[source]

The weight array for the current correlation object as a 1-d array.

This is the weight array corresponding to getStat. In this case, it is the denominator RRR from the calculation done by calculateZeta().

process_auto(cat, *, metric=None, num_threads=None)[source]

Process a single catalog, accumulating the auto-correlation.

This accumulates the auto-correlation for the given catalog. After calling this function as often as desired, the finalize command will finish the calculation of meand1, meanlogd1, etc.

Parameters:
  • cat (Catalog) – The catalog to process

  • metric (str) – Which metric to use. See Metrics for details. (default: ‘Euclidean’; this value can also be given in the constructor in the config dict.)

  • num_threads (int) – How many OpenMP threads to use during the calculation. (default: use the number of cpu cores; this value can also be given in the constructor in the config dict.)

process_cross(cat1, cat2, cat3, *, metric=None, ordered=True, num_threads=None)[source]

Process a set of three catalogs, accumulating the 3pt cross-correlation.

This accumulates the cross-correlation for the given catalogs as part of a larger auto- or cross-correlation calculation. E.g. when splitting up a large catalog into patches, this is appropriate to use for the cross correlation between different patches as part of the complete auto-correlation of the full catalog.

Parameters:
  • cat1 (Catalog) – The first catalog to process

  • cat2 (Catalog) – The second catalog to process

  • cat3 (Catalog) – The third catalog to process

  • metric (str) – Which metric to use. See Metrics for details. (default: ‘Euclidean’; this value can also be given in the constructor in the config dict.)

  • ordered (bool) – Whether to fix the order of the triangle vertices to match the catalogs. (default: True)

  • num_threads (int) – How many OpenMP threads to use during the calculation. (default: use the number of cpu cores; this value can also be given in the constructor in the config dict.)

process_cross12(cat1, cat2, *, metric=None, ordered=True, num_threads=None)[source]

Process two catalogs, accumulating the 3pt cross-correlation, where one of the points in each triangle come from the first catalog, and two come from the second.

This accumulates the cross-correlation for the given catalogs as part of a larger auto- or cross-correlation calculation. E.g. when splitting up a large catalog into patches, this is appropriate to use for the cross correlation between different patches as part of the complete auto-correlation of the full catalog.

Parameters:
  • cat1 (Catalog) – The first catalog to process. (1 point in each triangle will come from this catalog.)

  • cat2 (Catalog) – The second catalog to process. (2 points in each triangle will come from this catalog.)

  • metric (str) – Which metric to use. See Metrics for details. (default: ‘Euclidean’; this value can also be given in the constructor in the config dict.)

  • ordered (bool) – Whether to fix the order of the triangle vertices to match the catalogs. (default: True)

  • num_threads (int) – How many OpenMP threads to use during the calculation. (default: use the number of cpu cores; this value can also be given in the constructor in the config dict.)

read(file_name, *, file_type=None)[source]

Read in values from a file.

This should be a file that was written by TreeCorr, preferably a FITS or HDF5 file, so there is no loss of information.

Warning

The NNNCorrelation object should be constructed with the same configuration parameters as the one being read. e.g. the same min_sep, max_sep, etc. This is not checked by the read function.

Parameters:
  • file_name (str) – The name of the file to read in.

  • file_type (str) – The type of file (‘ASCII’ or ‘FITS’). (default: determine the type automatically from the extension of file_name.)

toSAS(*, target=None, **kwargs)[source]

Convert a multipole-binned correlation to the corresponding SAS binning.

This is only valid for bin_type == LogMultipole.

Keyword Arguments:
  • target – A target NNNCorrelation object with LogSAS binning to write to. If this is not given, a new object will be created based on the configuration paramters of the current object. (default: None)

  • **kwargs – Any kwargs that you want to use to configure the returned object. Typically, might include min_phi, max_phi, nphi_bins, phi_bin_size. The default phi binning is [0,pi] with nphi_bins = self.max_n.

Returns:

An NNNCorrelation object with bin_type=LogSAS containing the same information as this object, but with the SAS binning.

write(file_name, *, rrr=None, drr=None, rdd=None, file_type=None, precision=None, write_patch_results=False, write_cov=False)[source]

Write the correlation function to the file, file_name.

Normally, at least rrr should be provided, but if this is None, then only the basic accumulated number of triangles are output (along with the columns parametrizing the size and shape of the triangles).

If at least rrr is given, then it will output an estimate of the final 3pt correlation function, \(\zeta\). There are two possible formulae that are currently supported.

  1. The simplest formula to use is \(\zeta^\prime = (DDD-RRR)/RRR\). In this case, only rrr needs to be given, the NNNCorrelation of a random field. However, note that in this case, the return value is not what is normally called \(\zeta\). Rather, this is an estimator of

    \[\zeta^\prime(d_1,d_2,d_3) = \zeta(d_1,d_2,d_3) + \xi(d_1) + \xi(d_2) + \xi(d_3)\]

    where \(\xi\) is the two-point correlation function for each leg of the triangle. You would typically want to calculate that separately and subtract off the two-point contributions.

  2. For auto-correlations, a better formula is \(\zeta = (DDD-RDD+DRR-RRR)/RRR\). In this case, RDD is the number of triangles where 1 point comes from the randoms and 2 points are from the data. Similarly, DRR has 1 point from the data and 2 from the randoms. For this case, all combinations rrr, drr, and rdd must be provided.

For bin_type = LogRUV, the output file will include the following columns:

Column

Description

r_nom

The nominal center of the bin in r = d2 where d1 > d2 > d3

u_nom

The nominal center of the bin in u = d3/d2

v_nom

The nominal center of the bin in v = +-(d1-d2)/d3

meanu

The mean value \(\langle u\rangle\) of triangles that fell into each bin

meanv

The mean value \(\langle v\rangle\) of triangles that fell into each bin

For bin_type = LogSAS, the output file will include the following columns:

Column

Description

d2_nom

The nominal center of the bin in d2

d3_nom

The nominal center of the bin in d3

phi_nom

The nominal center of the bin in phi, the opening angle between d2 and d3 in the counter-clockwise direction

meanphi

The mean value \(\langle phi\rangle\) of triangles that fell into each bin

For bin_type = LogMultipole, the output file will include the following columns:

Column

Description

d2_nom

The nominal center of the bin in d2

d3_nom

The nominal center of the bin in d3

n

The multipole index n

weight_re

The real part of the complex weight.

weight_im

The imaginary part of the complex weight.

In addition, all bin types include the following columns:

Column

Description

meand1

The mean value \(\langle d1\rangle\) of triangles that fell into each bin

meanlogd1

The mean value \(\langle \log(d1)\rangle\) of triangles that fell into each bin

meand2

The mean value \(\langle d2\rangle\) of triangles that fell into each bin

meanlogd2

The mean value \(\langle \log(d2)\rangle\) of triangles that fell into each bin

meand3

The mean value \(\langle d3\rangle\) of triangles that fell into each bin

meanlogd3

The mean value \(\langle \log(d3)\rangle\) of triangles that fell into each bin

zeta

The estimator \(\zeta\) (if rrr is given, or zeta was already computed)

sigma_zeta

The sqrt of the variance estimate of \(\zeta\) (if rrr is given)

DDD

The total weight of DDD triangles in each bin

RRR

The total weight of RRR triangles in each bin (if rrr is given)

DRR

The total weight of DRR triangles in each bin (if drr is given)

RDD

The total weight of RDD triangles in each bin (if rdd is given)

ntri

The number of triangles contributing to each bin

If sep_units was given at construction, then the distances will all be in these units. Otherwise, they will be in either the same units as x,y,z (for flat or 3d coordinates) or radians (for spherical coordinates).

Parameters:
  • file_name (str) – The name of the file to write to.

  • rrr (NNNCorrelation) – The auto-correlation of the random field (RRR)

  • drr (NNNCorrelation) – DRR if desired. (default: None)

  • rdd (NNNCorrelation) – RDD if desired. (default: None)

  • file_type (str) – The type of file to write (‘ASCII’ or ‘FITS’). (default: determine the type automatically from the extension of file_name.)

  • precision (int) – For ASCII output catalogs, the desired precision. (default: 4; this value can also be given in the constructor in the config dict.)

  • write_patch_results (bool) – Whether to write the patch-based results as well. (default: False)

  • write_cov (bool) – Whether to write the covariance matrix as well. (default: False)