Three-point Correlation Functions
TreeCorr can compute three-point correlations for several different kinds of fields. For notational brevity in the various classes involved, we use a single letter to represent each kind of field as follows:
N represents simple counting statistics. The underlying field is a density of objects, which is manifested by objects appearing at specific locations. The assumption here is that the probability of an object occurring at a specific location is proportional to the underlying density field at that spot.
K represents a real, scalar field. Nominally, K is short for “kappa”, since TreeCorr was originally written for weak lensing applications, and kappa is the name of the weak lensing convergence, a measure of the projected matter density along the line of sight.
G represents a complex, spin-2 shear field. Spin-2 means that the complex value changes by \(\exp(2i \phi)\) when the orientation is rotated by an angle \(\phi\). The letter g is commonly used for reduced shear in the weak lensing context (and \(\gamma\) is the unreduced shear), which is a spin-2 field, hence our use of G for spin-2 fields in Treecorr.
We have not yet implemented complex fields with spin 0, 1, 3 or 4 (called Z, V, T, and Q respectively) as we have for two-point functions. If you have a use case that requires any of these, please open an issue requesting this feature.
The following classes are used for computing the three-point functions according to which field is on each vertex of the triangle.
- NNNCorrelation: Count-count-count correlations
- KKKCorrelation: Scalar-scalar-scalar correlations
- GGGCorrelation: Shear-shear-shear correlations
- NNKCorrelation: Count-count-scalar correlations
- NKNCorrelation: Count-scalar-count correlations
- KNNCorrelation: Scalar-count-count correlations
- NNGCorrelation: Count-count-shear correlations
- NGNCorrelation: Count-shear-count correlations
- GNNCorrelation: Shear-count-count correlations
- NKKCorrelation: Count-scalar-scalar correlations
- KNKCorrelation: Scalar-count-scalar correlations
- KKNCorrelation: Scalar-scalar-count correlations
- NGGCorrelation: Count-shear-shear correlations
- GNGCorrelation: Shear-count-shear correlations
- GGNCorrelation: Shear-shear-count correlations
- KKGCorrelation: Scalar-scalar-shear correlations
- KGKCorrelation: Scalar-shear-scalar correlations
- GKKCorrelation: Shear-scalar-scalar correlations
- KGGCorrelation: Scalar-shear-shear correlations
- GKGCorrelation: Shear-scalar-shear correlations
- GGKCorrelation: Shear-shear-scalar correlations
Each of the above classes is a sub-class of the base class Corr3, so they have a number of features in common about how they are constructed. The common features are documented here.
- class treecorr.Corr3(config=None, *, logger=None, rng=None, **kwargs)[source]
This class stores the results of a 3-point correlation calculation, along with some ancillary data.
This is a base class that is not intended to be constructed directly. But it has a few helper functions that derived classes can use to help perform their calculations. See the derived classes for more details:
NNNCorrelation
handles count-count-count correlation functionsKKKCorrelation
handles scalar-scalar-scalar correlation functionsGGGCorrelation
handles shear-shear-shear correlation functions
Three-point correlations are a bit more complicated than two-point, since the data need to be binned in triangles, not just the separation between two points.
There are currenlty three different ways to quantify the triangle shapes.
The triangle can be defined by its three side lengths (i.e. SSS congruence). In this case, we characterize the triangles according to the following three parameters based on the three side lengths with d1 >= d2 >= d3.
\[\begin{split}r &= d2 \\ u &= \frac{d3}{d2} \\ v &= \pm \frac{(d1 - d2)}{d3} \\\end{split}\]The orientation of the triangle is specified by the sign of v. Positive v triangles have the three sides d1,d2,d3 in counter-clockwise orientation. Negative v triangles have the three sides d1,d2,d3 in clockwise orientation.
Note
We always bin the same way for positive and negative v values, and the binning specification for v should just be for the positive values. E.g. if you specify min_v=0.2, max_v=0.6, then TreeCorr will also accumulate triangles with -0.6 < v < -0.2 in addition to those with 0.2 < v < 0.6.
The triangle can be defined by two of the sides and the angle between them (i.e. SAS congruence). The vertex point between the two sides is considered point “1” (P1), so the two sides (opposite points 2 and 3) are called d2 and d3. The angle between them is called phi, and it is measured in radians.
The orientation is defined such that 0 <= phi <= pi is the angle sweeping from d2 to d3 counter-clockwise.
Unlike the SSS definition where every triangle is uniquely placed in a single bin, this definition forms a triangle with each object at the central vertex, P1, so for auto-correlations, each triangle is placed in bins three times. For cross-correlations, the order of the points is such that objects in the first catalog are at the central vertex, P1, objects in the second catalog are at P2, which is opposite d2 (i.e. at the end of line segment d3 from P1), and objects in the third catalog are at P3, opposite d3 (i.e. at the end of d2 from P1).
The third option is a multipole expansion of the SAS description. This idea was initially developed by Chen & Szapudi (2005, ApJ, 635, 743) and then further refined by Slepian & Eisenstein (2015, MNRAS, 454, 4142), Philcox et al (2022, MNRAS, 509, 2457), and Porth et al (2024, A&A, 689, 227). The latter in particular showed how to use this method for non-spin-0 correlations (GGG in particular).
The basic idea is to do a Fourier transform of the phi binning to convert the phi bins into n bins.
\[\zeta(d_2, d_3, \phi) = \frac{1}{2\pi} \sum_n \mathcal{Z}_n(d_2,d_3) e^{i n \phi}\]Formally, this is exact if the sum goes from \(-\infty .. \infty\). Truncating this sum at \(\pm n_\mathrm{max}\) is similar to binning in theta with this many bins for \(\phi\) within the range \(0 <= \phi <= \pi\).
The above papers show that this multipole expansion allows for a much more efficient calculation, since it can be done with a kind of 2-point calculation. We provide methods to convert the multipole output into the SAS binning if desired, since that is often more convenient in practice.
The constructor for all derived classes take a config dict as the first argument, since this is often how we keep track of parameters, but if you don’t want to use one or if you want to change some parameters from what are in a config dict, then you can use normal kwargs, which take precedence over anything in the config dict.
There are a number of possible definitions for the distance between two points, which are appropriate for different use cases. These are specified by the
metric
parameter. The possible options are:‘Euclidean’ = straight line Euclidean distance between two points. For spherical coordinates (ra,dec without r), this is the chord distance between points on the unit sphere.
‘FisherRperp’ = the perpendicular component of the distance, following the definitions in Fisher et al, 1994 (MNRAS, 267, 927).
‘OldRperp’ = the perpendicular component of the distance using the definition of Rperp from TreeCorr v3.x.
‘Rperp’ = an alias for FisherRperp. You can change it to be an alias for OldRperp if you want by setting
treecorr.Rperp_alias = 'OldRperp'
before using it.‘Rlens’ = the distance from the first object (taken to be a lens) to the line connecting Earth and each of the other two objects (taken to be lensed sources).
‘Arc’ = the true great circle distance for spherical coordinates.
‘Periodic’ = Like Euclidean, but with periodic boundaries.
Note
The triangles for three-point correlations can become ambiguous if a triangle side length d > period/2, which means for the SSS triangle definition, max_sep (the maximum d2) should be less than period/4, and for SAS, max_sep should be less than period/2. This is not enforced.
See Metrics for more information about these various metric options.
There are three allowed value for the
bin_type
for three-point correlations.‘LogRUV’ uses the SSS description given above converted to r,u,v. The bin steps will be uniform in log(r) from log(min_sep) .. log(max_sep). The u and v values are binned linearly from min_u .. max_u and min_v .. max_v.
‘LogSAS’ uses the SAS description given above. The bin steps will be uniform in log(d) for both d2 and d3 from log(min_sep) .. log(max_sep). The phi values are binned linearly from min_phi .. max_phi. This is the default.
‘LogMultipole’ uses the multipole description given above. The bin steps will be uniform in log(d) for both d2 and d3 from log(min_sep) .. log(max_sep), and the n value range from -max_n .. max_n, inclusive.
Objects of any
Corr3
subclass hold the following attributes:- Attributes:
nbins – The number of bins in logr where r = d2.
bin_size – The size of the bins in logr.
min_sep – The minimum separation being considered.
max_sep – The maximum separation being considered.
logr1d – The nominal centers of the nbins bins in log(r).
If the bin_type is LogRUV, then it will have these attributes:
- Attributes:
nubins – The number of bins in u where u = d3/d2.
ubin_size – The size of the bins in u.
min_u – The minimum u being considered.
max_u – The maximum u being considered.
nvbins – The number of bins in v where v = +-(d1-d2)/d3.
vbin_size – The size of the bins in v.
min_v – The minimum v being considered.
max_v – The maximum v being considered.
u1d – The nominal centers of the nubins bins in u.
v1d – The nominal centers of the nvbins bins in v.
If the bin_type is LogSAS, then it will have these attributes:
- Attributes:
nphi_bins – The number of bins in phi.
phi_bin_size – The size of the bins in phi.
min_phi – The minimum phi being considered.
max_phi – The maximum phi being considered.
phi1d – The nominal centers of the nphi_bins bins in phi.
If the bin_type is LogMultipole, then it will have these attributes:
- Attributes:
max_n – The maximum multipole index n being stored.
n1d – The multipole index n in the 2*max_n+1 bins of the third bin direction.
In addition, the following attributes are numpy arrays whose shape is:
(nbins, nubins, nvbins) if bin_type is LogRUV
(nbins, nbins, nphi_bins) if bin_type is LogSAS
(nbins, nbins, 2*max_n+1) if bin_type is LogMultipole
If bin_type is LogRUV:
- Attributes:
logr – The nominal center of each bin in log(r).
rnom – The nominal center of each bin converted to regular distance. i.e. r = exp(logr).
u – The nominal center of each bin in u.
v – The nominal center of each bin in v.
meanu – The (weighted) mean value of u for the triangles in each bin.
meanv – The (weighted) mean value of v for the triangles in each bin.
If bin_type is LogSAS:
- Attributes:
logd2 – The nominal center of each bin in log(d2).
d2nom – The nominal center of each bin converted to regular d2 distance. i.e. d2 = exp(logd2).
logd3 – The nominal center of each bin in log(d3).
d3nom – The nominal center of each bin converted to regular d3 distance. i.e. d3 = exp(logd3).
phi – The nominal center of each angular bin.
meanphi – The (weighted) mean value of phi for the triangles in each bin.
If bin_type is LogMultipole:
- Attributes:
logd2 – The nominal center of each bin in log(d2).
d2nom – The nominal center of each bin converted to regular d2 distance. i.e. d2 = exp(logd2).
logd3 – The nominal center of each bin in log(d3).
d3nom – The nominal center of each bin converted to regular d3 distance. i.e. d3 = exp(logd3).
n – The multipole index n for each bin.
For any bin_type:
- Attributes:
meand1 – The (weighted) mean value of d1 for the triangles in each bin.
meanlogd1 – The (weighted) mean value of log(d1) for the triangles in each bin.
meand2 – The (weighted) mean value of d2 (aka r) for the triangles in each bin.
meanlogd2 – The (weighted) mean value of log(d2) for the triangles in each bin.
meand3 – The (weighted) mean value of d3 for the triangles in each bin.
meanlogd3 – The (weighted) mean value of log(d3) for the triangles in each bin.
weight – The total weight in each bin.
ntri – The number of triangles going into each bin (including those where one or more objects have w=0).
If
sep_units
are given (either in the config dict or as a named kwarg) then the distances will all be in these units.Note
If you separate out the steps of the
process
command and useprocess_auto
and/orprocess_cross
, then the units will not be applied tomeanr
ormeanlogr
until thefinalize
function is called.- Parameters:
config (dict) – A configuration dict that can be used to pass in the below kwargs if desired. This dict is allowed to have addition entries in addition to those listed below, which are ignored here. (default: None)
logger – If desired, a logger object for logging. (default: None, in which case one will be built according to the config dict’s verbose level.)
- Keyword Arguments:
nbins (int) – How many bins to use. (Exactly three of nbins, bin_size, min_sep, max_sep are required. If nbins is not given or set to None, it will be calculated from the values of the other three, rounding up to the next highest integer. In this case, bin_size will be readjusted to account for this rounding up.)
bin_size (float) – The width of the bins in log(separation). (Exactly three of nbins, bin_size, min_sep, max_sep are required. If bin_size is not given or set to None, it will be calculated from the values of the other three.)
min_sep (float) – The minimum separation in units of sep_units, if relevant. (Exactly three of nbins, bin_size, min_sep, max_sep are required. If min_sep is not given or set to None, it will be calculated from the values of the other three.)
max_sep (float) – The maximum separation in units of sep_units, if relevant. (Exactly three of nbins, bin_size, min_sep, max_sep are required. If max_sep is not given or set to None, it will be calculated from the values of the other three.)
sep_units (str) – The units to use for the separation values, given as a string. This includes both min_sep and max_sep above, as well as the units of the output distance values. Valid options are arcsec, arcmin, degrees, hours, radians. (default: radians if angular units make sense, but for 3-d or flat 2-d positions, the default will just match the units of x,y[,z] coordinates)
bin_slop (float) – How much slop to allow in the placement of triangles in the bins. If bin_slop = 1, then the bin into which a particular pair is placed may be incorrect by at most 1.0 bin widths. (default: None, which means to use a bin_slop that gives a maximum error of 10% on any bin, which has been found to yield good results for most application.)
angle_slop (float) – How much slop to allow in the angular direction. This works very similarly to bin_slop, but applies to the projection angle of a pair of cells. The projection angle for any two objects in a pair of cells will differ by no more than angle_slop radians from the projection angle defined by the centers of the cells. (default: 0.1)
brute (bool) –
Whether to use the “brute force” algorithm. (default: False) Options are:
False (the default): Stop at non-leaf cells whenever the error in the separation is compatible with the given bin_slop and angle_slop.
True: Go to the leaves for both catalogs.
1: Always go to the leaves for cat1, but stop at non-leaf cells of cat2 when the error is compatible with the given slop values.
2: Always go to the leaves for cat2, but stop at non-leaf cells of cat1 when the error is compatible with the given slop values.
nphi_bins (int) – Analogous to nbins for the phi values when bin_type=LogSAS. (The default is to calculate from phi_bin_size = bin_size, min_phi = 0, max_u = np.pi, but this can be overridden by specifying up to 3 of these four parametes.)
phi_bin_size (float) – Analogous to bin_size for the phi values. (default: bin_size)
min_phi (float) – Analogous to min_sep for the phi values. (default: 0)
max_phi (float) – Analogous to max_sep for the phi values. (default: np.pi)
phi_units (str) – The units to use for the phi values, given as a string. This includes both min_phi and max_phi above, as well as the units of the output meanphi values. Valid options are arcsec, arcmin, degrees, hours, radians. (default: radians)
max_n (int) – The maximum value of n to store for the multipole binning. (required if bin_type=LogMultipole)
nubins (int) – Analogous to nbins for the u values when bin_type=LogRUV. (The default is to calculate from ubin_size = bin_size, min_u = 0, max_u = 1, but this can be overridden by specifying up to 3 of these four parametes.)
ubin_size (float) – Analogous to bin_size for the u values. (default: bin_size)
min_u (float) – Analogous to min_sep for the u values. (default: 0)
max_u (float) – Analogous to max_sep for the u values. (default: 1)
nvbins (int) – Analogous to nbins for the positive v values when bin__type=LogRUV. (The default is to calculate from vbin_size = bin_size, min_v = 0, max_v = 1, but this can be overridden by specifying up to 3 of these four parametes.)
vbin_size (float) – Analogous to bin_size for the v values. (default: bin_size)
min_v (float) – Analogous to min_sep for the positive v values. (default: 0)
max_v (float) – Analogous to max_sep for the positive v values. (default: 1)
verbose (int) –
If no logger is provided, this will optionally specify a logging level to use:
0 means no logging output
1 means to output warnings only (default)
2 means to output various progress information
3 means to output extensive debugging information
log_file (str) – If no logger is provided, this will specify a file to write the logging output. (default: None; i.e. output to standard output)
output_dots (bool) – Whether to output progress dots during the calcualtion of the correlation function. (default: False unless verbose is given and >= 2, in which case True)
split_method (str) –
How to split the cells in the tree when building the tree structure. Options are:
mean = Use the arithmetic mean of the coordinate being split. (default)
median = Use the median of the coordinate being split.
middle = Use the middle of the range; i.e. the average of the minimum and maximum value.
random: Use a random point somewhere in the middle two quartiles of the range.
min_top (int) – The minimum number of top layers to use when setting up the field. (default: \(\max(3, \log_2(N_{\rm cpu}))\))
max_top (int) – The maximum number of top layers to use when setting up the field. The top-level cells are where each calculation job starts. There will typically be of order \(2^{\rm max\_top}\) top-level cells. (default: 10)
precision (int) – The precision to use for the output values. This specifies how many digits to write. (default: 4)
metric (str) – Which metric to use for distance measurements. Options are listed above. (default: ‘Euclidean’)
bin_type (str) – What type of binning should be used. Options are listed above. (default: ‘LogSAS’)
min_rpar (float) – The minimum difference in Rparallel to allow for pairs being included in the correlation function. (default: None)
max_rpar (float) – The maximum difference in Rparallel to allow for pairs being included in the correlation function. (default: None)
period (float) – For the ‘Periodic’ metric, the period to use in all directions. (default: None)
xperiod (float) – For the ‘Periodic’ metric, the period to use in the x direction. (default: period)
yperiod (float) – For the ‘Periodic’ metric, the period to use in the y direction. (default: period)
zperiod (float) – For the ‘Periodic’ metric, the period to use in the z direction. (default: period)
var_method (str) – Which method to use for estimating the variance. Options are: ‘shot’, ‘jackknife’, ‘sample’, ‘bootstrap’, ‘marked_bootstrap’. (default: ‘shot’)
num_bootstrap (int) – How many bootstrap samples to use for the ‘bootstrap’ and ‘marked_bootstrap’ var_methods. (default: 500)
cross_patch_weight (str) – How to weight pairs that cross between two patches when one patch is deselected (e.g. in a jackknife sense) and the other is selected. (default None)
rng (RandomState) – If desired, a numpy.random.RandomState instance to use for bootstrap random number generation. (default: None)
num_threads (int) –
How many OpenMP threads to use during the calculation. (default: use the number of cpu cores; this value can also be given in the constructor in the config dict.)
Note
This won’t work if the system’s C compiler cannot use OpenMP (e.g. clang prior to version 3.7.)
- build_cov_design_matrix(method, *, func=None, comm=None, num_bootstrap=None, cross_patch_weight=None)[source]
Build the design matrix that is used for estimating the covariance matrix.
The design matrix for patch-based covariance estimates is a matrix where each row corresponds to a different estimate of the data vector, \(\zeta_i\) (or \(f(\zeta_i)\) if using the optional
func
parameter).The different of rows in the matrix for each valid
method
are:‘shot’: This method is not valid here.
‘jackknife’: The data vector when excluding a single patch.
‘sample’: The data vector using only a single patch for the first catalog.
‘bootstrap’: The data vector for a random resampling of the patches keeping the sample total number, but allowing some to repeat. Cross terms from repeated patches are excluded (since they are really auto terms).
‘marked_bootstrap’: The data vector for a random resampling of patches in the first catalog, using all patches for the second catalog. Based on the algorithm in Loh(2008).
See
estimate_cov
for more details.The return value includes both the design matrix and a vector of weights (the total weight array in the computed correlation functions). The weights are used for the sample method when estimating the covariance matrix. The other methods ignore them, but they are provided here in case they are useful.
- Parameters:
method (str) – Which method to use to estimate the covariance matrix.
func (function) – A unary function that takes the list
corrs
and returns the desired full data vector. (default: None, which is equivalent tolambda corrs: np.concatenate([c.getStat() for c in corrs])
)comm (mpi4py.Comm) – If using MPI, an mpi4py Comm object to communicate between processes. (default: None)
num_bootstrap (int) – How many bootstrap samples to use for the ‘bootstrap’ and ‘marked_bootstrap’ var_methods. (default: 500; this value can also be given in the constructor.)
cross_patch_weight (str) – How to weight pairs that cross between two patches when one patch is deselected (e.g. in a jackknife sense) and the other is selected. (default ‘simple’; this value can also be given in the constructor.)
- Returns:
numpy arrays with the design matrix and weights respectively.
- Return type:
A, w
- property cov
The estimated covariance matrix
- property cov_diag
A possibly more efficient way to access just the diagonal of the covariance matrix.
If var_method == ‘shot’, then this won’t make the full covariance matrix, just to then pull out the diagonal.
- estimate_cov(method, *, func=None, comm=None, num_bootstrap=None, cross_patch_weight=None)[source]
Estimate the covariance matrix based on the data
This function will calculate an estimate of the covariance matrix according to the given method.
Options for
method
include:‘shot’ = The variance based on “shot noise” only. This includes the Poisson counts of points for N statistics, shape noise for G statistics, and the observed scatter in the values for K statistics. In this case, the returned value will only be the diagonal. Use np.diagonal(cov) if you actually want a full matrix from this.
‘jackknife’ = A jackknife estimate of the covariance matrix based on the scatter in the measurement when excluding one patch at a time.
‘sample’ = An estimate based on the sample covariance of a set of samples, taken as the patches of the input catalog.
‘bootstrap’ = A bootstrap covariance estimate. It selects patches at random with replacement and then generates the statistic using all the auto-correlations at their selected repetition plus all the cross terms that aren’t actually auto terms.
‘marked_bootstrap’ = An estimate based on a marked-point bootstrap resampling of the patches. Similar to bootstrap, but only samples the patches of the first catalog and uses all patches from the second catalog that correspond to each patch selection of the first catalog. cf. https://ui.adsabs.harvard.edu/abs/2008ApJ…681..726L/
Both ‘bootstrap’ and ‘marked_bootstrap’ use the num_bootstrap parameter, which can be set on construction.
Another relevant parameter is ‘cross_patch_weight’. This parameter controls how triangles that cross between two or three patches are weighted when some, but not all of the patches are selected. See Mohammad and Percival (2021) (https://arxiv.org/abs/2109.07071) for an in-depth discussion of these options for two-point statistics. We use a similar definitions for three-point statistics. Briefly the options are: (TODO! This is aspirational so far.)
‘simple’ = Don’t use any triangles where any object is in a deselected patch. This is currently the default for all methods.
‘mean’ = Use a weight of 1/3 for any triangle with one object in a selected patch and the other two in deselected patches, and 2/3 for any triangle with two objects in selected patches.
‘geom’ = Use the geometric mean of the three patch weights for each triangle.
‘match’ = Use the “optimal” weight that matches the effect of auto- and cross-pairs for two-point jackknife covariances derived by Mohammad and Percival (w = n_patch / (2 + sqrt(2) (n_patch-1))). There is a similar formula for triangles that span three different patches (w = sqrt(2) n_patch / 3 (n_patch - 1 + sqrt(2))), which we use for those triples.
Note
For most classes, there is only a single statistic,
zeta
, so this calculates a covariance matrix for that vector.GGGCorrelation
has four:gam0
,gam1
,gam2
, andgam3
, so in this case the full data vector isgam0
followed bygam1
, thengam2
, thengam3
, and this calculates the covariance matrix for that full vector including both statistics. The helper functiongetStat
returns the relevant statistic in all cases.In all cases, the relevant processing needs to already have been completed and finalized. And for all methods other than ‘shot’, the processing should have involved an appropriate number of patches – preferably more patches than the length of the vector for your statistic, although this is not checked.
The default data vector to use for the covariance matrix is given by the method
getStat
. As noted above, this is usually just self.zeta. However, there is an option to compute the covariance of some other function of the correlation object by providing an arbitrary function,func
, which should act on the current correlation object and return the data vector of interest.For instance, for an
GGGCorrelation
, you might want to compute the covariance of just gam0 and ignore the others. In this case you could use>>> func = lambda ggg: ggg.gam0
The return value from this func should be a single numpy array. (This is not directly checked, but you’ll probably get some kind of exception if it doesn’t behave as expected.)
Note
The optional
func
parameter is not valid in conjunction withmethod='shot'
. It only works for the methods that are based on patch combinations.This function can be parallelized by passing the comm argument as an mpi4py communicator to parallelize using that. For MPI, all processes should have the same inputs. If method == “shot” then parallelization has no effect.
- Parameters:
method (str) – Which method to use to estimate the covariance matrix.
func (function) – A unary function that acts on the current correlation object and returns the desired data vector. (default: None, which is equivalent to
lambda corr: corr.getStat()
)comm (mpi4py.Comm) – If using MPI, an mpi4py Comm object to communicate between processes. (default: None)
num_bootstrap (int) – How many bootstrap samples to use for the ‘bootstrap’ and ‘marked_bootstrap’ var_methods. (default: 500; this value can also be given in the constructor.)
cross_patch_weight (str) – How to weight pairs that cross between two patches when one patch is deselected (e.g. in a jackknife sense) and the other is selected. (default ‘simple’; this value can also be given in the constructor.)
- Returns:
A numpy array with the estimated covariance matrix.
- classmethod from_file(file_name, *, file_type=None, logger=None, rng=None)[source]
Create a new instance from an output file.
This should be a file that was written by TreeCorr.
Note
This classmethod may be called either using the base class or the class type that wrote the file. E.g. if the file was written by
GGGCorrelation
, then either of the following would work and be equivalent:>>> ggg = treecorr.GGGCorrelation.from_file(file_name) >>> ggg = treecorr.Corr3.from_file(file_name)
- Parameters:
file_name (str) – The name of the file to read in.
file_type (str) – The type of file (‘ASCII’, ‘FITS’, or ‘HDF’). (default: determine the type automatically from the extension of file_name.)
logger (Logger) – If desired, a logger object to use for logging. (default: None)
rng (RandomState) – If desired, a numpy.random.RandomState instance to use for bootstrap random number generation. (default: None)
- Returns:
A Correlation object, constructed from the information in the file.
- getStat()[source]
The standard statistic for the current correlation object as a 1-d array.
Usually, this is just self.zeta. But in case we have a multi-dimensional array at some point (like TwoD for 2pt), use self.zeta.ravel().
And for
GGGCorrelation
, it is the concatenation of the four different correlations [gam0.ravel(), gam1.ravel(), gam2.ravel(), gam3.ravel()].
- getWeight()[source]
The weight array for the current correlation object as a 1-d array.
This is the weight array corresponding to
getStat
. Usually just self.weight.ravel(), but duplicated for GGGCorrelation to match whatgetStat
does in that case.
- property nonzero
Return if there are any values accumulated yet. (i.e. ntri > 0)
- process(cat1, cat2=None, cat3=None, *, metric=None, ordered=True, num_threads=None, comm=None, low_mem=False, initialize=True, finalize=True, patch_method=None, algo=None, max_n=None, corr_only=False)[source]
Compute the 3pt correlation function.
If only 1 argument is given, then compute an auto-correlation function.
If 2 arguments are given, then compute a cross-correlation function with the first catalog taking one corner of the triangles, and the second taking two corners.
If 3 arguments are given, then compute a three-way cross-correlation function.
Note
For cross correlations where the third field type is different from the other two (e.g. KKG, NNG, etc.) then the 2 argument version will use the first catalog for first two vertices and the second for the third vertex, since that’s the only valid combination for those correlation types.
E.g.
kkg.process(cat1, cat2)
is equivalent tokkg.process(cat1, cat1, cat2)
, except it will be slightly more efficient, since it knows the first two vertices are from a single field.For cross correlations, the default behavior is to use cat1 for the first vertex (P1), cat2 for the second vertex (P2), and cat3 for the third vertex (P3). If only two catalogs are given, vertices P2 and P3 both come from cat2. The sides d1, d2, d3, used to define the binning, are taken to be opposte P1, P2, P3 respectively.
However, if you want to accumulate triangles where objects from each catalog can take any position in the triangles, you can set
ordered=False
. In this case, triangles will be formed where P1, P2 and P3 can come from any input catalog, so long as there is one from cat1, one from cat2, and one from cat3 (or two from cat2 if cat3 is None).All catalog arguments may be lists, in which case all items in the list are used for that element of the correlation.
Note
In addition to ordered = True or False, you may also set ordered to 1, 2 or 3 which means that the catalog in that position is fixed, but the other two vertices are unordered. E.g. if ordered=3, then P3 will always come from cat3, but P1 and P2 will each come from one of cat1 or cat2 in either order. This option is only valid when all three catalogs (cat1, cat2, cat3) are given.
In addition to computing the correlation function, this function also computes a number of ancillary quantities that are useful for interpreting the resulting correlation function, including the attributes
meand1
,meanlogd1
, etc. These almost never impart significant extra computation time for three-point correlations, but we provide the optioncorr_only=True
in analogy to thetwo-point version
, which skips these computations. In this case the resultingmeand?
andmeanlogd?
attributes are the values for the nominal centers of the bins, not the actual mean values. Andntri
is estimated from the total computedweight
and the mean weight in the catalogs.- Parameters:
cat1 (Catalog) – A catalog or list of catalogs for the first field.
cat2 (Catalog) – A catalog or list of catalogs for the second field. (default: None)
cat3 (Catalog) – A catalog or list of catalogs for the third field. (default: None)
metric (str) – Which metric to use. See Metrics for details. (default: ‘Euclidean’; this value can also be given in the constructor in the config dict.)
ordered (bool) – Whether to fix the order of the triangle vertices to match the catalogs. (see above; default: True)
num_threads (int) – How many OpenMP threads to use during the calculation. (default: use the number of cpu cores; this value can also be given in the constructor in the config dict.)
comm (mpi4py.Comm) – If running MPI, an mpi4py Comm object to communicate between processes. If used, the rank=0 process will have the final computation. This only works if using patches. (default: None)
low_mem (bool) – Whether to sacrifice a little speed to try to reduce memory usage. This only works if using patches. (default: False)
initialize (bool) – Whether to begin the calculation with a call to
Corr3.clear
. (default: True)finalize (bool) – Whether to complete the calculation with a call to finalize. (default: True)
patch_method (str) – Which patch method to use. (default is to use ‘local’ if bin_type=LogMultipole, and ‘global’ otherwise)
algo (str) – Which accumulation algorithm to use. (options are ‘triangle’ or ‘multipole’; default is ‘multipole’ unless bin_type is ‘LogRUV’, which can only use ‘triangle’) cf. Three-point Algorithm.
max_n (int) – If using the multpole algorithm, and this is not directly using bin_type=’LogMultipole’, then this is the value of max_n to use for the multipole part of the calculation. (default is to use 2pi/phi_bin_size; this value can also be given in the constructor in the config dict.)
corr_only (bool) – Whether to skip summing quantities that are not essential for computing the correlation function. (default: False)
- process_auto(cat, *, metric=None, num_threads=None, corr_only=False)[source]
Process a single catalog, accumulating the auto-correlation.
This accumulates the auto-correlation for the given catalog. After calling this function as often as desired, the
finalize
command will finish the calculation of meand1, meanlogd1, etc.This method is only valid for classes that have the same type of value in all three triangle vertices. (E.g. NNN, GGG, KKK)
- Parameters:
cat (Catalog) – The catalog to process
metric (str) – Which metric to use. See Metrics for details. (default: ‘Euclidean’; this value can also be given in the constructor in the config dict.)
num_threads (int) – How many OpenMP threads to use during the calculation. (default: use the number of cpu cores; this value can also be given in the constructor in the config dict.)
corr_only (bool) – Whether to skip summing quantities that are not essential for computing the correlation function. (default: False)
- process_cross(cat1, cat2, cat3, *, metric=None, ordered=True, num_threads=None, corr_only=False)[source]
Process a set of three catalogs, accumulating the 3pt cross-correlation.
This accumulates the cross-correlation for the given catalogs as part of a larger auto- or cross-correlation calculation. E.g. when splitting up a large catalog into patches, this is appropriate to use for the cross correlation between different patches as part of the complete auto-correlation of the full catalog.
- Parameters:
cat1 (Catalog) – The first catalog to process
cat2 (Catalog) – The second catalog to process
cat3 (Catalog) – The third catalog to process
metric (str) – Which metric to use. See Metrics for details. (default: ‘Euclidean’; this value can also be given in the constructor in the config dict.)
ordered (bool) – Whether to fix the order of the triangle vertices to match the catalogs. (default: True)
num_threads (int) – How many OpenMP threads to use during the calculation. (default: use the number of cpu cores; this value can also be given in the constructor in the config dict.)
corr_only (bool) – Whether to skip summing quantities that are not essential for computing the correlation function. (default: False)
- process_cross12(cat1, cat2, *, metric=None, ordered=True, num_threads=None, corr_only=False)[source]
Process two catalogs, accumulating the 3pt cross-correlation, where one of the points in each triangle come from the first catalog, and two come from the second.
This accumulates the cross-correlation for the given catalogs as part of a larger auto- or cross-correlation calculation. E.g. when splitting up a large catalog into patches, this is appropriate to use for the cross correlation between different patches as part of the complete auto-correlation of the full catalog.
This method is only valid for classes that have the same type of value in vertices 2 and 3. (E.g. KKK, KGG, NKK)
- Parameters:
cat1 (Catalog) – The first catalog to process. (1 point in each triangle will come from this catalog.)
cat2 (Catalog) – The second catalog to process. (2 points in each triangle will come from this catalog.)
metric (str) – Which metric to use. See Metrics for details. (default: ‘Euclidean’; this value can also be given in the constructor in the config dict.)
ordered (bool) – Whether to fix the order of the triangle vertices to match the catalogs. (default: True)
num_threads (int) – How many OpenMP threads to use during the calculation. (default: use the number of cpu cores; this value can also be given in the constructor in the config dict.)
corr_only (bool) – Whether to skip summing quantities that are not essential for computing the correlation function. (default: False)
- process_cross21(cat1, cat2, *, metric=None, ordered=True, num_threads=None, corr_only=False)[source]
Process two catalogs, accumulating the 3pt cross-correlation, where two of the points in each triangle come from the first catalog, and one comes from the second.
This accumulates the cross-correlation for the given catalogs as part of a larger auto- or cross-correlation calculation. E.g. when splitting up a large catalog into patches, this is appropriate to use for the cross correlation between different patches as part of the complete auto-correlation of the full catalog.
This method is only valid for classes that have the same type of value in vertices 1 and 2. (E.g. KKK, KKG, NNK)
- Parameters:
cat1 (Catalog) – The first catalog to process. (2 points in each triangle will come from this catalog.)
cat2 (Catalog) – The second catalog to process. (1 point in each triangle will come from this catalog.)
metric (str) – Which metric to use. See Metrics for details. (default: ‘Euclidean’; this value can also be given in the constructor in the config dict.)
ordered (bool) – Whether to fix the order of the triangle vertices to match the catalogs. (default: True)
num_threads (int) – How many OpenMP threads to use during the calculation. (default: use the number of cpu cores; this value can also be given in the constructor in the config dict.)
corr_only (bool) – Whether to skip summing quantities that are not essential for computing the correlation function. (default: False)
- read(file_name, *, file_type=None)[source]
Read in values from a file.
This should be a file that was written by TreeCorr, preferably a FITS or HDF5 file, so there is no loss of information.
Warning
The current object should be constructed with the same configuration parameters as the one being read. e.g. the same min_sep, max_sep, etc. This is not checked by the read function. For most use cases, you should prefer
from_file
, which will automatically construct the object with the correct configuration parameters given the information in the file.- Parameters:
file_name (str) – The name of the file to read in.
file_type (str) – The type of file (‘ASCII’ or ‘FITS’). (default: determine the type automatically from the extension of file_name.)
- toSAS(*, target=None, **kwargs)[source]
Convert a multipole-binned correlation to the corresponding SAS binning.
This is only valid for bin_type == LogMultipole.
- Keyword Arguments:
target – A target Correlation object with LogSAS binning to write to. If this is not given, a new object will be created based on the configuration paramters of the current object. (default: None)
**kwargs – Any kwargs that you want to use to configure the returned object. Typically, might include min_phi, max_phi, nphi_bins, phi_bin_size. The default phi binning is [0,pi] with nphi_bins = self.max_n.
- Returns:
An object with bin_type=LogSAS containing the same information as this object, but with the SAS binning.
- write(file_name, *, file_type=None, precision=None, write_patch_results=False, write_cov=False)[source]
Write the correlation function to the file, file_name.
For bin_type = LogRUV, the output file will include the following columns:
Column
Description
r_nom
The nominal center of the bin in r = d2 where d1 > d2 > d3
u_nom
The nominal center of the bin in u = d3/d2
v_nom
The nominal center of the bin in v = +-(d1-d2)/d3
meanu
The mean value \(\langle u\rangle\) of triangles that fell into each bin
meanv
The mean value \(\langle v\rangle\) of triangles that fell into each bin
For bin_type = LogSAS, the output file will include the following columns:
Column
Description
d2_nom
The nominal center of the bin in d2
d3_nom
The nominal center of the bin in d3
phi_nom
The nominal center of the bin in phi, the opening angle between d2 and d3 in the counter-clockwise direction
meanphi
The mean value \(\langle phi\rangle\) of triangles that fell into each bin
For bin_type = LogMultipole, the output file will include the following columns:
Column
Description
d2_nom
The nominal center of the bin in d2
d3_nom
The nominal center of the bin in d3
n
The multipole index n
In addition, all bin types include the following columns:
Column
Description
meand1
The mean value \(\langle d1\rangle\) of triangles that fell into each bin
meanlogd1
The mean value \(\langle \log(d1)\rangle\) of triangles that fell into each bin
meand2
The mean value \(\langle d2\rangle\) of triangles that fell into each bin
meanlogd2
The mean value \(\langle \log(d2)\rangle\) of triangles that fell into each bin
meand3
The mean value \(\langle d3\rangle\) of triangles that fell into each bin
meanlogd3
The mean value \(\langle \log(d3)\rangle\) of triangles that fell into each bin {}
weight
The total weight of triangles contributing to each bin. (For LogMultipole, this is split into real and imaginary parts, weightr and weighti.)
ntri
The number of triangles contributing to each bin
If
sep_units
was given at construction, then the distances will all be in these units. Otherwise, they will be in either the same units as x,y,z (for flat or 3d coordinates) or radians (for spherical coordinates).- Parameters:
file_name (str) – The name of the file to write to.
file_type (str) – The type of file to write (‘ASCII’ or ‘FITS’). (default: determine the type automatically from the extension of file_name.)
precision (int) – For ASCII output catalogs, the desired precision. (default: 4; this value can also be given in the constructor in the config dict.)
write_patch_results (bool) – Whether to write the patch-based results as well. (default: False)
write_cov (bool) – Whether to write the covariance matrix as well. (default: False)