Two-point Correlation Functions

TreeCorr can compute two-point correlations for several different kinds of fields. For notational brevity in the various classes involved, we use a single letter to represent each kind of field as follows:

N represents simple counting statistics. The underlying field is a density of objects, which is manifested by objects appearing at specific locations. The assumption here is that the probability of an object occurring at a specific location is proportional to the underlying density field at that spot.
K represents a real, scalar field. Nominally, K is short for “kappa”, since TreeCorr was originally written for weak lensing applications, and kappa is the name of the weak lensing convergence, a measure of the projected matter density along the line of sight.
Z represents a complex, spin-0 scalar field. This is mostly for API consistency, since we have several other complex fields with different spin properties. Spin-0 fields don’t change their complex value when the orientation changes.
V represents a complex, spin-1 vector field. Spin-1 means that the complex value changes by \(\exp(i \phi)\) when the orientation is rotated by an angle \(\phi\). This kind of field is appropriate for normal vectors with direction, like velocity fields.
G represents a complex, spin-2 shear field. Spin-2 means that the complex value changes by \(\exp(2i \phi)\) when the orientation is rotated by an angle \(\phi\). The letter g is commonly used for reduced shear in the weak lensing context (and \(\gamma\) is the unreduced shear), which is a spin-2 field, hence our use of G for spin-2 fields in Treecorr.
T represents a complex, spin-3 field. Spin-3 means that the complex value changes by \(\exp(3i \phi)\) when the orientation is rotated by an angle \(\phi\). The letter T is short for trefoil, a shape with spin-3 rotational properties.
Q represents a complex, spin-4 field. Spin-4 means that the complex value changes by \(\exp(4i \phi)\) when the orientation is rotated by an angle \(\phi\). The letter Q is short for quatrefoil, a shape with spin-4 rotational properties.

Not all possible pairings of two fields are currently implemented. The following lists all the currently implemented classes for computing two-point correlations. If you have need of a pairing not listed here, please file in issue asking for it. It’s not hard to add more, but I didn’t want to implement a bunch of classes that no one will use.

Each of the above classes is a sub-class of the base class Corr2, so they have a number of features in common about how they are constructed. The common features are documented here.

class treecorr.Corr2(config=None, *, logger=None, rng=None, **kwargs)[source]

This class stores the results of a 2-point correlation calculation, along with some ancillary data.

This is a base class that is not intended to be constructed directly. But it has a few helper functions that derived classes can use to help perform their calculations. See the derived classes for more details:

NNCorrelation handles count-count correlation functions
NKCorrelation handles count-scalar correlation functions
KKCorrelation handles scalar-scalar correlation functions
NZCorrelation handles count-complex correlation functions
KZCorrelation handles scalar-complex correlation functions
ZZCorrelation handles complex-complex correlation functions
NVCorrelation handles count-vector correlation functions
KVCorrelation handles scalar-vector correlation functions
VVCorrelation handles vector-vector correlation functions
NGCorrelation handles count-shear correlation functions
KGCorrelation handles scalar-shear correlation functions
GGCorrelation handles shear-shear correlation functions
NTCorrelation handles count-trefoil correlation functions
KTCorrelation handles scalar-trefoil correlation functions
TTCorrelation handles trefoil-trefoil correlation functions
NQCorrelation handles count-quatrefoil correlation functions
KQCorrelation handles scalar-quatrefoil correlation functions
QQCorrelation handles quatrefoil-quatrefoil correlation functions

Note

TreeCorr was originally designed for weak lensing applications, so the K letter for scalar quantities nominally refers to the weak lesing kappa field. But in fact any scalar quantity may be used here. (CMB temperature fluctuations for example.)

The constructor for all derived classes take a config dict as the first argument, since this is often how we keep track of parameters, but if you don’t want to use one or if you want to change some parameters from what are in a config dict, then you can use normal kwargs, which take precedence over anything in the config dict.

There are a number of possible definitions for the distance between two points, which are appropriate for different use cases. These are specified by the metric parameter. The possible options are:

‘Euclidean’ = straight line Euclidean distance between two points. For spherical coordinates (ra,dec without r), this is the chord distance between points on the unit sphere.

‘FisherRperp’ = the perpendicular component of the distance, following the definitions in Fisher et al, 1994 (MNRAS, 267, 927).

‘OldRperp’ = the perpendicular component of the distance using the definition of Rperp from TreeCorr v3.x.

‘Rperp’ = an alias for FisherRperp. You can change it to be an alias for OldRperp if you want by setting treecorr.Rperp_alias = 'OldRperp' before using it.

‘Rlens’ = the distance from the first object (taken to be a lens) to the line connecting Earth and the second object (taken to be a lensed source).

‘Arc’ = the true great circle distance for spherical coordinates.

‘Periodic’ = Like Euclidean, but with periodic boundaries.

See Metrics for more information about these various metric options.

There are also a few different possibile binning prescriptions to define the range of distances, which should be placed into each bin.

‘Log’ - logarithmic binning in the distance. The bin steps will be uniform in log(r) from log(min_sep) .. log(max_sep).

‘Linear’ - linear binning in the distance. The bin steps will be uniform in r from min_sep .. max_sep.

‘TwoD’ = 2-dimensional binning from x = (-max_sep .. max_sep) and y = (-max_sep .. max_sep). The bin steps will be uniform in both x and y. (i.e. linear in x,y)

See Binning for more information about the different binning options.

Objects of any Corr2 subclass hold the following attributes:

Attributes:

nbins – The number of bins in logr
bin_size – The size of the bins in logr
min_sep – The minimum separation being considered
max_sep – The maximum separation being considered

In addition, the following attributes are numpy arrays of length (nbins):

Attributes:

logr – The nominal center of the bin in log(r) (the natural logarithm of r).
rnom – The nominal center of the bin converted to regular distance. i.e. r = exp(logr).
meanr – The (weighted) mean value of r for the pairs in each bin. If there are no pairs in a bin, then exp(logr) will be used instead.
meanlogr – The mean value of log(r) for the pairs in each bin. If there are no pairs in a bin, then logr will be used instead.
weight – The total weight in each bin.
npairs – The number of pairs going into each bin (including pairs where one or both objects have w=0).

If sep_units are given (either in the config dict or as a named kwarg) then the distances will all be in these units.

Note

If you separate out the steps of the process command and use process_auto and/or process_cross, then the units will not be applied to meanr or meanlogr until the finalize function is called.

Also, if you use the corr_only=True option to Corr2.process, then meanr and meanlogr are not computed, so they will be set to the nominal values for the bins. Similarly, npairs will be set to the weight for each bin divided by the mean weight of points in the catalog, rather than the actual number of pairs for each bin.

Parameters:

config (dict) – A configuration dict that can be used to pass in the below kwargs if desired. This dict is allowed to have addition entries in addition to those listed below, which are ignored here. (default: None)
logger – If desired, a logger object for logging. (default: None, in which case one will be built according to the config dict’s verbose level.)

Keyword Arguments:

nbins (int) – How many bins to use. (Exactly three of nbins, bin_size, min_sep, max_sep are required. If nbins is not given or set to None, it will be calculated from the values of the other three, rounding up to the next highest integer. In this case, bin_size will be readjusted to account for this rounding up.)
bin_size (float) – The width of the bins in log(separation). (Exactly three of nbins, bin_size, min_sep, max_sep are required. If bin_size is not given or set to None, it will be calculated from the values of the other three.)
min_sep (float) – The minimum separation in units of sep_units, if relevant. (Exactly three of nbins, bin_size, min_sep, max_sep are required. If min_sep is not given or set to None, it will be calculated from the values of the other three.)
max_sep (float) – The maximum separation in units of sep_units, if relevant. (Exactly three of nbins, bin_size, min_sep, max_sep are required. If max_sep is not given or set to None, it will be calculated from the values of the other three.)
sep_units (str) – The units to use for the separation values, given as a string. This includes both min_sep and max_sep above, as well as the units of the output distance values. Valid options are arcsec, arcmin, degrees, hours, radians. (default: radians if angular units make sense, but for 3-d or flat 2-d positions, the default will just match the units of x,y[,z] coordinates)
bin_slop (float) – How much slop to allow in the placement of pairs in the bins. If bin_slop = 1, then the bin into which a particular pair is placed may be incorrect by at most 1.0 bin widths. (default: None, which means to use a bin_slop that gives a maximum error of 10% on any bin, which has been found to yield good results for most application.
angle_slop (float) – How much slop to allow in the angular direction. This works very similarly to bin_slop, but applies to the projection angle of a pair of cells. The projection angle for any two objects in a pair of cells will differ by no more than angle_slop radians from the projection angle defined by the centers of the cells. (default: 0.1)
brute (bool) –
Whether to use the “brute force” algorithm. (default: False) Options are:
- False (the default): Stop at non-leaf cells whenever the error in the separation is compatible with the given bin_slop and angle_slop.
- True: Go to the leaves for both catalogs.
- 1: Always go to the leaves for cat1, but stop at non-leaf cells of cat2 when the error is compatible with the given slop values.
- 2: Always go to the leaves for cat2, but stop at non-leaf cells of cat1 when the error is compatible with the given slop values.
verbose (int) –
If no logger is provided, this will optionally specify a logging level to use:
- 0 means no logging output
- 1 means to output warnings only (default)
- 2 means to output various progress information
- 3 means to output extensive debugging information
log_file (str) – If no logger is provided, this will specify a file to write the logging output. (default: None; i.e. output to standard output)
output_dots (bool) – Whether to output progress dots during the calcualtion of the correlation function. (default: False unless verbose is given and >= 2, in which case True)
split_method (str) –
How to split the cells in the tree when building the tree structure. Options are:
- mean = Use the arithmetic mean of the coordinate being split. (default)
- median = Use the median of the coordinate being split.
- middle = Use the middle of the range; i.e. the average of the minimum and maximum value.
- random: Use a random point somewhere in the middle two quartiles of the range.
min_top (int) – The minimum number of top layers to use when setting up the field. (default: \(\max(3, \log_2(N_{\rm cpu}))\))
max_top (int) – The maximum number of top layers to use when setting up the field. The top-level cells are where each calculation job starts. There will typically be of order \(2^{\rm max\_top}\) top-level cells. (default: 10)
precision (int) – The precision to use for the output values. This specifies how many digits to write. (default: 4)
m2_uform (str) – The default functional form to use for aperture mass calculations. see calculateMapSq for more details. (default: ‘Crittenden’)
metric (str) – Which metric to use for distance measurements. Options are listed above. (default: ‘Euclidean’)
bin_type (str) – What type of binning should be used. Options are listed above. (default: ‘Log’)
min_rpar (float) – The minimum difference in Rparallel to allow for pairs being included in the correlation function. (default: None)
max_rpar (float) – The maximum difference in Rparallel to allow for pairs being included in the correlation function. (default: None)
period (float) – For the ‘Periodic’ metric, the period to use in all directions. (default: None)
xperiod (float) – For the ‘Periodic’ metric, the period to use in the x direction. (default: period)
yperiod (float) – For the ‘Periodic’ metric, the period to use in the y direction. (default: period)
zperiod (float) – For the ‘Periodic’ metric, the period to use in the z direction. (default: period)
var_method (str) – Which method to use for estimating the variance. Options are: ‘shot’, ‘jackknife’, ‘sample’, ‘bootstrap’, ‘marked_bootstrap’. (default: ‘shot’)
num_bootstrap (int) – How many bootstrap samples to use for the ‘bootstrap’ and ‘marked_bootstrap’ var_methods. (default: 500)
cross_patch_weight (str) – How to weight pairs that cross between two patches when one patch is deselected (e.g. in a jackknife sense) and the other is selected. (default None)
rng (RandomState) – If desired, a numpy.random.RandomState instance to use for bootstrap random number generation. (default: None)
num_threads (int) –
How many OpenMP threads to use during the calculation. (default: use the number of cpu cores)

Note

This won’t work if the system’s C compiler cannot use OpenMP (e.g. clang prior to version 3.7.)

build_cov_design_matrix(method, *, func=None, comm=None, num_bootstrap=None, cross_patch_weight=None)[source]

Build the design matrix that is used for estimating the covariance matrix.

The design matrix for patch-based covariance estimates is a matrix where each row corresponds to a different estimate of the data vector, \(\xi_i\) (or \(f(\xi_i)\) if using the optional func parameter).

The different of rows in the matrix for each valid method are:

‘shot’: This method is not valid here.

‘jackknife’: The data vector when excluding a single patch.

‘sample’: The data vector using only a single patch for the first catalog.

‘bootstrap’: The data vector for a random resampling of the patches keeping the sample total number, but allowing some to repeat. Cross terms from repeated patches are excluded (since they are really auto terms).

‘marked_bootstrap’: The data vector for a random resampling of patches in the first catalog, using all patches for the second catalog. Based on the algorithm in Loh(2008).

See estimate_cov for more details.

The return value includes both the design matrix and a vector of weights (the total weight array in the computed correlation functions). The weights are used for the sample method when estimating the covariance matrix. The other methods ignore them, but they are provided here in case they are useful.

Parameters:

method (str) – Which method to use to estimate the covariance matrix.
func (function) – A unary function that takes the list corrs and returns the desired full data vector. (default: None, which is equivalent to lambda corrs: np.concatenate([c.getStat() for c in corrs]))
comm (mpi4py.Comm) – If using MPI, an mpi4py Comm object to communicate between processes. (default: None)
num_bootstrap (int) – How many bootstrap samples to use for the ‘bootstrap’ and ‘marked_bootstrap’ var_methods. (default: 500; this value can also be given in the constructor.)
cross_patch_weight (str) – How to weight pairs that cross between two patches when one patch is deselected (e.g. in a jackknife sense) and the other is selected. (default ‘simple’; this value can also be given in the constructor.)

Returns:

(A, w), numpy arrays with the design matrix and weights respectively.

clear()[source]: Clear all data vectors, the results dict, and any related values.

copy()[source]: Make a copy

property cov: The estimated covariance matrix

property cov_diag

A possibly more efficient way to access just the diagonal of the covariance matrix.

If var_method == ‘shot’, then this won’t make the full covariance matrix, just to then pull out the diagonal.

estimate_cov(method, *, func=None, comm=None, num_bootstrap=None, cross_patch_weight=None)[source]

Estimate the covariance matrix based on the data

This function will calculate an estimate of the covariance matrix according to the given method.

Options for method include:

‘shot’ = The variance based on “shot noise” only. This includes the Poisson counts of points for N statistics, shape noise for G statistics, and the observed scatter in the values for K statistics. In this case, the returned value will only be the diagonal. Use np.diagonal(cov) if you actually want a full matrix from this.

‘jackknife’ = A jackknife estimate of the covariance matrix based on the scatter in the measurement when excluding one patch at a time.

‘sample’ = An estimate based on the sample covariance of a set of samples, taken as the patches of the input catalog.

‘bootstrap’ = A bootstrap covariance estimate. It selects patches at random with replacement and then generates the statistic using all the auto-correlations at their selected repetition plus all the cross terms that aren’t actually auto terms.

‘marked_bootstrap’ = An estimate based on a marked-point bootstrap resampling of the patches. Similar to bootstrap, but only samples the patches of the first catalog and uses all patches from the second catalog that correspond to each patch selection of the first catalog. Based on the algorithm presented in Loh (2008). cf. https://ui.adsabs.harvard.edu/abs/2008ApJ…681..726L/

Both ‘bootstrap’ and ‘marked_bootstrap’ use the num_bootstrap parameter, which can be set on construction.

Another relevant parameter is ‘cross_patch_weight’. This parameter controls how pairs that cross between two patches are weighted when one of the patches is selected and the other is deselected. See Mohammad and Percival (2022) (MP22 hereafter) for an in-depth discussion of these options. The parameter options mostly correspond to the notation used in that paper. Briefly the options are:

‘simple’ = Don’t use any pairs where either object is in a deselected patch. This is the simplest implementation of the methods and is what TreeCorr always did prior to version 5.1. In the language of MP22, this is called mult, since deselected patches have weight=0, and selected patches have weight=1, so the product of the weights is 0 if either patch is deselected. This option is valid for all patch-based methods, and it is currently still the default. This may change in a future version.

‘mean’ = Use a weight of 0.5 for any pair with one object in a deselected patch and the other in a selected patch. This is the mean of the two patch weights (0,1). This option is valid for all patch-based methods.

‘geom’ = Use the geometric mean of the two patch weights for each pair. For jackknife, this would be equivalent to ‘simple’, since patch weights are either 0 or 1. However, for bootstrap, individual patches can have weights greater than 1, so this leads to a different weight for pairs that cross between two selected patches. This option is only valid for bootstrap.

‘match’ = Use the “optimal” weight that matches the effect of auto- and cross-pairs in the estimate of the jackknife covariance. MP22 calculate this for the to be w = 1 - n_patch / (2 + sqrt(2) (n_patch-1)). This option is only valid for jackknife.

Note

For most classes, there is only a single statistic, so this calculates a covariance matrix for that vector. GGCorrelation other complex auto-correaltions have two: xip and xim, so in this case the full data vector is xip followed by xim, and this calculates the covariance matrix for that full vector including both statistics. The helper function getStat returns the relevant statistic in all cases.

In all cases, the relevant processing needs to already have been completed and finalized. And for all methods other than ‘shot’, the processing should have involved an appropriate number of patches – preferably more patches than the length of the vector for your statistic, although this is not checked.

The default data vector to use for the covariance matrix is given by the method getStat. As noted above, this is usually just self.xi. However, there is an option to compute the covariance of some other function of the correlation object by providing an arbitrary function, func, which should act on the current correlation object and return the data vector of interest.

For instance, for an NGCorrelation, you might want to compute the covariance of the imaginary part, ng.xi_im, rather than the real part. In this case you could use

>>> func = lambda ng: ng.xi_im

The return value from this func should be a single numpy array. (This is not directly checked, but you’ll probably get some kind of exception if it doesn’t behave as expected.)

Note

The optional func parameter is not valid in conjunction with method='shot'. It only works for the methods that are based on patch combinations.

This function can be parallelized by passing the comm argument as an mpi4py communicator to parallelize using that. For MPI, all processes should have the same inputs. If method == “shot” then parallelization has no effect.

Parameters:

method (str) – Which method to use to estimate the covariance matrix.
func (function) – A unary function that acts on the current correlation object and returns the desired data vector. (default: None, which is equivalent to lambda corr: corr.getStat().)
comm (mpi4py.Comm) – If using MPI, an mpi4py Comm object to communicate between processes. (default: None)
num_bootstrap (int) – How many bootstrap samples to use for the ‘bootstrap’ and ‘marked_bootstrap’ var_methods. (default: 500; this value can also be given in the constructor.)
cross_patch_weight (str) – How to weight pairs that cross between two patches when one patch is deselected (e.g. in a jackknife sense) and the other is selected. (default ‘simple’; this value can also be given in the constructor.)

Returns:

A numpy array with the estimated covariance matrix.

classmethod from_file(file_name, *, file_type=None, logger=None, rng=None)[source]

Create a new instance from an output file.

This should be a file that was written by TreeCorr.

Note

This classmethod may be called either using the base class or the class type that wrote the file. E.g. if the file was written by GGCorrelation, then either of the following would work and be equivalent:

>>> gg = treecorr.GGCorrelation.from_file(file_name)
>>> gg = treecorr.Corr2.from_file(file_name)

Parameters:

file_name (str) – The name of the file to read in.
file_type (str) – The type of file (‘ASCII’, ‘FITS’, or ‘HDF’). (default: determine the type automatically from the extension of file_name.)
logger (Logger) – If desired, a logger object to use for logging. (default: None)
rng (RandomState) – If desired, a numpy.random.RandomState instance to use for bootstrap random number generation. (default: None)

Returns:

A Correlation object, constructed from the information in the file.

getStat()[source]

The standard statistic for the current correlation object as a 1-d array.

Usually, this is just self.xi. But if the metric is TwoD, this becomes self.xi.ravel(). And for GGCorrelation (and other complex auto-correlations), it is the concatenation of self.xip and self.xim.

getWeight()[source]

The weight array for the current correlation object as a 1-d array.

This is the weight array corresponding to getStat. Usually just self.weight, but raveled for TwoD and duplicated for GGCorrelation, etc. to match what getStat does in those cases.

property nonzero: Return if there are any values accumulated yet. (i.e. npairs > 0)

process(cat1, cat2=None, metric=None, num_threads=None, comm=None, low_mem=False, initialize=True, finalize=True, patch_method='global', corr_only=False)[source]

Compute the correlation function.

If only 1 argument is given, then compute an auto-correlation function.
If 2 arguments are given, then compute a cross-correlation function.

Both catalog arguments may be lists, in which case all items in the list are used for that element of the correlation.

In addition to computing the correlation function, this function also computes a few ancillary quantities that are useful for interpreting the resulting correlation function, including the attributes meanr, meanlogr and npairs. For most use cases these calculation impart negligible overhead to the calculation time. The exception is NNCorrelation, where they can result in somthing like 20-30% overhead in the compute time. So if you want to optimize the efficiency of the calculation, we provide the option corr_only=True, which skips these computations. In this case the resulting meanr and meanlogr attributes are the nominal centers of the bins, not the actual mean values. And npairs is estimated from the total computed weight and the mean weight in the catalogs.

Parameters:

cat1 (Catalog) – A catalog or list of catalogs for the first field.
cat2 (Catalog) – A catalog or list of catalogs for the second field, if any. (default: None)
metric (str) – Which metric to use. See Metrics for details. (default: ‘Euclidean’; this value can also be given in the constructor in the config dict.)
num_threads (int) – How many OpenMP threads to use during the calculation. (default: use the number of cpu cores; this value can also be given in the constructor in the config dict.)
comm (mpi4py.Comm) – If running MPI, an mpi4py Comm object to communicate between processes. If used, the rank=0 process will have the final computation. This only works if using patches. (default: None)
low_mem (bool) – Whether to sacrifice a little speed to try to reduce memory usage. This only works if using patches. (default: False)
initialize (bool) – Whether to begin the calculation with a call to Corr2.clear. (default: True)
finalize (bool) – Whether to complete the calculation with a call to finalize. (default: True)
patch_method (str) – Which patch method to use. (default: ‘global’)
corr_only (bool) – Whether to skip summing quantities that are not essential for computing the correlation function. (default: False; see above)

process_auto(cat, *, metric=None, num_threads=None, corr_only=False)[source]

Process a single catalog, accumulating the auto-correlation.

This accumulates the weighted sums into the bins, but does not finalize the calculation by dividing by the total weight at the end. After calling this function as often as desired, the finalize command will finish the calculation.

Parameters:

cat (Catalog) – The catalog to process
metric (str) – Which metric to use. See Metrics for details. (default: ‘Euclidean’; this value can also be given in the constructor in the config dict.)
num_threads (int) – How many OpenMP threads to use during the calculation. (default: use the number of cpu cores; this value can also be given in the constructor in the config dict.)
corr_only (bool) – Whether to skip summing quantities that are not essential for computing the correlation function. (default: False)

process_cross(cat1, cat2, *, metric=None, num_threads=None, corr_only=False)[source]

Process a single pair of catalogs, accumulating the cross-correlation.

This accumulates the weighted sums into the bins, but does not finalize the calculation by dividing by the total weight at the end. After calling this function as often as desired, the finalize command will finish the calculation.

Parameters:

cat1 (Catalog) – The first catalog to process
cat2 (Catalog) – The second catalog to process
metric (str) – Which metric to use. See Metrics for details. (default: ‘Euclidean’; this value can also be given in the constructor in the config dict.)
num_threads (int) – How many OpenMP threads to use during the calculation. (default: use the number of cpu cores; this value can also be given in the constructor in the config dict.)
corr_only (bool) – Whether to skip summing quantities that are not essential for computing the correlation function. (default: False)

read(file_name, *, file_type=None)[source]

Read in values from a file.

This should be a file that was written by TreeCorr, preferably a FITS or HDF5 file, so there is no loss of information.

Warning

The current object should be constructed with the same configuration parameters as the one being read. e.g. the same min_sep, max_sep, etc. This is not checked by the read function. For most use cases, you should prefer from_file, which will automatically construct the object with the correct configuration parameters given the information in the file.

Parameters:

file_name (str) – The name of the file to read in.
file_type (str) – The type of file (‘ASCII’ or ‘FITS’). (default: determine the type automatically from the extension of file_name.)

sample_pairs(n, cat1, cat2, *, min_sep, max_sep, metric=None)[source]

Return a random sample of n pairs whose separations fall between min_sep and max_sep.

This would typically be used to get some random subset of the indices of pairs that fell into a particular bin of the correlation. E.g. to get 100 pairs from the third bin of a Corr2 instance, corr, you could write:

>>> min_sep = corr.left_edges[2]   # third bin has i=2
>>> max_sep = corr.right_edges[2]
>>> i1, i2, sep = corr.sample_pairs(100, cat1, cat2, min_sep, max_sep)

The min_sep and max_sep should use the same units as were defined when constructing the corr instance.

The selection process will also use the same bin_slop as specified (either explicitly or implicitly) when constructing the corr instance. This means that some of the pairs may have actual separations slightly outside of the specified range. If you want a selection using an exact range without any slop, you should construct a new Correlation instance with bin_slop=0, and call sample_pairs with that.

The returned separations will likewise correspond to the separation of the cells in the tree that TreeCorr used to place the pairs into the given bin. Therefore, if these cells were not leaf cells, then they will not typically be equal to the real separations for the given metric. If you care about the exact separations for each pair, you should either call sample_pairs from a Correlation instance with brute=True or recalculate the distances yourself from the original data.

Also, note that min_sep and max_sep may be arbitrary. There is no requirement that they be edges of one of the standard bins for this correlation function. There is also no requirement that this correlation instance has already accumulated pairs via a call to process with these catalogs.

Parameters:

n (int) – How many samples to return.
cat1 (Catalog) – The catalog from which to sample the first object of each pair.
cat2 (Catalog) – The catalog from which to sample the second object of each pair. (This may be the same as cat1.)
min_sep (float) – The minimum separation for the returned pairs (modulo some slop allowed by the bin_slop parameter). (Note: keyword name is required for this parameter: min_sep=min_sep)
max_sep (float) – The maximum separation for the returned pairs (modulo some slop allowed by the bin_slop parameter). (Note: keyword name is required for this parameter: max_sep=max_sep)
metric (str) – Which metric to use. See Metrics for details. (default: self.metric, or ‘Euclidean’ if not set yet)

Returns:

Tuple containing

i1 (array): indices of objects from cat1

i2 (array): indices of objects from cat2

sep (array): separations of the pairs of objects (i1,i2)

treecorr.estimate_multi_cov(corrs, method, *, func=None, comm=None, num_bootstrap=None, cross_patch_weight=None)[source]

Estimate the covariance matrix of multiple statistics.

This is like the method Corr2.estimate_cov, except that it will acoommodate multiple statistics from a list corrs of Corr2 objects.

Options for method include:

‘shot’ = The variance based on “shot noise” only. This includes the Poisson counts of points for N statistics, shape noise for G statistics, and the observed scatter in the values for K statistics. In this case, the returned value will only be the diagonal. Use np.diagonal(cov) if you actually want a full matrix from this.

‘jackknife’ = A jackknife estimate of the covariance matrix based on the scatter in the measurement when excluding one patch at a time.

‘sample’ = An estimate based on the sample covariance of a set of samples, taken as the patches of the input catalog.

‘bootstrap’ = A bootstrap covariance estimate. It selects patches at random with replacement and then generates the statistic using all the auto-correlations at their selected repetition plus all the cross terms that aren’t actually auto terms.

‘marked_bootstrap’ = An estimate based on a marked-point bootstrap resampling of the patches. Similar to bootstrap, but only samples the patches of the first catalog and uses all patches from the second catalog that correspond to each patch selection of the first catalog. Based on the algorithm presented in Loh (2008). cf. https://ui.adsabs.harvard.edu/abs/2008ApJ…681..726L/

Both ‘bootstrap’ and ‘marked_bootstrap’ use the num_bootstrap parameter to set the number of bootstrap realizations to be used.

Another relevant parameter is ‘cross_patch_weight’. This parameter controls how pairs that cross between two patches are weighted when one of the patches is selected and the other is deselected. See Mohammad and Percival (2022) (MP22 hereafter) for an in-depth discussion of these options. The parameter options mostly correspond to the notation used in that paper. Briefly the options are:

‘simple’ = Don’t use any pairs where either object is in a deselected patch. This is the simplest implementation of the methods and is what TreeCorr always did prior to version 5.1. In the language of MP22, this is called mult, since deselected patches have weight=0, and selected patches have weight=1, so the product of the weights is 0 if either patch is deselected. This option is valid for all patch-based methods, and it is currently still the default. This may change in a future version.

‘mean’ = Use a weight of 0.5 for any pair with one object in a deselected patch and the other in a selected patch. This is the mean of the two patch weights (0,1). This option is valid for all patch-based methods.

‘geom’ = Use the geometric mean of the two patch weights for each pair. For jackknife, this would be equivalent to ‘simple’, since patch weights are either 0 or 1. However, for bootstrap, individual patches can have weights greater than 1, so this leads to a different weight for pairs that cross between two selected patches. This option is only valid for bootstrap.

‘match’ = Use the “optimal” weight that matches the effect of auto- and cross-pairs in the estimate of the jackknife covariance. MP22 calculate this for the to be w = 1 - n_patch / (2 + sqrt(2) (n_patch-1)). This option is only valid for jackknife.

For example, to find the combined covariance matrix for an NG tangential shear statistc, along with the GG xi+ and xi- from the same area, using jackknife covariance estimation, you would write:

>>> cov = treecorr.estimate_multi_cov([ng,gg], method='jackknife')

In all cases, the relevant processing needs to already have been completed and finalized. And for all methods other than ‘shot’, the processing should have involved an appropriate number of patches – preferably more patches than the length of the vector for your statistic, although this is not checked.

The default order of the covariance matrix is to simply concatenate the data vectors for each corr in the list corrs. However, if you want to do something more complicated, you may provide an arbitrary function, func, which should act on the list of correlations. For instance, if you have several GGCorrelation objects and would like to order the covariance such that all xi+ results come first, and then all xi- results, you could use

>>> func = lambda corrs: np.concatenate([c.xip for c in corrs] + [c.xim for c in corrs])

Or if you want to compute the covariance matrix of some derived quantity like the ratio of two correlations, you could use

>>> func = lambda corrs: corrs[0].xi / corrs[1].xi

This function can be parallelized by passing the comm argument as an mpi4py communicator to parallelize using that. For MPI, all processes should have the same inputs. If method == “shot” then parallelization has no effect.

The return value from this func should be a single numpy array. (This is not directly checked, but you’ll probably get some kind of exception if it doesn’t behave as expected.)

Note

The optional func parameter is not valid in conjunction with method='shot'. It only works for the methods that are based on patch combinations.

Parameters:

corrs (list) – A list of Corr2 instances.
method (str) – Which method to use to estimate the covariance matrix.
func (function) – A unary function that takes the list corrs and returns the desired full data vector. (default: None, which is equivalent to lambda corrs: np.concatenate([c.getStat() for c in corrs]))
comm (mpi4py.Comm) – If using MPI, an mpi4py Comm object to communicate between processes. (default: None)
num_bootstrap (int) – How many bootstrap samples to use for the ‘bootstrap’ and ‘marked_bootstrap’ var_methods. (default: 500)
cross_patch_weight (str) – How to weight pairs that cross between two patches when one patch is deselected (e.g. in a jackknife sense) and the other is selected. (default ‘simple’)

Returns:

A numpy array with the estimated covariance matrix.

treecorr.build_multi_cov_design_matrix(corrs, method, *, func=None, comm=None, num_bootstrap=None, cross_patch_weight=None)[source]

Build the design matrix that is used for estimating the covariance matrix.

The design matrix for patch-based covariance estimates is a matrix where each row corresponds to a different estimate of the data vector, \(\xi_i\) (or \(f(\xi_i)\) if using the optional func parameter).

The different of rows in the matrix for each valid method are:

‘shot’: This method is not valid here.

‘jackknife’: The data vector when excluding a single patch.

‘sample’: The data vector using only a single patch for the first catalog.

‘bootstrap’: The data vector for a random resampling of the patches keeping the sample total number, but allowing some to repeat. Cross terms from repeated patches are excluded (since they are really auto terms).

‘marked_bootstrap’: The data vector for a random resampling of patches in the first catalog, using all patches for the second catalog. Based on the algorithm in Loh(2008).

See estimate_multi_cov for more details.

The return value includes both the design matrix and a vector of weights (the total weight array in the computed correlation functions). The weights are used for the sample method when estimating the covariance matrix. The other methods ignore them, but they are provided here in case they are useful.

Parameters:

corrs (list) – A list of Corr2 instances.
method (str) – Which method to use to estimate the covariance matrix.
func (function) – A unary function that takes the list corrs and returns the desired full data vector. (default: None, which is equivalent to lambda corrs: np.concatenate([c.getStat() for c in corrs]))
comm (mpi4py.Comm) – If using MPI, an mpi4py Comm object to communicate between processes. (default: None)
num_bootstrap (int) – How many bootstrap samples to use for the ‘bootstrap’ and ‘marked_bootstrap’ var_methods. (default: 500)
cross_patch_weight (str) – How to weight pairs that cross between two patches when one patch is deselected (e.g. in a jackknife sense) and the other is selected. (default ‘simple’)

Returns:

(A, w), numpy arrays with the design matrix and weights respectively.

treecorr.set_max_omp_threads(num_threads, logger=None)[source]

Set the maximum allowed number of OpenMP threads to use in the C++ layer in any further TreeCorr functions

Parameters:

num_threads – The target maximum number of threads to allow. None means no limit.
logger – If desired, a logger object for logging any warnings here. (default: None)

treecorr.set_omp_threads(num_threads, logger=None)[source]

Set the number of OpenMP threads to use in the C++ layer.

Parameters:

num_threads – The target number of threads to use
logger – If desired, a logger object for logging any warnings here. (default: None)

Returns:

The number of threads OpenMP reports that it will use. Typically this matches the input, but OpenMP reserves the right not to comply with the requested number of threads.

treecorr.get_omp_threads()[source]

Get the number of OpenMP threads currently set to be used in the C++ layer.

Returns:: The number of threads OpenMP reports that it will use.