Input Data

The Catalog class

class treecorr.Catalog(file_name=None, config=None, *, num=0, logger=None, is_rand=False, x=None, y=None, z=None, ra=None, dec=None, r=None, w=None, wpos=None, flag=None, k=None, z1=None, z2=None, v1=None, v2=None, g1=None, g2=None, t1=None, t2=None, q1=None, q2=None, patch=None, patch_centers=None, rng=None, **kwargs)[source]

A set of input data (positions and other quantities) to be correlated.

A Catalog object keeps track of the relevant information for a number of objects to be correlated. The objects each have some kind of position (for instance (x,y), (ra,dec), (x,y,z), etc.), and possibly some extra information such as weights (w), shear values (g1,g2), scalar values (k), or vector values (v1,v2).

Note

See Shear Conventions for some discussion of the conventions used in TreeCorr for the orientation of the shear values.

The simplest way to build a Catalog is to simply pass in numpy arrays for each piece of information you want included. For instance:

>>> cat = treecorr.Catalog(x=x, y=y, k=k, w=w)

Each of these input paramters should be a numpy array, where each corresponding element is the value for that object. Of course, all the arrays should be the same size.

In some cases, there are additional required parameters. For instance, with RA and Dec positions, you need to declare what units the given input values use:

>>> cat = treecorr.Catalog(ra=ra, dec=dec, g1=g1, g2=g2,
...                        ra_units='hour', dec_units='deg')

For (ra,dec) positions, these units fields are required to specify the units of the angular values. For (x,y) positions, the units are optional (and usually unnecessary).

You can also initialize a Catalog by reading in columns from a file. For instance:

>>> cat = treecorr.Catalog('data.fits', ra_col='ALPHA2000', dec_col='DELTA2000',
...                        g1_col='E1', g2_col='E2', ra_units='deg', dec_units='deg')

This reads the given columns from the input file. The input file may be a FITS file, an HDF5 file, a Parquet file, or an ASCII file. Normally the file type is determined according to the file’s extension (e.g. ‘.fits’ here), but it can also be set explicitly with file_type.

For FITS, HDF5, and Parquet files, the column names should be strings as shown above. For ASCII files, they may be strings if the input file has column names. But you may also use integer values giving the index of which column to use. We use a 1-based convention for these, so x_col=1 would mean to use the first column as the x value. (0 means don’t read that column.)

Sometimes the columns in the input file aren’t quite what you want. Rather you need to do some simple calculation based on the input columns. For instance, PSF rho statistics generally entail taking the difference of the model and data g1,g2 columns. To deal with this, you can use e.g. g1_eval and g2_eval, which use the Python eval function to evaluate a string. The string can use the names of columns in the input file, so long as these columns are specified in the extra_cols parameter. For instance:

>>> cat = treecorr.Catalog('data.fits', ra_col='ALPHA2000', dec_col='DELTA2000',
...                        ra_units='deg', dec_units='deg',
...                        g1_eval='G1_MODEL - G1_DATA', g2_eval='G2_MODEL - G2_DATA',
...                        extra_cols=['G1_MODEL', 'G1_DATA', 'G2_MODEL', 'G2_DATA'])

The eval strings are allowed to use numpy, math or coord functions if desired. If you need additional modules, you can update the list treecorr.Catalog.eval_modules to add the module(s) you need.

Finally, you may store all the various parameters in a configuration dict and just pass the dict as an argument after the file name:

>>> config = { 'ra_col' : 'ALPHA2000',
...            'dec_col' : 'DELTA2000',
...            'g1_col' : 'E1',
...            'g2_col' : 'E2',
...            'ra_units' : 'deg',
...            'dec_units' : 'deg' }
>>> cat = treecorr.Catalog(file_name, config)

This can be useful for encapsulating all the TreeCorr options in a single place in your code, which might be used multiple times. Notably, this syntax ignores any dict keys that are not relevant to the Catalog construction, so you can use the same config dict for the Catalog and your correlation objects, which can be convenient.

See also Configuration Parameters for complete descriptions of all of the relevant configuration parameters, particularly the first section Parameters about the input file(s).

You may also override any configuration parameters or add additional parameters as kwargs after the config dict. For instance, to flip the sign of the g1 values after reading from the input file, you could write:

>>> cat1 = treecorr.Catalog(file_name, config, flip_g1=True)

After construction, a Catalog object will have the following attributes:

Attributes:
  • x – The x positions, if defined, as a numpy array (converted to radians if x_units was given). (None otherwise)

  • y – The y positions, if defined, as a numpy array (converted to radians if y_units was given). (None otherwise)

  • z – The z positions, if defined, as a numpy array. (None otherwise)

  • ra – The right ascension, if defined, as a numpy array (in radians). (None otherwise)

  • dec – The declination, if defined, as a numpy array (in radians). (None otherwise)

  • r – The distance, if defined, as a numpy array. (None otherwise)

  • w – The weights, as a numpy array. (All 1’s if no weight column provided.)

  • wpos – The weights for position centroiding, as a numpy array, if given. (None otherwise, which means that implicitly wpos = w.)

  • k – The scalar field, kappa, if defined, as a numpy array. (None otherwise)

  • z1 – The z1 component of a complex scalar, if defined, as a numpy array. (None otherwise)

  • z2 – The z2 component of a complex scalar, if defined, as a numpy array. (None otherwise)

  • v1 – The v1 component of a vector, if defined, as a numpy array. (None otherwise)

  • v2 – The v2 component of a vector, if defined, as a numpy array. (None otherwise)

  • g1 – The g1 component of a shear, if defined, as a numpy array. (None otherwise)

  • g2 – The g2 component of a shear, if defined, as a numpy array. (None otherwise)

  • t1 – The 1st component of a trefoil field, if defined, as a numpy array. (None otherwise)

  • t2 – The 2nd component of a trefoil field, if defined, as a numpy array. (None otherwise)

  • q1 – The 1st component of a quatrefoil field, if defined, as a numpy array. (None otherwise)

  • q2 – The 2nd component of a quatrefoil field, if defined, as a numpy array. (None otherwise)

  • patch – The patch number of each object, if patches are being used. (None otherwise) If the entire catalog is a single patch, then patch may be an int.

  • ntot – The total number of objects (including those with zero weight if keep_zero_weight is set to True)

  • nobj – The number of objects with non-zero weight

  • sumw – The sum of the weights

  • vark – The variance of the scalar field (0 if k is not defined)

    Note

    If there are weights, this is really \(\sum(w^2 (\kappa-\langle \kappa \rangle)^2)/\sum(w)\). which is more like \(\langle w \rangle \mathrm{Var}(\kappa)\). It is only used for var_method='shot', where the noise estimate is this value divided by the total weight per bin, so this is the right quantity to use for that.

  • varz – The variance per component of the complex scalar field (0 if z1,z2 are not defined)

    Note

    If there are weights, this is really \(\sum(w^2 |z - \langle z \rangle|^2)/\sum(w)\), which is more like \(\langle w \rangle \mathrm{Var}(z)\). As for vark, this is the right quantity to use for the 'shot' noise estimate.

  • varv – The variance per component of the vector field (0 if v1,v2 are not defined)

    Note

    If there are weights, this is really \(\sum(w^2 |v - \langle v \rangle|^2)/\sum(w)\), which is more like \(\langle w \rangle \mathrm{Var}(v)\). As for vark, this is the right quantity to use for the 'shot' noise estimate.

  • varg – The variance per component of the shear field (aka shape noise) (0 if g1,g2 are not defined)

    Note

    If there are weights, this is really \(\sum(w^2 |g-\langle g \rangle|^2)/\sum(w)\), which is more like \(\langle w \rangle \mathrm{Var}(g)\). As for vark, this is the right quantity to use for the 'shot' noise estimate.

  • vart – The variance per component of the trefoil field (0 if g1,g2 are not defined)

    Note

    If there are weights, this is really \(\sum(w^2 |t-\langle t \rangle|^2)/\sum(w)\), which is more like \(\langle w \rangle \mathrm{Var}(t)\). As for vark, this is the right quantity to use for the 'shot' noise estimate.

  • varq – The variance per component of the quatrefoil field (0 if g1,g2 are not defined)

    Note

    If there are weights, this is really \(\sum(w^2 |q-\langle q \rangle|^2)/\sum(w)\), which is more like \(\langle w \rangle \mathrm{Var}(q)\). As for vark, this is the right quantity to use for the 'shot' noise estimate.

  • name – When constructed from a file, this will be the file_name. It is only used as a reference name in logging output after construction, so if you construct it from data vectors directly, it will be ''. You may assign to it if you want to give this catalog a specific name.

  • coords – Which kind of coordinate system is defined for this catalog. The possibilities for this attribute are:

    • ‘flat’ = 2-dimensional flat coordinates. Set when x,y are given.

    • ‘spherical’ = spherical coordinates. Set when ra,dec are given.

    • ‘3d’ = 3-dimensional coordinates. Set when x,y,z or ra,dec,r are given.

  • field – If any of the get?Field methods have been called to construct a field from this catalog (either explicitly or implicitly via a corr.process() command, then this attribute will hold the most recent field to have been constructed.

    Note

    It holds this field as a weakref, so if caching is turned off with resize_cache(0), and the field has been garbage collected, then this attribute will be None.

Parameters:
  • file_name (str) – The name of the catalog file to be read in. (default: None, in which case the columns need to be entered directly with x, y, etc.)

  • config (dict) – A configuration dict which defines attributes about how to read the file. Any optional kwargs may be given here in the config dict if desired. Invalid keys in the config dict are ignored. (default: None)

Keyword Arguments:
  • num (int) – Which number catalog are we reading. e.g. for NG correlations the catalog for the N has num=0, the one for G has num=1. This is only necessary if you are using a config dict where things like x_col have multiple values. (default: 0)

  • logger – If desired, a Logger object for logging. (default: None, in which case one will be built according to the config dict’s verbose level.)

  • is_rand (bool) – If this is a random file, then setting is_rand to True will let them skip k_col, g1_col, and g2_col if they were set for the main catalog. (default: False)

  • x (array) – The x values. (default: None; When providing values directly, either x,y are required or ra,dec are required.)

  • y (array) – The y values. (default: None; When providing values directly, either x,y are required or ra,dec are required.)

  • z (array) – The z values, if doing 3d positions. (default: None; invalid in conjunction with ra, dec.)

  • ra (array) – The RA values. (default: None; When providing values directly, either x,y are required or ra,dec are required.)

  • dec (array) – The Dec values. (default: None; When providing values directly, either x,y are required or ra,dec are required.)

  • r (array) – The r values (the distances of each source from Earth). (default: None; invalid in conjunction with x, y.)

  • w (array) – The weights to apply when computing the correlations. (default: None)

  • wpos (array) – The weights to use for position centroiding. (default: None, which means to use the value weights, w, to weight the positions as well.)

  • flag (array) – An optional array of flags, indicating objects to skip. Rows with flag != 0 (or technically flag & ~ok_flag != 0) will be given a weight of 0. (default: None)

  • k (array) – The kappa values to use for scalar correlations. (This may represent any scalar field.) (default: None)

  • z1 (array) – The z1 values to use for complex scalar correlations. (default: None)

  • z2 (array) – The z2 values to use for complex scalar correlations. (default: None)

  • v1 (array) – The v1 values to use for vector correlations. (default: None)

  • v2 (array) – The v2 values to use for vector correlations. (default: None)

  • g1 (array) – The g1 values to use for shear correlations. (g1,g2 may represent any spin-2 field.) (default: None)

  • g2 (array) – The g2 values to use for shear correlations. (g1,g2 may represent any spin-2 field.) (default: None)

  • t1 (array) – The t1 values to use for trefoil (spin-3) correlations. (default: None)

  • t2 (array) – The t2 values to use for trefoil (spin-3) correlations. (default: None)

  • q1 (array) – The q1 values to use for quatrefoil (spin-4) correlations. (default: None)

  • q2 (array) – The q2 values to use for quatrefoil (spin-4) correlations. (default: None)

  • patch (array or int) –

    Optionally, patch numbers to use for each object. (default: None)

    Note

    This may also be an int if the entire catalog represents a single patch. If patch_centers is given this will select those items from the full input that correspond to the given patch number. Similarly if patch_col is given.

    If neither of these are given, then all items are set to have the given patch number, and npatch is required to set the total number of patches, which this catalog is a part of.

  • patch_centers (array or str) – Alternative to setting patch by hand or using kmeans, you may instead give patch_centers either as a file name or an array from which the patches will be determined. (default: None)

  • file_type (str) – What kind of file is the input file. Valid options are ‘ASCII’, ‘FITS’ ‘HDF’, or ‘Parquet’ (default: if the file_name extension starts with .fit, then use ‘FITS’, or with .hdf, then use ‘HDF’, or with ‘.par’, then use ‘Parquet’, else ‘ASCII’)

  • delimiter (str) – For ASCII files, what delimiter to use between values. (default: None, which means any whitespace)

  • comment_marker (str) – For ASCII files, what token indicates a comment line. (default: ‘#’)

  • first_row (int) – Which row to take as the first row to be used. (default: 1)

  • last_row (int) – Which row to take as the last row to be used. (default: -1, which means the last row in the file)

  • every_nth (int) – Only use every nth row of the input catalog. (default: 1)

  • npatch (int) –

    How many patches to split the catalog into (using kmeans if no other patch information is provided) for the purpose of jackknife variance or other options that involve running via patches. (default: 1)

    Note

    If the catalog has ra,dec,r positions, the patches will be made using just ra,dec.

    If patch is given, then this sets the total number of patches that are relevant for the area that was split into patches, which may include more catalogs than just this one.

  • kmeans_init (str) – If using kmeans to make patches, which init method to use. cf. Field.run_kmeans (default: ‘tree’)

  • kmeans_alt (bool) – If using kmeans to make patches, whether to use the alternate kmeans algorithm. cf. Field.run_kmeans (default: False)

  • x_col (str or int) – The column to use for the x values. An integer is only allowed for ASCII files. (default: ‘0’, which means not to read in this column. When reading from a file, either x_col and y_col are required or ra_col and dec_col are required.)

  • y_col (str or int) – The column to use for the y values. An integer is only allowed for ASCII files. (default: ‘0’, which means not to read in this column. When reading from a file, either x_col and y_col are required or ra_col and dec_col are required.)

  • z_col (str or int) – The column to use for the z values. An integer is only allowed for ASCII files. (default: ‘0’, which means not to read in this column; invalid in conjunction with ra_col, dec_col.)

  • ra_col (str or int) – The column to use for the ra values. An integer is only allowed for ASCII files. (default: ‘0’, which means not to read in this column. When reading from a file, either x_col and y_col are required or ra_col and dec_col are required.)

  • dec_col (str or int) – The column to use for the dec values. An integer is only allowed for ASCII files. (default: ‘0’, which means not to read in this column. When reading from a file, either x_col and y_col are required or ra_col and dec_col are required.)

  • r_col (str or int) – The column to use for the r values. An integer is only allowed for ASCII files. (default: ‘0’, which means not to read in this column; invalid in conjunction with x_col, y_col.)

  • x_units (str) – The units to use for the x values, given as a string. Valid options are arcsec, arcmin, degrees, hours, radians. (default: radians, although with (x,y) positions, you can often just ignore the units, and the output separations will be in whatever units x and y are in.)

  • y_units (str) – The units to use for the y values, given as a string. Valid options are arcsec, arcmin, degrees, hours, radians. (default: radians, although with (x,y) positions, you can often just ignore the units, and the output separations will be in whatever units x and y are in.)

  • ra_units (str) – The units to use for the ra values, given as a string. Valid options are arcsec, arcmin, degrees, hours, radians. (required when using ra_col or providing ra directly)

  • dec_units (str) – The units to use for the dec values, given as a string. Valid options are arcsec, arcmin, degrees, hours, radians. (required when using dec_col or providing dec directly)

  • k_col (str or int) – The column to use for the kappa values. An integer is only allowed for ASCII files. (default: ‘0’, which means not to read in this column.)

  • z1_col (str or int) – The column to use for the z1 values. An integer is only allowed for ASCII files. (default: ‘0’, which means not to read in this column.)

  • z2_col (str or int) – The column to use for the z2 values. An integer is only allowed for ASCII files. (default: ‘0’, which means not to read in this column.)

  • v1_col (str or int) – The column to use for the v1 values. An integer is only allowed for ASCII files. (default: ‘0’, which means not to read in this column.)

  • v2_col (str or int) – The column to use for the v2 values. An integer is only allowed for ASCII files. (default: ‘0’, which means not to read in this column.)

  • g1_col (str or int) – The column to use for the g1 values. An integer is only allowed for ASCII files. (default: ‘0’, which means not to read in this column.)

  • g2_col (str or int) – The column to use for the g2 values. An integer is only allowed for ASCII files. (default: ‘0’, which means not to read in this column.)

  • t1_col (str or int) – The column to use for the t1 values. An integer is only allowed for ASCII files. (default: ‘0’, which means not to read in this column.)

  • t2_col (str or int) – The column to use for the t2 values. An integer is only allowed for ASCII files. (default: ‘0’, which means not to read in this column.)

  • q1_col (str or int) – The column to use for the q1 values. An integer is only allowed for ASCII files. (default: ‘0’, which means not to read in this column.)

  • q2_col (str or int) – The column to use for the q2 values. An integer is only allowed for ASCII files. (default: ‘0’, which means not to read in this column.)

  • patch_col (str or int) – The column to use for the patch numbers. An integer is only allowed for ASCII files. (default: ‘0’, which means not to read in this column.)

  • w_col (str or int) – The column to use for the weight values. An integer is only allowed for ASCII files. (default: ‘0’, which means not to read in this column.)

  • wpos_col (str or int) – The column to use for the position weight values. An integer is only allowed for ASCII files. (default: ‘0’, which means not to read in this column, in which case wpos=w.)

  • flag_col (str or int) – The column to use for the flag values. An integer is only allowed for ASCII files. Any row with flag != 0 (or technically flag & ~ok_flag != 0) will be given a weight of 0. (default: ‘0’, which means not to read in this column.)

  • ignore_flag (int) – Which flags should be ignored. (default: all non-zero flags are ignored. Equivalent to ignore_flag = ~0.)

  • ok_flag (int) – Which flags should be considered ok. (default: 0. i.e. all non-zero flags are ignored.)

  • allow_xyz (bool) – Whether to allow x,y,z values in conjunction with ra,dec. Normally, it is an error to have both kinds of positions, but if you know that the x,y,z, values are consistent with the given ra,dec values, it can save time to input them, rather than calculate them using trig functions. (default: False)

  • flip_z1 (bool) – Whether to flip the sign of the input z1 values. (default: False)

  • flip_z2 (bool) – Whether to flip the sign of the input z2 values. (default: False)

  • flip_v1 (bool) – Whether to flip the sign of the input v1 values. (default: False)

  • flip_v2 (bool) – Whether to flip the sign of the input v2 values. (default: False)

  • flip_g1 (bool) – Whether to flip the sign of the input g1 values. (default: False)

  • flip_g2 (bool) – Whether to flip the sign of the input g2 values. (default: False)

  • flip_t1 (bool) – Whether to flip the sign of the input t1 values. (default: False)

  • flip_t2 (bool) – Whether to flip the sign of the input t2 values. (default: False)

  • flip_q1 (bool) – Whether to flip the sign of the input q1 values. (default: False)

  • flip_q2 (bool) – Whether to flip the sign of the input q2 values. (default: False)

  • keep_zero_weight (bool) – Whether to keep objects with wpos=0 in the catalog (including any objects that indirectly get wpos=0 due to NaN or flags), so they would be included in ntot and also in npairs calculations that use this Catalog, although of course not contribute to the accumulated weight of pairs. (default: False)

  • save_patch_dir (str) – If desired, when building patches from this Catalog, save them as FITS files in the given directory for more efficient loading when doing cross-patch correlations with the low_mem option.

  • ext (int/str) – For FITS/HDF files, Which extension to read. (default: 1 for fits, root for HDF)

  • x_ext (int/str) – Which extension to use for the x values. (default: ext)

  • y_ext (int/str) – Which extension to use for the y values. (default: ext)

  • z_ext (int/str) – Which extension to use for the z values. (default: ext)

  • ra_ext (int/str) – Which extension to use for the ra values. (default: ext)

  • dec_ext (int/str) – Which extension to use for the dec values. (default: ext)

  • r_ext (int/str) – Which extension to use for the r values. (default: ext)

  • k_ext (int/str) – Which extension to use for the k values. (default: ext)

  • z1_ext (int/str) – Which extension to use for the z1 values. (default: ext)

  • z2_ext (int/str) – Which extension to use for the z2 values. (default: ext)

  • v1_ext (int/str) – Which extension to use for the v1 values. (default: ext)

  • v2_ext (int/str) – Which extension to use for the v2 values. (default: ext)

  • g1_ext (int/str) – Which extension to use for the g1 values. (default: ext)

  • g2_ext (int/str) – Which extension to use for the g2 values. (default: ext)

  • t1_ext (int/str) – Which extension to use for the t1 values. (default: ext)

  • t2_ext (int/str) – Which extension to use for the t2 values. (default: ext)

  • q1_ext (int/str) – Which extension to use for the q1 values. (default: ext)

  • q2_ext (int/str) – Which extension to use for the q2 values. (default: ext)

  • patch_ext (int/str) – Which extension to use for the patch numbers. (default: ext)

  • w_ext (int/str) – Which extension to use for the w values. (default: ext)

  • wpos_ext (int/str) – Which extension to use for the wpos values. (default: ext)

  • flag_ext (int/str) – Which extension to use for the flag values. (default: ext)

  • x_eval (str) – An eval string to use for the x values. (default: None)

  • y_eval (str) – An eval string to use for the y values. (default: None)

  • z_eval (str) – An eval string to use for the z values. (default: None)

  • ra_eval (str) – An eval string to use for the ra values. (default: None)

  • dec_eval (str) – An eval string to use for the dec values. (default: None)

  • r_eval (str) – An eval string to use for the r values. (default: None)

  • k_eval (str) – An eval string to use for the k values. (default: None)

  • z1_eval (str) – An eval string to use for the z1 values. (default: None)

  • z2_eval (str) – An eval string to use for the z2 values. (default: None)

  • v1_eval (str) – An eval string to use for the v1 values. (default: None)

  • v2_eval (str) – An eval string to use for the v2 values. (default: None)

  • g1_eval (str) – An eval string to use for the g1 values. (default: None)

  • g2_eval (str) – An eval string to use for the g2 values. (default: None)

  • t1_eval (str) – An eval string to use for the t1 values. (default: None)

  • t2_eval (str) – An eval string to use for the t2 values. (default: None)

  • q1_eval (str) – An eval string to use for the q1 values. (default: None)

  • q2_eval (str) – An eval string to use for the q2 values. (default: None)

  • patch_eval (str) – An eval string to use for the patch numbers. (default: None)

  • w_eval (str) – An eval string to use for the weight values. (default: None)

  • wpos_eval (str) – An eval string to use for the position weight values. (default: None)

  • flag_eval (str) – An eval string to use for the flag values. (default: None)

  • extra_cols (list) – A list of column names to read to be used for the quantities that are calculated with eval. (default: None)

  • verbose (int) –

    If no logger is provided, this will optionally specify a logging level to use.

    • 0 means no logging output

    • 1 means to output warnings only (default)

    • 2 means to output various progress information

    • 3 means to output extensive debugging information

  • log_file (str) – If no logger is provided, this will specify a file to write the logging output. (default: None; i.e. output to standard output)

  • split_method (str) –

    How to split the cells in the tree when building the tree structure. Options are:

    • mean: Use the arithmetic mean of the coordinate being split. (default)

    • median: Use the median of the coordinate being split.

    • middle: Use the middle of the range; i.e. the average of the minimum and maximum value.

    • random: Use a random point somewhere in the middle two quartiles of the range.

  • cat_precision (int) – The precision to use when writing a Catalog to an ASCII file. This should be an integer, which specifies how many digits to write. (default: 16)

  • rng (np.Generator) – If desired, a numpy.random.Generator or numpy.random.RandomState instance to use for any random number generation (e.g. kmeans patches). (default: None)

  • num_threads (int) –

    How many OpenMP threads to use during the catalog load steps. (default: use the number of cpu cores)

    Note

    This won’t work if the system’s C compiler cannot use OpenMP (e.g. clang prior to version 3.7.)

checkForNaN(col, col_str)[source]

Check if the column has any NaNs. If so, set those rows to have w[k]=0.

Parameters:
  • col (array) – The input column to check.

  • col_str (str) – The name of the column. Used only as information in logging output.

clear_cache()[source]

Clear all field caches.

The various kinds of fields built from this catalog are cached. This may or may not be an optimization for your use case. Normally only a single field is built for a given catalog, and it is usually efficient to cache it, so it can be reused multiple times. E.g. for the usual Landy-Szalay NN calculation:

>>> dd.process(data_cat)
>>> rr.process(rand_cat)
>>> dr.process(data_cat, rand_cat)

the third line will be able to reuse the same fields built for the data and randoms in the first two lines.

However, this also means that the memory used for the field will persist as long as the catalog object does. If you need to recover this memory and don’t want to delete the catalog yet, this method lets you clear the cache.

There are separate caches for each kind of field. If you want to clear just one or some of them, you can call clear separately for the different caches:

>>> cat.nfields.clear()
>>> cat.kfields.clear()
>>> cat.zfields.clear()
>>> cat.vfields.clear()
>>> cat.gfields.clear()
>>> cat.tfields.clear()
>>> cat.qfields.clear()
classmethod combine(cat_list, *, mask_list=None, low_mem=False)[source]

Combine several Catalogs into a single larger Catalog

If desired, one can also specify a mask for each of the input catalogs, which will select just a portion of the rows in that catalog

All the Catalog must have the same columns defined (e.g. ra, dec, x, y, k, g1, g2, etc.)

Parameters:
  • cat_list – A list of Catalog instances to combine.

  • mask_list – (optional) Which objects to take from each Catalog. If given, it must be a list of the same length as cat_list. (default: None)

  • low_mem (bool) – Whether to try to leave the catalogs in cat_list unloaded if they started out that way to keep total memory down. (default: False)

Returns:

combined_cat

copy()[source]

Make a copy

getGField(*, min_size=0, max_size=None, split_method=None, brute=False, min_top=None, max_top=10, coords=None, logger=None)[source]

Return a GField based on the g1,g2 values in this catalog.

The GField object is cached, so this is efficient to call multiple times. cf. resize_cache and clear_cache.

Parameters:
  • min_size (float) – The minimum radius cell required (usually min_sep). (default: 0)

  • max_size (float) – The maximum radius cell required (usually max_sep). (default: None)

  • split_method (str) – Which split method to use (‘mean’, ‘median’, ‘middle’, or ‘random’) (default: ‘mean’; this value can also be given in the Catalog constructor in the config dict.)

  • brute (bool) – Whether to force traversal to the leaves. (default: False)

  • min_top (int) – The minimum number of top layers to use when setting up the field. (default: \(\max(3, \log_2(N_{\rm cpu}))\))

  • max_top (int) – The maximum number of top layers to use when setting up the field. (default: 10)

  • coords (str) – The kind of coordinate system to use. (default self.coords)

  • logger – A Logger object if desired (default: self.logger)

Returns:

A GField object

getKField(*, min_size=0, max_size=None, split_method=None, brute=False, min_top=None, max_top=10, coords=None, logger=None)[source]

Return a KField based on the k values in this catalog.

The KField object is cached, so this is efficient to call multiple times. cf. resize_cache and clear_cache

Parameters:
  • min_size (float) – The minimum radius cell required (usually min_sep). (default: 0)

  • max_size (float) – The maximum radius cell required (usually max_sep). (default: None)

  • split_method (str) – Which split method to use (‘mean’, ‘median’, ‘middle’, or ‘random’) (default: ‘mean’; this value can also be given in the Catalog constructor in the config dict.)

  • brute (bool) – Whether to force traversal to the leaves. (default: False)

  • min_top (int) – The minimum number of top layers to use when setting up the field. (default: \(\max(3, \log_2(N_{\rm cpu}))\))

  • max_top (int) – The maximum number of top layers to use when setting up the field. (default: 10)

  • coords (str) – The kind of coordinate system to use. (default self.coords)

  • logger – A Logger object if desired (default: self.logger)

Returns:

A KField object

getNField(*, min_size=0, max_size=None, split_method=None, brute=False, min_top=None, max_top=10, coords=None, logger=None)[source]

Return an NField based on the positions in this catalog.

The NField object is cached, so this is efficient to call multiple times. cf. resize_cache and clear_cache

Parameters:
  • min_size (float) – The minimum radius cell required (usually min_sep). (default: 0)

  • max_size (float) – The maximum radius cell required (usually max_sep). (default: None)

  • split_method (str) – Which split method to use (‘mean’, ‘median’, ‘middle’, or ‘random’) (default: ‘mean’; this value can also be given in the Catalog constructor in the config dict.)

  • brute (bool) – Whether to force traversal to the leaves. (default: False)

  • min_top (int) – The minimum number of top layers to use when setting up the field. (default: \(\max(3, \log_2(N_{\rm cpu}))\))

  • max_top (int) – The maximum number of top layers to use when setting up the field. (default: 10)

  • coords (str) – The kind of coordinate system to use. (default: self.coords)

  • logger – A Logger object if desired (default: self.logger)

Returns:

An NField object

getQField(*, min_size=0, max_size=None, split_method=None, brute=False, min_top=None, max_top=10, coords=None, logger=None)[source]

Return a QField based on the q1,q2 values in this catalog.

The QField object is cached, so this is efficient to call multiple times. cf. resize_cache and clear_cache.

Parameters:
  • min_size (float) – The minimum radius cell required (usually min_sep). (default: 0)

  • max_size (float) – The maximum radius cell required (usually max_sep). (default: None)

  • split_method (str) – Which split method to use (‘mean’, ‘median’, ‘middle’, or ‘random’) (default: ‘mean’; this value can also be given in the Catalog constructor in the config dict.)

  • brute (bool) – Whether to force traversal to the leaves. (default: False)

  • min_top (int) – The minimum number of top layers to use when setting up the field. (default: \(\max(3, \log_2(N_{\rm cpu}))\))

  • max_top (int) – The maximum number of top layers to use when setting up the field. (default: 10)

  • coords (str) – The kind of coordinate system to use. (default self.coords)

  • logger – A Logger object if desired (default: self.logger)

Returns:

A QField object

getTField(*, min_size=0, max_size=None, split_method=None, brute=False, min_top=None, max_top=10, coords=None, logger=None)[source]

Return a TField based on the t1,t2 values in this catalog.

The TField object is cached, so this is efficient to call multiple times. cf. resize_cache and clear_cache.

Parameters:
  • min_size (float) – The minimum radius cell required (usually min_sep). (default: 0)

  • max_size (float) – The maximum radius cell required (usually max_sep). (default: None)

  • split_method (str) – Which split method to use (‘mean’, ‘median’, ‘middle’, or ‘random’) (default: ‘mean’; this value can also be given in the Catalog constructor in the config dict.)

  • brute (bool) – Whether to force traversal to the leaves. (default: False)

  • min_top (int) – The minimum number of top layers to use when setting up the field. (default: \(\max(3, \log_2(N_{\rm cpu}))\))

  • max_top (int) – The maximum number of top layers to use when setting up the field. (default: 10)

  • coords (str) – The kind of coordinate system to use. (default self.coords)

  • logger – A Logger object if desired (default: self.logger)

Returns:

A TField object

getVField(*, min_size=0, max_size=None, split_method=None, brute=False, min_top=None, max_top=10, coords=None, logger=None)[source]

Return a VField based on the v1,v2 values in this catalog.

The VField object is cached, so this is efficient to call multiple times. cf. resize_cache and clear_cache.

Parameters:
  • min_size (float) – The minimum radius cell required (usually min_sep). (default: 0)

  • max_size (float) – The maximum radius cell required (usually max_sep). (default: None)

  • split_method (str) – Which split method to use (‘mean’, ‘median’, ‘middle’, or ‘random’) (default: ‘mean’; this value can also be given in the Catalog constructor in the config dict.)

  • brute (bool) – Whether to force traversal to the leaves. (default: False)

  • min_top (int) – The minimum number of top layers to use when setting up the field. (default: \(\max(3, \log_2(N_{\rm cpu}))\))

  • max_top (int) – The maximum number of top layers to use when setting up the field. (default: 10)

  • coords (str) – The kind of coordinate system to use. (default self.coords)

  • logger – A Logger object if desired (default: self.logger)

Returns:

A VField object

getZField(*, min_size=0, max_size=None, split_method=None, brute=False, min_top=None, max_top=10, coords=None, logger=None)[source]

Return a VField based on the z1,z2 values in this catalog.

The VField object is cached, so this is efficient to call multiple times. cf. resize_cache and clear_cache.

Parameters:
  • min_size (float) – The minimum radius cell required (usually min_sep). (default: 0)

  • max_size (float) – The maximum radius cell required (usually max_sep). (default: None)

  • split_method (str) – Which split method to use (‘mean’, ‘median’, ‘middle’, or ‘random’) (default: ‘mean’; this value can also be given in the Catalog constructor in the config dict.)

  • brute (bool) – Whether to force traversal to the leaves. (default: False)

  • min_top (int) – The minimum number of top layers to use when setting up the field. (default: \(\max(3, \log_2(N_{\rm cpu}))\))

  • max_top (int) – The maximum number of top layers to use when setting up the field. (default: 10)

  • coords (str) – The kind of coordinate system to use. (default self.coords)

  • logger – A Logger object if desired (default: self.logger)

Returns:

A ZField object

get_patch_centers()[source]

Return an array of patch centers corresponding to the patches in this catalog.

If the patches were set either using K-Means or by giving the centers, then this will just return that same center array. Otherwise, it will be calculated from the positions of the objects with each patch number.

This function is automatically called when accessing the property patch_centers. So you should not normally need to call it directly.

Returns:

An array of center coordinates used to make the patches. Shape is (npatch, 2) for flat geometries or (npatch, 3) for 3d or spherical geometries. In the latter case, the centers represent (x,y,z) coordinates on the unit sphere.

get_patch_file_names(save_patch_dir)[source]

Get the names of the files to use for reading/writing patches in save_patch_dir

get_patches(*, low_mem=False)[source]

Return a list of Catalog instances each representing a single patch from this Catalog

After calling this function once, the patches may be repeatedly accessed by the patches attribute, without triggering a rebuild of the patches. Furthermore, if patches is accessed before calling this function, it will be called automatically (with the default low_mem parameter).

Parameters:

low_mem (bool) – Whether to try to leave the returned patch catalogs in an “unloaded” state, wherein they will not load the data from a file until they are used. This only works if the current catalog was loaded from a file or the patches were saved (using save_patch_dir). (default: False)

load()[source]

Load the data from a file, if it isn’t yet loaded.

When a Catalog is read in from a file, it tries to delay the loading of the data from disk until it is actually needed. This is especially important when running over a set of patches, since you may not be able to fit all the patches in memory at once.

One does not normally need to call this method explicitly. It will run automatically whenever the data is needed. However, if you want to directly control when the disk access happens, you can use this function.

makeArray(col, col_str, dtype=<class 'float'>)[source]

Turn the input column into a numpy array if it wasn’t already. Also make sure the input is 1-d.

Parameters:
  • col (array-like) – The input column to be converted into a numpy array.

  • col_str (str) – The name of the column. Used only as information in logging output.

  • dtype (type) – The dtype for the returned array. (default: float)

Returns:

The column converted to a 1-d numpy array.

read_patch_centers(file_name)[source]

Read patch centers from a file.

This function typically gets called automatically when setting patch_centers as a string, being the file name. The patch centers are read from the file and returned.

Parameters:

file_name (str) – The name of the file to write to.

Returns:

The centers, as an array, which can be used to determine the patches.

read_patches(save_patch_dir=None)[source]

Read the patches from files on disk.

This function assumes that the patches were written using write_patches. In particular, the file names are not arbitrary, but must match what TreeCorr uses in that method.

Note

The patches that are read in will be in an “unloaded” state. They will load as needed when some functionality requires it. So this is compatible with using the low_mem option in various places.

Parameters:

save_patch_dir (str) – The directory to read from. [default: None, in which case self.save_patch_dir will be used. If that is None, a ValueError will be raised.]

resize_cache(maxsize)[source]

Resize all field caches.

The various kinds of fields built from this catalog are cached. This may or may not be an optimization for your use case. Normally only a single field is built for a given catalog, and it is usually efficient to cache it, so it can be reused multiple times. E.g. for the usual Landy-Szalay NN calculation:

>>> dd.process(data_cat)
>>> rr.process(rand_cat)
>>> dr.process(data_cat, rand_cat)

the third line will be able to reuse the same fields built for the data and randoms in the first two lines.

However, if you are making many different fields from the same catalog – for instance because you keep changing the min_sep and max_sep for different calls – then saving them all will tend to blow up the memory.

Therefore, the default number of fields (of each type) to cache is 1. This lets the first use case be efficient, but not use too much memory for the latter case.

If you prefer a different behavior, this method lets you change the number of fields to cache. The cache is an LRU (Least Recently Used) cache, which means only the n most recently used fields are saved. I.e. when it is full, the least recently used field is removed from the cache.

If you call this with maxsize=0, then caching will be turned off. A new field will be built each time you call a process function with this catalog.

If you call this with maxsize>1, then mutiple fields will be saved according to whatever number you set. This will use more memory, but may be an optimization for you depending on what your are doing.

Finally, if you want to set different sizes for the different kinds of fields, then you can call resize separately for the different caches:

>>> cat.nfields.resize(maxsize)
>>> cat.kfields.resize(maxsize)
>>> cat.zfields.resize(maxsize)
>>> cat.vfields.resize(maxsize)
>>> cat.gfields.resize(maxsize)
>>> cat.tfields.resize(maxsize)
>>> cat.qfields.resize(maxsize)
Parameters:

maxsize (float) – The new maximum number of fields of each type to cache.

select(indx)[source]

Trim the catalog to only include those objects with the give indices.

Parameters:

indx – A numpy array of index values to keep.

unload()[source]

Bring the Catalog back to an “unloaded” state, if possible.

When a Catalog is read in from a file, it tries to delay the loading of the data from disk until it is actually needed. After loading, this method will return the Catalog back to the unloaded state to recover the memory in the data arrays. If the Catalog is needed again during further processing, it will re-load the data from disk at that time.

This will also call clear_cache to recover any memory from fields that have been constructed as well.

If the Catalog was not read in from a file, then this function will only do the clear_cache step.

write(file_name, *, file_type=None, precision=None)[source]

Write the catalog to a file.

The position columns are output using the same units as were used when building the Catalog. If you want to use a different unit, you can set the catalog’s units directly before writing. e.g.:

>>> cat = treecorr.Catalog('cat.dat', ra=ra, dec=dec,
                           ra_units='hours', dec_units='degrees')
>>> cat.ra_units = coord.degrees
>>> cat.write('new_cat.dat')

The output file will include some of the following columns (those for which the corresponding attribute is not None):

Column

Description

ra

self.ra if not None

dec

self.dec if not None

r

self.r if not None

x

self.x if not None

y

self.y if not None

z

self.z if not None

w

self.w if not None and self.nontrivial_w

wpos

self.wpos if not None

k

self.k if not None

z1

self.z1 if not None

z2

self.z2 if not None

v1

self.v1 if not None

v2

self.v2 if not None

g1

self.g1 if not None

g2

self.g2 if not None

t1

self.t1 if not None

t2

self.t2 if not None

q1

self.q1 if not None

q2

self.q2 if not None

patch

self.patch if not None

Parameters:
  • file_name (str) – The name of the file to write to.

  • file_type (str) – The type of file to write (‘ASCII’ or ‘FITS’). (default: determine the type automatically from the extension of file_name.)

  • precision (int) – For ASCII output catalogs, the desired precision. (default: 16; this value can also be given in the Catalog constructor in the config dict as cat_precision.)

Returns:

The column names that were written to the file as a list.

write_patch_centers(file_name)[source]

Write the patch centers to a file.

The output file will include the following columns:

Column

Description

patch

patch number (0..npatch-1)

x

mean x values

y

mean y values

z

mean z values (only for spherical or 3d coordinates)

It will write a FITS file if the file name ends with ‘.fits’, otherwise an ASCII file.

Parameters:

file_name (str) – The name of the file to write to.

write_patches(save_patch_dir=None)[source]

Write the patches to disk as separate files.

This can be used in conjunction with low_mem=True option of get_patches (and implicitly by the various process methods) to only keep at most two patches in memory at a time.

Parameters:

save_patch_dir (str) – The directory to write the patches to. [default: None, in which case self.save_patch_dir will be used. If that is None, a ValueError will be raised.]

File Readers

class treecorr.reader.FitsReader(file_name, *, logger=None)[source]

Reader interface for FITS files. Uses fitsio to read columns, etc.

check_valid_ext(ext)[source]

Check if an extension is valid for reading, and raise ValueError if not.

The ext must both exist and be a table (not an image)

Parameters:

ext (str/int) – The extension to check.

names(*, ext=None)[source]

Return a list of the names of all the columns in an extension

Parameters:

ext (str/int) – The extension to search for columns. (default: 1)

Returns:

A list of string column names

read(cols, s=slice(None, None, None), *, ext=None)[source]

Read a slice of a column or list of columns from a specified extension

Parameters:
  • cols (str/list) – The name(s) of column(s) to read.

  • s (slice/array) – A slice object or selection of integers to read. (default: all)

  • ext (str/int)) – The FITS extension to use. (default: 1)

Returns:

The data as a recarray or simple numpy array as appropriate

read_array(shape, *, ext=None)[source]

Read an array from the file.

Parameters:
  • shape (tuple) – The expected shape of the array.

  • ext (str) – The extension. (ignored – Ascii always reads the next group)

Returns:

array

read_data(*, ext=None, max_rows=None)[source]

Read all data in the file, and the parameters in the header, if any.

Parameters:
  • ext (str/int) – The FITS extension to use. (default: 1)

  • max_rows (int) – The max number of rows to read. (ignored)

Returns:

data

read_params(*, ext=None)[source]

Read the params in the given extension, if any.

Parameters:

ext (str/int) – The FITS extension to use. (default: 1)

Returns:

params

row_count(col=None, *, ext=None)[source]

Count the number of rows in the named extension

For compatibility with the HDF interface, which can have columns of different lengths, we allow a second argument, col, but it is ignored here.

Parameters:
  • col (str) – The column to use. (ignored)

  • ext (str/int) – The FITS extension to use. (default: 1)

Returns:

The number of rows

class treecorr.reader.HdfReader(file_name, *, logger=None)[source]

Reader interface for HDF5 files. Uses h5py to read columns, etc.

check_valid_ext(ext)[source]

Check if an extension is valid for reading, and raise ValueError if not.

The ext must exist - there is no other requirement for HDF files.

Parameters:

ext (str) – The extension to check.

names(*, ext=None)[source]

Return a list of the names of all the columns in an extension

Parameters:

ext (str) – The extension to search for columns. (default: ‘/’)

Returns:

A list of string column names

read(cols, s=slice(None, None, None), *, ext=None)[source]

Read a slice of a column or list of columns from a specified extension.

Slices should always be used when reading HDF files - using a sequence of integers is painfully slow.

Parameters:
  • cols (str/list) – The name(s) of column(s) to read.

  • s (slice/array) – A slice object or selection of integers to read. (default: all)

  • ext (str) – The HDF (sub-)group to use. (default: ‘/’)

Returns:

The data as a dict or single numpy array as appropriate

read_array(shape, *, ext=None)[source]

Read an array from the file.

Parameters:
  • shape (tuple) – The expected shape of the array.

  • ext (str) – The extension. (ignored – Ascii always reads the next group)

Returns:

array

read_data(*, ext=None, max_rows=None)[source]

Read all data in the file, and the parameters in the attributes, if any.

Parameters:
  • ext (str) – The HDF (sub-)group to use. (default: ‘/’)

  • max_rows (int) – The max number of rows to read. (ignored)

Returns:

data

read_params(*, ext=None)[source]

Read the params in the given extension, if any.

Parameters:

ext (str) – The HDF (sub-)group to use. (default: ‘/’)

Returns:

params

row_count(col, *, ext=None)[source]

Count the number of rows in the named extension and column

Unlike in FitsReader, col is required.

Parameters:
  • col (str) – The column to use.

  • ext (str) – The HDF group name to use. (default: ‘/’)

Returns:

The number of rows

class treecorr.reader.AsciiReader(file_name, *, delimiter=None, comment_marker='#', logger=None)[source]

Reader interface for ASCII files using numpy.

check_valid_ext(ext)[source]

Check if an extension is valid for reading, and raise ValueError if not.

None is the only valid extension for ASCII files.

Parameters:

ext (str) – The extension to check.

names(*, ext=None)[source]

Return a list of the names of all the columns in an extension

Parameters:

ext (str) – The extension. (ignored)

Returns:

A list of string column names

read(cols, s=slice(None, None, None), *, ext=None)[source]

Read a slice of a column or list of columns from a specified extension.

Parameters:
  • cols (str/list) – The name(s) of column(s) to read.

  • s (slice/array) – A slice object or selection of integers to read. (default: all)

  • ext (str) – The extension. (ignored)

Returns:

The data as a dict or single numpy array as appropriate

read_array(shape, *, ext=None)[source]

Read an array from the file.

Parameters:
  • shape (tuple) – The expected shape of the array.

  • ext (str) – The extension. (ignored – Ascii always reads the next group)

Returns:

array

read_data(*, ext=None, max_rows=None)[source]

Read all data in the file, and the parameters in the header, if any.

Parameters:
  • ext (str) – The extension. (ignored – Ascii always reads the next group)

  • max_rows (int) – The max number of rows to read. (default: None)

Returns:

data

read_params(*, ext=None)[source]

Read the params in the given extension, if any.

Parameters:

ext (str) – The extension. (ignored – Ascii always reads the next group)

Returns:

params

row_count(col=None, *, ext=None)[source]

Count the number of rows in the file.

Parameters:
  • col (str) – The column to use. (ignored)

  • ext (str) – The extension. (ignored)

Returns:

The number of rows

class treecorr.reader.PandasReader(file_name, *, delimiter=None, comment_marker='#', logger=None)[source]

Reader interface for ASCII files using pandas.

read(cols, s=slice(None, None, None), *, ext=None)[source]

Read a slice of a column or list of columns from a specified extension.

Parameters:
  • cols (str/list) – The name(s) of column(s) to read.

  • s (slice/array) – A slice object or selection of integers to read. (default: all)

  • ext (str) – The extension. (ignored)

Returns:

The data as a dict or single numpy array as appropriate

class treecorr.reader.ParquetReader(file_name, *, delimiter=None, comment_marker='#', logger=None)[source]

Reader interface for Parquet files using pandas.

check_valid_ext(ext)[source]

Check if an extension is valid for reading, and raise ValueError if not.

None is the only valid extension for ASCII files.

Parameters:

ext (str) – The extension to check.

names(*, ext=None)[source]

Return a list of the names of all the columns in an extension

Parameters:

ext (str) – The extension to search for columns. (ignored)

Returns:

A list of string column names

read(cols, s=slice(None, None, None), *, ext=None)[source]

Read a slice of a column or list of columns from a specified extension.

Parameters:
  • cols (str/list) – The name(s) of column(s) to read.

  • s (slice/array) – A slice object or selection of integers to read. (default: all)

  • ext (str) – The extension. (ignored)

Returns:

The data as a recarray or simple numpy array as appropriate

row_count(col=None, *, ext=None)[source]

Count the number of rows in the named extension and column

Unlike in FitsReader, col is required.

Parameters:
  • col (str) – The column to use. (ignored)

  • ext (str) – The extension. (ignored)

Returns:

The number of rows