pycoalescence package

pycoalescence provides the facilities for running spatially explicit neutral coalescence ecological simulations and performing basic analysis of the simulation outputs. The program requires necsim to function properly.

Key submodules

These are the most important modules for running and analysing spatially explicit neutral models, and are most likely to be used directly.

simulation module

Run spatially explicit neutral simulations on provided landscapes with support for a wide range of scenarios and parameters. Detailed here.

The main class is Simulation, which contains routines for setting up and running simulations, plus basic tree generation after simulations have been completed.

input:
  • Simulation parameters (such as dispersal kernel, speciation rate)
  • Map files representing density over space
  • [optional] map files representing relative reproductive ability
  • [optional] map files representing dispersal potential
  • [optional] historical density map files
output:
  • Database containing generated coalescence tree, simulation parameters and basic biodiversity metrics.
  • If the simulation does not complete, will instead dump data to a Dump_main_*_*.csv file for resuming simulations.
class Simulation(logging_level=30, log_output=None, **kwargs)[source]

Bases: pycoalescence.landscape.Landscape

A class containing routines to set up and run simulations, including detecting map dimensions from tif files.

add_death_map(death_map)[source]

Adds a death map to the simulation.

Parameters:death_map (str/pycoalescence.Map) – the death map to import
Return type:None
add_dispersal_map(dispersal_map)[source]

Adds a dispersal map to the simulation.

Parameters:dispersal_map (str/pycoalescence.Map) – the dispersal map to import
Return type:None
add_gillespie(generations=0.0)[source]

Uses the Gillespie algorithm from the given number of generations into the simulation. :param generations: the number of generations at which to use gillespie.

Return type:None
Returns:None
add_reproduction_map(reproduction_map)[source]

Adds a death map to the simulation.

Parameters:reproduction_map (str/pycoalescence.Map) – the death map to import
Return type:None
add_sample_time(time)[source]

Adds an extra sample time to the list of times.

This allows for multiple temporal sample points from within the same simulation.

Parameters:time – the sample time to add
apply_speciation_rates(speciation_rates=None)[source]

Applies the speciation rates to the coalescence tree and outputs to the database.

Parameters:speciation_rates – a list of speciation rates to apply
Return type:None
calculate_sql_database()[source]

Saves the output database location to self.output_database.

check_can_use_gillespie()[source]

Checks if the simulation can use the Gillespie algorithm.

Returns:true if the simulation can use Gillespie
Return type:bool
check_death_map()[source]

Checks that the death map dimensions match the fine map dimensions.

check_dimensions_match_fine(map_to_check, name='')[source]

Checks that the dimensions of the provided map matches the fine map.

Parameters:
  • map_to_check (Map) – map to check the dimensions of against the fine map
  • name (str) – name to write out in error message
Returns:

true if the dimensions match

check_dispersal_map()[source]

Checks that the dispersal map dimensions match the fine map dimensions.

check_file_parameters(expected=False)[source]

Check that all the required files exist for the simulation and the output doesn’t already exist.

Parameters:expected – set to true if we expect the output file to already exist
Raises:RuntimeError – if previous set-up routines are not complete
check_maps()[source]

Checks that the maps all exist and that the file structure makes sense.

Raises:ValueError – if the grid is not within the sample map.
Returns:None
check_reproduction_map()[source]

Checks that the reproduction map dimensions match the fine map dimensions.

check_sample_map_equals_sample_grid()[source]

Checks if the grid and sample map are the same size and offset (in which case, future operations can be simplified).

Returns:true if the grid and sample map dimensions and offsets are equal
Return type:bool
check_simulation_parameters(expected=False, ignore_errors=False)[source]

Checks that simulation parameters have been correctly set and the program is ready for running.

Parameters:
  • ignore_errors – if true, any FileNotFoundError and FileExistsError raised by checking the output database are ignored
  • expected – set to true if we expect the output file to exist
check_sql_database(expected=False)[source]

Checks whether the output database exists. If the existance does not match the expected variable, raises an error.

Raises:
  • FileExistsError – if the file already exists when it’s not expected to
  • FileNotExistsError – if the file does not exist when we expect it to
Parameters:

expected – boolean for expected existance of the output file

Return type:

None

count_individuals()[source]

Estimates the number of individuals to be simulated. This may be inaccurate if using multiple time points and historical maps.

Returns:a count of the number of individuals to be simulated
Return type:float
create_config(output_file=None)[source]

Generates the configuration. This will be written out either by providing an output file here, or by calling write_config();

Parameters:output_file (str) – the file to generate the config option. Must be a path to a .txt file.
create_map_config(output_file=None)[source]

Generates the map config file from reading the spatial structure of each of the provided files.

Parameters:output_file (str) – the file to output configuration data to (the map config file)
create_temporal_sampling_config()[source]

Creates the time-sampling config file.

Function is called automatically when creating a config file, and should not be manually called.

detect_map_dimensions()[source]

Detects all the map dimensions for the provided files (where possible) and sets the respective values. This is intended to be run after set_map_files()

Returns:None
finalise_setup(expected=False, ignore_errors=False)[source]

Runs all setup routines to provide a complete simulation. Should be called immediately before run_coalescence() to ensure the simulation setup is complete.

Parameters:
  • ignore_errors – if true, any FileNotFoundError and FileExistsError raised by checking the output database are ignored
  • expected – set to true if we expect the output file to exist
get_average_density()[source]

Gets the average density across the fine map, subsetted for the sample grid.

get_optimised_solution()[source]

Gets the optimised solution as a dictionary containing the important optimised variables. This can be read back in with set_optimised_solution

Returns:dict containing the important optimised variables
Return type:dict
get_protracted()[source]

Gets whether the simulation pointed to by this object is a protracted simulation or not.

get_species_richness(reference=1)[source]

Calls coal_analyse.get_species_richness() with the supplied variables.

Requires successful import of coal_analyse and sqlite3.

Parameters:reference (int) – the community reference to obtain the metrics for.
Returns:the species richness.
grid_density_actual(x_off, y_off, x_dim, y_dim)[source]

Counts the density total for a subset of the grid by sampling from the fine map.

Note that for large maps this can take a very long time.

Parameters:
  • x_off – the x offset of the grid map subset
  • y_off – the y offset of the grid map subset
  • x_dim – the x dimension of the grid map subset
  • y_dim – the y dimension of the grid map subset
Returns:

the total individuals that exist in the subset.

Return type:

int

grid_density_estimate(x_off, y_off, x_dim, y_dim)[source]

Counts the density total for a subset of the grid by sampling from the fine map

Note:

This is an approximation (based on the average density of the fine map) and does not produce a perfect value. This is done for performance reasons. The actual value can be obtained with grid_density_actual().

Parameters:
  • x_off – the x offset of the grid map subset
  • y_off – the y offset of the grid map subset
  • x_dim – the x dimension of the grid map subset
  • y_dim – the y dimension of the grid map subset
Returns:

an estimate of the total individuals that exist in the subset.

Return type:

int

import_fine_map_array()[source]

Imports the fine map array to the in-memory object, subsetted to the same size as the sample grid.

Return type:None
import_sample_map_array()[source]

Imports the sample map array to the in-memory object.

Return type:None
load_config(config_file)[source]

Loads the config file by reading the lines in order.

Parameters:config_file (str) – the config file to read in.
optimise_ram(ram_limit)[source]

Optimises the maps for a specific RAM usage.

If ram_limit is None, this function does nothing.

Note:Assumes that the C++ compiler has sizeof(long) = 8 bytes for calculating space usage.
Note:Only optimises RAM for a square area of the map. For rectangular shapes, will use the shortest length as a maximum size.
Parameters:ram_limit – the desired amount of RAM to limit to, in GB
Raises:MemoryError – if the desired simulation cannot be compressed into available RAM
persistent_ram_usage()[source]

This is the persistent RAM usage which cannot be optimised by the program for a particular set of maps

Returns:the total persistent RAM usage in bytes
resume_coalescence(pause_directory, seed, task, max_time, out_directory=None, protracted=None, spatial=None)[source]

Resumes the simulation from the specified directory, looking for the simulation with the specified seed and task referencing.

Parameters:
  • pause_directory – the directory to search for the paused simulation
  • seed – the seed of the paused simulation
  • task – the task of the paused simulation
  • max_time – the maximum time to run simulations for
  • out_directory – optionally provide an alternative output location. Defaults to same location as

pause_directory :param bool protracted: protractedness of the simulation :param bool spatial: if the simulation is to be run with spatial complexity

Returns:None
run()[source]

Convenience function which completes setup, runs the simulation and calculates the coalescence tree for the set speciation rates in one step.

Return type:None
run_coalescence()[source]

Attempt to run the simulation with the given simulation set-up. This is the main routine performing the actual simulation which will take a considerable amount of time.

Returns:True if the simulation completes successfully, False if the simulation pauses.
Return type:bool
run_simple(seed, task, output, speciation_rate, sigma, size)[source]

Runs a simple coalescence simulation on a square infinite landscape with the provided parameters. This requires a separate compilation of the inf_land version of the coalescence simulator.

Note that this function returns richness=0 for failure to read from the file. It is assumed that there will be at least one species in the simulation.

Note that the maximum time for this function is set as 10 hours (36000 seconds) and will raise an exception if the simulation does not complete in this time).

Raises:

RuntimeError – if the simulation didn’t complete in time.

Parameters:
  • seed – the simulation seed
  • task – the task (for file naming)
  • output – the output directory
  • speciation_rate – the probability of speciation
  • sigma – the normal distribution sigma value for dispersal
  • size – the size of the world (so there will be size^2 individuals simulated)
Returns:

the species richness in the simulation

set_config_file(output_file=None)[source]

Sets the config file to the output, over-writing any existing config file that has been stored.

Parameters:output_file – path to config file to output to
set_map_files(sample_file, fine_file=None, coarse_file=None, historical_fine_file=None, historical_coarse_file=None, dispersal_map=None, death_map=None, reproduction_map=None)[source]

Sets the map files (or to null, if none specified). It then calls detect_map_dimensions() to correctly read in the specified dimensions.

If sample_file is “null”, dimension values will remain at 0. If coarse_file is “null”, it will default to the size of fine_file with zero offset. If the coarse file is “none”, it will not be used. If the historical fine or coarse files are “none”, they will not be used.

Note

the dispersal map should be of dimensions xy by xy where x, y are the fine map dimensions. Dispersal rates from each row/column index represents dispersal from the row index to the column index according to index = x+(y*xdim), where x,y are the coordinates of the cell and xdim is the x dimension of the fine map. See the PatchedLandscape class for routines for generating these landscapes.

Parameters:
  • sample_file (str) – the sample map file. Provide “null” if on samplemask is required
  • fine_file (str) – the fine map file. Defaults to “null” if none provided
  • coarse_file (str) – the coarse map file. Defaults to “none” if none provided
  • historical_fine_file (str) – the historical fine map file. Defaults to “none” if none provided
  • historical_coarse_file (str) – the historical coarse map file. Defaults to “none” if none provided
  • dispersal_map (str) – the dispersal map for reading dispersal values. Default to “none” if none provided
  • death_map (str) – a map of relative death probabilities, at the scale of the fine map
  • reproduction_map (str) – a map of relative reproduction probabilities, at the scale of the fine map
Return type:

None

Returns:

None

set_optimised_solution(dict_in)[source]

Sets the optimised RAM solution from the variables in the provided dictionary. This should contain the grid_x_size, grid_y_size, grid_file_name, sample_x_offset and sample_y_offset.

Parameters:dict_in (dict) – the dictionary containing the optimised RAM solution variables
Return type:None
set_seed(seed)[source]

Sets the seed for the simulation.

A seed < 1 should not be set for the necsim, as equivalent behaviour is produce for seed and abs(seed), plus for seed = 1 and seed = 0. Consequently, for any values less than 1, we take a very large number plus the seed, instead. Therefore a error is raised if the seed exceeds this very large number (this is an acceptable decrease in userability as a seed that large is unlikely to ever be used).

Parameters:seed (int) – the random number seed
set_simulation_parameters(seed, task, output_directory, min_speciation_rate, sigma=1.0, tau=1.0, deme=1.0, sample_size=1.0, max_time=3600, dispersal_method=None, m_prob=0.0, cutoff=0, dispersal_relative_cost=1, min_num_species=1, restrict_self=False, landscape_type=False, protracted=False, min_speciation_gen=None, max_speciation_gen=None, spatial=True, uses_spatial_sampling=False, times=None)[source]

Set all the simulation parameters apart from the map objects.

Parameters:
  • seed (int) – the unique job number for this simulation set
  • task (int) – the task reference number (used for easy file identification after simulations are complete)
  • output_directory (str) – the output directory to store the SQL database
  • min_speciation_rate (float) – the minimum speciation rate to simulate
  • sigma (float) – the dispersal sigma value
  • tau (float) – the fat-tailed dispersal tau value
  • deme (float) – the deme size (in individuals per cell)
  • sample_size (float) – the sample size of the deme (decimal 0-1)
  • max_time (float) – the maximum allowed simulation time (in seconds)
  • dispersal_method (str) – the dispersal kernel method. Should be one of [normal, fat-tail, norm-uniform]
  • m_prob (float) – the probability of drawing from the uniform dispersal. Only relevant for uniform dispersals
  • cutoff (float) – the maximum value for the uniform dispersal. Only relevant for uniform dispersals.
  • dispersal_relative_cost (float) – the relative cost of travelling through non-habitat (defaults to 1)
  • min_num_species (int) – the minimum number of species known to exist (defaults to 1
  • restrict_self (bool) – if true, restricts dispersal from own cell
  • landscape_type (bool/str) – if false or “closed”, restricts dispersal to the provided maps, otherwise can be “infinite”, or a tiled landscape using “tiled_coarse” or “tiled_fine”, or a clamped landscape using “clamped_coarse” or “clamped_fine”.
  • protracted (bool) – if true, uses protracted speciation application
  • min_speciation_gen (float) – the minimum amount of time a lineage must exist before speciation occurs.
  • max_speciation_gen (float) – the maximum amount of time a lineage can exist before speciating.
  • spatial (bool) – if true, means that the simulation is spatial
  • uses_spatial_sampling (bool) – if true, the sample mask is interpreted as a proportional sampling mask, where the number of individuals sampled in the cell is equal to the density * deme_sample * cell sampling proportion
  • times (list) – list of temporal sampling points to apply (in generations)
set_speciation_rates(speciation_rates)[source]

Add speciation rates for analysis at the end of the simulation. This is optional

Parameters:speciation_rates (list) – a list of speciation rates to apply at the end of the simulation
setup_necsim()[source]

Calculates the type of the simulation (spatial/non-spatial, protracted/non-protracted) and sets the c object appropriately.

Return type:None
write_config(config_file)[source]

Writes the config to the config file provided, overwriting any existing config files.

Parameters:config_file – the config file to write out to
Return type:None

coalescence_tree module

Generate the coalescence tree and acquire a number of biodiversity metrics for different parameter sets. Can also be used to compare against a comparison simulation object.

input:
  • Completed simulation database from Simulation
  • Parameters and operations to apply
output:
  • A variety of biodiversity metrics, including species richness and abundance distributions, locations of each species, alpha and beta diversity, plus equivalent fragment biodiversity metrics.
  • Modifies the simulation database in place.
class CoalescenceTree(database=None, logging_level=30, log_output=None)[source]

Bases: object

Contains the coalescence tree and performs various calculations of different biodiversity metrics, which are then stored in the SQLite database.

The general process is

add_metacommunity_parameters(metacommunity_size=None, metacommunity_speciation_rate=None, metacommunity_option=None, metacommunity_reference=0)[source]

Adds the metacommunity parameters to the object.

Parameters:
  • metacommunity_size (float) – the number of individuals in the metacommunity
  • metacommunity_speciation_rate (float) – the speciation rate within the metacommunity
  • metacommunity_option (str) – either “simulated”, “analytical”, or a path to a database to read SADs from
  • metacommunity_reference (int) – the metacommunity reference if using a database to provide the metacommunity
Return type:

None

add_multiple_protracted_parameters(min_speciation_gens=None, max_speciation_gens=None, speciation_gens=None)[source]

Adds the protracted parameter set, taking an iterable as an input.

Note

Using the keyword arguments, one can supply either a list of tuples for pairs of speciation generations, or two lists of generations for the min and max, matching in order.

Parameters:
  • min_speciation_gens – the minimum number of generations required before speciation is permitted. Order should match that of max_speciation_gens
  • max_speciation_gens – the maximum number of generations required before speciation is permitted. Order should match that of min_speciation_gens
  • speciation_gens – a list of tuples of min/max speciation generations.
add_protracted_parameters(min_speciation_gen, max_speciation_gen)[source]

Adds the protracted parameter set.

Note

Wipes (0.0, 0.0) from protracted parameters, if it is there alone.

Parameters:
  • min_speciation_gen – the minimum number of generations required before speciation is permitted
  • max_speciation_gen – the maximum number of generations required before speciation is permitted
add_time(time)[source]

Adds the time to the list to be applied.

Parameters:time – the time to be applied
add_times(times)[source]

Adds the list of times to those to be applied.

Parameters:times – list of times to be applied
adjust_data()[source]

Ensures that the numbers of individuals are equalised between the comparison and simulated datasets, and modifies the relevant tables with the new data

apply()[source]

Generates the cooalescence tree for the set of speciation parameters. This must be run after the main coalescence simulations are complete. It will create additional fields and tables in the SQLite database which contains the requested data.

apply_incremental()[source]

Generates the coalescence tree for the set of speciation parameters. Does not write changes to the database, just holds the changes internally.

apply_non_spatial_remaining(database)[source]

Applies the non-spatial neutral model to the remaining lineages. This approximation is reasonable on a closed landscape once the lineages themselves are close to randomly distributed.

Parameters:database – the database file to open
Returns:None
Return type:None
calculate_alpha_diversity(output_metrics=True)[source]

Calculates the system alpha diversity for each set of parameters stored in COMMUNITY_PARAMETERS. Stores the output in ALPHA_DIVERSITY table.

Parameters:output_metrics (bool) – output to the BIODIVERSITY_METRICS table
calculate_beta_diversity(output_metrics=True)[source]

Calculates the beta diversity for the system for each speciation parameter set and stores the output in BETA_DIVERSITY. Will calculate alpha diversity and species richness tables if they have not already been performed.

Parameters:output_metrics (bool) – output to the BIODIVERSITY_METRICS table
calculate_comparison_octaves(store=False)[source]

Calculates the octave classes for the comparison data and for fragments (if required). If the octaves exist in the FRAGMENT_OCTAVES table in the comparison database, the data will be imported instead of being re-calculated.

Note

If store is True, will store an EDITED version of the comparison octaves, such that the number of

individuals is equal between the comparison and simulated data.

Parameters:store – if True, stores within the comparison database.
calculate_fragment_abundances()[source]

Calculates the fragment abundances, including equalising with the comparison database, if it has already been set.

Sets fragment_abundances object.

calculate_fragment_octaves()[source]

Calculates the octave classes for each fragment. Outputs the calculated richness into the SQL database within a FRAGMENT_OCTAVES table

calculate_fragment_richness(output_metrics=True)[source]

Calculates the fragment richness and stores it in a new table called FRAGMENT_RICHNESS. Also adds the record to BIODIVERSITY METRICS for If the table already exists, it will simply be returned. Each time point and speciation rate combination will be recorded as a new variable.

Parameters:output_metrics (bool) – output to the BIODIVERSITY_METRICS table
calculate_goodness_of_fit()[source]

Calculates the goodness-of-fit measure based on the calculated biodiversity metrics, scaling each metric by the number of individuals involved in the metric.

This requires that import_comparison_data() has already been successfully run.

Note

This doesn’t calculate anything for values which have not yet been written to the BIODIVERSITY_METRICS table. All in-built functions (e.g. calculate_alpha_diversity, calculate_fragment_richness) write to the BIODIVERSITY_METRICS table automatically, so this is only relevant for custom functions.

The resulting value will then be written to the BIODIVERSITY_METRICS table in the SQL database.

calculate_octaves()[source]

Calculates the octave classes for the landscape. Outputs the calculated richness into the SQL database within a FRAGMENT_OCTAVES table.

calculate_octaves_error()[source]

Calculates the error in octaves classes between the simulated data and the comparison data. Stores each error value as a new entry in BIODIVERSITY_METRICS under fragment_octaves. Calculates the error by comparing each octave class and summing the relative difference. Octaves are then averaged for each fragment.

calculate_richness(output_metrics=True)[source]

Calculates the landscape richness from across all fragments and stores result in a new table in SPECIES_RICHNESS Stores a separate result for each community reference.

Parameters:output_metrics (bool) – output to the BIODIVERSITY_METRICS table
calculate_species_distance_similarity(output_metrics=True)[source]

Calculates the probability two individuals are of the same species as a function of distance.

Stores the mean distance between individuals of the same species in the BIODIVERSITY_METRICS table, and stores the full data in new table (SPECIES_DISTANCE_SIMILARITY). Distances are binned to the nearest integer.

Parameters:output_metrics – if true, outputs to the BIODIVERSITY_METRICS table as well, for metric comparison

Note

Extremely slow for large landscape sizes.

check_biodiversity_table_exists()[source]

Checks whether the biodiversity table exists and creates the table if required.

Returns:the max reference value currently existing
clear_calculations()[source]

Removes the BIODIVERSITY_METRICS and FRAGMENT_OCTAVES tables completely.

Note

that this cannot be undone (other than re-running the calculations).

dispersal_parameters()[source]

Reads the dispersal parameters from the database and returns them.

Returns:a dict of the dispersal parameters (dispersal method, sigma, tau, m_probability and cutoff)
downsample(sample_proportion)[source]

Down-samples the individuals by a given proportion globally, and at each location.

The original SPECIES_LIST is stored in a new table called SPECIES_LIST_ORIGINAL and a new SPECIES_LIST object is created containing the down-sampled coalescence tree.

Parameters:sample_proportion (float) – the proportion of individuals to sample at each location
Returns:None
Return type:None
downsample_at_locations(fragment_csv, ignore_errors=False)[source]

Downsamples the SPECIES_LIST object using a fragment csv.

Each row in the csv file should contain the fragment name, x min, y min, x max, y max and the number of individuals per cell in that fragment.

Parameters:
  • fragment_csv – a csv file to use for downsampling individuals
  • ignore_errors – ignore the errors from mismatches in numbers of individuals
Returns:

None

Return type:

None

get_all_fragment_abundances()[source]

Returns the whole table of fragment abundances from the database.

Returns:a list of reference, fragment, species_id, no_individuals
get_alpha_diversity(reference=1)[source]

Gets the system alpha diversity for the provided community reference parameters. Alpha diversity is the mean number of species per fragment. :param reference: the community reference for speciation parameters :return: the alpha diversity of the system

get_alpha_diversity_pd()[source]

Gets the alpha diversity for each set of community parameters.

Returns:all alpha diversity values
Return type:pandas.DataFrame
get_beta_diversity(reference=1)[source]

Gets the system beta diversity for the provided community reference parameters. Beta diversity is the true beta diversity (gamma / alpha). :param reference: the community reference for speciation parameters :return: the beta diversity of the system

get_beta_diversity_pd()[source]

Gets the beta diversity for each set of community parameters.

Returns:all beta diversity values
Return type:pd.DataFrame
get_biodiversity_metrics()[source]

Get calculated biodiversity metrics.

Returns:all biodiversity metrics
Return type:pandas.DataFrame
get_community_parameters(reference=1)[source]

Returns a dictionary containing the parameters for the calculated community.

Parameters:reference – the reference key for the calculated parameters (default is 1)
Returns:dictionary containing the speciation_rate, time, fragments, metacommunity_reference and min/max speciation generation for protracted sims
Return type:dict
get_community_parameters_pd()[source]

Gets all the calculated community parameter sets from the database.

Returns:the community parameters
Return type:pd.DataFrame
get_community_reference(speciation_rate, time, fragments, metacommunity_size=0, metacommunity_speciation_rate=0.0, metacommunity_option=None, external_reference=0, min_speciation_gen=0.0, max_speciation_gen=0.0)[source]

Gets the community reference associated with the supplied community parameters

Raises:

KeyError – if COMMUNITY_PARAMETERS (or METACOMMUNITY_PARAMETERS) does not exist in database or no reference exists for the supplied parameters

Parameters:
  • speciation_rate (float) – the speciation rate of the community
  • time (float) – the time in generations of the community
  • fragments (bool/int) – whether fragments were determined for the community
  • metacommunity_size (int/float) – the metacommunity size
  • metacommunity_speciation_rate (float) – the metacommunity speciation rate
  • metacommunity_option (str) – option used for metacommunity creation
  • external_reference (int) – the metacommunity reference for external metacommunity databases
  • min_speciation_gen (float) – the minimum number of generations required before speciation
  • max_speciation_gen (float) – the maximum number of generations required before speciation
Returns:

the reference associated with this set of simulation parameters

get_community_references()[source]

Gets a list of all the commuity references already calculated for the simulation.

Returns:list of all calculated community references
Return type:list
get_fragment_abundances(fragment, reference)[source]

Gets the species abundances for the supplied fragment and community reference.

Parameters:
  • fragment – the name of the fragment to obtain
  • reference – the reference for speciation parameters to obtain for
Returns:

a list of species ids and abundances

get_fragment_abundances_pd()[source]

Gets the fragment abundances for each set of community parameters.

Returns:the fragment abundances for each associated community reference
Return type:pandas.DataFrame
get_fragment_list(community_reference=1)[source]

Returns a list of all fragments that exist in FRAGMENT_ABUNDANCES.

Parameters:community_reference – community reference to obtain for (default 1)
Returns:list all all fragment names
get_fragment_octaves(fragment=None, reference=None)[source]

Get the pre-calculated octave data for the specified fragment, speciation rate and time. If fragment and speciation_rate are None, returns the entire FRAGMENT_OCTAVES object This requires self.calculate_fragment_octaves() to have been run successfully at some point previously.

Returns are of form [id, fragment, community_reference, octave class, number of species]

Parameters:
  • fragment – the desired fragment (defaults to None)
  • reference – the reference key for the calculated community parameters
Returns:

output from FRAGMENT_OCTAVES for the selected variables

get_fragment_octaves_pd()[source]

Gets the octave classes for each fragment and community parameter set

Returns:all fragment octave classes
Return type:pandas.DataFrame
get_fragment_richness(fragment=None, reference=None)[source]

Gets the fragment richness for each speciation rate and time for the specified simulation.

If the fragment richness has not yet been calculated, it tries to calculate the fragment richness,

Parameters:
  • fragment (str) – the desired fragment (defaults to None)
  • reference (int) – the reference key for the calculated community parameters
Raises:

sqlite3.Error if no table FRAGMENT_ABUNDANCES exists

Raises:

RuntimeError if no data for the specified fragment, speciation rate and time exists.

Returns:

A list containing the fragment richness, or a value of the fragment richness

Return type:

list

get_fragment_richness_pd()[source]

Gets the fragment richness for each set of community parameters.

Returns:the fragment richness for each associated community reference
Return type:pandas.DataFrame
get_goodness_of_fit(reference=1)[source]

Returns the goodness of fit from the file.

Parameters:reference – the community reference to get from
Returns:the full output from the SQL query
Return type:float
get_goodness_of_fit_fragment_octaves(reference=1)[source]

Returns the goodness of fit for fragment octaves from the file.

Note

If more than one metric matches the specified criteria, only the first will be returned.

Raises:ValueError – if BIODIVERSITY_METRICS table does not exist.
Parameters:reference – the community reference number
Returns:the full output from the SQL query
Return type:double
get_goodness_of_fit_fragment_richness(reference=1)[source]

Returns the goodness of fit for fragment richness from the file.

Raises:ValueError – if BIODIVERSITY_METRICS table does not exist.
Parameters:reference – the community reference number
Returns:the full output from the SQL query
Return type:float
get_goodness_of_fit_metric(metric, reference=1)[source]

Gets the goodness-of-fit measure for the specified metric and community reference.

Parameters:
  • metric – the metric goodness of fit has been calculated for to obtain
  • reference – the community reference to fetch fits for
Returns:

the goodness of fit value

Return type:

float

get_job()[source]

Gets the job number (the seed) and the task identifier.

Returns:list containing [seed, task]
get_metacommunity_parameters(reference=1)[source]

Returns a dictionary containing the parameters for the calculated community.

Parameters:

reference – the reference key for the calculated parameters. (default is 1)

Raises:
  • sqlite3.Error – if the METACOMMUNITY_PARAMETERS table does not exist, or some other sqlite error occurs
  • KeyError – if the supplied reference does not exist in the METACOMMUNITY_PARAMETERS table
Returns:

dictionary containing the speciation_rate, metacommunity_size, metacommunity option and metacommunity reference.

Return type:

dict

get_metacommunity_parameters_pd()[source]

Gets all the calculated metacommunity parameter sets from the database.

Returns:the metacommunity parameters
Return type:pd.DataFrame
get_metacommunity_references()[source]

Gets a list of all the metacommuity references already calculated for the simulation.

Note

Returns an empty list and logs an error message if the METACOMMUNITY_PARAMETERS table does not exist.

Returns:list of all calculated metacommunity references
Return type:list
get_number_individuals(fragment=None, community_reference=None)[source]

Gets the number of individuals that exist, either in the provided fragment, or on the whole landscape in one time slice. Counts individuals from FRAGMENT_ABUNDANCES or SPECIES_ABUNDANCES, respectively.

If a community reference is provided, only individuals for that time slice will be counted, otherwise a mean is taken across time slices.

Parameters:
  • fragment – the name of the fragment to get a count of individuals from
  • community_reference – the reference to the community parameters
Returns:

the number of individuals that exists in the desired location

get_octaves(reference)[source]

Get the pre-calculated octave data for the parameters associated with the supplied reference. This will call self.calculate_octaves() if it hasn’t been called previously.

Returns are of form [id, ‘whole’, time, speciation rate, octave class, number of species]

Parameters:reference – community reference which contains the parameters of interest
Returns:output from FRAGMENT_OCTAVES on the whole landscape for the selected variables
get_octaves_pd()[source]

Gets the species octaves for all calculated community parameters

Returns:all octave classes for the whole landscape
Return type:pandas.DataFrame
get_parameter_description(key=None)[source]

Gets the description of the parameter matching the key from those contained in SIMULATION_PARAMETERS

Simply accesses the _parameter_descriptions data stored in parameter_descriptions.json

Returns:string containing the parameter description or a dict containing all values if no key is supplied
Return type:str
get_simulation_parameters(guild=None)[source]

Reads the simulation parameters from the database and returns them.

Returns:a dictionary mapping names to values for seed, task, output_dir, speciation_rate, sigma,

L_value, deme, sample_size, maxtime, dispersal_relative_cost, min_spec, habitat_change_rate, gen_since_historical, time_config, coarse_map vars, fine map vars, sample_file, gridx, gridy, historical coarse map, historical fine map, sim_complete, dispersal_method, m_probability, cutoff, landscape_type, protracted, min_speciation_gen, max_speciation_gen, dispersal_map

get_species_abundances(fragment=None, reference=None)[source]

Gets the species abundance for a particular fragment, speciation rate and time. If fragment is None, returns the whole landscape species abundances.

Parameters:
  • fragment (str) – the fragment to obtain the species abundance of. If None, returns landscape abundances.
  • reference (int) – the commmunity reference to obtain metrics for
Returns:

list of species abundances [reference, species ID, speciation rate, number of individuals, generation]

get_species_abundances_pd()[source]

Gets the species abundances for all community parameter sets.

Returns:all species abundances
Return type:pandas.DataFrame
get_species_distance_similarity(community_reference=1)[source]

Gets the species distance similarity table for the provided community reference.

Returns:list containing the distance, number of similar species with that distance
get_species_list()[source]

Gets the entirety of the SPECIES_LIST table, returning a tuple with an entry for each row. This can be used to construct custom analyses of the coalescence tree.

Note

The species list will be produced in an unprocessed format

Returns:a list of each coalescence and speciation event, with locations, performed in the simulation
Return type:tuple
get_species_locations(community_reference=None)[source]

Gets the list of species locations after coalescence.

If a community reference is provided, will return just the species for that community reference, otherwise returns the whole table

Parameters:community_reference (int) – community reference number
Returns:a list of lists containing each row of the SPECIES_LOCATIONS table
get_species_richness(reference=1)[source]

Get the system richness for the parameters associated with the supplied community reference.

Note

Richness of 0 is returned if there has been some problem; it is assumed that species richness will be above 0 for any simulation.

Note

if species richness has previously been calculated and stored in SPECIES_RICHNESS table, it gets the species richness value from there, otherwise it calculates the species richness

Parameters:reference – community reference which contains the parameters of interest
Returns:either a list containing the community references and respective species richness values OR (if community_reference is provided), the species richness for that community reference.
Return type:int, list
get_species_richness_pd()[source]

Gets the species richness for all calculated parameters from the database.

Returns:all species richness values with their associated community reference
Return type:pandas.DataFrame
get_total_number_individuals()[source]

Gets the total number of individuals that exist in the simulation. :return: the total number of individuals simulated across time slices

import_comparison_data(filename, ignore_mismatch=False)[source]

Imports the SQL database that contains the biodiversity metrics that we want to compare against.

This can either be real data (for comparing simulated data) or other simulated data (for comparing between models).

If the SQL database does not contain the relevant biodiversity metrics, they will be calculated (if possible) or skipped.

The expected form of the database is the same as the BIODIVERSITY_METRICS table, except without any speciation rates or time references, and a new column containing the number of individuals involved in each metric.

Note

This also equalises the comparison data if ignore_mismatch is not True, so that the number of individuals is equal between the simulated and comparison datasets.

Parameters:
  • filename (str) – the file containing the comparison biodiversity metrics.
  • ignore_mismatch (bool) – set to true to ignore abundance mismatches between the comparison and simulated

data.

is_protracted()[source]

Indicates whether the simulation is a protracted simulation or not. This is read from the completed database file.

Returns:boolean, true if the simulation was performed with protracted speciation.
output()[source]

Outputs the coalescence trees to the same simulation database object.

revert_downsample()[source]

Reverts the downsample process by restoring the original SPECIES_LIST table.

Returns:None
Return type:None
sample_fragment_richness(fragment, number_of_individuals, community_reference=1, n=1)[source]

Samples from the database from FRAGMENT_ABUNDANCES, the desired number of individuals.

Randomly selects the desired number of individuals from the database n times and returns the mean richness for the random samples.

Raises:

IOError – if the FRAGMENT_ABUNDANCES table does not exist in the database.

Parameters:
  • fragment – the reference of the fragment to aquire the richness for
  • number_of_individuals – the number of individuals to sample
  • community_reference – the reference for the community parameters
  • n – number of times to repeatedly sample
Returns:

the mean of the richness from the repeats

Return type:

float

sample_landscape_richness(number_of_individuals, n=1, community_reference=1)[source]

Samples from the landscape the required number of individuals, returning the mean of the species richnesses produced.

If number_of_individuals is a dictionary mapping fragment names to numbers sampled, will sample the respective number from each fragment and return the whole landscape richness.

Raises:

KeyError – if the dictionary supplied contains more sampled individuals than exist in a fragment, or if the fragment is not contained within the dictionary.

Parameters:
  • number_of_individuals (int/dict) – either an int containing the number of individuals to be sampled, or a dictionary mapping fragment names to numbers of individuals to be sampled
  • n – the number of repeats to average over
  • community_reference – the community reference to fetch abundances for
Returns:

the mean of the richness from the repeats for the whole landscape

Return type:

float

set_database(filename)[source]

Sets the database to the specified file and opens the sqlite connection.

This must be done before any other operations can be performed and the file must exist.

Raises:IOError – if the simulation is not complete, as analysis can only be performed on complete simulations. However, the database WILL be set before the error is thrown, allowing for analysis of incomplete simulations if the error is handled correctly.
Parameters:filename (pycoalescence.simulation.Simulation/str) – the SQLite database file to import
set_speciation_parameters(speciation_rates, record_spatial=False, record_fragments=False, sample_file=None, times=None, protracted_speciation_min=None, protracted_speciation_max=None, metacommunity_size=None, metacommunity_speciation_rate=None, metacommunity_option=None, metacommunity_reference=None)[source]

Set the parameters for the application of speciation rates. If no config files or time_config files are provided, they will be taken from the main coalescence simulation.

Parameters:
  • speciation_rates (float/list) – a single float, or list of speciation rates to apply
  • str record_spatial (bool,) – a boolean of whether to record spatial data (default=False)
  • str record_fragments (bool,) – either a csv file containing fragment data, or T/F for whether fragments should be calculated from squares of continuous habitat (default=False)
  • sample_file (str) – a sample tif or csv specifying the sampling mask
  • times (list) – a list of times to apply (should have been run with the original simulation)
  • protracted_speciation_min (float) – the minimum number of generations required for speciation to occur
  • protracted_speciation_max (float) – the maximum number of generations before speciation occurs
  • metacommunity_size (float) – the size of the metacommunity to apply
  • metacommunity_speciation_rate (float) – speciation rate for the metacommunity
  • metacommunity_option (str) – either “simulated”, “analytical”, or a path to a database to read SADs from
  • metacommunity_reference (int) – the metacommunity reference if using a database to provide the metacommunity
Return type:

None

speciate_remaining(database)[source]

Speciates the remaining lineages in a paused database.

Parameters:database (str/pycoalescence.simulation.Simulation) – the paused database to open
Return type:None
wipe_data()[source]

Wipes all calculated data apart from the original, unformatted coalescence tree. The Speciation_Counter program will have to be re-run to perform any analyses.

write_all_to_csvs(output_location, file_naming)[source]

Outputs all tables from the database to csvs contained in the provided directory and following the naming structure of the supplied file naming.

Parameters:
  • output_location (str) – the folder to generate files in
  • file_naming – the naming for the output csvs - will be appended with _{table_name}.csv
Returns:

Note

dots and “.csv” extensions are removed from the file_naming output

write_to_csv(output_csv, table_name)[source]

Writes a specified table from the database to the output database.

Parameters:
  • output_csv (str) – path to the csv output location
  • table_name (str) – name of the output table
Returns:

None

Return type:

None

get_parameter_description(key=None)[source]

Gets the parameter descriptions for the supplied key. If the key is None, returns all keys.

Parameters:key – the simulation parameter
Returns:string containing the parameter description or a dict containing all values if no key is supplied
scale_simulation_fit(simulated_value, actual_value, number_individuals, total_individuals)[source]

Calculates goodness of fit for the provided values, and scales based on the total number of individuals that exist. The calculation is 1 - (abs(x - y)/max(x, y)) * n/n_tot for x, y simulated and actual values, n, n_tot for metric and total number of individuals.

Parameters:
  • simulated_value – the simulated value of the metric
  • actual_value – the actual value of the metric
  • number_individuals – the number of individuals this metric relates to
  • total_individuals – the total number of individuals across all sites for this metric
Returns:

the scaled fit value

Additional submodules

All additional modules which are required for package functionality, but are unlikely to be used directly.

dispersal_simulation module

Simulate dispersal kernels on landscapes. Detailed here.

input:
  • Map file to simulate on
  • Set of dispersal pararameters, including the dispersal kernel, number of repetitions and landscape properties
output:
  • Database containing each distance travelled so that metrics can be calculated.
  • A table is created for mean dispersal distance over a single step or for mean distance travelled.
class DispersalSimulation(dispersal_db=None, file=None, logging_level=30)[source]

Bases: pycoalescence.landscape.Landscape

Simulates a dispersal kernel upon a tif file to calculate landscape-level dispersal metrics.

check_base_parameters(number_repeats=None, seed=None, sequential=None, number_workers=None, dispersal=False)[source]

Checks that the parameters have been set properly.

Parameters:
  • number_repeats (int) – the number of times to iterate on the map
  • seed (int) – the random seed
  • sequential (bool) – if true, runs repeats in the dispersal simulation sequentially
  • number_workers (int) – the number of threads (>= 1) launched to run the distance simulation in parallel
Prarm bool dispersal:
 

True iff a dispersal instead of a distance simulation is to be run

Return type:

None

complete_setup()[source]

Completes the setup for the dispersal simulation, including importing the map files and setting the historical maps.

get_all_dispersal(database=None, parameter_reference=1)[source]

Gets all mean dispersal values from the database if run_mean_dispersal has already been run.

Raises:

ValueError if dispersal_database is None and so run_mean_dispersal() has not been run

Raises:

IOError if the output database does not exist

Parameters:
  • database (str) – the database to open
  • parameter_reference (int) – the parameter reference to use (default 1)
Returns:

the dispersal values from the database

get_all_distances(database=None, parameter_reference=1)[source]

Gets all total distances travelled from the database if run_mean_distance_travelled or run_all_distance_travelled or run_sample_distance_travelled has already been run.

Raises:

ValueError if dispersal_database is None and so run_mean_dispersal() has not been run

Raises:

IOError if the output database does not exist

Parameters:
  • database (str) – the database to open
  • parameter_reference (int) – the parameter reference to use (default 1)
Returns:

the dispersal values from the database

get_database_parameters(reference=None)[source]

Gets the dispersal simulation parameters from the dispersal_db

Parameters:reference – the reference to obtain parameters for
Returns:the dispersal simulation parameters
Return type:dict
get_database_references()[source]

Gets the references from the database.

Returns:a list of references from the database
Return type:list
get_distances_map(shape, database=None, parameter_reference=1)[source]

Gets all total distances travelled from the database if run_mean_distance_travelled or run_all_distance_travelled or run_sample_distance_travelled has already been run and puts them inside a numpy matrix

Raises:

ValueError if dispersal_database is None and so run_mean_dispersal() has not been run

Raises:

IOError if the output database does not exist

Raises:

IndexError if the output database contains coordinates outside a matrix with shape=shape

Parameters:
  • int) shape ((int,) – shape of the numpy matrix to return which will contain the distances
  • database (str) – the database to open
  • parameter_reference (int) – the parameter reference to use (default 1)
Returns:

the dispersal values from the database

get_mean_dispersal(database=None, parameter_reference=1)[source]

Gets the mean dispersal for the map if run_mean_dispersal has already been run.

Raises:

ValueError if dispersal_database is None and so run_mean_dispersal() has not been run

Raises:

IOError if the output database does not exist

Parameters:
  • database (str) – the database to open
  • parameter_reference (int) – the parameter reference to use (default 1)).
Returns:

mean dispersal from the database

get_mean_distance_travelled(database=None, parameter_reference=1)[source]

Gets the mean dispersal for the map if run_mean_distance_travelled or run_all_distance_travelled or run_sample_distance_travelled has already been run.

Raises:

ValueError if dispersal_database is None and so test_average_dispersal() has not been run

Raises:

IOError if the output database does not exist

Parameters:
  • database (str) – the database to open
  • parameter_reference (int) – the parameter reference to use (or 1 for default parameter reference).
Returns:

mean of dispersal from the database

get_stdev_dispersal(database=None, parameter_reference=1)[source]

Gets the standard deviation of dispersal for the map if run_mean_dispersal has already been run.

Raises:

ValueError if dispersal_database is None and so test_average_dispersal() has not been run

Raises:

IOError if the output database does not exist

Parameters:
  • database (str) – the database to open
  • parameter_reference (int) – the parameter reference to use (or 1 for default parameter reference).
Returns:

standard deviation of dispersal from the database

get_stdev_distance_travelled(database=None, parameter_reference=1)[source]

Gets the standard deviation of the distance travelled for the map if run_mean_distance_travelled or run_all_distance_travelled or run_sample_distance_travelled has already been run.

Raises:

ValueError if dispersal_database is None and so test_average_dispersal() has not been run

Raises:

IOError if the output database does not exist

Parameters:
  • database (str) – the database to open
  • parameter_reference (int) – the parameter reference to use (or 1 for default parameter reference).
Returns:

standard deviation of dispersal from the database

Return type:

float

run_all_distance_travelled(number_repeats=None, number_steps=None, seed=None, number_workers=None)[source]

Tests the dispersal kernel on all cells on the provided map, producing a database containing the average distance travelled after number_steps have been moved.

Parameters:
  • number_repeats (int) – the number of times to average over for each cell
  • number_steps (int/list) – the number of steps to take each time before recording the distance travelled
  • seed (int) – the random seed
  • number_workers (int) – the number of threads (>= 1) launched to run the distance simulation in parallel
Return type:

None

run_mean_dispersal(number_repeats=None, seed=None, sequential=None)[source]

Tests the dispersal kernel on the provided map, producing a database containing each dispersal distance for analysis purposes.

Note

should be equivalent to run_mean_distance_travelled() with number_steps = 1

Parameters:
  • number_repeats (int) – the number of times to iterate on the map
  • seed (int) – the random seed
  • sequential (bool) – if true, runs repeats sequentially
run_mean_distance_travelled(number_repeats=None, number_steps=None, seed=None, number_workers=None)[source]

Tests the dispersal kernel on the provided map, producing a database containing the average distance travelled after number_steps have been moved.

Note

mean distance travelled with number_steps=1 should be equivalent to running run_mean_dispersal()

Parameters:
  • number_repeats (int) – the number of times to iterate on the map
  • number_steps (int/list) – the number of steps to take each time before recording the distance travelled
  • seed (int) – the random seed
  • number_workers (int) – the number of threads (>= 1) launched to run the distance simulation in parallel
Return type:

None

run_sample_distance_travelled(samples_X, samples_Y, number_repeats=None, number_steps=None, seed=None, number_workers=None)[source]

Tests the dispersal kernel on the sampled cells on the provided map, producing a database containing the average distance travelled after number_steps have been moved.

Parameters:
  • samples_X (list) – list of the integer x coordinates of the sampled cells
  • samples_Y (list) – list of the integer y coordinates of the sampled cells
  • number_repeats (int) – the number of times to average over for each cell
  • number_steps (int/list) – the number of steps to take each time before recording the distance travelled
  • seed (int) – the random seed
  • number_workers (int) – the number of threads (>= 1) launched to run the distance simulation in parallel
Return type:

None

set_dispersal_parameters(dispersal_method='normal', dispersal_file='none', sigma=1, tau=1, m_prob=1, cutoff=100, dispersal_relative_cost=1, restrict_self=False)[source]

Sets the dispersal parameters.

Parameters:
  • dispersal_method (str) – the dispersal method to use (“normal”, “fat-tailed” or “norm-uniform”)
  • dispersal_file (str) – path to the dispersal map file, or none.
  • sigma (float) – the sigma value to use for normal and norm-uniform dispersal
  • tau (float) – the tau value to use for fat-tailed dispersal
  • m_prob (float) – the m_prob to use for norm-uniform dispersal
  • cutoff (float) – the cutoff value to use for norm-uniform dispersal

:param float dispersal_relative_cost:relative dispersal ability through non-habitat :param bol restrict_self: if true, self-dispersal is prohibited

set_map_files(fine_file, sample_file='null', coarse_file=None, historical_fine_file=None, historical_coarse_file=None, deme=1)[source]

Sets the map files.

Uses a null sampling regime, as the sample file should have no effect.

Parameters:
  • fine_file (str) – the fine map file. Defaults to “null” if none provided
  • coarse_file (str) – the coarse map file. Defaults to “none” if none provided
  • historical_fine_file (str) – the historical fine map file. Defaults to “none” if none provided
  • historical_coarse_file (str) – the historical coarse map file. Defaults to “none” if none provided
  • deme (int) – the number of individuals per cell
Return type:

None

set_simulation_parameters(number_repeats=None, output_database='output.db', seed=1, number_workers=1, dispersal_method='normal', landscape_type='closed', sigma=1, tau=1, m_prob=1, cutoff=100, sequential=False, dispersal_relative_cost=1, restrict_self=False, number_steps=1, dispersal_file='none')[source]

Sets the simulation parameters for the dispersal simulations.

Parameters:
  • number_repeats (int) – the number of times to iterate on the map
  • output_database (str) – the path to the output database
  • seed (int) – the random seed
  • number_workers (int) – the number of threads (>= 1) launched to run the distance simulation in parallel
  • dispersal_method (str) – the dispersal method to use (“normal”, “fat-tailed” or “norm-uniform”)
  • landscape_type (str) – the landscape type to use (“infinite”, “tiled_coarse”, “tiled_fine”, “clamped_coarse”, “clamped_fine” or “closed”)
  • sigma (float) – the sigma value to use for normal and norm-uniform dispersal
  • tau (float) – the tau value to use for fat-tailed dispersal
  • m_prob (float) – the m_prob to use for norm-uniform dispersal
  • cutoff (float) – the cutoff value to use for norm-uniform dispersal
  • sequential (bool) – if true, end locations of one dispersal event are used as the start for the next. Otherwise,

a new random cell is chosen :param float dispersal_relative_cost: relative dispersal ability through non-habitat :param bool restrict_self: if true, self-dispersal is prohibited :param list/int number_steps: the number to calculate for mean distance travelled, provided as an int or a list

of ints
Parameters:dispersal_file (str) – path to the dispersal map file, or none.
update_parameters(number_repeats=None, number_steps=None, seed=None, number_workers=None, dispersal_method=None, dispersal_file=None, sigma=None, tau=None, m_prob=None, cutoff=None, dispersal_relative_cost=None, restrict_self=None)[source]

Provides a convenience function for updating all parameters which can be updated.

Parameters:
  • number_repeats (int) – the number of repeats to perform the dispersal simulation for
  • number_steps (list/int) – the number of steps to iterate for in calculating the mean distance travelled
  • seed (int) – the random number seed
  • number_workers (int) – the number of threads (>= 1) launched to run the distance simulation in parallel
  • dispersal_method (str) – the method of dispersal
  • dispersal_file (str) – the dispersal file (alternative to dispersal_method)
  • sigma (float) – the sigma dispersal value
  • tau (float) – the tau dispersal value
  • m_prob (float) – the probability of drawing from a uniform distribution
  • cutoff (float) – the maximum value for the uniform distribution
  • dispersal_relative_cost (float) – the relative cost of moving through non-habitat
  • restrict_self (bool) – if true, prohibits dispersal from the same cell
Return type:

None

fragments module

Generate fragmented landscapes with specific properties. Detailed here.

Contains FragmentedLandscape for creating a fragmented landscape using hexagonal packing and an even spread of individuals between fragments. Requires scipy and matplotlib.

class Fragment(x=None, y=None)[source]

Bases: object

Simple class containing the centres of fragments for a fragmented landscape

place_on_grid()[source]

Changes the x and y positions to integers (always rounds down).

Return type:None
setup(x, y)[source]

Sets up the fragment from the x and y position.

Parameters:
  • x – the x position of the fragment centre
  • y – the y position of the fragment centre
Return type:

None

class FragmentedLandscape(number_fragments=None, size=None, total=None, output_file=None)[source]

Bases: object

Contains hexagonal packing algorithms for spacing clumps evenly on the landscape. Includes a LLoyd’s smoothing algorithm for better spacing of fragments.

Note

Fragments will not be distinct units for unfragmented landscapes (with above around 50% habitat cover).

create(override_smoothing=None, n=10)[source]

Creates the landscape, including running the hexagonal packing and smoothing algorithms (if required).

Note

smoothing is recommended for any landscape that is doesn’t contain a square number of fragments.

Parameters:
  • override_smoothing – if true, overrides the default smoothing settings (enabled for landscapes with fewer than 100000 fragments.
  • n – the number of iterations to run Lloyd’s algorithm for
Return type:

None

fill_grid()[source]

Distributes the sizes evenly between the fragments, generating the actual landscape.

Return type:None
generate(override_smoothing=None, n=10)[source]

Convenience function for creating fragments in one function. Generates the landscape and writes out to the output file.

If smoothing is true, will run Lloyd’s algorithm after the hexagonal packing algorithm to increase the equality of the spacing.

Note

smoothing is recommended for any landscape that is doesn’t contain a square number of fragments.

Parameters:
  • override_smoothing – if true, overrides the default smoothing settings (enabled for landscapes with fewer than 100000 fragments.
  • n – the number of iterations to run Lloyd’s algorithm for
Return type:

None

place_fragments(smoothing=True, n=10)[source]

Places the fragments evenly on the landscape. If smoothing is true, will run Lloyd’s algorithm after the hexagonal packing algorithm to increase the equality of the spacing.

Note

smoothing is recommended for any landscape that is doesn’t contain a square number of fragments.

Parameters:
  • smoothing – if true, runs Lloyd’s algorithm after the hexagonal packing
  • n – the number of iterations to run Lloyd’s algorithm for
Return type:

None

plot()[source]

Returns a matplotlib.pyplot.figure object containing an image of the fragmented landscape (with axes removed).

Requires that the fragmented landscape has been created already using create().

Returns:figure object containing the fragmented landscape.
Return type:matplotlib.pyplot.figure
setup(number_fragments, size, total, output_file)[source]

Sets up the landscape by checking parameters and setting object sizes.

Parameters:
  • number_fragments – the number of individual fragments to exist on the landscape
  • size – the size of the x and y dimensions of the landscape
  • total – the total number of individuals to place on the landscape
  • output_file – the output tif file to write the output to
Return type:

None

write_to_raster()[source]

Writes the landscape to a tif file.

Raises:FileExistsError – if the output file already exists
Parameters:output_file – the path to the tif file to write out to.
Return type:None

fragments config module

Generate the fragment config files from a supplied shapefile and a raster file to offset from.

The function generate_fragment_csv() contains the full pipeline to generate the fragment csv.

class FragmentConfigHandler[source]

Bases: object

Contains routines for calculating the offsets from a config file.

generate_config(input_shapefile, input_raster, field_name='fragment', field_area='area')[source]

Generates the config file from the shapefile containing the fragments, writing the coordinates of the extent of each fragment to the output csv. The coordinates are calculated from their relevant position on the input raster.

Parameters:
  • input_shapefile (str) – shapefile containing the fragments in a “fragments” field, with each defined as a polygon.
  • input_raster (str) – the raster to calculate the coordinates from
  • field_name (str) – optionally provide a field to extract fragment names from
  • field_area (str) – optionally provide a field to extract fragment areas from (the number of individuals that exist in the fragment.
read_csv(input_csv)[source]

Reads the input csv file into the fragments object.

Parameters:input_csv – the csv file to read in
Returns:None
Return type:None
write_csv(output_csv)[source]

Writes the fragments to the output csv.

Parameters:output_csv (str) – the csv to write the output to
generate_fragment_csv(input_shapefile, input_raster, output_csv, field_name='fragment', field_area='area')[source]

Generates the fragment csv from the provided shapefile and raster file. Coordinates for outputted to the csv are calculated from the extent of each polygon in the shapefile as their relative position on the input raster.

The fragment extents are used solely, so overlapping extents of fragments results in individuals in those areas appearing in both fragments. Therefore, rectangular fragments alone should be used.

Important

The input shapefile and raster must have the same projection.

Parameters:
  • input_shapefile – the shapefile containing polygons defining fragments. Should contain fields of field_name and field_area
  • input_raster – raster file to calculate the relative coordinates on
  • output_csv – output csv to create
  • field_name – name of the field in the shapefile to acquire fragment names from
  • field_area – name of the field in the shapefile to acquire the number of individuals from

helper file

Port older simulation outputs to the updated naming conventions. Should not be required by most users.

update_parameter_names(database)[source]

Alters the parameters names of SIMULATION_PARAMETERS in the database so that it matches the updated naming convention.

Provided for back-compatibility with older simulations.

Note

If the simulation does not require updating, this function exits silently.

Parameters:database – the database path to alter the names of
Returns:None
Return type:None

hpc_setup file

Compile necsim with a number of intel compiler optimisations for running on high-performance computing systems.

build_hpc()[source]

Compiles necsim with the flags for optimisation on high-performance intel-based systems. On systems with a global variable containing INTEL_LICENSE_FILE, most of these options will be turned on automatically.

Return type:None

installer file

Compile necsim with default or provided compilation options. Intended for internal usage during pip or conda builds, although manual installation is also possible by running this file from the command line. python installer.py configures the install by detecting system components and compiles the C++ files, if possible. Command line flags can be provided to installer.py to modify the install (see Compilation Options for more information).

class Installer(dist, **kwargs)[source]

Bases: setuptools.command.build_ext.build_ext

Wraps configuration and compilation of C++ code.

autoconf()[source]

Runs the autoconf bash function (assuming that autoconf is available) to create the configure executable.

backup_makefile()[source]

Copies the makefile to a saved folder so that even if the original is overwritten, the last successful compilation can be recorded.

build_extension(ext)[source]

Builds the C++ and Python extension.

clean()[source]

Runs make clean in the NECSim directory to wipe any previous potential compile attempts.

clean_cmake()[source]

Deletes the cmake files and object locations if they exist.

configure(opts=None)[source]

Runs ./configure –opts with the supplied options. This should create the makefile for compilation, otherwise a RuntimeError will be thrown.

Parameters:opts – a list of options to pass to the ./configure call
configure_and_compile(argv=[None], logging_level=20)[source]

Calls the configure script, then runs the compilation.

Parameters:
  • argv – the arguments to pass to configure script
  • logging_level – the logging level to utilise (defaults to INFO).
Return type:

None

copy_makefile()[source]

Copies the backup makefile to the main directory, if it exists. Throws an IOError if no makefile is found.

create_default_depend()[source]

Runs the default makedepend command, outputting dependencies to lib/depends_default.

Used to generate a default dependency file on a system where makedepend exists, for a system where it does not.

do_compile()[source]

Compiles the C++ necsim program by running make. This changes the working directory to wherever the module has been installed for the subprocess call.

get_build_dir()[source]

Gets the build directory.

Returns:the build directory path
get_compilation_flags(display_warnings=False)[source]

Generates the compilation flags for passing to ./configure. :param display_warnings: If true, runs with the -Wall flag for compilation (displaying all warnings). Default is False.

Returns:list of compilation flags.
Return type:list
get_default_cmake_args(output_dir)[source]

Returns the default cmake configure and build arguments.

Parameters:output_dir – the output directory to use
Returns:tuple of two lists, first containing cmake configure arguments, second containing build arguments
Return type:tuple
get_ldflags()[source]

Get the ldflags that Python was compiled with, removing some problematic options.

get_ldshared()[source]

Get the ldshared Python variables and replaces -bundle with -shared for proper compilation.

get_obj_dir()[source]

Gets the obj directory for installing obj files to.

Returns:the obj directory path
make_depend()[source]

Runs make depend in the lib directory to calculate all dependencies for the header and source files.

Note

Fails silently if makedepend is not installed, printing an error to logging.

move_shared_object_file()[source]

Moves the shared object (.so) file to the build directory. :return:

run()[source]

Runs installation and generates the shared object files - entry point for setuptools

run_cmake(src_dir, cmake_args, build_args, tmp_dir, env)[source]

Runs cmake to compile necsim.

Parameters:
  • src_dir – the source directory for necsim .cpp and .h files
  • cmake_args – arguments to pass to the cmake project
  • tmp_dir – the build directory to output cmake files to
  • env – the os.environ (or other environmental variables) to pass on
run_configure(argv=None, logging_level=20, display_warnings=False)[source]

Configures the install for compile options provided via the command line, or with default options if no options exist. Running with -help or -h will display the compilation configurations called from ./configure.

Parameters:
  • argv – the arguments to pass to configure script
  • logging_level – the logging level to utilise (defaults to INFO).
  • display_warnings – If true, runs with the -Wall flag for compilation (displaying all warnings). Default is False.
setuptools_cmake(ext)[source]

Configures cmake for setuptools usage.

Parameters:ext – the extension to build cmake on
use_default_depends()[source]

Uses the default dependencies, copying all contents of depends_default to the end of Makefile.

Note

Zero error-checking is done here as the Makefiles should not change, and the depends_default file should be created using create_default_depend()

get_python_library(python_version)[source]

Get path to the Python library associated with the current Python interpreter.

landscape file

Generate landscapes and check map file combinations. Child class for Simulation and DispersalSimulation. Contains Map objects for each relevant map file internally.

class Landscape[source]

Bases: object

Calculates offsets and dimensions of a selection of tif files making up a landscape.

add_historical_map(fine_file, coarse_file, time, rate=0.0)[source]

Adds an extra map to the list of historical maps.

Parameters:
  • fine_file (str) – the historical fine map file to add
  • coarse_file (str) – the historical coarse map file to add
  • time – the time to add (when the map is accurate)
  • rate – the rate to add (the rate of habitat change at this time)
check_maps()[source]

Checks that the maps all exist and that the file structure makes sense.

Raises:
  • TypeError – if a dispersal map or reproduction map is specified, we must have a fine map specified, but not a coarse map.
  • IOError – if one of the required maps does not exist
Returns:

None

detect_map_dimensions()[source]

Detects all the map dimensions for the provided files (where possible) and sets the respective values. This is intended to be run after set_map_files()

Raises:
  • TypeError – if a dispersal map or reproduction map is specified, we must have a fine map specified, but not a coarse map.
  • IOError – if one of the required maps does not exist
  • ValueError – if the dimensions of the dispersal map do not make sense when used with the fine map provided
Returns:

None

set_map(map_file, x_size=None, y_size=None)[source]

Quick function for setting a single map file for both the sample map and fine map, of dimensions x and y. Sets the sample file to “null” and coarse file and historical files to “none”.

Parameters:
  • map_file (str) – path to the map file
  • x_size (int) – the x dimension, or None to detect automatically from the “.tif” file
  • y_size (int) – the y dimension, or None to detect automatically from the “.tif” file
set_map_files(sample_file, fine_file=None, coarse_file=None, historical_fine_file=None, historical_coarse_file=None)[source]

Sets the map files (or to null, if none specified). It then calls detect_map_dimensions() to correctly read in the specified dimensions.

If sample_file is “null”, dimension values will remain at 0. If coarse_file is “null”, it will default to the size of fine_file with zero offset. If the coarse file is “none”, it will not be used. If the historical fine or coarse files are “none”, they will not be used.

Parameters:
  • sample_file (str) – the sample map file. Provide “null” if on samplemask is required
  • fine_file (str) – the fine map file. Defaults to “null” if none provided
  • coarse_file (str) – the coarse map file. Defaults to “none” if none provided
  • historical_fine_file (str) – the historical fine map file. Defaults to “none” if none provided
  • historical_coarse_file (str) – the historical coarse map file. Defaults to “none” if none provided
Return type:

None

Returns:

None

set_map_parameters(sample_file, sample_x, sample_y, fine_file, fine_x, fine_y, fine_x_offset, fine_y_offset, coarse_file, coarse_x, coarse_y, coarse_x_offset, coarse_y_offset, coarse_scale, historical_fine_map, historical_coarse_map)[source]

Set up the map objects with the required parameters. This is required for csv file usage.

Note that this function is not recommended for tif file usage, as it is much simpler to call set_map_files() and which should automatically calculate map offsets, scaling and dimensions.

Parameters:
  • sample_file – the sample file to use, which should contain a boolean mask of where to sample
  • sample_x – the x dimension of the sample file
  • sample_y – the y dimension of the sample file
  • fine_file – the fine map file to use (must be equal to or larger than the sample file)
  • fine_x – the x dimension of the fine map file
  • fine_y – the y dimension of the fine map file
  • fine_x_offset – the x offset of the fine map file
  • fine_y_offset – the y offset of the fine map file
  • coarse_file – the coarse map file to use (must be equal to or larger than fine map file)
  • coarse_x – the x dimension of the coarse map file
  • coarse_y – the y dimension of the coarse map file
  • coarse_x_offset – the x offset of the coarse map file at the resolution of the fine map
  • coarse_y_offset – the y offset of the coarse map file at the resoultion of the fine map
  • coarse_scale – the relative scale of the coarse map compared to the fine map (must match x and y scaling)
  • historical_fine_map – the historical fine map file to use (must have dimensions equal to fine map)
  • historical_coarse_map – the historical coarse map file to use (must have dimensions equal to coarse map)
sort_historical_maps()[source]

Sorts the historical maps by time.

landscape_metrics file

Calculates landscape-level metrics, including mean distance to nearest-neighbour for each habitat cell and clumpiness.

class LandscapeMetrics(file=None, logging_level=30)[source]

Bases: pycoalescence.map.Map

Calculates the mean nearest-neighbour for cells across a landscape. See here for details.

get_clumpiness()[source]

Calculates the clumpiness metric for the landscape, a measure of how spread out the points are across the landscape. See here for details.

Returns:the CLUMPY metric
Return type:float
get_mnn()[source]

Calculates the mean nearest-neighbour for cells across a landscape. See here for details.

Returns:the mean distance to the nearest neighbour of a cell.
Return type:float

map module

Open tif files and detect properties and data using gdal. Detailed here.

class GdalErrorHandler(logger)[source]

Bases: object

Custom error handler for GDAL warnings and errors.

handler(err_level, err_no, err_msg)[source]
Parameters:
  • err_level – the level at which to log outputs
  • err_no – the error number to use
  • err_msg – the error message
Returns:

class Map(file=None, is_sample=None, logging_level=30)[source]

Bases: object

Contains the file name and the variables associated with this map object.

The internal array of the tif file is stored in self.data, and band 1 of the file can be opened by using open()

Important

Currently, Map does not support skewed rasters (not north/south).

Variables:data – if the map file has been opened, contains the full tif data as a numpy array.
calculate_offset(file_offset)[source]

Calculates the offset of the map object from the supplied file_offset.

The self map should be the smaller

Parameters:file_offset (str/Map) – the path to the file to calculate the offset. Can also be a Map object with the filename contained.
Raises:TypeError – if the spatial reference systems of the two files do not match
Returns:the offset x and y (at the resolution of the file_home) in integers
calculate_scale(file_scaled)[source]

Calculates the scale of map object from the supplied file_scaled.

Parameters:file_scaled (str/Map) – the path to the file to calculate the scale.
Returns:the scale (of the x dimension)
check_map()[source]

Checks that the dimensions for the map have been set and that the map file exists

convert_lat_long(lat, long)[source]

Converts the input latitude and longitude to x, y coordinates on the Map

Parameters:
  • lat – the latitude to obtain the y coordinate of
  • long – the longitude to obtain the x coordinate of
Raises:

IndexError – if the provided coordinates are outside the Map object.

Returns:

[x, y] coordinates on the Map

create(file, bands=1, datatype=<MagicMock name='mock.GDT_Byte' id='139658721481672'>, geotransform=None, projection=None)[source]

Create the file output and writes the data to the output.

Parameters:
  • file (str) – the output file to create
  • bands (int) – optionally provide a number of bands to create
  • datatype (gdal.GDT_Byte) – the databae of the output
  • geotransform (tuple) – optionally provide a geotransform to set for the raster - defaults to (0, 1, 0, 0,

0, -1) :param string projection: optionally provide a projection to set for the raster, in WKT format

create_copy(dst_file, src_file=None)[source]

Creates a file copying projection and other attributes over from the desired copy

Parameters:
  • dst_file – existing file to create
  • src_file – the source file to copy from
get_band_number()[source]

Gets the number of raster bands in the file.

Return type:int
Returns:the number of bands in the raster
get_cached_subset(x_offset, y_offset, x_size, y_size)[source]

Gets a subset of the map file, BUT rounds all numbers to integers to save RAM and keeps the entire array in memory to speed up fetches.

Parameters:
  • x_offset (int) – the x offset from the top left corner of the map
  • y_offset (int) – the y offset from the top left corner of the map
  • x_size (int) – the x size of the subset to obtain
  • y_size (int) – the y size of the subset to obtain
Returns:

a numpy array containing the subsetted data

get_dataset(file=None, permissions=<MagicMock name='mock.GA_Update' id='139658721469216'>)[source]

Gets the dataset from the file.

Parameters:
  • file (str) – path to the file to open
  • permissions (int) – the gdal permission reference to open the dataset
Raises:
  • ImportError – if the gdal module has not been imported correctly
  • IOError – if the supplied filename is not a tif or vrt
  • IOError – if the map does not exist
Returns:

an opened dataset object

get_dimensions()[source]

Calls read_dimensions() if dimensions have not been read, or reads stored information.

Returns:a list containing [0] x, [1] y, [2] x offset, [3] y offset, [4] x resolution, [5] y resolution, [6] upper left x, [7] upper left y
get_dtype(band_no=None)[source]

Gets the data type of the provided band number

Parameters:band_no – band number to obtain the data type of
Return type:int
Returns:the gdal data type number in the raster file
get_extent()[source]

Gets the min and max x, and min and max y values, including accounting for skew :return: list of the x min, x max, y min, y max values. :rtype: list

get_geo_transform()[source]

Gets the geotransform of the file.

Returns:list containing the geotransform parameters
get_no_data(band_no=None)[source]

Gets the no data value for the tif map.

Parameters:band_no – the band number to obtain the no data value from
Returns:the no data value
Return type:float
get_projection()[source]

Gets the projection of the map.

Returns:the projection object of the map in WKT format
Return type:str
get_subset(x_offset, y_offset, x_size, y_size, no_data_value=None)[source]

Gets a subset of the map file

Parameters:
  • x_offset (int) – the x offset from the top left corner of the map
  • y_offset (int) – the y offset from the top left corner of the map
  • x_size (int) – the x size of the subset to obtain
  • y_size (int) – the y size of the subset to obtain
  • no_data_value (float/int) – optionally provide a value to replace all no data values with.
Returns:

a numpy array containing the subsetted data

get_x_y()[source]

Simply returns the x and y dimension of the file.

Returns:the x and y dimensions
has_equal_dimensions(equal_map)[source]

Checks if the supplied Map has equal dimensions to this Map.

Note

Dimension matching uses an absolute value (0.0001) for latitude/longitude, and relative value for pixel resolution. The map sizes must fit perfectly.

Parameters:equal_map (Map) – the Map object to check if dimensions match
Returns:true if the dimensions match, false otherwise
Return type:bool
is_within(outside_map)[source]

Checks if the object is within the provided Map object.

Note

Uses the extents of the raster file for checking location, ignoring any offsetting

Parameters:outside_map (Map) – the Map object to check if this class is within
Returns:true if this Map is entirely within the supplied Map
Return type:bool
map_exists(file=None)[source]

Checks if the output (or provided file) exists.

If file is provided, self.file_name is set to file.

Parameters:file – optionally, the file to check exists
Returns:true if the output file does exist
Rtype bool:
open(file=None, band_no=1)[source]

Reads the raster file from memory into the data object. This allows direct access to the internal numpy array using the data object.

Parameters:
  • file (str) – path to file to open (or None to use self.file_name
  • band_no (int) – the band number to read from
Return type:

None

plot()[source]

Returns a matplotlib.pyplot.figure object containing an image of the fragmented landscape (with axes removed).

Requires that the fragmented landscape has been created already using create().

Returns:figure object containing the fragmented landscape.
Return type:matplotlib.pyplot.figure
rasterise(shape_file, raster_file=None, x_res=None, y_res=None, output_srs=None, geo_transform=None, field=None, burn_val=None, data_type=<MagicMock name='mock.GDT_Float32' id='139658721439984'>, attribute_filter=None, x_buffer=None, y_buffer=None, extent=None, **kwargs)[source]

Rasterises the provided shape file to produce the output raster.

If x_res or y_res are not provided, self.x_res and self.y_res will be used.

If a field is provided, the value in that field will become the value in the raster.

If a geo_transform is provided, it overrides the x_res, y_res, x_buffer and y_buffer.

Parameters:shape_file (str/os.path) – path to the .shp vector file to rasterise, or an ogr.DataSource object contain

the shape file :param str/os.path raster_file: path to the output raster file (should not already exist) :param int/float x_res: the x resolution of the output raster :param int/float y_res: the y resolution of the output raster :param str/osr.SpatialReference output_srs: optionally define the output projection of the raster file :param list/tuple geo_transform: optionally define the geotransform of the raster file (cannot use resolution or

buffer arguments with this option)
Parameters:
  • field (str) – the field to set as raster values
  • burn_val (list/int) – the r,g,b value to use if there is no field for the location
  • data_type (int) – the gdal type for output data
  • attribute_filter (str) – optionally provide a filter to extract features by, of the form “field=fieldval”
  • x_buffer (int/float) – number of extra pixels to include at left and right sides
  • y_buffer (int/float) – number of extra pixels to include at top and bottom
  • extent (list) – list containing the new extent, provided as [ulx, lrx, uly, lry] (output from get_extent())
  • kwargs – additional options to provide to gdal.RasterizeLayer
Raises:
  • IOError – if the shape file does not exist
  • IOError – if the output raster already exists
  • ValueError – if the provided shape_file is not a .shp file
  • RuntimeError – if gdal throws an error during rasterisation
Return type:

None

read_dimensions()[source]

Return a list containing the geospatial coordinate system for the file.

Returns:a list containing [0] x, [1] y, [2] upper left x, [3] upper left y, [4] x resolution, [5] y resolution
reproject_raster(dest_projection=None, source_file=None, dest_file=None, x_scalar=1.0, y_scalar=1.0, resample_algorithm=<MagicMock name='mock.GRA_NearestNeighbour' id='139658721490088'>, warp_memory_limit=0.0)[source]

Re-writes the file with a new projection.

Note

Writes to an in-memory file which then overwrites the original file, unless dest_file is not None.

Parameters:
  • dest_projection (str/os.path) – the destination file projection, can only be None if rescaling
  • source_file (str/os.path) – optionally provide a file name to reproject. Defaults to self.file_name
  • dest_file (str/os.path) – the destination file to output to (if None, overwrites original file)
  • x_scalar (float) – multiplier to change the x resolution by, defaults to 1
  • y_scalar (float) – multiplier to change the y resolution by, defaults to 1
  • resample_algorithm (gdal.GRA) – should be one of the gdal.GRA algorithms
  • warp_memory_limit (float) – optionally provide a memory cache limit (uses default if 0.0)
set_dimensions(file_name=None, x_size=None, y_size=None, x_offset=None, y_offset=None)[source]

Sets the dimensions and file for the Map object

Parameters:
  • file_name (str/pycoalescence.Map) – the location of the map object (a csv or tif file). If None, required that file_name is already provided.
  • x_size (int) – the x dimension
  • y_size (int) – the y dimension
  • x_offset (int) – the x offset from the north-west corner
  • y_offset (int) – the y offset from the north-west corner
Returns:

None

set_sample(is_sample)[source]

Set the is_sample attribute to true if this is a sample mask rather than an offset map

Parameters:is_sample (bool) – indicates this is a sample mask rather than offset map
translate(dest_file, source_file=None, **kwargs)[source]

Translates the provided source file to the output file, given a set of options to pass to gdal.Translate()

Parameters:
  • dest_file (str) – the destination file to create
  • source_file (str) – the source file to translate, or None to translate this file
  • kwargs – additional keywords to pass to gdal.Translate()
Return type:

None

write(file=None, band_no=None)[source]

Writes the array in self.data to the output array. The output file must exist, and the array will be overridden in the band. Intended for writing changes to the same file the data was read from.

Parameters:
  • file – the path to the file to write to
  • band_no – the band number to write into

:rtype None

write_subset(array, x_off, y_off)[source]

Writes over a subset of the array to file. The size of the overwritten area is detected from the inputted array, and the offsets describe the location in the output map to overwrite.

The output map must file must exist and be larger than the array.

Parameters:
  • array (numpy.ndarray) – the array to write out
  • x_off (int) – the x offset to begin writing out from
  • y_off (int) – the y offset to begin writing out from
Return type:

None

zero_offsets()[source]

Sets the x and y offsets to 0

shapefile_from_wkt(wkts, dest_file, EPSG=4326, fields=None)[source]

Generates a shape file from a WKT string.

Parameters:
  • wkts – a list of well-known text polygons to create in the shapefile
  • dest_file – a destination file to create
  • EPSG – the EPSG to use for the spatial referencing
  • fields – list of dictionaries containing fields to add to the geometries
Return type:

None

merger module

Combine simulation outputs from separate guilds. Detailed here.

Merger will output a single database file, merging the various biodiversity tables into one.

Metrics are also calculated for the entire system, with a guild reference of 0.

All standard routines provided in CoalescenceTree can then be performed on the combined database.

class Merger(database=None, logging_level=30, log_output=None, expected=False)[source]

Bases: pycoalescence.coalescence_tree.CoalescenceTree

Merges simulation outputs into a single database. Inherits from CoalescenceTree to provide all routines in the same object.

add_simulation(input_simulation)[source]

Adds a simulation to the list of merged simulations.

This also calls the relevant merges for the tables that exist in the provided database.

Parameters:input_simulation – either the path to the input simulation, a Coalescence class object, or a CoalescenceTree object which contains the completed simulation.
Returns:None
Return type:None
add_simulations(simulation_list)[source]

A convenience function that adds each simulation from the list of simulations provided and then writes to the database.

Parameters:simulation_list – list of paths to completed simulations
apply()[source]

Generates the cooalescence tree for the set of speciation parameters. This must be run after the main coalescence simulations are complete. It will create additional fields and tables in the SQLite database which contains the requested data.

apply_incremental()[source]

Generates the coalescence tree for the set of speciation parameters. Does not write changes to the database, just holds the changes internally.

generate_guild_tables()[source]

Generates a set of tables containing the biodiversity metrics for each guild.

Return type:None
get_added_simulations()[source]

Gets the simulations which have already been added to the database.

Returns:dictionary of simulations and guild numbers
Return type:dict
output()[source]

Outputs the coalescence trees to the same simulation database object.

set_database(filename, expected=False)[source]

Sets the output database for the merged simulations

Assumes no database currently exists, and will create one.

Raises:

IOError – if the output database already exists

Parameters:
  • filename – the filename to output merged simulations into
  • expected – if true, expects the output to exist
Return type:

None

write()[source]

Writes out all stored simulation parameters to the output database and wipes the in-memory objects.

This should be called after all simulation have been added, or when RAM usage gets too large for large simulations

patched_landscape module

Generate landscapes of interconnected patches for simulating within a spatially explicit neutral model. Detailed here.

Dispersal probabilities are defined between different patches, and each patch will be contain n individuals.

class Patch(id, density)[source]

Bases: object

Contains a single patch, to which the probability of dispersal to every other patch can be added.

add_patch(patch, probability)[source]

Adds dispersal from this patch to another patch object with a set probability. The patch should not already have been added.

Note

The probabilities can be relative, as they can be re-scaled to sum to 1 using re_scale_probabilities().

Raises:
  • KeyError – if the patch already exists in the dispersal probabilities.
  • ValueError – if the dispersal probability is less than 0.
Parameters:
  • patch – the patch id to disperse to
  • probability – the probability of dispersal
re_scale_probabilities()[source]

Re-scales the probabilities so that they sum to 1. Also checks to make sure dispersal from within this patch is defined.

Raises:ValueError – if the self dispersal probability has not been defined, or the dispersal probabilities do not sum to > 0.
class PatchedLandscape(output_fine_map, output_dispersal_map)[source]

Bases: object

Landscape made up of a list of patches with dispersal probabilities to each other.

add_dispersal(source_patch, target_patch, dispersal_probability)[source]

Adds a dispersal probability from the source patch to the target patch.

Note

Both the source and target patch should already have been added using add_patch().

Parameters:
  • source_patch – the id of the source patch
  • target_patch – the id of the target patch
  • dispersal_probability – the probability of dispersal from source to target
add_patch(id, density, self_dispersal=None, dispersal_probabilities=None)[source]

Add a patch with the given parameters.

Parameters:
  • id – the unique reference for the patch
  • density – the number of individuals that exist in the patch
  • self_dispersal – the relative probability of dispersal from within the same patch
  • dispersal_probabilities – dictionary containing all other patches and their relative dispersal probabilities
generate_files()[source]

Re-scales the dispersal probabilities and generates the patches landscape files. These include the fine map file containing the densities and the dispersal probability map.

The fine map file will be dimensions 1xN where N is the number of patches in the landscape.
The dispersal probability map will be dimensions NxN, where dispersal occurs from the y index cell to the x index cell.
generate_fragment_csv(fragment_csv)[source]

Generates a fragment csv for usage within a coalescence simulation, with each patch becomming one fragment on the landscape.

Parameters:fragment_csv – the path to the output csv to create
Raises:IOError – if the output fragment csv already exists
generate_from_matrix(density_matrix, dispersal_matrix)[source]

Generates the patched landscape from the input matrix and writes out to the files.

Note

Uses a slightly inefficient method of generating the full patched landscape, and then writing back out to the map files so that full error-checking is included. A more efficient implementation is possible by simply writing the matrix to file using the Map class.

Note

The generated density map will have dimensions 1 by xy (where x, y are the dimensions of the original density matrix. However, the dispersal matrix should still be compatible with the original density matrix as a x by y tif file.

Parameters:
  • density_matrix – a numpy matrix containing the density probabilities
  • dispersal_matrix – a numpy matrix containing the dispersal probabilities
has_patch(id)[source]

Checks if the patches object already contains a patch with the provided id.

Parameters:id – id to check for in patches
Returns:true if the patch already exists
convert_index_to_x_y(index, dim)[source]

Converts an index to an x, y coordinate.

Used when mapping from 1-D space to 2-D space.

Parameters:
  • index – the index to convert from
  • dim – the x dimension of the matrix
Returns:

a tuple of integers containing the x and y coordinates

Return type:

tuple

spatial_algorithms file

Simple spatial algorithms required for package functionality.

Algorithms include generation of Voronoi diagrams and spacing points on a landscape using Lloyd’s algorithm.

archimedes_spiral(centre_x, centre_y, radius, theta)[source]

Gets the x, y coordinates on a spiral, given a radius and theta

Parameters:
  • centre_x (int) – the x coordinate of the centre of the spiral
  • centre_y (int) – the y coordinate of the centre of the spiral
  • radius (float) – the distance from the centre of the spiral
  • theta (float) – the angle of rotation
Returns:

tuple of x and y coordinates

Return type:

tuple

calculate_centre_of_mass(points_list)[source]

Calculates the centre of mass for the non-intersecting polygon defined by points_list.

Note

the centre of mass will be incorrect for intersecting polygons.

Note

it is assumed that points_list defines, in order, the vertices of the polygon. The last point is assumed to connect to the first point.

Parameters:points_list – a list of x, y points defining the non-intersecting polygon
Returns:the x,y centre of mass
calculate_distance_between(x1, y1, x2, y2)[source]

Calculates the distance between the points (x1, y1) and (x2, y2)

Note

Returns the absolute value

Parameters:
  • x1 – x coordinate of the first point
  • y1 – y coordinate of the first point
  • x2 – x coordinate of the second point
  • y2 – y coordinate of the second point
Returns:

the absolute distance between the points

convert_coordinates(x, y, input_srs, output_srs)[source]

Converts the coordinates from the input srs to the output srs.

Parameters:
  • x – the x coordinate to transform
  • y – the y coordinate to transform
  • input_srs – the input srs to transform from
  • output_srs – the output srs to transform to
Return type:

list

Returns:

transformed [x, y] coordinates

estimate_sigma_from_distance(distance, n)[source]

Estimates the sigma value from a rayleigh distribution (2-d normal) from a total distance travelled in n steps.

Parameters:
  • distance (float) – the total distance travelled
  • n (int) – the number of steps
Returns:

an estimation of the sigma value required to generate the distance travelled in n steps

lloyds_algorithm(points_list, maxima, n=7)[source]

Equally spaces the points in the given landscape defined by (0, x_max), (0, y_max) using Lloyd’s algorithm.

Algorthim is:

  • Reflect the points at x=0, x=x_max, y=0 and y=y_max to make boundaries of the Voronoi diagram on the original
set of points have finite edges
  • Define the Voronoi diagram separating the points
  • Find the centres of the regions of the voronoi diagram for our original set of points
  • Move the our points to the centres of their voronoi regions
  • Repeat n times (for convergence)
  • Edits the points_list to contain the equally-spaced points

Note

all points are assumed to be in the range x in (0, x_max) and y in (0, y_max)

Parameters:
  • points_list – a list of points to be equally spaced in the landscape
  • maxima – the maximum size of the landscape to space out within
  • n – the number of iterations to perform Lloyd’s algorthim for.

:return list containing the new point centres.

reflect_dimensions(points, maximums)[source]

Reflects the provided points across x=0, y=0, x=x_max and y=y_max (essentially tiling the polygon 4 times, around the original polygon).

Parameters:
  • points (list) – a list of 2-d points to reflect
  • maximums (tuple) – tuple containing the x and y maximums
Returns:

a list of reflected points

sqlite_connection file

Safely open, close and fetch data from an sqlite connection.

SQLiteConnection contains context management for opening sql connections, plus basic functionality for detecting existence and structure of databases.

class SQLiteConnection(filename)[source]

Bases: object

Class containing context management for opening sqlite3 connections. The file name provided can either be a string containing the path to the file, or an sqlite3.Connection object, which will NOT be closed on destruction. This provides two points of entry to the system with the same interface.

check_sql_column_exists(database, table_name, column_name)[source]

Checks if the column exists in the database.

Parameters:
  • database (str/sqlite3.Connection) – the database to check existence in
  • table_name (str) – the table name to check within
  • column_name (str) – the column name to check for
Returns:

true if the column exists.

Return type:

bool

check_sql_table_exist(database, table_name)[source]

Checks that the supplied table exists in the supplied database.

Parameters:
  • database (str/sqlite3.Connection) – the database to check existence in
  • table_name (str) – the table name to check for
Returns:

true if the table exists

Return type:

bool

fetch_table_from_sql(database, table_name, column_names=False)[source]

Returns a list of the data contained by the provided table in the database.

Raises:

sqlite3.Error – if the table is not contained in the database (protects SQL injections).

Parameters:
  • database (str/sqlite3.Connection) – the database to obtain from
  • table_name (str) – the table name to fetch data from
  • column_names (bool) – if true, return the column names as the first row in the output
Returns:

a list of lists, containing all data within the provided table in the database

get_table_names(database)[source]

Gets a list of all table names in the database.

Parameters:database (str/sqlite3.Connection) – the path to the database connection or an already-open database object
Returns:a list of all table names from the database
Return type:list
sql_get_max_from_column(database, table_name, column_name)[source]

Returns the maximum value from the specified column.

Parameters:
  • database (str/sqlite3.Connection) – the database to fetch from
  • table_name (str) – the table name to attain
  • column_name (str) – the column name to obtain from
Returns:

system_operations file

Basic system-level operations required for package functionality, including subprocess calls, logging methods and file management.

The functions are contained here as they are required by many different modules. Note that logging will not raise an exception if there has been no call to set_logging_method()

cantor_pairing(x1, x2)[source]

Creates a unique integer from the two provided positive integers.

Maps ZxZ -> N, so only relevant for positive numbers. For any A and B, generates C such that no D and E produce C unless D=A and B=E.

Assigns consecutive numbers to points along diagonals of a plane

Parameters:
  • x1 – the first number
  • x2 – the second number
Returns:

a unique reference combining the two integers

check_file_exists(file_name)[source]

Checks that the specified filename exists, if it is not “null” or “none”.

Parameters:file_name – file path to check for
Returns:None
Raises:IOError if no file exists
check_parent(file_path)[source]

Checks if the parent file exists, and creates it if it doesn’t.

Note

if file_path is a directory (ends with a “/”), it will be created

Parameters:file_path – the file or directory to check if the parent exists
Return type:None
create_logger(logger, file=None, logging_level=30, **kwargs)[source]

Creates a logger object to be assigned to NECSim sims and dispersal tests.

Parameters:
  • logger – the logger to alter
  • file – the file to write out to, defaults to None, writing to terminal
  • logging_level – the logging level to write out at (defaults to INFO)
  • kwargs – optionally provide additional arguments for logging to
Returns:

elegant_pairing(x1, x2)[source]

A more elegant version of cantor pairing, which allows for storing of a greater number of digits without experiencing integer overflow issues.

Cantor pairing assigns consecutive numbers to points along diagonals of a plane

Parameters:
  • x1 – the first number
  • x2 – the second number
Returns:

a unique reference combining the two integers.

execute(cmd, silent=False, **kwargs)[source]

Calls the command using subprocess and yields the running output for printing to terminal. Any errors produced by subprocess call will be redirected to logging.warning() after the subprocess call is complete.

Parameters:
  • cmd – the command to execute using subprocess.Popen()
  • silent – if true, does not log any warnings

:return a line from the execution output

execute_log_info(cmd, **kwargs)[source]

Calls execute() with the supplied command and keyword arguments, and redirects stdout to the logging object.

Parameters:
  • cmd – the command to execute using subprocess.Popen()
  • kwargs – keyword arguments to be passed to subprocess.Popen()
Returns:

None

Return type:

None

execute_silent(cmd, **kwargs)[source]

Calls execute() silently with the supplied command and keyword arguments.

Note

If this function fails, no error will be thrown due to its silent nature, unless a full failure occurs.

Parameters:
  • cmd – the command to execute using subprocess.Popen()
  • kwargs – keyword arguments to be passed to subprocess.Popen()
Returns:

None

Return type:

None

set_logging_method(logging_level=20, output=None, **kwargs)[source]

Initiates the logging method.

Parameters:
  • logging_level – the detail in logging output: can be one of logging.INFO (default), logging.WARNING, logging.DEBUG, logging.ERROR or logging.CRITICAL
  • output – the output logfile (or None to redirect to terminal via stdout)
  • kwargs – additional arguments to pass to the logging.basicConfig() call
Returns:

None

write_to_log(i, message, logger)[source]

Writes the message to the provided logger, at the provided level.

This is used by necsim to access to logging module more easily.

Parameters:
  • i (int) – the level to log at (10: debug, 20: info, 30: warning, 40: error, 50: critical)
  • message (str) – the message to write to the logger.
  • logger (logging.Logger) –
Return type:

None