pycoalescence package¶

pycoalescence provides the facilities for running spatially explicit neutral coalescence ecological simulations and performing basic analysis of the simulation outputs. The program requires necsim to function properly.

Key submodules
- simulation module
- coalescence_tree module
Additional submodules

Key submodules ¶

These are the most important modules for running and analysing spatially explicit neutral models, and are most likely to be used directly.

simulation module ¶

Run spatially explicit neutral simulations on provided landscapes with support for a wide range of scenarios and parameters. Detailed here.

The main class is Simulation, which contains routines for setting up and running simulations, plus basic tree generation after simulations have been completed.

input:	Simulation parameters (such as dispersal kernel, speciation rate) Map files representing density over space [optional] map files representing relative reproductive ability [optional] map files representing dispersal potential [optional] historical density map files
output:	Database containing generated coalescence tree, simulation parameters and basic biodiversity metrics. If the simulation does not complete, will instead dump data to a Dump_main__.csv file for resuming simulations.

class Simulation(logging_level=30, log_output=None, **kwargs)[source]¶

Bases: pycoalescence.landscape.Landscape

A class containing routines to set up and run simulations, including detecting map dimensions from tif files.

add_death_map(death_map)[source]¶

Adds a death map to the simulation.

Parameters:	death_map (str/pycoalescence.Map) – the death map to import
Return type:	None

add_dispersal_map(dispersal_map)[source]¶

Adds a dispersal map to the simulation.

Parameters:	dispersal_map (str/pycoalescence.Map) – the dispersal map to import
Return type:	None

add_gillespie(generations=0.0)[source]¶

Uses the Gillespie algorithm from the given number of generations into the simulation. :param generations: the number of generations at which to use gillespie.

Return type:	None
Returns:	None

add_reproduction_map(reproduction_map)[source]¶

Adds a death map to the simulation.

Parameters:	reproduction_map (str/pycoalescence.Map) – the death map to import
Return type:	None

add_sample_time(time)[source]¶

Adds an extra sample time to the list of times.

This allows for multiple temporal sample points from within the same simulation.

Parameters:	time – the sample time to add

apply_speciation_rates(speciation_rates=None)[source]¶

Applies the speciation rates to the coalescence tree and outputs to the database.

Parameters:	speciation_rates – a list of speciation rates to apply
Return type:	None

calculate_sql_database()[source]¶: Saves the output database location to self.output_database.

check_can_use_gillespie()[source]¶

Checks if the simulation can use the Gillespie algorithm.

Returns:	true if the simulation can use Gillespie
Return type:	bool

check_death_map()[source]¶: Checks that the death map dimensions match the fine map dimensions.

check_dimensions_match_fine(map_to_check, name='')[source]¶

Checks that the dimensions of the provided map matches the fine map.

Parameters:	map_to_check (Map) – map to check the dimensions of against the fine map name (str) – name to write out in error message
Returns:	true if the dimensions match

check_dispersal_map()[source]¶: Checks that the dispersal map dimensions match the fine map dimensions.

check_file_parameters(expected=False)[source]¶

Check that all the required files exist for the simulation and the output doesn’t already exist.

Parameters:	expected – set to true if we expect the output file to already exist
Raises:	RuntimeError – if previous set-up routines are not complete

check_maps()[source]¶

Checks that the maps all exist and that the file structure makes sense.

Raises:	ValueError – if the grid is not within the sample map.
Returns:	None

check_reproduction_map()[source]¶: Checks that the reproduction map dimensions match the fine map dimensions.

check_sample_map_equals_sample_grid()[source]¶

Checks if the grid and sample map are the same size and offset (in which case, future operations can be simplified).

Returns:	true if the grid and sample map dimensions and offsets are equal
Return type:	bool

check_simulation_parameters(expected=False, ignore_errors=False)[source]¶

Checks that simulation parameters have been correctly set and the program is ready for running.

Parameters:	ignore_errors – if true, any FileNotFoundError and FileExistsError raised by checking the output database are ignored expected – set to true if we expect the output file to exist

check_sql_database(expected=False)[source]¶

Checks whether the output database exists. If the existance does not match the expected variable, raises an error.

Raises:	FileExistsError – if the file already exists when it’s not expected to FileNotExistsError – if the file does not exist when we expect it to
Parameters:	expected – boolean for expected existance of the output file
Return type:	None

count_individuals()[source]¶

Estimates the number of individuals to be simulated. This may be inaccurate if using multiple time points and historical maps.

Returns:	a count of the number of individuals to be simulated
Return type:	float

create_config(output_file=None)[source]¶

Generates the configuration. This will be written out either by providing an output file here, or by calling write_config();

Parameters:	output_file (str) – the file to generate the config option. Must be a path to a .txt file.

create_map_config(output_file=None)[source]¶

Generates the map config file from reading the spatial structure of each of the provided files.

Parameters:	output_file (str) – the file to output configuration data to (the map config file)

create_temporal_sampling_config()[source]¶

Creates the time-sampling config file.

Function is called automatically when creating a config file, and should not be manually called.

detect_map_dimensions()[source]¶

Detects all the map dimensions for the provided files (where possible) and sets the respective values. This is intended to be run after set_map_files()

Returns:	None

finalise_setup(expected=False, ignore_errors=False)[source]¶

Runs all setup routines to provide a complete simulation. Should be called immediately before run_coalescence() to ensure the simulation setup is complete.

Parameters:	ignore_errors – if true, any FileNotFoundError and FileExistsError raised by checking the output database are ignored expected – set to true if we expect the output file to exist

get_average_density()[source]¶: Gets the average density across the fine map, subsetted for the sample grid.

get_optimised_solution()[source]¶

Gets the optimised solution as a dictionary containing the important optimised variables. This can be read back in with set_optimised_solution

Returns:	dict containing the important optimised variables
Return type:	dict

get_protracted()[source]¶: Gets whether the simulation pointed to by this object is a protracted simulation or not.

get_species_richness(reference=1)[source]¶

Calls coal_analyse.get_species_richness() with the supplied variables.

Requires successful import of coal_analyse and sqlite3.

Parameters:	reference (int) – the community reference to obtain the metrics for.
Returns:	the species richness.

grid_density_actual(x_off, y_off, x_dim, y_dim)[source]¶

Counts the density total for a subset of the grid by sampling from the fine map.

Note that for large maps this can take a very long time.

Parameters:	x_off – the x offset of the grid map subset y_off – the y offset of the grid map subset x_dim – the x dimension of the grid map subset y_dim – the y dimension of the grid map subset
Returns:	the total individuals that exist in the subset.
Return type:	int

grid_density_estimate(x_off, y_off, x_dim, y_dim)[source]¶

Counts the density total for a subset of the grid by sampling from the fine map

Note:	This is an approximation (based on the average density of the fine map) and does not produce a perfect value. This is done for performance reasons. The actual value can be obtained with grid_density_actual().
Parameters:	x_off – the x offset of the grid map subset y_off – the y offset of the grid map subset x_dim – the x dimension of the grid map subset y_dim – the y dimension of the grid map subset
Returns:	an estimate of the total individuals that exist in the subset.
Return type:	int

import_fine_map_array()[source]¶

Imports the fine map array to the in-memory object, subsetted to the same size as the sample grid.

Return type:	None

import_sample_map_array()[source]¶

Imports the sample map array to the in-memory object.

Return type:	None

load_config(config_file)[source]¶

Loads the config file by reading the lines in order.

Parameters:	config_file (str) – the config file to read in.

optimise_ram(ram_limit)[source]¶

Optimises the maps for a specific RAM usage.

If ram_limit is None, this function does nothing.

Note:	Assumes that the C++ compiler has sizeof(long) = 8 bytes for calculating space usage.
Note:	Only optimises RAM for a square area of the map. For rectangular shapes, will use the shortest length as a maximum size.
Parameters:	ram_limit – the desired amount of RAM to limit to, in GB
Raises:	MemoryError – if the desired simulation cannot be compressed into available RAM

persistent_ram_usage()[source]¶

This is the persistent RAM usage which cannot be optimised by the program for a particular set of maps

Returns:	the total persistent RAM usage in bytes

resume_coalescence(pause_directory, seed, job_type, max_time, out_directory=None, protracted=None, spatial=None)[source]¶

Resumes the simulation from the specified directory, looking for the simulation with the specified seed and task referencing.

Parameters:	pause_directory – the directory to search for the paused simulation seed – the seed of the paused simulation job_type – the task of the paused simulation max_time – the maximum time to run simulations for out_directory – optionally provide an alternative output location. Defaults to same location as

pause_directory :param bool protracted: protractedness of the simulation :param bool spatial: if the simulation is to be run with spatial complexity

Returns:	None

run()[source]¶

Convenience function which completes setup, runs the simulation and calculates the coalescence tree for the set speciation rates in one step.

Return type:	None

run_coalescence()[source]¶

Attempt to run the simulation with the given simulation set-up. This is the main routine performing the actual simulation which will take a considerable amount of time.

Returns:	True if the simulation completes successfully, False if the simulation pauses.
Return type:	bool

run_simple(seed, task, output, speciation_rate, sigma, size)[source]¶

Runs a simple coalescence simulation on a square infinite landscape with the provided parameters. This requires a separate compilation of the inf_land version of the coalescence simulator.

Note that this function returns richness=0 for failure to read from the file. It is assumed that there will be at least one species in the simulation.

Note that the maximum time for this function is set as 10 hours (36000 seconds) and will raise an exception if the simulation does not complete in this time).

Raises:	RuntimeError – if the simulation didn’t complete in time.
Parameters:	seed – the simulation seed task – the task (for file naming) output – the output directory speciation_rate – the probability of speciation sigma – the normal distribution sigma value for dispersal size – the size of the world (so there will be size^2 individuals simulated)
Returns:	the species richness in the simulation

set_config_file(output_file=None)[source]¶

Sets the config file to the output, over-writing any existing config file that has been stored.

Parameters:	output_file – path to config file to output to

set_map_files(sample_file, fine_file=None, coarse_file=None, historical_fine_file=None, historical_coarse_file=None, dispersal_map=None, death_map=None, reproduction_map=None)[source]¶

Sets the map files (or to null, if none specified). It then calls detect_map_dimensions() to correctly read in the specified dimensions.

If sample_file is “null”, dimension values will remain at 0. If coarse_file is “null”, it will default to the size of fine_file with zero offset. If the coarse file is “none”, it will not be used. If the historical fine or coarse files are “none”, they will not be used.

Note

the dispersal map should be of dimensions xy by xy where x, y are the fine map dimensions. Dispersal rates from each row/column index represents dispersal from the row index to the column index according to index = x+(y*xdim), where x,y are the coordinates of the cell and xdim is the x dimension of the fine map. See the PatchedLandscape class for routines for generating these landscapes.

Parameters:	sample_file (str) – the sample map file. Provide “null” if on samplemask is required fine_file (str) – the fine map file. Defaults to “null” if none provided coarse_file (str) – the coarse map file. Defaults to “none” if none provided historical_fine_file (str) – the historical fine map file. Defaults to “none” if none provided historical_coarse_file (str) – the historical coarse map file. Defaults to “none” if none provided dispersal_map (str) – the dispersal map for reading dispersal values. Default to “none” if none provided death_map (str) – a map of relative death probabilities, at the scale of the fine map reproduction_map (str) – a map of relative reproduction probabilities, at the scale of the fine map
Return type:	None
Returns:	None

set_optimised_solution(dict_in)[source]¶

Sets the optimised RAM solution from the variables in the provided dictionary. This should contain the grid_x_size, grid_y_size, grid_file_name, sample_x_offset and sample_y_offset.

Parameters:	dict_in (dict) – the dictionary containing the optimised RAM solution variables
Return type:	None

set_seed(seed)[source]¶

Sets the seed for the simulation.

A seed < 1 should not be set for the necsim, as equivalent behaviour is produce for seed and abs(seed), plus for seed = 1 and seed = 0. Consequently, for any values less than 1, we take a very large number plus the seed, instead. Therefore a error is raised if the seed exceeds this very large number (this is an acceptable decrease in userability as a seed that large is unlikely to ever be used).

Parameters:	seed (int) – the random number seed

set_simulation_parameters(seed, task, output_directory, min_speciation_rate, sigma=1.0, tau=1.0, deme=1.0, sample_size=1.0, max_time=3600, dispersal_method=None, m_prob=0.0, cutoff=0, dispersal_relative_cost=1, min_num_species=1, restrict_self=False, landscape_type=False, protracted=False, min_speciation_gen=None, max_speciation_gen=None, spatial=True, uses_spatial_sampling=False, times=None)[source]¶

Set all the simulation parameters apart from the map objects.

Parameters:

seed (int) – the unique job number for this simulation set
task (int) – the task reference number (used for easy file identification after simulations are complete)
output_directory (str) – the output directory to store the SQL database
min_speciation_rate (float) – the minimum speciation rate to simulate
sigma (float) – the dispersal sigma value
tau (float) – the fat-tailed dispersal tau value
deme (float) – the deme size (in individuals per cell)
sample_size (float) – the sample size of the deme (decimal 0-1)
max_time (float) – the maximum allowed simulation time (in seconds)
dispersal_method (str) – the dispersal kernel method. Should be one of [normal, fat-tail, norm-uniform]
m_prob (float) – the probability of drawing from the uniform dispersal. Only relevant for uniform dispersals
cutoff (float) – the maximum value for the uniform dispersal. Only relevant for uniform dispersals.
dispersal_relative_cost (float) – the relative cost of travelling through non-habitat (defaults to 1)
min_num_species (int) – the minimum number of species known to exist (defaults to 1
restrict_self (bool) – if true, restricts dispersal from own cell
landscape_type (bool/str) – if false or “closed”, restricts dispersal to the provided maps, otherwise can be “infinite”, or a tiled landscape using “tiled_coarse” or “tiled_fine”, or a clamped landscape using “clamped_coarse” or “clamped_fine”.
protracted (bool) – if true, uses protracted speciation application
min_speciation_gen (float) – the minimum amount of time a lineage must exist before speciation occurs.
max_speciation_gen (float) – the maximum amount of time a lineage can exist before speciating.
spatial (bool) – if true, means that the simulation is spatial
uses_spatial_sampling (bool) – if true, the sample mask is interpreted as a proportional sampling mask, where the number of individuals sampled in the cell is equal to the density * deme_sample * cell sampling proportion
times (list) – list of temporal sampling points to apply (in generations)

set_speciation_rates(speciation_rates)[source]¶

Add speciation rates for analysis at the end of the simulation. This is optional

Parameters:	speciation_rates (list) – a list of speciation rates to apply at the end of the simulation

setup_necsim()[source]¶

Calculates the type of the simulation (spatial/non-spatial, protracted/non-protracted) and sets the c object appropriately.

Return type:	None

write_config(config_file)[source]¶

Writes the config to the config file provided, overwriting any existing config files.

Parameters:	config_file – the config file to write out to
Return type:	None

coalescence_tree module ¶

Generate the coalescence tree and acquire a number of biodiversity metrics for different parameter sets. Can also be used to compare against a comparison simulation object.

input:	Completed simulation database from `Simulation` Parameters and operations to apply
output:	A variety of biodiversity metrics, including species richness and abundance distributions, locations of each species, alpha and beta diversity, plus equivalent fragment biodiversity metrics. Modifies the simulation database in place.

class CoalescenceTree(database=None, logging_level=30, log_output=None)[source]¶

Bases: object

Contains the coalescence tree and performs various calculations of different biodiversity metrics, which are then stored in the SQLite database.

The general process is

Import the database (set_database()) and import the comparison data, if required (import_comparison_data())
Apply additional speciation rates (if required) using set_speciation_parameters() and then apply()
Calculate required metrics (such as calculate_fragment_richness())
Optionally, calculate the goodness of fit (calculate_goodness_of_fit())

add_metacommunity_parameters(metacommunity_size=None, metacommunity_speciation_rate=None, metacommunity_option=None, metacommunity_reference=0)[source]¶

Adds the metacommunity parameters to the object.

Parameters:

metacommunity_size (float) – the number of individuals in the metacommunity
metacommunity_speciation_rate (float) – the speciation rate within the metacommunity
metacommunity_option (str) – either “simulated”, “analytical”, or a path to a database to read SADs from
metacommunity_reference (int) – the metacommunity reference if using a database to provide the metacommunity

Return type:

None

add_multiple_protracted_parameters(min_speciation_gens=None, max_speciation_gens=None, speciation_gens=None)[source]¶

Adds the protracted parameter set, taking an iterable as an input.

Note

Using the keyword arguments, one can supply either a list of tuples for pairs of speciation generations, or two lists of generations for the min and max, matching in order.

Parameters:	min_speciation_gens – the minimum number of generations required before speciation is permitted. Order should match that of `max_speciation_gens` max_speciation_gens – the maximum number of generations required before speciation is permitted. Order should match that of `min_speciation_gens` speciation_gens – a list of tuples of min/max speciation generations.

add_protracted_parameters(min_speciation_gen, max_speciation_gen)[source]¶

Adds the protracted parameter set.

Note

Wipes (0.0, 0.0) from protracted parameters, if it is there alone.

Parameters:	min_speciation_gen – the minimum number of generations required before speciation is permitted max_speciation_gen – the maximum number of generations required before speciation is permitted

add_time(time)[source]¶

Adds the time to the list to be applied.

Parameters:	time – the time to be applied

add_times(times)[source]¶

Adds the list of times to those to be applied.

Parameters:	times – list of times to be applied

adjust_data()[source]¶: Ensures that the numbers of individuals are equalised between the comparison and simulated datasets, and modifies the relevant tables with the new data

apply()[source]¶: Generates the cooalescence tree for the set of speciation parameters. This must be run after the main coalescence simulations are complete. It will create additional fields and tables in the SQLite database which contains the requested data.

apply_incremental()[source]¶: Generates the coalescence tree for the set of speciation parameters. Does not write changes to the database, just holds the changes internally.

apply_non_spatial_remaining(database)[source]¶

Applies the non-spatial neutral model to the remaining lineages. This approximation is reasonable on a closed landscape once the lineages themselves are close to randomly distributed.

Parameters:	database – the database file to open
Returns:	None
Return type:	None

calculate_alpha_diversity(output_metrics=True)[source]¶

Calculates the system alpha diversity for each set of parameters stored in COMMUNITY_PARAMETERS. Stores the output in ALPHA_DIVERSITY table.

Parameters:	output_metrics (bool) – output to the BIODIVERSITY_METRICS table

calculate_beta_diversity(output_metrics=True)[source]¶

Calculates the beta diversity for the system for each speciation parameter set and stores the output in BETA_DIVERSITY. Will calculate alpha diversity and species richness tables if they have not already been performed.

Parameters:	output_metrics (bool) – output to the BIODIVERSITY_METRICS table

calculate_comparison_octaves(store=False)[source]¶

Calculates the octave classes for the comparison data and for fragments (if required). If the octaves exist in the FRAGMENT_OCTAVES table in the comparison database, the data will be imported instead of being re-calculated.

Note

If store is True, will store an EDITED version of the comparison octaves, such that the number of

individuals is equal between the comparison and simulated data.

Parameters:	store – if True, stores within the comparison database.

calculate_fragment_abundances()[source]¶

Calculates the fragment abundances, including equalising with the comparison database, if it has already been set.

Sets fragment_abundances object.

calculate_fragment_octaves()[source]¶: Calculates the octave classes for each fragment. Outputs the calculated richness into the SQL database within a FRAGMENT_OCTAVES table

calculate_fragment_richness(output_metrics=True)[source]¶

Calculates the fragment richness and stores it in a new table called FRAGMENT_RICHNESS. Also adds the record to BIODIVERSITY METRICS for If the table already exists, it will simply be returned. Each time point and speciation rate combination will be recorded as a new variable.

Parameters:	output_metrics (bool) – output to the BIODIVERSITY_METRICS table

calculate_goodness_of_fit()[source]¶

Calculates the goodness-of-fit measure based on the calculated biodiversity metrics, scaling each metric by the number of individuals involved in the metric.

This requires that import_comparison_data() has already been successfully run.

Note

This doesn’t calculate anything for values which have not yet been written to the BIODIVERSITY_METRICS table. All in-built functions (e.g. calculate_alpha_diversity, calculate_fragment_richness) write to the BIODIVERSITY_METRICS table automatically, so this is only relevant for custom functions.

The resulting value will then be written to the BIODIVERSITY_METRICS table in the SQL database.

calculate_octaves()[source]¶: Calculates the octave classes for the landscape. Outputs the calculated richness into the SQL database within a FRAGMENT_OCTAVES table.

calculate_octaves_error()[source]¶: Calculates the error in octaves classes between the simulated data and the comparison data. Stores each error value as a new entry in BIODIVERSITY_METRICS under fragment_octaves. Calculates the error by comparing each octave class and summing the relative difference. Octaves are then averaged for each fragment.

calculate_richness(output_metrics=True)[source]¶

Calculates the landscape richness from across all fragments and stores result in a new table in SPECIES_RICHNESS Stores a separate result for each community reference.

Parameters:	output_metrics (bool) – output to the BIODIVERSITY_METRICS table

calculate_species_distance_similarity(output_metrics=True)[source]¶

Calculates the probability two individuals are of the same species as a function of distance.

Stores the mean distance between individuals of the same species in the BIODIVERSITY_METRICS table, and stores the full data in new table (SPECIES_DISTANCE_SIMILARITY). Distances are binned to the nearest integer.

Parameters:	output_metrics – if true, outputs to the BIODIVERSITY_METRICS table as well, for metric comparison

Note

Extremely slow for large landscape sizes.

check_biodiversity_table_exists()[source]¶

Checks whether the biodiversity table exists and creates the table if required.

Returns:	the max reference value currently existing

clear_calculations()[source]¶: Removes the BIODIVERSITY_METRICS and FRAGMENT_OCTAVES tables completely.

Note

that this cannot be undone (other than re-running the calculations).

dispersal_parameters()[source]¶

Reads the dispersal parameters from the database and returns them.

Returns:	a dict of the dispersal parameters (dispersal method, sigma, tau, m_probability and cutoff)

downsample(sample_proportion)[source]¶

Down-samples the individuals by a given proportion globally, and at each location.

The original SPECIES_LIST is stored in a new table called SPECIES_LIST_ORIGINAL and a new SPECIES_LIST object is created containing the down-sampled coalescence tree.

Parameters:	sample_proportion (float) – the proportion of individuals to sample at each location
Returns:	None
Return type:	None

downsample_at_locations(fragment_csv, ignore_errors=False)[source]¶

Downsamples the SPECIES_LIST object using a fragment csv.

Each row in the csv file should contain the fragment name, x min, y min, x max, y max and the number of individuals per cell in that fragment.

Parameters:	fragment_csv – a csv file to use for downsampling individuals ignore_errors – ignore the errors from mismatches in numbers of individuals
Returns:	None
Return type:	None

get_all_fragment_abundances()[source]¶

Returns the whole table of fragment abundances from the database.

Returns:	a list of reference, fragment, species_id, no_individuals

get_alpha_diversity(reference=1)[source]¶: Gets the system alpha diversity for the provided community reference parameters. Alpha diversity is the mean number of species per fragment. :param reference: the community reference for speciation parameters :return: the alpha diversity of the system

get_alpha_diversity_pd()[source]¶

Gets the alpha diversity for each set of community parameters.

Returns:	all alpha diversity values
Return type:	pandas.DataFrame

get_beta_diversity(reference=1)[source]¶: Gets the system beta diversity for the provided community reference parameters. Beta diversity is the true beta diversity (gamma / alpha). :param reference: the community reference for speciation parameters :return: the beta diversity of the system

get_beta_diversity_pd()[source]¶

Gets the beta diversity for each set of community parameters.

Returns:	all beta diversity values
Return type:	pd.DataFrame

get_biodiversity_metrics()[source]¶

Get calculated biodiversity metrics.

Returns:	all biodiversity metrics
Return type:	pandas.DataFrame

get_community_parameters(reference=1)[source]¶

Returns a dictionary containing the parameters for the calculated community.

Parameters:	reference – the reference key for the calculated parameters (default is 1)
Returns:	dictionary containing the speciation_rate, time, fragments, metacommunity_reference and min/max speciation generation for protracted sims
Return type:	dict

get_community_parameters_pd()[source]¶

Gets all the calculated community parameter sets from the database.

Returns:	the community parameters
Return type:	pd.DataFrame

get_community_reference(speciation_rate, time, fragments, metacommunity_size=0, metacommunity_speciation_rate=0.0, metacommunity_option=None, external_reference=0, min_speciation_gen=0.0, max_speciation_gen=0.0)[source]¶

Gets the community reference associated with the supplied community parameters

Raises:	KeyError – if COMMUNITY_PARAMETERS (or METACOMMUNITY_PARAMETERS) does not exist in database or no reference exists for the supplied parameters
Parameters:	speciation_rate (float) – the speciation rate of the community time (float) – the time in generations of the community fragments (bool/int) – whether fragments were determined for the community metacommunity_size (int/float) – the metacommunity size metacommunity_speciation_rate (float) – the metacommunity speciation rate metacommunity_option (str) – option used for metacommunity creation external_reference (int) – the metacommunity reference for external metacommunity databases min_speciation_gen (float) – the minimum number of generations required before speciation max_speciation_gen (float) – the maximum number of generations required before speciation
Returns:	the reference associated with this set of simulation parameters

get_community_references()[source]¶

Gets a list of all the commuity references already calculated for the simulation.

Returns:	list of all calculated community references
Return type:	list

get_fragment_abundances(fragment, reference)[source]¶

Gets the species abundances for the supplied fragment and community reference.

Parameters:	fragment – the name of the fragment to obtain reference – the reference for speciation parameters to obtain for
Returns:	a list of species ids and abundances

get_fragment_abundances_pd()[source]¶

Gets the fragment abundances for each set of community parameters.

Returns:	the fragment abundances for each associated community reference
Return type:	pandas.DataFrame

get_fragment_list(community_reference=1)[source]¶

Returns a list of all fragments that exist in FRAGMENT_ABUNDANCES.

Parameters:	community_reference – community reference to obtain for (default 1)
Returns:	list all all fragment names

get_fragment_octaves(fragment=None, reference=None)[source]¶

Get the pre-calculated octave data for the specified fragment, speciation rate and time. If fragment and speciation_rate are None, returns the entire FRAGMENT_OCTAVES object This requires self.calculate_fragment_octaves() to have been run successfully at some point previously.

Returns are of form [id, fragment, community_reference, octave class, number of species]

Parameters:	fragment – the desired fragment (defaults to None) reference – the reference key for the calculated community parameters
Returns:	output from FRAGMENT_OCTAVES for the selected variables

get_fragment_octaves_pd()[source]¶

Gets the octave classes for each fragment and community parameter set

Returns:	all fragment octave classes
Return type:	pandas.DataFrame

get_fragment_richness(fragment=None, reference=None)[source]¶

Gets the fragment richness for each speciation rate and time for the specified simulation.

If the fragment richness has not yet been calculated, it tries to calculate the fragment richness,

Parameters:	fragment (str) – the desired fragment (defaults to None) reference (int) – the reference key for the calculated community parameters
Raises:	sqlite3.Error if no table FRAGMENT_ABUNDANCES exists
Raises:	RuntimeError if no data for the specified fragment, speciation rate and time exists.
Returns:	A list containing the fragment richness, or a value of the fragment richness
Return type:	list

get_fragment_richness_pd()[source]¶

Gets the fragment richness for each set of community parameters.

Returns:	the fragment richness for each associated community reference
Return type:	pandas.DataFrame

get_goodness_of_fit(reference=1)[source]¶

Returns the goodness of fit from the file.

Parameters:	reference – the community reference to get from
Returns:	the full output from the SQL query
Return type:	float

get_goodness_of_fit_fragment_octaves(reference=1)[source]¶

Returns the goodness of fit for fragment octaves from the file.

Note

If more than one metric matches the specified criteria, only the first will be returned.

Raises:	ValueError – if BIODIVERSITY_METRICS table does not exist.
Parameters:	reference – the community reference number
Returns:	the full output from the SQL query
Return type:	double

get_goodness_of_fit_fragment_richness(reference=1)[source]¶

Returns the goodness of fit for fragment richness from the file.

Raises:	ValueError – if BIODIVERSITY_METRICS table does not exist.
Parameters:	reference – the community reference number
Returns:	the full output from the SQL query
Return type:	float

get_goodness_of_fit_metric(metric, reference=1)[source]¶

Gets the goodness-of-fit measure for the specified metric and community reference.

Parameters:	metric – the metric goodness of fit has been calculated for to obtain reference – the community reference to fetch fits for
Returns:	the goodness of fit value
Return type:	float

get_job()[source]¶

Gets the job number (the seed) and the job type (the task identifier).

Returns:	list containing [seed, job_type (the task identifier)]

get_metacommunity_parameters(reference=1)[source]¶

Returns a dictionary containing the parameters for the calculated community.

Parameters:	reference – the reference key for the calculated parameters. (default is 1)
Raises:	sqlite3.Error – if the METACOMMUNITY_PARAMETERS table does not exist, or some other sqlite error occurs KeyError – if the supplied reference does not exist in the METACOMMUNITY_PARAMETERS table
Returns:	dictionary containing the speciation_rate, metacommunity_size, metacommunity option and metacommunity reference.
Return type:	dict

get_metacommunity_parameters_pd()[source]¶

Gets all the calculated metacommunity parameter sets from the database.

Returns:	the metacommunity parameters
Return type:	pd.DataFrame

get_metacommunity_references()[source]¶

Gets a list of all the metacommuity references already calculated for the simulation.

Note

Returns an empty list and logs an error message if the METACOMMUNITY_PARAMETERS table does not exist.

Returns:	list of all calculated metacommunity references
Return type:	list

get_number_individuals(fragment=None, community_reference=None)[source]¶

Gets the number of individuals that exist, either in the provided fragment, or on the whole landscape in one time slice. Counts individuals from FRAGMENT_ABUNDANCES or SPECIES_ABUNDANCES, respectively.

If a community reference is provided, only individuals for that time slice will be counted, otherwise a mean is taken across time slices.

Parameters:	fragment – the name of the fragment to get a count of individuals from community_reference – the reference to the community parameters
Returns:	the number of individuals that exists in the desired location

get_octaves(reference)[source]¶

Get the pre-calculated octave data for the parameters associated with the supplied reference. This will call self.calculate_octaves() if it hasn’t been called previously.

Returns are of form [id, ‘whole’, time, speciation rate, octave class, number of species]

Parameters:	reference – community reference which contains the parameters of interest
Returns:	output from FRAGMENT_OCTAVES on the whole landscape for the selected variables

get_octaves_pd()[source]¶

Gets the species octaves for all calculated community parameters

Returns:	all octave classes for the whole landscape
Return type:	pandas.DataFrame

get_parameter_description(key=None)[source]¶

Gets the description of the parameter matching the key from those contained in SIMULATION_PARAMETERS

Simply accesses the _parameter_descriptions data stored in parameter_descriptions.json

Returns:	string containing the parameter description or a dict containing all values if no key is supplied
Return type:	str

get_simulation_parameters(guild=None)[source]¶

Reads the simulation parameters from the database and returns them.

Returns:	a dictionary mapping names to values for seed, job_type, output_dir, speciation_rate, sigma,

L_value, deme, sample_size, maxtime, dispersal_relative_cost, min_spec, habitat_change_rate, gen_since_historical, time_config, coarse_map vars, fine map vars, sample_file, gridx, gridy, historical coarse map, historical fine map, sim_complete, dispersal_method, m_probability, cutoff, landscape_type, protracted, min_speciation_gen, max_speciation_gen, dispersal_map

get_species_abundances(fragment=None, reference=None)[source]¶

Gets the species abundance for a particular fragment, speciation rate and time. If fragment is None, returns the whole landscape species abundances.

Parameters:	fragment (str) – the fragment to obtain the species abundance of. If None, returns landscape abundances. reference (int) – the commmunity reference to obtain metrics for
Returns:	list of species abundances [reference, species ID, speciation rate, number of individuals, generation]

get_species_abundances_pd()[source]¶

Gets the species abundances for all community parameter sets.

Returns:	all species abundances
Return type:	pandas.DataFrame

get_species_distance_similarity(community_reference=1)[source]¶

Gets the species distance similarity table for the provided community reference.

Returns:	list containing the distance, number of similar species with that distance

get_species_list()[source]¶

Gets the entirety of the SPECIES_LIST table, returning a tuple with an entry for each row. This can be used to construct custom analyses of the coalescence tree.

Note

The species list will be produced in an unprocessed format

Returns:	a list of each coalescence and speciation event, with locations, performed in the simulation
Return type:	tuple

get_species_locations(community_reference=None)[source]¶

Gets the list of species locations after coalescence.

If a community reference is provided, will return just the species for that community reference, otherwise returns the whole table

Parameters:	community_reference (int) – community reference number
Returns:	a list of lists containing each row of the SPECIES_LOCATIONS table

get_species_richness(reference=1)[source]¶

Get the system richness for the parameters associated with the supplied community reference.

Note

Richness of 0 is returned if there has been some problem; it is assumed that species richness will be above 0 for any simulation.

Note

if species richness has previously been calculated and stored in SPECIES_RICHNESS table, it gets the species richness value from there, otherwise it calculates the species richness

Parameters:	reference – community reference which contains the parameters of interest
Returns:	either a list containing the community references and respective species richness values OR (if community_reference is provided), the species richness for that community reference.
Return type:	int, list

get_species_richness_pd()[source]¶

Gets the species richness for all calculated parameters from the database.

Returns:	all species richness values with their associated community reference
Return type:	pandas.DataFrame

get_total_number_individuals()[source]¶: Gets the total number of individuals that exist in the simulation. :return: the total number of individuals simulated across time slices

import_comparison_data(filename, ignore_mismatch=False)[source]¶

Imports the SQL database that contains the biodiversity metrics that we want to compare against.

This can either be real data (for comparing simulated data) or other simulated data (for comparing between models).

If the SQL database does not contain the relevant biodiversity metrics, they will be calculated (if possible) or skipped.

The expected form of the database is the same as the BIODIVERSITY_METRICS table, except without any speciation rates or time references, and a new column containing the number of individuals involved in each metric.

Note

This also equalises the comparison data if ignore_mismatch is not True, so that the number of individuals is equal between the simulated and comparison datasets.

Parameters:	filename (str) – the file containing the comparison biodiversity metrics. ignore_mismatch (bool) – set to true to ignore abundance mismatches between the comparison and simulated

data.

is_protracted()[source]¶

Indicates whether the simulation is a protracted simulation or not. This is read from the completed database file.

Returns:	boolean, true if the simulation was performed with protracted speciation.

output()[source]¶: Outputs the coalescence trees to the same simulation database object.

revert_downsample()[source]¶

Reverts the downsample process by restoring the original SPECIES_LIST table.

Returns:	None
Return type:	None

sample_fragment_richness(fragment, number_of_individuals, community_reference=1, n=1)[source]¶

Samples from the database from FRAGMENT_ABUNDANCES, the desired number of individuals.

Randomly selects the desired number of individuals from the database n times and returns the mean richness for the random samples.

Raises:	IOError – if the FRAGMENT_ABUNDANCES table does not exist in the database.
Parameters:	fragment – the reference of the fragment to aquire the richness for number_of_individuals – the number of individuals to sample community_reference – the reference for the community parameters n – number of times to repeatedly sample
Returns:	the mean of the richness from the repeats
Return type:	float

sample_landscape_richness(number_of_individuals, n=1, community_reference=1)[source]¶

Samples from the landscape the required number of individuals, returning the mean of the species richnesses produced.

If number_of_individuals is a dictionary mapping fragment names to numbers sampled, will sample the respective number from each fragment and return the whole landscape richness.

Raises:	KeyError – if the dictionary supplied contains more sampled individuals than exist in a fragment, or if the fragment is not contained within the dictionary.
Parameters:	number_of_individuals (int/dict) – either an int containing the number of individuals to be sampled, or a dictionary mapping fragment names to numbers of individuals to be sampled n – the number of repeats to average over community_reference – the community reference to fetch abundances for
Returns:	the mean of the richness from the repeats for the whole landscape
Return type:	float

set_database(filename)[source]¶

Sets the database to the specified file and opens the sqlite connection.

This must be done before any other operations can be performed and the file must exist.

Raises:	IOError – if the simulation is not complete, as analysis can only be performed on complete simulations. However, the database WILL be set before the error is thrown, allowing for analysis of incomplete simulations if the error is handled correctly.
Parameters:	filename (pycoalescence.simulation.Simulation/str) – the SQLite database file to import

set_speciation_parameters(speciation_rates, record_spatial=False, record_fragments=False, sample_file=None, times=None, protracted_speciation_min=None, protracted_speciation_max=None, metacommunity_size=None, metacommunity_speciation_rate=None, metacommunity_option=None, metacommunity_reference=None)[source]¶

Set the parameters for the application of speciation rates. If no config files or time_config files are provided, they will be taken from the main coalescence simulation.

Parameters:

speciation_rates (float/list) – a single float, or list of speciation rates to apply
str record_spatial (bool,) – a boolean of whether to record spatial data (default=False)
str record_fragments (bool,) – either a csv file containing fragment data, or T/F for whether fragments should be calculated from squares of continuous habitat (default=False)
sample_file (str) – a sample tif or csv specifying the sampling mask
times (list) – a list of times to apply (should have been run with the original simulation)
protracted_speciation_min (float) – the minimum number of generations required for speciation to occur
protracted_speciation_max (float) – the maximum number of generations before speciation occurs
metacommunity_size (float) – the size of the metacommunity to apply
metacommunity_speciation_rate (float) – speciation rate for the metacommunity
metacommunity_option (str) – either “simulated”, “analytical”, or a path to a database to read SADs from
metacommunity_reference (int) – the metacommunity reference if using a database to provide the metacommunity

Return type:

None

speciate_remaining(database)[source]¶

Speciates the remaining lineages in a paused database.

Parameters:	database (str/pycoalescence.simulation.Simulation) – the paused database to open
Return type:	None

wipe_data()[source]¶: Wipes all calculated data apart from the original, unformatted coalescence tree. The Speciation_Counter program will have to be re-run to perform any analyses.

write_all_to_csvs(output_location, file_naming)[source]¶

Outputs all tables from the database to csvs contained in the provided directory and following the naming structure of the supplied file naming.

Parameters:	output_location (str) – the folder to generate files in file_naming – the naming for the output csvs - will be appended with _{table_name}.csv
Returns:

Note

dots and “.csv” extensions are removed from the file_naming output

write_to_csv(output_csv, table_name)[source]¶

Writes a specified table from the database to the output database.

Parameters:	output_csv (str) – path to the csv output location table_name (str) – name of the output table
Returns:	None
Return type:	None

get_parameter_description(key=None)[source]¶

Gets the parameter descriptions for the supplied key. If the key is None, returns all keys.

Parameters:	key – the simulation parameter
Returns:	string containing the parameter description or a dict containing all values if no key is supplied

scale_simulation_fit(simulated_value, actual_value, number_individuals, total_individuals)[source]¶

Calculates goodness of fit for the provided values, and scales based on the total number of individuals that exist. The calculation is 1 - (abs(x - y)/max(x, y)) * n/n_tot for x, y simulated and actual values, n, n_tot for metric and total number of individuals.

Parameters:	simulated_value – the simulated value of the metric actual_value – the actual value of the metric number_individuals – the number of individuals this metric relates to total_individuals – the total number of individuals across all sites for this metric
Returns:	the scaled fit value

Additional submodules ¶

All additional modules which are required for package functionality, but are unlikely to be used directly.

dispersal_simulation module ¶

Simulate dispersal kernels on landscapes. Detailed here.

input:	Map file to simulate on Set of dispersal pararameters, including the dispersal kernel, number of repetitions and landscape properties
output:	Database containing each distance travelled so that metrics can be calculated. A table is created for mean dispersal distance over a single step or for mean distance travelled.

class DispersalSimulation(dispersal_db=None, file=None, logging_level=30)[source]¶

Bases: pycoalescence.landscape.Landscape

Simulates a dispersal kernel upon a tif file to calculate landscape-level dispersal metrics.

check_base_parameters(number_repeats=None, seed=None, sequential=None, number_workers=None, dispersal=False)[source]¶

Checks that the parameters have been set properly.

Prarm bool dispersal:
Parameters:	number_repeats (int) – the number of times to iterate on the map seed (int) – the random seed sequential (bool) – if true, runs repeats in the dispersal simulation sequentially number_workers (int) – the number of threads (>= 1) launched to run the distance simulation in parallel
	True iff a dispersal instead of a distance simulation is to be run
Return type:	None

complete_setup()[source]¶: Completes the setup for the dispersal simulation, including importing the map files and setting the historical maps.

get_all_dispersal(database=None, parameter_reference=1)[source]¶

Gets all mean dispersal values from the database if run_mean_dispersal has already been run.

Raises:	ValueError if dispersal_database is None and so run_mean_dispersal() has not been run
Raises:	IOError if the output database does not exist
Parameters:	database (str) – the database to open parameter_reference (int) – the parameter reference to use (default 1)
Returns:	the dispersal values from the database

get_all_distances(database=None, parameter_reference=1)[source]¶

Gets all total distances travelled from the database if run_mean_distance_travelled or run_all_distance_travelled or run_sample_distance_travelled has already been run.

Raises:	ValueError if dispersal_database is None and so run_mean_dispersal() has not been run
Raises:	IOError if the output database does not exist
Parameters:	database (str) – the database to open parameter_reference (int) – the parameter reference to use (default 1)
Returns:	the dispersal values from the database

get_database_parameters(reference=None)[source]¶

Gets the dispersal simulation parameters from the dispersal_db

Parameters:	reference – the reference to obtain parameters for
Returns:	the dispersal simulation parameters
Return type:	dict

get_database_references()[source]¶

Gets the references from the database.

Returns:	a list of references from the database
Return type:	list

get_distances_map(shape, database=None, parameter_reference=1)[source]¶

Gets all total distances travelled from the database if run_mean_distance_travelled or run_all_distance_travelled or run_sample_distance_travelled has already been run and puts them inside a numpy matrix

Raises:	ValueError if dispersal_database is None and so run_mean_dispersal() has not been run
Raises:	IOError if the output database does not exist
Raises:	IndexError if the output database contains coordinates outside a matrix with shape=shape
Parameters:	int) shape ((int,) – shape of the numpy matrix to return which will contain the distances database (str) – the database to open parameter_reference (int) – the parameter reference to use (default 1)
Returns:	the dispersal values from the database

get_mean_dispersal(database=None, parameter_reference=1)[source]¶

Gets the mean dispersal for the map if run_mean_dispersal has already been run.

Raises:	ValueError if dispersal_database is None and so run_mean_dispersal() has not been run
Raises:	IOError if the output database does not exist
Parameters:	database (str) – the database to open parameter_reference (int) – the parameter reference to use (default 1)).
Returns:	mean dispersal from the database

get_mean_distance_travelled(database=None, parameter_reference=1)[source]¶

Gets the mean dispersal for the map if run_mean_distance_travelled or run_all_distance_travelled or run_sample_distance_travelled has already been run.

Raises:	ValueError if dispersal_database is None and so test_average_dispersal() has not been run
Raises:	IOError if the output database does not exist
Parameters:	database (str) – the database to open parameter_reference (int) – the parameter reference to use (or 1 for default parameter reference).
Returns:	mean of dispersal from the database

get_stdev_dispersal(database=None, parameter_reference=1)[source]¶

Gets the standard deviation of dispersal for the map if run_mean_dispersal has already been run.

Raises:	ValueError if dispersal_database is None and so test_average_dispersal() has not been run
Raises:	IOError if the output database does not exist
Parameters:	database (str) – the database to open parameter_reference (int) – the parameter reference to use (or 1 for default parameter reference).
Returns:	standard deviation of dispersal from the database

get_stdev_distance_travelled(database=None, parameter_reference=1)[source]¶

Gets the standard deviation of the distance travelled for the map if run_mean_distance_travelled or run_all_distance_travelled or run_sample_distance_travelled has already been run.

Raises:	ValueError if dispersal_database is None and so test_average_dispersal() has not been run
Raises:	IOError if the output database does not exist
Parameters:	database (str) – the database to open parameter_reference (int) – the parameter reference to use (or 1 for default parameter reference).
Returns:	standard deviation of dispersal from the database
Return type:	float

run_all_distance_travelled(number_repeats=None, number_steps=None, seed=None, number_workers=None)[source]¶

Tests the dispersal kernel on all cells on the provided map, producing a database containing the average distance travelled after number_steps have been moved.

Parameters:	number_repeats (int) – the number of times to average over for each cell number_steps (int/list) – the number of steps to take each time before recording the distance travelled seed (int) – the random seed number_workers (int) – the number of threads (>= 1) launched to run the distance simulation in parallel
Return type:	None

run_mean_dispersal(number_repeats=None, seed=None, sequential=None)[source]¶

Tests the dispersal kernel on the provided map, producing a database containing each dispersal distance for analysis purposes.

Note

should be equivalent to run_mean_distance_travelled() with number_steps = 1

Parameters:	number_repeats (int) – the number of times to iterate on the map seed (int) – the random seed sequential (bool) – if true, runs repeats sequentially

run_mean_distance_travelled(number_repeats=None, number_steps=None, seed=None, number_workers=None)[source]¶

Tests the dispersal kernel on the provided map, producing a database containing the average distance travelled after number_steps have been moved.

Note

mean distance travelled with number_steps=1 should be equivalent to running run_mean_dispersal()

Parameters:	number_repeats (int) – the number of times to iterate on the map number_steps (int/list) – the number of steps to take each time before recording the distance travelled seed (int) – the random seed number_workers (int) – the number of threads (>= 1) launched to run the distance simulation in parallel
Return type:	None

run_sample_distance_travelled(samples_X, samples_Y, number_repeats=None, number_steps=None, seed=None, number_workers=None)[source]¶

Tests the dispersal kernel on the sampled cells on the provided map, producing a database containing the average distance travelled after number_steps have been moved.

Parameters:

samples_X (list) – list of the integer x coordinates of the sampled cells
samples_Y (list) – list of the integer y coordinates of the sampled cells
number_repeats (int) – the number of times to average over for each cell
number_steps (int/list) – the number of steps to take each time before recording the distance travelled
seed (int) – the random seed
number_workers (int) – the number of threads (>= 1) launched to run the distance simulation in parallel

Return type:

None

set_dispersal_parameters(dispersal_method='normal', dispersal_file='none', sigma=1, tau=1, m_prob=1, cutoff=100, dispersal_relative_cost=1, restrict_self=False)[source]¶

Sets the dispersal parameters.

Parameters:

dispersal_method (str) – the dispersal method to use (“normal”, “fat-tailed” or “norm-uniform”)
dispersal_file (str) – path to the dispersal map file, or none.
sigma (float) – the sigma value to use for normal and norm-uniform dispersal
tau (float) – the tau value to use for fat-tailed dispersal
m_prob (float) – the m_prob to use for norm-uniform dispersal
cutoff (float) – the cutoff value to use for norm-uniform dispersal

:param float dispersal_relative_cost:relative dispersal ability through non-habitat :param bol restrict_self: if true, self-dispersal is prohibited

set_map_files(fine_file, sample_file='null', coarse_file=None, historical_fine_file=None, historical_coarse_file=None, deme=1)[source]¶

Sets the map files.

Uses a null sampling regime, as the sample file should have no effect.

Parameters:

fine_file (str) – the fine map file. Defaults to “null” if none provided
coarse_file (str) – the coarse map file. Defaults to “none” if none provided
historical_fine_file (str) – the historical fine map file. Defaults to “none” if none provided
historical_coarse_file (str) – the historical coarse map file. Defaults to “none” if none provided
deme (int) – the number of individuals per cell

Return type:

None

set_simulation_parameters(number_repeats=None, output_database='output.db', seed=1, number_workers=1, dispersal_method='normal', landscape_type='closed', sigma=1, tau=1, m_prob=1, cutoff=100, sequential=False, dispersal_relative_cost=1, restrict_self=False, number_steps=1, dispersal_file='none')[source]¶

Sets the simulation parameters for the dispersal simulations.

Parameters:

number_repeats (int) – the number of times to iterate on the map
output_database (str) – the path to the output database
seed (int) – the random seed
number_workers (int) – the number of threads (>= 1) launched to run the distance simulation in parallel
dispersal_method (str) – the dispersal method to use (“normal”, “fat-tailed” or “norm-uniform”)
landscape_type (str) – the landscape type to use (“infinite”, “tiled_coarse”, “tiled_fine”, “clamped_coarse”, “clamped_fine” or “closed”)
sigma (float) – the sigma value to use for normal and norm-uniform dispersal
tau (float) – the tau value to use for fat-tailed dispersal
m_prob (float) – the m_prob to use for norm-uniform dispersal
cutoff (float) – the cutoff value to use for norm-uniform dispersal
sequential (bool) – if true, end locations of one dispersal event are used as the start for the next. Otherwise,

a new random cell is chosen :param float dispersal_relative_cost: relative dispersal ability through non-habitat :param bool restrict_self: if true, self-dispersal is prohibited :param list/int number_steps: the number to calculate for mean distance travelled, provided as an int or a list

of ints

Parameters:	dispersal_file (str) – path to the dispersal map file, or none.

update_parameters(number_repeats=None, number_steps=None, seed=None, number_workers=None, dispersal_method=None, dispersal_file=None, sigma=None, tau=None, m_prob=None, cutoff=None, dispersal_relative_cost=None, restrict_self=None)[source]¶

Provides a convenience function for updating all parameters which can be updated.

Parameters:

number_repeats (int) – the number of repeats to perform the dispersal simulation for
number_steps (list/int) – the number of steps to iterate for in calculating the mean distance travelled
seed (int) – the random number seed
number_workers (int) – the number of threads (>= 1) launched to run the distance simulation in parallel
dispersal_method (str) – the method of dispersal
dispersal_file (str) – the dispersal file (alternative to dispersal_method)
sigma (float) – the sigma dispersal value
tau (float) – the tau dispersal value
m_prob (float) – the probability of drawing from a uniform distribution
cutoff (float) – the maximum value for the uniform distribution
dispersal_relative_cost (float) – the relative cost of moving through non-habitat
restrict_self (bool) – if true, prohibits dispersal from the same cell

Return type:

None

fragments module ¶

Generate fragmented landscapes with specific properties. Detailed here.

Contains FragmentedLandscape for creating a fragmented landscape using hexagonal packing and an even spread of individuals between fragments. Requires scipy and matplotlib.

class Fragment(x=None, y=None)[source]¶

Bases: object

Simple class containing the centres of fragments for a fragmented landscape

place_on_grid()[source]¶

Changes the x and y positions to integers (always rounds down).

Return type:	None

setup(x, y)[source]¶

Sets up the fragment from the x and y position.

Parameters:	x – the x position of the fragment centre y – the y position of the fragment centre
Return type:	None

class FragmentedLandscape(number_fragments=None, size=None, total=None, output_file=None)[source]¶

Bases: object

Contains hexagonal packing algorithms for spacing clumps evenly on the landscape. Includes a LLoyd’s smoothing algorithm for better spacing of fragments.

Note

Fragments will not be distinct units for unfragmented landscapes (with above around 50% habitat cover).

create(override_smoothing=None, n=10)[source]¶

Creates the landscape, including running the hexagonal packing and smoothing algorithms (if required).

Note

smoothing is recommended for any landscape that is doesn’t contain a square number of fragments.

Parameters:	override_smoothing – if true, overrides the default smoothing settings (enabled for landscapes with fewer than 100000 fragments. n – the number of iterations to run Lloyd’s algorithm for
Return type:	None

fill_grid()[source]¶

Distributes the sizes evenly between the fragments, generating the actual landscape.

Return type:	None

generate(override_smoothing=None, n=10)[source]¶

Convenience function for creating fragments in one function. Generates the landscape and writes out to the output file.

If smoothing is true, will run Lloyd’s algorithm after the hexagonal packing algorithm to increase the equality of the spacing.

Note

smoothing is recommended for any landscape that is doesn’t contain a square number of fragments.

Parameters:	override_smoothing – if true, overrides the default smoothing settings (enabled for landscapes with fewer than 100000 fragments. n – the number of iterations to run Lloyd’s algorithm for
Return type:	None

place_fragments(smoothing=True, n=10)[source]¶

Places the fragments evenly on the landscape. If smoothing is true, will run Lloyd’s algorithm after the hexagonal packing algorithm to increase the equality of the spacing.

Note

smoothing is recommended for any landscape that is doesn’t contain a square number of fragments.

Parameters:	smoothing – if true, runs Lloyd’s algorithm after the hexagonal packing n – the number of iterations to run Lloyd’s algorithm for
Return type:	None

plot()[source]¶

Returns a matplotlib.pyplot.figure object containing an image of the fragmented landscape (with axes removed).

Requires that the fragmented landscape has been created already using create().

Returns:	figure object containing the fragmented landscape.
Return type:	matplotlib.pyplot.figure

setup(number_fragments, size, total, output_file)[source]¶

Sets up the landscape by checking parameters and setting object sizes.

Parameters:	number_fragments – the number of individual fragments to exist on the landscape size – the size of the x and y dimensions of the landscape total – the total number of individuals to place on the landscape output_file – the output tif file to write the output to
Return type:	None

write_to_raster()[source]¶

Writes the landscape to a tif file.

Raises:	FileExistsError – if the output file already exists
Parameters:	output_file – the path to the tif file to write out to.
Return type:	None

fragments config module ¶

Generate the fragment config files from a supplied shapefile and a raster file to offset from.

The function generate_fragment_csv() contains the full pipeline to generate the fragment csv.

class FragmentConfigHandler[source]¶

Bases: object

Contains routines for calculating the offsets from a config file.

generate_config(input_shapefile, input_raster, field_name='fragment', field_area='area')[source]¶

Generates the config file from the shapefile containing the fragments, writing the coordinates of the extent of each fragment to the output csv. The coordinates are calculated from their relevant position on the input raster.

Parameters:

input_shapefile (str) – shapefile containing the fragments in a “fragments” field, with each defined as a polygon.
input_raster (str) – the raster to calculate the coordinates from
field_name (str) – optionally provide a field to extract fragment names from
field_area (str) – optionally provide a field to extract fragment areas from (the number of individuals that exist in the fragment.

read_csv(input_csv)[source]¶

Reads the input csv file into the fragments object.

Parameters:	input_csv – the csv file to read in
Returns:	None
Return type:	None

write_csv(output_csv)[source]¶

Writes the fragments to the output csv.

Parameters:	output_csv (str) – the csv to write the output to

generate_fragment_csv(input_shapefile, input_raster, output_csv, field_name='fragment', field_area='area')[source]¶

Generates the fragment csv from the provided shapefile and raster file. Coordinates for outputted to the csv are calculated from the extent of each polygon in the shapefile as their relative position on the input raster.

The fragment extents are used solely, so overlapping extents of fragments results in individuals in those areas appearing in both fragments. Therefore, rectangular fragments alone should be used.

Important

The input shapefile and raster must have the same projection.

Parameters:

input_shapefile – the shapefile containing polygons defining fragments. Should contain fields of field_name and field_area
input_raster – raster file to calculate the relative coordinates on
output_csv – output csv to create
field_name – name of the field in the shapefile to acquire fragment names from
field_area – name of the field in the shapefile to acquire the number of individuals from

helper file ¶

Port older simulation outputs to the updated naming conventions. Should not be required by most users.

update_parameter_names(database)[source]¶

Alters the parameters names of SIMULATION_PARAMETERS in the database so that it matches the updated naming convention.

Provided for back-compatibility with older simulations.

Note

If the simulation does not require updating, this function exits silently.

Parameters:	database – the database path to alter the names of
Returns:	None
Return type:	None

hpc_setup file ¶

Compile necsim with a number of intel compiler optimisations for running on high-performance computing systems.

build_hpc()[source]¶

Compiles necsim with the flags for optimisation on high-performance intel-based systems. On systems with a global variable containing INTEL_LICENSE_FILE, most of these options will be turned on automatically.

Return type:	None

installer file ¶

Compile necsim with default or provided compilation options. Intended for internal usage during pip or conda builds, although manual installation is also possible by running this file from the command line. python installer.py configures the install by detecting system components and compiles the C++ files, if possible. Command line flags can be provided to installer.py to modify the install (see Compilation Options for more information).

class Installer(dist, **kwargs)[source]¶

Bases: setuptools.command.build_ext.build_ext

Wraps configuration and compilation of C++ code.

autoconf()[source]¶: Runs the autoconf bash function (assuming that autoconf is available) to create the configure executable.

backup_makefile()[source]¶: Copies the makefile to a saved folder so that even if the original is overwritten, the last successful compilation can be recorded.

build_extension(ext)[source]¶: Builds the C++ and Python extension.

clean()[source]¶: Runs make clean in the NECSim directory to wipe any previous potential compile attempts.

clean_cmake()[source]¶: Deletes the cmake files and object locations if they exist.

configure(opts=None)[source]¶

Runs ./configure –opts with the supplied options. This should create the makefile for compilation, otherwise a RuntimeError will be thrown.

Parameters:	opts – a list of options to pass to the ./configure call

configure_and_compile(argv=[None], logging_level=20)[source]¶

Calls the configure script, then runs the compilation.

Parameters:	argv – the arguments to pass to configure script logging_level – the logging level to utilise (defaults to INFO).
Return type:	None

copy_makefile()[source]¶: Copies the backup makefile to the main directory, if it exists. Throws an IOError if no makefile is found.

create_default_depend()[source]¶

Runs the default makedepend command, outputting dependencies to lib/depends_default.

Used to generate a default dependency file on a system where makedepend exists, for a system where it does not.

do_compile()[source]¶: Compiles the C++ necsim program by running make. This changes the working directory to wherever the module has been installed for the subprocess call.

get_build_dir()[source]¶

Gets the build directory.

Returns:	the build directory path

get_compilation_flags(display_warnings=False)[source]¶

Generates the compilation flags for passing to ./configure. :param display_warnings: If true, runs with the -Wall flag for compilation (displaying all warnings). Default is False.

Returns:	list of compilation flags.
Return type:	list

get_default_cmake_args(output_dir)[source]¶

Returns the default cmake configure and build arguments.

Parameters:	output_dir – the output directory to use
Returns:	tuple of two lists, first containing cmake configure arguments, second containing build arguments
Return type:	tuple

get_ldflags()[source]¶: Get the ldflags that Python was compiled with, removing some problematic options.

get_ldshared()[source]¶: Get the ldshared Python variables and replaces -bundle with -shared for proper compilation.

get_obj_dir()[source]¶

Gets the obj directory for installing obj files to.

Returns:	the obj directory path

make_depend()[source]¶: Runs make depend in the lib directory to calculate all dependencies for the header and source files.

Note

Fails silently if makedepend is not installed, printing an error to logging.

move_shared_object_file()[source]¶: Moves the shared object (.so) file to the build directory. :return:

run()[source]¶: Runs installation and generates the shared object files - entry point for setuptools

run_cmake(src_dir, cmake_args, build_args, tmp_dir, env)[source]¶

Runs cmake to compile necsim.

Parameters:	src_dir – the source directory for necsim .cpp and .h files cmake_args – arguments to pass to the cmake project tmp_dir – the build directory to output cmake files to env – the os.environ (or other environmental variables) to pass on

run_configure(argv=None, logging_level=20, display_warnings=False)[source]¶

Configures the install for compile options provided via the command line, or with default options if no options exist. Running with -help or -h will display the compilation configurations called from ./configure.

Parameters:	argv – the arguments to pass to configure script logging_level – the logging level to utilise (defaults to INFO). display_warnings – If true, runs with the -Wall flag for compilation (displaying all warnings). Default is False.

setuptools_cmake(ext)[source]¶

Configures cmake for setuptools usage.

Parameters:	ext – the extension to build cmake on

use_default_depends()[source]¶: Uses the default dependencies, copying all contents of depends_default to the end of Makefile.

Note

Zero error-checking is done here as the Makefiles should not change, and the depends_default file should be created using create_default_depend()

get_python_library(python_version)[source]¶: Get path to the Python library associated with the current Python interpreter.

landscape file ¶

Generate landscapes and check map file combinations. Child class for Simulation and DispersalSimulation. Contains Map objects for each relevant map file internally.

class Landscape[source]¶

Bases: object

Calculates offsets and dimensions of a selection of tif files making up a landscape.

add_historical_map(fine_file, coarse_file, time, rate=0.0)[source]¶

Adds an extra map to the list of historical maps.

Parameters:	fine_file (str) – the historical fine map file to add coarse_file (str) – the historical coarse map file to add time – the time to add (when the map is accurate) rate – the rate to add (the rate of habitat change at this time)

check_maps()[source]¶

Checks that the maps all exist and that the file structure makes sense.

Raises:	TypeError – if a dispersal map or reproduction map is specified, we must have a fine map specified, but not a coarse map. IOError – if one of the required maps does not exist
Returns:	None

detect_map_dimensions()[source]¶

Detects all the map dimensions for the provided files (where possible) and sets the respective values. This is intended to be run after set_map_files()

Raises:	TypeError – if a dispersal map or reproduction map is specified, we must have a fine map specified, but not a coarse map. IOError – if one of the required maps does not exist ValueError – if the dimensions of the dispersal map do not make sense when used with the fine map provided
Returns:	None

set_map(map_file, x_size=None, y_size=None)[source]¶

Quick function for setting a single map file for both the sample map and fine map, of dimensions x and y. Sets the sample file to “null” and coarse file and historical files to “none”.

Parameters:	map_file (str) – path to the map file x_size (int) – the x dimension, or None to detect automatically from the “.tif” file y_size (int) – the y dimension, or None to detect automatically from the “.tif” file

set_map_files(sample_file, fine_file=None, coarse_file=None, historical_fine_file=None, historical_coarse_file=None)[source]¶

Sets the map files (or to null, if none specified). It then calls detect_map_dimensions() to correctly read in the specified dimensions.

If sample_file is “null”, dimension values will remain at 0. If coarse_file is “null”, it will default to the size of fine_file with zero offset. If the coarse file is “none”, it will not be used. If the historical fine or coarse files are “none”, they will not be used.

Parameters:	sample_file (str) – the sample map file. Provide “null” if on samplemask is required fine_file (str) – the fine map file. Defaults to “null” if none provided coarse_file (str) – the coarse map file. Defaults to “none” if none provided historical_fine_file (str) – the historical fine map file. Defaults to “none” if none provided historical_coarse_file (str) – the historical coarse map file. Defaults to “none” if none provided
Return type:	None
Returns:	None

set_map_parameters(sample_file, sample_x, sample_y, fine_file, fine_x, fine_y, fine_x_offset, fine_y_offset, coarse_file, coarse_x, coarse_y, coarse_x_offset, coarse_y_offset, coarse_scale, historical_fine_map, historical_coarse_map)[source]¶

Set up the map objects with the required parameters. This is required for csv file usage.

Note that this function is not recommended for tif file usage, as it is much simpler to call set_map_files() and which should automatically calculate map offsets, scaling and dimensions.

Parameters:

sample_file – the sample file to use, which should contain a boolean mask of where to sample
sample_x – the x dimension of the sample file
sample_y – the y dimension of the sample file
fine_file – the fine map file to use (must be equal to or larger than the sample file)
fine_x – the x dimension of the fine map file
fine_y – the y dimension of the fine map file
fine_x_offset – the x offset of the fine map file
fine_y_offset – the y offset of the fine map file
coarse_file – the coarse map file to use (must be equal to or larger than fine map file)
coarse_x – the x dimension of the coarse map file
coarse_y – the y dimension of the coarse map file
coarse_x_offset – the x offset of the coarse map file at the resolution of the fine map
coarse_y_offset – the y offset of the coarse map file at the resoultion of the fine map
coarse_scale – the relative scale of the coarse map compared to the fine map (must match x and y scaling)
historical_fine_map – the historical fine map file to use (must have dimensions equal to fine map)
historical_coarse_map – the historical coarse map file to use (must have dimensions equal to coarse map)

sort_historical_maps()[source]¶: Sorts the historical maps by time.

landscape_metrics file ¶

Calculates landscape-level metrics, including mean distance to nearest-neighbour for each habitat cell and clumpiness.

class LandscapeMetrics(file=None, logging_level=30)[source]¶

Bases: pycoalescence.map.Map

Calculates the mean nearest-neighbour for cells across a landscape. See here for details.

get_clumpiness()[source]¶

Calculates the clumpiness metric for the landscape, a measure of how spread out the points are across the landscape. See here for details.

Returns:	the CLUMPY metric
Return type:	float

get_mnn()[source]¶

Calculates the mean nearest-neighbour for cells across a landscape. See here for details.

Returns:	the mean distance to the nearest neighbour of a cell.
Return type:	float

map module ¶

Open tif files and detect properties and data using gdal. Detailed here.

class GdalErrorHandler(logger)[source]¶

Bases: object

Custom error handler for GDAL warnings and errors.

handler(err_level, err_no, err_msg)[source]¶

Parameters:	err_level – the level at which to log outputs err_no – the error number to use err_msg – the error message
Returns:

class Map(file=None, is_sample=None, logging_level=30)[source]¶

Bases: object

Contains the file name and the variables associated with this map object.

The internal array of the tif file is stored in self.data, and band 1 of the file can be opened by using open()

Important

Currently, Map does not support skewed rasters (not north/south).

Variables:	data – if the map file has been opened, contains the full tif data as a numpy array.

calculate_offset(file_offset)[source]¶

Calculates the offset of the map object from the supplied file_offset.

The self map should be the smaller

Parameters:	file_offset (str/Map) – the path to the file to calculate the offset. Can also be a Map object with the filename contained.
Raises:	TypeError – if the spatial reference systems of the two files do not match
Returns:	the offset x and y (at the resolution of the file_home) in integers

calculate_scale(file_scaled)[source]¶

Calculates the scale of map object from the supplied file_scaled.

Parameters:	file_scaled (str/Map) – the path to the file to calculate the scale.
Returns:	the scale (of the x dimension)

check_map()[source]¶: Checks that the dimensions for the map have been set and that the map file exists

convert_lat_long(lat, long)[source]¶

Converts the input latitude and longitude to x, y coordinates on the Map

Parameters:	lat – the latitude to obtain the y coordinate of long – the longitude to obtain the x coordinate of
Raises:	IndexError – if the provided coordinates are outside the Map object.
Returns:	[x, y] coordinates on the Map

create(file, bands=1, datatype=<MagicMock name='mock.GDT_Byte' id='140067099763488'>, geotransform=None, projection=None)[source]¶

Create the file output and writes the data to the output.

Parameters:	file (str) – the output file to create bands (int) – optionally provide a number of bands to create datatype (gdal.GDT_Byte) – the databae of the output geotransform (tuple) – optionally provide a geotransform to set for the raster - defaults to (0, 1, 0, 0,

0, -1) :param string projection: optionally provide a projection to set for the raster, in WKT format

create_copy(dst_file, src_file=None)[source]¶

Creates a file copying projection and other attributes over from the desired copy

Parameters:	dst_file – existing file to create src_file – the source file to copy from

get_band_number()[source]¶

Gets the number of raster bands in the file.

Return type:	int
Returns:	the number of bands in the raster

get_cached_subset(x_offset, y_offset, x_size, y_size)[source]¶

Gets a subset of the map file, BUT rounds all numbers to integers to save RAM and keeps the entire array in memory to speed up fetches.

Parameters:	x_offset (int) – the x offset from the top left corner of the map y_offset (int) – the y offset from the top left corner of the map x_size (int) – the x size of the subset to obtain y_size (int) – the y size of the subset to obtain
Returns:	a numpy array containing the subsetted data

get_dataset(file=None, permissions=<MagicMock name='mock.GA_Update' id='140067099750976'>)[source]¶

Gets the dataset from the file.

Parameters:	file (str) – path to the file to open permissions (int) – the gdal permission reference to open the dataset
Raises:	ImportError – if the gdal module has not been imported correctly IOError – if the supplied filename is not a tif or vrt IOError – if the map does not exist
Returns:	an opened dataset object

get_dimensions()[source]¶

Calls read_dimensions() if dimensions have not been read, or reads stored information.

Returns:	a list containing [0] x, [1] y, [2] x offset, [3] y offset, [4] x resolution, [5] y resolution, [6] upper left x, [7] upper left y

get_dtype(band_no=None)[source]¶

Gets the data type of the provided band number

Parameters:	band_no – band number to obtain the data type of
Return type:	int
Returns:	the gdal data type number in the raster file

get_extent()[source]¶: Gets the min and max x, and min and max y values, including accounting for skew :return: list of the x min, x max, y min, y max values. :rtype: list

get_geo_transform()[source]¶

Gets the geotransform of the file.

Returns:	list containing the geotransform parameters

get_no_data(band_no=None)[source]¶

Gets the no data value for the tif map.

Parameters:	band_no – the band number to obtain the no data value from
Returns:	the no data value
Return type:	float

get_projection()[source]¶

Gets the projection of the map.

Returns:	the projection object of the map in WKT format
Return type:	str

get_subset(x_offset, y_offset, x_size, y_size, no_data_value=None)[source]¶

Gets a subset of the map file

Parameters:	x_offset (int) – the x offset from the top left corner of the map y_offset (int) – the y offset from the top left corner of the map x_size (int) – the x size of the subset to obtain y_size (int) – the y size of the subset to obtain no_data_value (float/int) – optionally provide a value to replace all no data values with.
Returns:	a numpy array containing the subsetted data

get_x_y()[source]¶

Simply returns the x and y dimension of the file.

Returns:	the x and y dimensions

has_equal_dimensions(equal_map)[source]¶

Checks if the supplied Map has equal dimensions to this Map.

Note

Dimension matching uses an absolute value (0.0001) for latitude/longitude, and relative value for pixel resolution. The map sizes must fit perfectly.

Parameters:	equal_map (Map) – the Map object to check if dimensions match
Returns:	true if the dimensions match, false otherwise
Return type:	bool

is_within(outside_map)[source]¶

Checks if the object is within the provided Map object.

Note

Uses the extents of the raster file for checking location, ignoring any offsetting

Parameters:	outside_map (Map) – the Map object to check if this class is within
Returns:	true if this Map is entirely within the supplied Map
Return type:	bool

map_exists(file=None)[source]¶

Checks if the output (or provided file) exists.

If file is provided, self.file_name is set to file.

Parameters:	file – optionally, the file to check exists
Returns:	true if the output file does exist
Rtype bool:

open(file=None, band_no=1)[source]¶

Reads the raster file from memory into the data object. This allows direct access to the internal numpy array using the data object.

Parameters:	file (str) – path to file to open (or None to use self.file_name band_no (int) – the band number to read from
Return type:	None

plot()[source]¶

Returns a matplotlib.pyplot.figure object containing an image of the fragmented landscape (with axes removed).

Requires that the fragmented landscape has been created already using create().

Returns:	figure object containing the fragmented landscape.
Return type:	matplotlib.pyplot.figure

rasterise(shape_file, raster_file=None, x_res=None, y_res=None, output_srs=None, geo_transform=None, field=None, burn_val=None, data_type=<MagicMock name='mock.GDT_Float32' id='140067108256792'>, attribute_filter=None, x_buffer=None, y_buffer=None, extent=None, **kwargs)[source]¶

Rasterises the provided shape file to produce the output raster.

If x_res or y_res are not provided, self.x_res and self.y_res will be used.

If a field is provided, the value in that field will become the value in the raster.

If a geo_transform is provided, it overrides the x_res, y_res, x_buffer and y_buffer.

Parameters:	shape_file (str/os.path) – path to the .shp vector file to rasterise, or an ogr.DataSource object contain

the shape file :param str/os.path raster_file: path to the output raster file (should not already exist) :param int/float x_res: the x resolution of the output raster :param int/float y_res: the y resolution of the output raster :param str/osr.SpatialReference output_srs: optionally define the output projection of the raster file :param list/tuple geo_transform: optionally define the geotransform of the raster file (cannot use resolution or

buffer arguments with this option)

Parameters:	field (str) – the field to set as raster values burn_val (list/int) – the r,g,b value to use if there is no field for the location data_type (int) – the gdal type for output data attribute_filter (str) – optionally provide a filter to extract features by, of the form “field=fieldval” x_buffer (int/float) – number of extra pixels to include at left and right sides y_buffer (int/float) – number of extra pixels to include at top and bottom extent (list) – list containing the new extent, provided as [ulx, lrx, uly, lry] (output from get_extent()) kwargs – additional options to provide to gdal.RasterizeLayer
Raises:	IOError – if the shape file does not exist IOError – if the output raster already exists ValueError – if the provided shape_file is not a .shp file RuntimeError – if gdal throws an error during rasterisation
Return type:	None

read_dimensions()[source]¶

Return a list containing the geospatial coordinate system for the file.

Returns:	a list containing [0] x, [1] y, [2] upper left x, [3] upper left y, [4] x resolution, [5] y resolution

reproject_raster(dest_projection=None, source_file=None, dest_file=None, x_scalar=1.0, y_scalar=1.0, resample_algorithm=<MagicMock name='mock.GRA_NearestNeighbour' id='140067099776000'>, warp_memory_limit=0.0)[source]¶

Re-writes the file with a new projection.

Note

Writes to an in-memory file which then overwrites the original file, unless dest_file is not None.

Parameters:

dest_projection (str/os.path) – the destination file projection, can only be None if rescaling
source_file (str/os.path) – optionally provide a file name to reproject. Defaults to self.file_name
dest_file (str/os.path) – the destination file to output to (if None, overwrites original file)
x_scalar (float) – multiplier to change the x resolution by, defaults to 1
y_scalar (float) – multiplier to change the y resolution by, defaults to 1
resample_algorithm (gdal.GRA) – should be one of the gdal.GRA algorithms
warp_memory_limit (float) – optionally provide a memory cache limit (uses default if 0.0)

set_dimensions(file_name=None, x_size=None, y_size=None, x_offset=None, y_offset=None)[source]¶

Sets the dimensions and file for the Map object

Parameters:	file_name (str/pycoalescence.Map) – the location of the map object (a csv or tif file). If None, required that file_name is already provided. x_size (int) – the x dimension y_size (int) – the y dimension x_offset (int) – the x offset from the north-west corner y_offset (int) – the y offset from the north-west corner
Returns:	None

set_sample(is_sample)[source]¶

Set the is_sample attribute to true if this is a sample mask rather than an offset map

Parameters:	is_sample (bool) – indicates this is a sample mask rather than offset map

translate(dest_file, source_file=None, **kwargs)[source]¶

Translates the provided source file to the output file, given a set of options to pass to gdal.Translate()

Parameters:	dest_file (str) – the destination file to create source_file (str) – the source file to translate, or None to translate this file kwargs – additional keywords to pass to gdal.Translate()
Return type:	None

write(file=None, band_no=None)[source]¶

Writes the array in self.data to the output array. The output file must exist, and the array will be overridden in the band. Intended for writing changes to the same file the data was read from.

Parameters:	file – the path to the file to write to band_no – the band number to write into

:rtype None

write_subset(array, x_off, y_off)[source]¶

Writes over a subset of the array to file. The size of the overwritten area is detected from the inputted array, and the offsets describe the location in the output map to overwrite.

The output map must file must exist and be larger than the array.

Parameters:	array (numpy.ndarray) – the array to write out x_off (int) – the x offset to begin writing out from y_off (int) – the y offset to begin writing out from
Return type:	None

zero_offsets()[source]¶: Sets the x and y offsets to 0

shapefile_from_wkt(wkts, dest_file, EPSG=4326, fields=None)[source]¶

Generates a shape file from a WKT string.

Parameters:	wkts – a list of well-known text polygons to create in the shapefile dest_file – a destination file to create EPSG – the EPSG to use for the spatial referencing fields – list of dictionaries containing fields to add to the geometries
Return type:	None

merger module ¶

Combine simulation outputs from separate guilds. Detailed here.

Merger will output a single database file, merging the various biodiversity tables into one.

Metrics are also calculated for the entire system, with a guild reference of 0.

All standard routines provided in CoalescenceTree can then be performed on the combined database.

class Merger(database=None, logging_level=30, log_output=None, expected=False)[source]¶

Bases: pycoalescence.coalescence_tree.CoalescenceTree

Merges simulation outputs into a single database. Inherits from CoalescenceTree to provide all routines in the same object.

add_simulation(input_simulation)[source]¶

Adds a simulation to the list of merged simulations.

This also calls the relevant merges for the tables that exist in the provided database.

Parameters:	input_simulation – either the path to the input simulation, a Coalescence class object, or a CoalescenceTree object which contains the completed simulation.
Returns:	None
Return type:	None

add_simulations(simulation_list)[source]¶

A convenience function that adds each simulation from the list of simulations provided and then writes to the database.

Parameters:	simulation_list – list of paths to completed simulations

apply()[source]¶: Generates the cooalescence tree for the set of speciation parameters. This must be run after the main coalescence simulations are complete. It will create additional fields and tables in the SQLite database which contains the requested data.

apply_incremental()[source]¶: Generates the coalescence tree for the set of speciation parameters. Does not write changes to the database, just holds the changes internally.

generate_guild_tables()[source]¶

Generates a set of tables containing the biodiversity metrics for each guild.

Return type:	None

get_added_simulations()[source]¶

Gets the simulations which have already been added to the database.

Returns:	dictionary of simulations and guild numbers
Return type:	dict

output()[source]¶: Outputs the coalescence trees to the same simulation database object.

set_database(filename, expected=False)[source]¶

Sets the output database for the merged simulations

Assumes no database currently exists, and will create one.

Raises:	IOError – if the output database already exists
Parameters:	filename – the filename to output merged simulations into expected – if true, expects the output to exist
Return type:	None

write()[source]¶

Writes out all stored simulation parameters to the output database and wipes the in-memory objects.

This should be called after all simulation have been added, or when RAM usage gets too large for large simulations

patched_landscape module ¶

Generate landscapes of interconnected patches for simulating within a spatially explicit neutral model. Detailed here.

Dispersal probabilities are defined between different patches, and each patch will be contain n individuals.

class Patch(id, density)[source]¶

Bases: object

Contains a single patch, to which the probability of dispersal to every other patch can be added.

add_patch(patch, probability)[source]¶

Adds dispersal from this patch to another patch object with a set probability. The patch should not already have been added.

Note

The probabilities can be relative, as they can be re-scaled to sum to 1 using re_scale_probabilities().

Raises:	KeyError – if the patch already exists in the dispersal probabilities. ValueError – if the dispersal probability is less than 0.
Parameters:	patch – the patch id to disperse to probability – the probability of dispersal

re_scale_probabilities()[source]¶

Re-scales the probabilities so that they sum to 1. Also checks to make sure dispersal from within this patch is defined.

Raises:	ValueError – if the self dispersal probability has not been defined, or the dispersal probabilities do not sum to > 0.

class PatchedLandscape(output_fine_map, output_dispersal_map)[source]¶

Bases: object

Landscape made up of a list of patches with dispersal probabilities to each other.

add_dispersal(source_patch, target_patch, dispersal_probability)[source]¶

Adds a dispersal probability from the source patch to the target patch.

Note

Both the source and target patch should already have been added using add_patch().

Parameters:	source_patch – the id of the source patch target_patch – the id of the target patch dispersal_probability – the probability of dispersal from source to target

add_patch(id, density, self_dispersal=None, dispersal_probabilities=None)[source]¶

Add a patch with the given parameters.

Parameters:	id – the unique reference for the patch density – the number of individuals that exist in the patch self_dispersal – the relative probability of dispersal from within the same patch dispersal_probabilities – dictionary containing all other patches and their relative dispersal probabilities

generate_files()[source]¶

Re-scales the dispersal probabilities and generates the patches landscape files. These include the fine map file containing the densities and the dispersal probability map.

The fine map file will be dimensions 1xN where N is the number of patches in the landscape.: The dispersal probability map will be dimensions NxN, where dispersal occurs from the y index cell to the x index cell.

generate_fragment_csv(fragment_csv)[source]¶

Generates a fragment csv for usage within a coalescence simulation, with each patch becomming one fragment on the landscape.

Parameters:	fragment_csv – the path to the output csv to create
Raises:	IOError – if the output fragment csv already exists

generate_from_matrix(density_matrix, dispersal_matrix)[source]¶

Generates the patched landscape from the input matrix and writes out to the files.

Note

Uses a slightly inefficient method of generating the full patched landscape, and then writing back out to the map files so that full error-checking is included. A more efficient implementation is possible by simply writing the matrix to file using the Map class.

Note

The generated density map will have dimensions 1 by xy (where x, y are the dimensions of the original density matrix. However, the dispersal matrix should still be compatible with the original density matrix as a x by y tif file.

Parameters:	density_matrix – a numpy matrix containing the density probabilities dispersal_matrix – a numpy matrix containing the dispersal probabilities

has_patch(id)[source]¶

Checks if the patches object already contains a patch with the provided id.

Parameters:	id – id to check for in patches
Returns:	true if the patch already exists

convert_index_to_x_y(index, dim)[source]¶

Converts an index to an x, y coordinate.

Used when mapping from 1-D space to 2-D space.

Parameters:	index – the index to convert from dim – the x dimension of the matrix
Returns:	a tuple of integers containing the x and y coordinates
Return type:	tuple

spatial_algorithms file ¶

Simple spatial algorithms required for package functionality.

Algorithms include generation of Voronoi diagrams and spacing points on a landscape using Lloyd’s algorithm.

archimedes_spiral(centre_x, centre_y, radius, theta)[source]¶

Gets the x, y coordinates on a spiral, given a radius and theta

Parameters:	centre_x (int) – the x coordinate of the centre of the spiral centre_y (int) – the y coordinate of the centre of the spiral radius (float) – the distance from the centre of the spiral theta (float) – the angle of rotation
Returns:	tuple of x and y coordinates
Return type:	tuple

calculate_centre_of_mass(points_list)[source]¶

Calculates the centre of mass for the non-intersecting polygon defined by points_list.

Note

the centre of mass will be incorrect for intersecting polygons.

Note

it is assumed that points_list defines, in order, the vertices of the polygon. The last point is assumed to connect to the first point.

Parameters:	points_list – a list of x, y points defining the non-intersecting polygon
Returns:	the x,y centre of mass

calculate_distance_between(x1, y1, x2, y2)[source]¶

Calculates the distance between the points (x1, y1) and (x2, y2)

Note

Returns the absolute value

Parameters:	x1 – x coordinate of the first point y1 – y coordinate of the first point x2 – x coordinate of the second point y2 – y coordinate of the second point
Returns:	the absolute distance between the points

convert_coordinates(x, y, input_srs, output_srs)[source]¶

Converts the coordinates from the input srs to the output srs.

Parameters:	x – the x coordinate to transform y – the y coordinate to transform input_srs – the input srs to transform from output_srs – the output srs to transform to
Return type:	list
Returns:	transformed [x, y] coordinates

estimate_sigma_from_distance(distance, n)[source]¶

Estimates the sigma value from a rayleigh distribution (2-d normal) from a total distance travelled in n steps.

Parameters:	distance (float) – the total distance travelled n (int) – the number of steps
Returns:	an estimation of the sigma value required to generate the distance travelled in n steps

lloyds_algorithm(points_list, maxima, n=7)[source]¶

Equally spaces the points in the given landscape defined by (0, x_max), (0, y_max) using Lloyd’s algorithm.

Algorthim is:

Reflect the points at x=0, x=x_max, y=0 and y=y_max to make boundaries of the Voronoi diagram on the original

set of points have finite edges

Define the Voronoi diagram separating the points
Find the centres of the regions of the voronoi diagram for our original set of points
Move the our points to the centres of their voronoi regions
Repeat n times (for convergence)
Edits the points_list to contain the equally-spaced points

Note

all points are assumed to be in the range x in (0, x_max) and y in (0, y_max)

Parameters:	points_list – a list of points to be equally spaced in the landscape maxima – the maximum size of the landscape to space out within n – the number of iterations to perform Lloyd’s algorthim for.

:return list containing the new point centres.

reflect_dimensions(points, maximums)[source]¶

Reflects the provided points across x=0, y=0, x=x_max and y=y_max (essentially tiling the polygon 4 times, around the original polygon).

Parameters:	points (list) – a list of 2-d points to reflect maximums (tuple) – tuple containing the x and y maximums
Returns:	a list of reflected points

sqlite_connection file ¶

Safely open, close and fetch data from an sqlite connection.

SQLiteConnection contains context management for opening sql connections, plus basic functionality for detecting existence and structure of databases.

class SQLiteConnection(filename)[source]¶

Bases: object

Class containing context management for opening sqlite3 connections. The file name provided can either be a string containing the path to the file, or an sqlite3.Connection object, which will NOT be closed on destruction. This provides two points of entry to the system with the same interface.

check_sql_column_exists(database, table_name, column_name)[source]¶

Checks if the column exists in the database.

Parameters:	database (str/sqlite3.Connection) – the database to check existence in table_name (str) – the table name to check within column_name (str) – the column name to check for
Returns:	true if the column exists.
Return type:	bool

check_sql_table_exist(database, table_name)[source]¶

Checks that the supplied table exists in the supplied database.

Parameters:	database (str/sqlite3.Connection) – the database to check existence in table_name (str) – the table name to check for
Returns:	true if the table exists
Return type:	bool

fetch_table_from_sql(database, table_name, column_names=False)[source]¶

Returns a list of the data contained by the provided table in the database.

Raises:	sqlite3.Error – if the table is not contained in the database (protects SQL injections).
Parameters:	database (str/sqlite3.Connection) – the database to obtain from table_name (str) – the table name to fetch data from column_names (bool) – if true, return the column names as the first row in the output
Returns:	a list of lists, containing all data within the provided table in the database

get_table_names(database)[source]¶

Gets a list of all table names in the database.

Parameters:	database (str/sqlite3.Connection) – the path to the database connection or an already-open database object
Returns:	a list of all table names from the database
Return type:	list

sql_get_max_from_column(database, table_name, column_name)[source]¶

Returns the maximum value from the specified column.

Parameters:	database (str/sqlite3.Connection) – the database to fetch from table_name (str) – the table name to attain column_name (str) – the column name to obtain from
Returns:

system_operations file ¶

Basic system-level operations required for package functionality, including subprocess calls, logging methods and file management.

The functions are contained here as they are required by many different modules. Note that logging will not raise an exception if there has been no call to set_logging_method()

cantor_pairing(x1, x2)[source]¶

Creates a unique integer from the two provided positive integers.

Maps ZxZ -> N, so only relevant for positive numbers. For any A and B, generates C such that no D and E produce C unless D=A and B=E.

Assigns consecutive numbers to points along diagonals of a plane

Parameters:	x1 – the first number x2 – the second number
Returns:	a unique reference combining the two integers

check_file_exists(file_name)[source]¶

Checks that the specified filename exists, if it is not “null” or “none”.

Parameters:	file_name – file path to check for
Returns:	None
Raises:	IOError if no file exists

check_parent(file_path)[source]¶

Checks if the parent file exists, and creates it if it doesn’t.

Note

if file_path is a directory (ends with a “/”), it will be created

Parameters:	file_path – the file or directory to check if the parent exists
Return type:	None

create_logger(logger, file=None, logging_level=30, **kwargs)[source]¶

Creates a logger object to be assigned to NECSim sims and dispersal tests.

Parameters:	logger – the logger to alter file – the file to write out to, defaults to None, writing to terminal logging_level – the logging level to write out at (defaults to INFO) kwargs – optionally provide additional arguments for logging to
Returns:

elegant_pairing(x1, x2)[source]¶

A more elegant version of cantor pairing, which allows for storing of a greater number of digits without experiencing integer overflow issues.

Cantor pairing assigns consecutive numbers to points along diagonals of a plane

Parameters:	x1 – the first number x2 – the second number
Returns:	a unique reference combining the two integers.

execute(cmd, silent=False, **kwargs)[source]¶

Calls the command using subprocess and yields the running output for printing to terminal. Any errors produced by subprocess call will be redirected to logging.warning() after the subprocess call is complete.

Parameters:	cmd – the command to execute using subprocess.Popen() silent – if true, does not log any warnings

:return a line from the execution output

execute_log_info(cmd, **kwargs)[source]¶

Calls execute() with the supplied command and keyword arguments, and redirects stdout to the logging object.

Parameters:	cmd – the command to execute using subprocess.Popen() kwargs – keyword arguments to be passed to subprocess.Popen()
Returns:	None
Return type:	None

execute_silent(cmd, **kwargs)[source]¶

Calls execute() silently with the supplied command and keyword arguments.

Note

If this function fails, no error will be thrown due to its silent nature, unless a full failure occurs.

Parameters:	cmd – the command to execute using subprocess.Popen() kwargs – keyword arguments to be passed to subprocess.Popen()
Returns:	None
Return type:	None

set_logging_method(logging_level=20, output=None, **kwargs)[source]¶

Initiates the logging method.

Parameters:	logging_level – the detail in logging output: can be one of logging.INFO (default), logging.WARNING, logging.DEBUG, logging.ERROR or logging.CRITICAL output – the output logfile (or None to redirect to terminal via stdout) kwargs – additional arguments to pass to the logging.basicConfig() call
Returns:	None

write_to_log(i, message, logger)[source]¶

Writes the message to the provided logger, at the provided level.

This is used by necsim to access to logging module more easily.

Parameters:	i (int) – the level to log at (10: debug, 20: info, 30: warning, 40: error, 50: critical) message (str) – the message to write to the logger. logger (logging.Logger) –
Return type:	None