necsim¶
A Neutral Ecology Coalescence Simulator
Introduction¶
necsim is a generic spatial coalescence simulator for neutral systems. It is intended to simulate an ecological community under individual-based neutral dynamics. It applies the model to maps for the supplied parameters and outputs information for each individual to a SQL database.
Functionality is also provided for applying varying speciation rates to outputs of necsim for analysis after simulations are complete. This enables the main simulation to be run with the minimum speciation rate required and afterwards analysis can be completed using different speciation rates. The same functionality is also provided within necsim for application of speciation rates immediately after simulations are complete.
Important
The recommended method of installing the program and running simulations is to use the pycoalescence module.
A Note on the Neutral Theory of Ecology¶
Neutral theory in ecology refers to the idea that individuals can be modelled as ecologically identical entities, undergoing dispersal, drift and speciation without niche effects or other competitive elements.
Whilst obviously not realistic, the patterns produced by such models can often give a surprisingly accurate portrayal of real-world systems. For more information on the topic, please see Hubbell (2001).
Instructions¶
Compiling the program¶
For compilation, there are several provided options:
Compilation can be handled within pycoalescence by running
python setup.py
. This is the recommended option.Alternatively, compilation can be completed with additional options using make. The steps are outlined below
- You might need to first run
autoconf
from within the necsim directory to generate the configure executable. - Run
./configure
(located within the necsim directory). Provide additional compilation flags if necessary (detailed below). - Run
make all
to create the executable. - [Optional] Move the executable (called necsim) to the build/Default directory in pycoalescence.
- You might need to first run
If you require compilation outside of the pycoalescence module, make use of the the file Makefile located in Makefiles/SimpleCompile. This can be modified and run using
make
to generate the executable.
Note
If you want to also compile the module for integrating with Python, provide the Python library (PYTHON_LIB) and
Python include directories (PYTHON_INCLUDE) as command line flags to ./configure
. You must then run make all
.
This is not recommended, and it is advised you use the install proceedure outlined here.
Warning
Compilation on high-performance clusters will likely require an icc compiler and custom linking to the required libraries.
See the Requirements section for a full list of the necessary prerequisites.
Requirements¶
- The SQLite library available here.
- The Boost library available here.
- C++ compiler (such as GNU g++) with C++14 support.
- Access to the relevant folders for Default simulations (see FAQS).
Recommended, but not essential:
- gdal library available here: provides reading of tif files.
- The fast-cpp-csv-parser by Ben Strasser, available here: provides much faster csv read and write capabilities. This is only really necessary if you have extremely large csv files as your map objects. If that is the case, I would highly recommend moving to using tif files, as they provide much more error-checking support within necsim, as well as much more advanced tools for dealing with spatial data.
Note
To use the fast-cpp-csv-parser, copy the whole folder (with csv.h in) to the necsim directory. Then reconfigure and recompile your code, making sure that you see ‘fast-cpp-csv-parser = enabled’ in the configuration output.
Compiler Options¶
Recognised compiler options include:
Option | Description |
---|---|
–with-debug | Adds additional debugging information to a log file in /logs |
–with-gdal=DIR | Define a gdal library at DIR |
–with-hpc | Compile ready for HPC, using intel’s icpc compilation and a variety of optimisation flags. |
–with-boost=DIR | Define a boost library at DIR |
Additional c++ compilation flags can be specified by CPPFLAGS=opts
for additional library paths or compilation
options as required.
Note that gdal and fast-cpp-csv-parser availability will be automatically detected and included in the compilation if possible.
Running simulations¶
As of version 3.1 and above, the routine relies on
supplying command line arguments (see below) for all the major
simulation variables. Alternatively, supplying a config .txt file and
using the command line arguments ./necsim -c /path/to/config.txt
can be used for parsing command line arguments from the text file.
Command Line Arguments¶
Deprecated since version This: method of supplying simulation parameters is not recommended and is provided for backwards-compatibility only. Support will be dropped completely in a future release.
The following command line arguments are required. This list can be
accessed by running “./necsim -h”
or ./necsim -help
As of version 3.6 and above, the command line options to be specified are:
- the seed for the simulation.
- the simulation task (for file reference).
- the map config file.
- the output directory.
- the minimum speciation rate.
- the dispersal z_fat value.
- the dispersal L value.
- the deme size.
- the deme sample size.
- the maximum simulation time (in seconds).
- the lambda value for moving through non-habitat.
- the temporal sampling file containing generation values for sampling points in time (null for only sampling the present)
- the minimum number of species known to exist. (Currently has no effect).
- (and onwards) speciation rates to apply after simulation.
In this format, the map config file and temporal sampling file are as described in Config Files.
Alternatively, by specifying the -f flag, (full mode) as the first argument, the program can read in pre-3.6 command line arguments, which are as followed.
- the task_iter used for setting the seed.
- the sample grid x dimension
- the sample grid y dimension
- the fine map file relative path.
- the fine map x dimension
- the fine map y dimension
- the fine map x offset
- the fine map y offset
- the coarse map file relative path.
- the coarse map x dimension
- the coarse map y dimension
- the coarse map x offset
- the coarse map y offset
- the scale of the coarse map compared to the fine (10 means resolution of coarse map = 10 x resolution of fine map)
- the output directory
- the speciation rate.
- the dispersal sigma value.
- the deme size
- the deme sample size (as a proportion of deme size)
- the time to run the simulation (in seconds).
- lambda - the relative cost of moving through non-forest
- the_task - for referencing the specific task later on.
- the minimum number of species the system is known to contain.
- the historical fine map file to use
- the historical coarse map file to use
- the rate of forest change from historical
- the time (in generations) since the historical forest was seen.
- the dispersal tau value (the width of the fat-tailed kernel).
- the sample mask, with binary 1:0 values for areas that we want to sample from. If this is not provided then this will default to mapping the whole area.
- the link to the file containing every generation that the list should be expanded. This should be in the format of a list.
- (and onwards) - speciation rates to apply after the simulation is complete.
Warning
This method of running simulations is provided for legacy purposes only, and is no longer recommended. For increase functionality, use the condensed command-line format, or switch to using config files.
Config Files¶
A configuration file can be used for setting simulation parameters.
-
- Contains the main simulation parameters, including dispersal parameters, speciation rates, sampling information and file referencing information. It also includes the paths to the other config files, which must be specified if the main simulation config is used.
- Contains the map parameters, including paths to the relevant map files, map dimensions, offsets and scaling. This option cannot be null (map dimensions at least must be specified).
- Contains the temporal sampling points, in generations. If this is ‘null’, sampling will automatically occur only at the present (generation time=0)
When running the simulation using config files, the path to the main simulation config file should be specified, e.g
./necsim -c /path/to/main/config.txt
.
Main Config File¶
The configuration containing the majority of the simulation set up, outside of map dimensions. An example file is shown
below. This file can be automatically generated by
create_config()
in pycoalescence. An example of this
configuration is given below:
[main]
seed = 6
task = 6
output_directory = output
min_spec_rate = 0.5
sigma = 4
tau = 4
deme = 1
sample_size = 0.1
max_time = 1
lambda = 1
min_species = 1
[spec_rates]
spec_rate1 = 0.6
spec_rate2 = 0.8
Map Config Options¶
The map options contain the information for setting up all maps required by the simulation. This involves maps at all times and at all scales. An example is given below.
[sample_grid]
path = null
x = 13
y = 13
mask = null
[fine_map]
path = sample/SA_sample_fine.tif
x = 13
y = 13
x_off = 0
y_off = 0
[coarse_map]
path = sample/SA_sample_coarse.tif
x = 35
y = 41
x_off = 11
y_off = 14
scale = 1.0
[historical_fine0]
path = sample/SA_sample_fine_historical1.tif
number = 0
time = 1
rate = 0.5
[historical_coarse0]
path = sample/SA_sample_coarse_historical1.tif
number = 0
time = 1
rate = 0.5
[historical_fine1]
path = sample/SA_sample_fine_historical2.tif
number = 1
time = 4
rate = 0.7
[historical_coarse1]
path = sample/SA_sample_coarse_historical2.tif
number = 1
time = 4
rate = 0.7
Note
The rates and times between the pairs of historical fine maps and historical coarse maps must match up. Without matching values here, there could be undetermined errors, or coarse map values being ignored.
Note
Pristine maps assume the same dimensions as their respective present-day equivalents.
Note
In older versions of this program these options were contained in a separate file. However, as of 1.2.4 all simulation options are contained in the same file.
Time Config Options¶
The temporal sampling options are specified as follows.
[main]
time0 = 0.0
time1 = 1.0
Note
For each speciation rate, all biodiversity measures (such as species’ abundances and species’ richness) will be calculated for each time supplied separately.
Default parameters¶
To run the program with the default parameters for testing purposes, run with the command line arguments -d or -dl (for the larger default run). Note that this will require access to the Default/ folder relative to the path of the program for storing the outputs to the default runs:
Outputs¶
Upon successful completion of a simulation, the outputs and parameters are stored in an SQLite database.
- an SQLite database file in the output directory. This database contains all important simulation data over several tables, which can be accessed using a program like DB Browser for SQLite or Microsoft Access. Alternatively, most programming languages have an SQLite interface (RSQlite, Python sqlite3)
The tables in the SQLite database are
SIMULATION_PARAMETERS
contains the parameters the simulation was performed with for referencing later.
SPECIES_LIST
contains the locations of every coalescence event. This is used by SpeciationCounter to reconstruct the coalescence tree for application of speciation rates after simulations are complete.
SPECIES_ABUNDANCES
contains the species abundance distributions for each speciation rate and time point that has been specified.
SPECIES_LOCATIONS [optional]
contains the x, y coordinates of every individual at each time point and for every specified speciation rate, along with species ID numbers.
FRAGMENT_ABUNDANCES [optional]
contains the species abundance distributions for each habitat fragment, either specified by the fragment csv file, or detected from squares across the map.
Additional information can be found in SpeciationCounter regarding the optional database tables.
SpeciationCounter¶
SpeciationCounter provides a method for applying additional speciation rates to outputs from necsim, without having to re-run the entire simulation. SpeciationCounter works by reconstructing the coalescence tree, checking at each point if an additional speciation rate has occured. As such, SpeciationCounter can only apply speciation rates higher than the initial speciation rate the program was run with.
Applying Speciation Rates¶
The command-line interface for applying speciation rates post-simulation has been removed - instead use
CoalescenceTree
in
pycoalescence module.
Debugging¶
Most errors will return an error code in the form “ERROR_NAME_XXX: Description” a list of which can be found in ERROR_REF.txt.
Brief Class Descriptions¶
A brief description of the important classes is given below.
The
Tree
class- The most important class!
- Contains the main setup, run and data output routines.
setup()
imports the data files from csv (if necessary) and creates the in-memory objects for the storing of the coalescence tree and the spatial grid of active lineages. Setup time mostly depends on the size of the csv file being imported.runSimulation()
continually loops over successive coalesence, move or speciation events until all individuals have speciated or coalesced. This is where the majority of the simulation time will be, and is mostly dependent on the number of individuals, speciation rate and size of the spatial grid.- At the end of the simulation, the
applyMultipleRates()
routine will generate the in-memory SQLite database for storing the coalescent tree. It can run multiple times if multiple speciation rates are required.outputData()
is called internally to output the SQLite database to file.
The
TreeNode
class- Contains a single record of a node on the phylogenetic tree, to be used in reassembling the tree structure at the end of the simulation.
The
DataPoint
class- Contains a single record of the location of a lineage.
The
NRrand
class- Contains the random number generator, as written by James Rosindell (j.rosindell@imperial.ac.uk).
The
Landscape
class- Contains the routines for importing and calling values from the
Map
objects. - The
getVal()
andrunDispersal()
functions can be modified to produce altered dispersal behaviour.
- Contains the routines for importing and calling values from the
The
DispersalCoordinator
class- Used to generate dispersal distances from a particular dispersal kernel. The dispersal kernel can also take the form of a dispersal map, which defines probabilites of dispersal between every cell on the landscape.
The
Matrix
andRow
classes- Based on code written by James Rosindell (j.rosindell@imperial.ac.uk).
- Handles indexing of the 2D object plus importing values from a csv or tif file
The
Map
class- Derived from
Matrix
with extra functionality for handling tif file parameter reads and calculation of spatial data.
- Derived from
The
DataMask
andSamplematrix
classes- Special maps that define the location of sampling across a landscape. Contain extra functionality for defining “null” maps (sampling everywhere).
The
SpeciesList
class- Contains the list of individuals, for application in a matrix, to essentially create a 3D array.
- Handles the positioning of individuals in space within a grid cell.
The
ConfigOption
class- Contains basic functions for importing command line arguments from a config file, providing an alternative way of setting up simulations.
The
Community
class- Provides the routines for applying different speciation rates to a phylogenetic tree, to be used either immediately after simulation within necsim, or at a later time using SpeciationCounter
- Use to generate a community of individuals for a particular set of parameters, providing options for generating species identities, species abundance distributions and species locations.
Known Bugs¶
- Simulations run until completion, rather than aiming for a desired number of species. This is an intentional change. Functions related to this functionality remain but are deprecated.
- In SpeciationCounter, only continuous rectangular fragments are properly calculated. Other shapes must be calculated by post-processing.
- In SpeciationCounter, 3 fragments instead of 2 will be calculated for certain adjacent rectangular patches.
FAQS (WIP)¶
- Why doesn’t the default simulation output anything?
- Check that the program has access to the folders relative to the program at Default/
- Why can’t I compile the program?
- This could be due to a number of reasons, most likely that you haven’t compiled with access to the lsqlite3 or boost packages. Installation and compilation differs across different systems; for most UNIX systems, compiling with the linker arguments -lsqlite3 -lboost_filesystem and -lboost_system will solve problems with the compiler not finding the sqlite or boost header file.
- Another option could be the potential lack of access to the fast-cpp-csv-parser by Ben Strasser, available here. If use_csv has been defined at the head of the file, try without use_csv or download the csv parser and locate the folder within your working directory at compilation.
- Every time the program runs I get error code XXX.
- Check the ERROR_REF.txt file for descriptions of the files. Try compiling with the DEBUG precursor to gain more information on the problem. It is most likely a problem with the set up of the map data (error checking is not yet properly implemented here).
Version¶
Version 1.2.7.post13
Contacts¶
Author: Samuel Thompson
Contact: samuelthompson14@imperial.ac.uk - thompsonsed@gmail.com
Institution: Imperial College London and National University of Singapore
Based heavily on code by James Rosindell
Contact: j.rosindell@imperial.ac.uk
Institution: Imperial College London
Licence¶
This project is released under MIT See file LICENSE.txt or go to here for full license details.
You are free to modify and distribute the code for any non-commercial purpose.
Class Hierarchy¶
-
- Namespace necsim
- Struct CommunitiesArray
- Struct CommunityParameters
- Struct ConfigException
- Struct FatalException
- Struct Fragment
- Struct HistoricalMapParameters
- Struct MapLocation
- Struct MetacommunitiesArray
- Struct MetacommunityParameters
- Struct ProtractedSpeciationParameters
- Struct SectionOption
- Struct SimParameters
- Struct SQLStatement
- Class ActivityMap
- Class AnalyticalSpeciesAbundancesHandler
- Class Cell
- Class Community
- Class ConfigParser
- Class DataMask
- Class DataPoint
- Class DispersalCoordinator
- Class GillespieHeapNode
- Class GillespieProbability
- Class Landscape
- Class LogFile
- Class Logger
- Template Class Map
- Template Class Matrix
- Class Metacommunity
- Class ProtractedSpatialTree
- Class ProtractedTree
- Class PyLogger
- Class Samplematrix
- Class SimulateDispersal
- Class SimulatedSpeciesAbundancesHandler
- Class SpatialTree
- Class SpeciationCommands
- Class SpeciesAbundancesHandler
- Class SpeciesList
- Class SpecSimParameters
- Class SQLiteHandler
- Class Step
- Class Tree
- Class TreeNode
- Enum CellEventType
- Enum EventType
- Namespace random_numbers
- Class RNGController
- Class SplitMix64
- Class Xoroshiro256plus
- Struct module_state
- Class LandscapeMetricsCalculator
- Class ProtractedTree
- Template Class PyCommunityTemplate
- Class PyLMC
- Class PySimulateDispersal
- Template Class PyTemplate
- Namespace necsim
File Hierarchy¶
-
- Directory necsim
- Directory eastl
- File heap.h
- File ActivityMap.cpp
- File ActivityMap.h
- File AnalyticalSpeciesAbundancesHandler.cpp
- File AnalyticalSpeciesAbundancesHandler.h
- File Cell.cpp
- File Cell.h
- File Community.cpp
- File Community.h
- File ConfigParser.cpp
- File ConfigParser.h
- File cpl_custom_handler.cpp
- File cpl_custom_handler.h
- File custom_exceptions.h
- File DataMask.cpp
- File DataMask.h
- File DataPoint.cpp
- File DataPoint.h
- File DispersalCoordinator.cpp
- File DispersalCoordinator.h
- File double_comparison.cpp
- File double_comparison.h
- File file_system.cpp
- File file_system.h
- File GillespieCalculator.cpp
- File GillespieCalculator.h
- File Landscape.cpp
- File Landscape.h
- File LicenseHeader.h
- File LogFile.cpp
- File LogFile.h
- File Logger.cpp
- File Logger.h
- File Logging.cpp
- File Logging.h
- File main.cpp
- File Map.h
- File MapLocation.cpp
- File MapLocation.h
- File Matrix.h
- File Metacommunity.cpp
- File Metacommunity.h
- File neutral_analytical.cpp
- File neutral_analytical.h
- File parameters.cpp
- File parameters.h
- File ProtractedSpatialTree.h
- File ProtractedTree.cpp
- File ProtractedTree.h
- File README.md
- File RNGController.h
- File setup.cpp
- File setup.h
- File SimParameters.h
- File SimulateDispersal.cpp
- File SimulateDispersal.h
- File SimulatedSpeciesAbundancesHandler.cpp
- File SimulatedSpeciesAbundancesHandler.h
- File SimulationTemplates.h
- File SpatialTree.cpp
- File SpatialTree.h
- File SpeciationCommands.cpp
- File SpeciationCommands.h
- File SpeciesAbundancesHandler.cpp
- File SpeciesAbundancesHandler.h
- File SpeciesList.cpp
- File SpeciesList.h
- File SpecSimParameters.h
- File SQLiteHandler.cpp
- File SQLiteHandler.h
- File Step.h
- File Tree.cpp
- File Tree.h
- File TreeNode.cpp
- File TreeNode.h
- File Xoroshiro256plus.h
- Directory eastl
- File CCommunity.h
- File CLandscapeMetricsCalculator.h
- File CSimulateDispersal.h
- File CSimulation.h
- File LandscapeMetricsCalculator.cpp
- File LandscapeMetricsCalculator.h
- File necsim.cpp
- File necsim.h
- File PyImports.cpp
- File PyImports.h
- File PyLogger.cpp
- File PyLogger.h
- File PyLogging.cpp
- File PyLogging.h
- File PyTemplates.h
- Directory necsim