Help for package PNC

Type:

Package

Title:

Phylogenetic Niche Conservatism Analysis for Ecological Communities

Version:

0.1.0

Date:

2025-11-5

Maintainer:

Yan He <heyan@njfu.edu.cn>

Description:

Provides functions for testing phylogenetic niche conservatism, a key prerequisite in community assembly studies. The package integrates global functional trait data across major taxonomic groups and implements methods such as Pagel's Lambda and Blomberg's K to quantify phylogenetic signals in ecological communities. Methods are described in Münkemüller et al. (2012) <doi:10.1111/j.2041-210X.2012.00196.x>.

License:

GPL-3

Encoding:

UTF-8

LazyData:

true

Depends:

R (≥ 3.5.0)

Imports:

ape, phytools, stats, utils, geiger

RoxygenNote:

7.3.2

Suggests:

testthat (≥ 3.0.0)

Config/testthat/edition:

NeedsCompilation:

Packaged:

2025-11-05 01:15:58 UTC; Administrator

Author:

Yan He [aut, cre], Yu Xia [aut], Rui Yang [aut], Lingfeng Mao [aut]

Repository:

CRAN

Date/Publication:

2025-11-07 13:40:13 UTC

AVONET Bird Morphological Dataset

Description

Comprehensive morphological dataset for bird species, including taxonomic information from BirdLife International and detailed morphological measurements.

Usage

AVONET

Format

A data frame with 11,009 rows and 14 columns, where each row represents a bird species:

species: Species scientific name
genus: Genus name
family: Family name, according to BirdLife International taxonomy
Beak.Length_Culmen: Length from beak tip to skull base, in millimeters
Beak.Length_Nares: Length from nostril anterior edge to beak tip, in millimeters
Beak.Width: Beak width at the anterior edge of nostrils, in millimeters
Beak.Depth: Beak depth at the anterior edge of nostrils, in millimeters
Tarsus.Length: Tarsus length from posterior notch between tibia and tarsus to the last scale end, in millimeters
Wing.Length: Length from carpal joint to longest primary feather tip, in millimeters
Kipps.Distance: Length from first secondary feather tip to longest primary feather tip, in millimeters
Secondary1: Length from carpal joint to first secondary feather tip, in millimeters
Hand-Wing.Index: 100*DK/Lw, where DK is Kipp's distance and Lw is wing length
Tail.Length: Distance from longest rectrix tip to point where central rectrices protrude from skin, in millimeters
Mass: Species average body mass, including both male and female, in grams

Details

This dataset provides comprehensive morphological measurements of birds, including beak, wing, tarsus, and body weight indicators. Data originates from a comprehensive study of bird morphological, ecological, and geographical characteristics.

Note

- Taxonomic information based on BirdLife International - Measurements represent species averages - Hand-Wing Index reflects flight capability and ecological adaptation

References

Tobias, J. A., Sheard, C., Pigot, A. L., Devenish, A. J. M., Yang, J., Sayol, F., Neate-Clegg, M. H. C., Alioravainen, N., Weeks, T. L., Barber, R. A., Walkden, P. A., MacGregor, H. E. A., Jones, S. E. I., Vincent, C., Phillips, A. G., Marples, N. M., Montaño-Centellas, F. A., Leandro-Silva, V., Claramunt, S., Darski, B., et al. (2022). AVONET: morphological, ecological and geographical data for all birds. Ecology Letters, 25(3), 581-597. doi:10.1111/ele.13898

Examples

data(AVONET)
head(AVONET)

AmphiBIO: Global Amphibian Ecological Traits Database

Description

A comprehensive global database of ecological traits for amphibian species, compiled to provide insights into the life history and ecological characteristics of amphibians worldwide.

Usage

AmphiBIO

Format

A data frame with multiple variables:

species: Scientific name of the amphibian species
genus: Taxonomic genus of the species
family: Taxonomic family of the species
Body_mass_g: Maximum adult body mass.
Age_at_maturity_min_y: Minimum age at maturation or sexual maturity.
Age_at_maturity_max_y: Maximum age at maturation or sexual maturity.
Body_size_mm: Maximum adult body size. In Anura, body size is reported as snout to vent length. In Gymnophiona and Caudata, body size is reported as total length.
Size_at_maturity_min_mm: Minimum size at maturation or sexual maturity.
Size_at_maturity_max_mm: Maximum size at maturation or sexual maturity.
Longevity_max_y: Maximum life span.
Litter_size_min_n: Minimum no. of offspring or eggs per clutch.
Litter_size_max_n: Maximum no. of offspring or eggs per clutch.
Reproductive_output_y: Maximum no. reproduction events per year.
Offspring_size_min_mm: Minimum offspring or egg size.
Offspring_size_max_mm: Maximum offspring or egg size.

References

Oliveira, B. F., São-Pedro, V. A., Santos-Barrera, G., Penone, C., & Costa, G. C. (2017). AmphiBIO, a global database for amphibian ecological traits. Scientific data, 4(1), 1-7. doi:10.1038/sdata.2017.123

Examples

# Load the dataset
data(AmphiBIO)
head(AmphiBIO)

Barro Colorado Island (BCI) dataset

Description

The Barro Colorado Island (BCI) dataset contains comprehensive ecological data from the 50-hectare forest dynamics plot on Barro Colorado Island, Panama. This dataset includes phylogenetic information and community composition data for tropical forest species.

Usage

BCI

Format

A list containing four main components:

splist: A data frame with species information including species names, genus, and family classifications.
phy_species: A phylogenetic tree representing species-level evolutionary relationships, rooted and including branch lengths.
phy_genus: A phylogenetic tree with 183 tips and 174 internal nodes, rooted and including branch lengths.
com: A community matrix showing species abundance across different sampling plots, with species counts for each location.

Source

Barro Colorado Island (BCI)

References

Condit, R., Pérez, R., Aguilar, S., Lao, S., Foster, R., & Hubbell, S. P. (2019). Complete data from the Barro Colorado 50-ha plot: 423617 trees, 35 years, 2019 version. Dryad Digital Repository. doi:10.15146/5xcp-0d46

Examples

# Load the dataset
data(BCI)
head(BCI)

COMBINE: Mammal Trait Database

Description

A comprehensive dataset of mammalian traits compiled from multiple sources, providing detailed ecological and biological information for various mammal species.

Usage

COMBINE

Format

A data frame with the following columns:

species: Species name
genus: Genus name
family: Taxonomic family
adult_mass_g: Body mass of an adult individual in grams
adult_brain_mass_g: Weight of the brain of an adult individual in grams
adult_body_length_mm: Total length from tip of the nose to anus or base of the tail of an adult individual in millimeters
adult_forearm_length_mm: Total length from elbow to wrist of an adult individual in millimeters, specific to order Chiroptera
max_longevity_d: Maximum reported age at death for the species in days
maturity_d: The amount of time needed to reach sexual maturity in days
female_maturity_d: The amount of time needed for a female to reach sexual maturity in days
male_maturity_d: Age at which females give birth to their first litter or their young attach to teats in days
age_first_reproduction_d: Age at first reproduction in days
gestation_length_d: Length of time of fetal growth in days
teat_number_n: Total number of teats present in an individual of the species
litter_size_n: Number of offspring born per litter per female
litters_per_year_n: Number of litters per female per year
interbirth_interval_d: Time between reproduction events in days
neonate_mass_g: Weight of an individual at birth in grams
weaning_age_d: Age at which primary nutritional dependency on the mother ends and independent foraging begins in days
weaning_mass_g: Weight at weaning in grams
generation_length_d: Average age of parents of the current cohort in days
dispersal_km: The distance an animal travels between its place of birth to the place where it reproduces in kilometers
density_n_km2: Number of individuals of the species per squared kilometer
home_range_km2: Size of the area within which everyday activities of individuals or groups of individuals are typically restricted in km2
social_group_n: Number of individuals in a group that spends most of their daily time together
dphy_invertebrate: Percentage of the diet composed of invertebrates
dphy_vertebrate: Percentage of the diet composed of vertebrates
dphy_plant: Percentage of the diet composed of plants and/or fungi
det_inv: Percentage of the diet composed of invertebrates
det_vend: Percentage of the diet composed of mammals, birds
det_vect: Percentage of the diet composed of reptiles, snakes, amphibians, salamanders
det_vfish: Percentage of the diet composed of fish
det_vunk: Percentage of the diet composed of vertebrates – general or unknown
det_scav: Percentage of the diet composed of scavenge, garbage, offal, carcasses, trawlers, carrion
det_fruit: Percentage of the diet composed of fruit, drupes
det_nect: Percentage of the diet composed of nectar, pollen, plant exudates, gums
det_seed: Percentage of the diet composed of seed, maize, nuts, spores, wheat, grains
det_plantother: Percentage of the diet composed of other plant elements
det_diet_breadth_n: Number of prevalent EltonTraits dietary categories consumed at 20 percent or more
upper_elevation_m: Upper elevation limit at which the species can be found in meters
lower_elevation_m: Lower elevation limit at which the species can be found in meters
altitude_breadth_m: Difference between the upper and lower elevation limits of a species in meters
habitat_breadth_n: Number of distinct suitable level 1 IUCN habitats

References

Soria, C. D., M. Pacifici, M. Di Marco, S. M. Stephen, and C. Rondinini. (2021). COMBINE: a coalesced mammal database of intrinsic and extrinsic traits. Ecology, 102(6):e03344. doi:10.1002/ecy.3344

Examples

data(COMBINE)
head(COMBINE)

Fishlife Dataset

Description

A comprehensive dataset of fish life history traits across multiple species, compiled by Thorson et al. (2023). The dataset provides various morphological, ecological, and biological characteristics of fish species.

Usage

Fishlife

Format

A data frame with multiple variables:

species: Scientific species name
genus: Genus of the fish species
family: Family classification
age_max: Maximum age, years
trophic_level: Trophic level, where 1 is primary producers, etc., dimensionless
aspect_ratio: Caudal fin height and length divided by area, dimensionless
fecundity: Annual eggs produced, number/year
growth_coefficient: von Bertalannffy growth coefficient, year-1
temperature: Average temperature from portion of population sampled, celcius
length_max: maximum length, cm
length_infinity: von Bertalanffy asymptotic maximum length, cm
length_maturity: Length at 50% maturity, cm
age_maturity: Age at 50% sexual maturity, years
natural_mortality: Natural mortality rate M, year-1
weight_infinity: Asymptotic maximum weight, g
max_body_depth: Maximum body depth, cm
max_body_width: Maximum body width, cm
lower_jaw_length: Length of lower jaw, cm
min_caudal_pedoncule_depth: Depth of caudal pedoncule, connecting caudal fin to body
offspring_size: Size of offspring, kg

References

Thorson, J. T., Maureaud, A. A., Frelat, R., Mérigot, B., Bigman, J. S., Friedman, S. T., Palomares, M. L. D., Pinsky, M. L., Price, S. A., & Wainwright, P. (2023). Identifying direct and indirect associations among traits by merging phylogenetic comparative methods and structural equation models. Methods in Ecology and Evolution, 14(5), 1243-1255. doi:10.1111/2041-210X.14076

Examples

data(Fishlife)
head(Fishlife)

Himalayan Birds Dataset

Description

The 'HimalayanBirds' dataset provides information on bird species in the Himalayas, including their species names, genera, families, phylogenetic relationships, and community composition across elevation bands. This dataset is used to explore elevational patterns of bird functional and phylogenetic diversity and the ecological processes that structure bird communities.

Usage

HimalayanBirds

Format

A list with three components:

splist

A data frame with 151 rows and 3 variables:

species: Scientific name of the bird species.
genus: Genus of the bird species.
family: Family of the bird species.

phy_species

A phylogenetic tree (object of class "phylo") representing the evolutionary relationships among the bird species. It contains edge, edge.length, Nnode, tip.label, and node.label.

com

A community matrix representing the presence (1) or absence (0) of each bird species across 12 elevation bands (ele1 to ele12). The rows represent the elevation bands, and the columns represent the bird species.

References

Ding, Z., Hu, H., Cadotte, M.W., Liang, J., Hu, Y., & Si, X. (2021). Elevational patterns of bird functional and phylogenetic structure in the central Himalaya. Ecography, 44(9), 1403-1417. doi:10.1111/ecog.05660

Examples

# Load the dataset
data(HimalayanBirds)
head(HimalayanBirds)

ReptTraits: A Comprehensive Dataset of Ecological Traits in Reptiles

Description

A comprehensive dataset containing ecological and morphological characteristics of reptiles. The dataset provides detailed information about reptile species, including elevation, seasonal precipitation, body mass, and reproductive features.

Usage

ReptTraits

Format

A data frame with the following columns:

species: Scientific species name
genus: Genus name
family: Family name
Minimal_elevation: Minimum elevation where the species was observed (meters above sea level)
Maximum_elevation: Maximum elevation where the species was observed (meters above sea level)
Mean_Annual_Temperature: Mean annual temperature,°C
Temperature_Seasonality: Temperature seasonality, standard deviation × 100
Seasonality_Precipitation: Seasonal precipitation information
Maximum_Longevity: Longevity data are the maximum age reported for each species from the literature, years
Maximum_body_mass: Maximum body mass of the species (grams)
Maximum_length: Maximum length ("SVL", mm)/straight carapace length for turtles ("SCL", mm)
Mean_number_of_offspring: Mean number of offspring or eggs per clutch
Smallest_clutch_size: Minimum clutch/litter size
Largest_clutch_size: Maximum clutch/litter size
Mean_Tb: The mean reported mean body temperatures of animal, °C

References

Oskyrko, O., Mi, C., Meiri, S., & Du, W. (2024). ReptTraits: a comprehensive dataset of ecological traits in reptiles. Scientific Data, 11(1), 243. doi:10.1038/s41597-024-03079-5

Examples

data(ReptTraits)
head(ReptTraits)

TRY Plant Trait Database

Description

A comprehensive global database of plant functional traits from the TRY initiative. This dataset contains standardized measurements of key plant functional traits across multiple species, genera, and families.

Usage

TRY

Format

A data frame with 58,964 rows and 23 variables:

species: Character. Species name
genus: Character. Genus name
family: Character. Family name
DispersalUnitLength: Numeric. Dispersal unit length, mm. (TraitID: 237)
LA: Numeric. Leaf area (in case of compound leaves: leaflet, undefined if petiole is in- or excluded), mm2. (TraitID: 3113)
LDMC: Numeric. Leaf dry mass per leaf fresh mass (leaf dry matter content, LDMC), g/g. (TraitID: 47)
LeafC: Numeric. Leaf carbon (C) content per leaf dry mass, mg/g. (TraitID: 13)
LeafN: Numeric. Leaf nitrogen (N) content per leaf dry mass, mg/g. (TraitID: 14)
LeafNPratio: Numeric. Leaf nitrogen/phosphorus (N/P) ratio, g/g. (TraitID: 56)
LeafNperArea: Numeric. Leaf nitrogen (N) content per leaf area, g m-2. (TraitID: 50)
LeafP: Numeric. Leaf phosphorus (P) content per leaf dry mass, mg/g. (TraitID: 15)
Leafdelta15N: Numeric. Leaf nitrogen (N) isotope signature (delta 15N), per mill. (TraitID: 78)
Leaffreshmass: Numeric. Leaf fresh mass, g. (TraitID: 163)
LMA: Numeric. Leaf mass per area. (1/SLA)
PlantHeight: Numeric. Plant height vegetative, m. (TraitID: 3106)
RootingDepth: Numeric. Root rooting depth, m. (TraitID: 6)
SeedLength: Numeric. Seed length, mm. (TraitID: 27)
SeedMass: Numeric. Seed dry mass, mg. (TraitID: 26)
SeedNumber: Numeric. Seed number per reproduction unit, number. (TraitID: 138)
SLA: Numeric. Leaf area per leaf dry mass (specific leaf area, SLA or 1/LMA): petiole excluded, mm2 mg-1. (TraitID: 3115)
SSD: Numeric. Stem specific density (SSD, stem dry mass per stem fresh volume) or wood density, g/cm3. (TraitID: 4)
StemConduitDensity: Numeric. Stem conduit density (vessels and tracheids), mm-2. (TraitID: 169)
WoodVesselLength: Numeric. Wood vessel element length; stem conduit (vessel and tracheids) element length, micro m. (TraitID: 282)

Details

The TRY database represents a global effort to compile plant functional trait data from multiple sources and research groups. Plant functional traits are morphological, physiological, and phenological characteristics that influence fitness and ecosystem functioning. This dataset includes key traits related to:

Leaf economics (SLA, LDMC, leaf nutrients)
Plant architecture (height, rooting depth)
Reproductive strategy (seed mass, seed number)
Wood anatomy (vessel length, conduit density)
Chemical composition (C, N, P content)

Missing values (NA) are common in trait databases due to the difficulty of measuring all traits for all species.

Source

TRY Plant Trait Database (https://www.try-db.org/)

References

Kattge, J., Bönisch, G., Díaz, S., et al. (2020). TRY plant trait database – enhanced coverage and open access. Global Change Biology, 26(1), 119-188. doi:10.1111/gcb.14904

Examples

# Load the dataset
data(TRY)

Calculate Phylogenetic Niche Conservatism Across Multiple Communities

Description

This function conducts comprehensive phylogenetic niche conservatism analysis across multiple communities simultaneously. It evaluates phylogenetic signal for trait data across different community assemblages using various statistical methods, enabling comparative assessment of niche conservatism patterns among communities. The function processes community composition matrices, species trait information, and phylogenetic trees to determine whether closely related species consistently occupy similar ecological niches across different habitats or sampling locations.

Usage

compnc(
  com,
  trait_data,
  phylo_tree,
  methods = c("lambda", "K"),
  pca_axes = c("PC1", "PC2"),
  sig_levels = c(0.001, 0.01, 0.05),
  min_abundance = 0,
  nsim = 1000,
  verbose = TRUE
)

Arguments

com

A community matrix with sites as rows and species as columns

trait_data

A data frame or matrix containing trait data with species as rows

phylo_tree

A phylogenetic tree object of class "phylo"

methods

Character vector specifying methods to use. Options: "lambda", "K"

pca_axes

Character vector specifying which PCA axes to include (e.g., c("PC1", "PC2"))

sig_levels

Numeric vector of significance levels for marking results

min_abundance

Minimum abundance threshold for including species

nsim

Number of permutations for significance testing

verbose

Logical indicating whether to show progress and warnings

Value

A data frame containing phylogenetic signal results for all communities

Examples


#' # Load example data
data(BCI)
data(TRY)

# Extract trait data
sp <- colnames(BCI$com)
subtraits <- extract_traits(sp, TRY, rank = "species",
                           traits = c("LA", "LMA", "LeafN", "PlantHeight", "SeedMass", "SSD"))

compnc(com = BCI$com, subtraits, BCI$phy_species, methods = "lambda", pca_axes = NULL)

Test Robustness of Phylogenetic Niche Conservatism Analysis Across Multiple Communities

Description

This function evaluates the robustness of phylogenetic signal estimates across multiple communities by simulating trait data with the same phylogenetic signal strength as observed, applying the original missing data pattern, and testing how consistently the statistical significance is recovered across multiple simulations for each community.

Usage

compnc_robustness(
  com,
  trait_data,
  phylo_tree,
  methods = "lambda",
  pca_axes = c("PC1", "PC2"),
  sig_levels = c(0.001, 0.01, 0.05),
  min_abundance = 0,
  n_simulations = 100,
  alpha_level = 0.05,
  tolerance = 0.05,
  verbose = TRUE
)

Arguments

com

A community matrix with sites as rows and species as columns

trait_data

A data frame or matrix containing trait data with species as rows

phylo_tree

A phylogenetic tree object of class "phylo"

methods

Character string specifying method to use. Options: "lambda" or "K". Default is "lambda"

pca_axes

Character vector specifying which PCA axes to include (e.g., c("PC1", "PC2")). Default is c("PC1", "PC2")

sig_levels

Numeric vector of significance levels for marking results

min_abundance

Minimum abundance threshold for including species

n_simulations

Integer. Number of simulations to run for robustness testing. Default is 100

alpha_level

Numeric. Significance level for statistical testing. Default is 0.05

tolerance

Numeric. Acceptable difference between target and estimated signal values during trait simulation. Default is 0.05

verbose

Logical indicating whether to show progress and warnings

Value

A data frame containing the original phylogenetic signal results with additional columns:

robustness: Percentage of simulations that maintain the same statistical significance conclusion as the original analysis
signal_sd: Standard deviation of phylogenetic signal values across successful simulations

Examples


# Load example data
data("HimalayanBirds")
str(HimalayanBirds)
data("AVONET")
head(AVONET)

# species level
sp <- colnames(HimalayanBirds$com)
sp
subtraits <- extract_traits(sp, AVONET, rank = "species")
head(subtraits)
coverage(subtraits)
pnc(subtraits, HimalayanBirds$phy_species, methods = "lambda", pca_axes = c("PC1", "PC2"))

compnc(com = HimalayanBirds$com, subtraits, HimalayanBirds$phy_species,
       methods = "lambda", pca_axes = NULL)

# Test robustness of phylogenetic signal analysis
# This function's runtime is long
compnc_robustness(HimalayanBirds$com,
                  subtraits,
                  HimalayanBirds$phy_species,
                  methods = "lambda",
                  pca_axes = NULL,
                  n_simulations = 5)

Calculate Trait Coverage Statistics

Description

This function calculates comprehensive coverage statistics for trait data, including individual trait coverage rates, complete case coverage, and overall data coverage. It provides both summary statistics and detailed breakdowns of missing and available data.

Usage

coverage(data)

Arguments

data

A data frame containing trait data. Each column represents a trait and each row represents an observation (e.g., species, samples).

Details

The function performs the following calculations:

Individual trait coverage: For each trait, calculates the number and percentage of available (non-NA) values
Complete case coverage: Counts rows with no missing values across all traits and calculates the percentage
Overall coverage: Calculates the percentage of all cells in the dataset that contain non-missing values

The function also prints the overall trait coverage rate to the console before returning the detailed summary table.

Value

A data frame with the following columns:

Trait: Character. Names of traits plus an "All" row for complete cases
Available_count: Integer. Number of non-missing values for each trait
Missing_count: Integer. Number of missing (NA) values for each trait
Trait_coverage_rate: Character. Percentage of available data for each trait

The "All" row shows statistics for complete cases (rows with no missing values).

Examples

# Create sample trait data
trait_data <- data.frame(
  PlantHeight = c(1.2, 1.5, NA, 2.1, 1.8),
  LDMC = c(0.5, NA, 0.8, 1.2, 0.9),
  LA = c(15.2, 18.5, 12.3, NA, 16.7)
)

# Calculate coverage statistics
coverage(trait_data)

Extract Plant Traits from Trait Database

Description

This function extracts plant trait data from the TRY database or similar datasets for a specified list of taxa at different taxonomic ranks (species, genus, or family). For numeric traits at genus and family levels, it calculates mean values across all available records.

Usage

extract_traits(sp.list, dataset, rank = "species", traits = NULL)

Arguments

sp.list

A character vector containing the names of taxa to extract traits for. The names should match the taxonomic rank specified in the 'rank' parameter.

dataset

A data frame containing trait data. Default is TRY database. Must contain columns named "species", "genus", and "family" for taxonomic information.

rank

A character string specifying the taxonomic rank to match against. Must be one of "species", "genus", or "family". Default is "species".

traits

A character vector specifying which traits to extract. If NULL (default), all available traits in the dataset will be extracted. Available traits are all columns except "species", "genus", and "family".

Details

The function performs the following operations:

Validates input parameters
Identifies available traits in the dataset
Matches input taxa with dataset entries
Reports missing taxa
Extracts trait data based on the specified taxonomic rank
For numeric traits at genus/family level, calculates mean values
For non-numeric traits, uses the first available value
Handles NaN values by converting them to NA

Value

A data frame with taxa names as row names and trait names as column names. For species-level extraction, returns the first occurrence of each species. For genus/family-level extraction, returns mean values for numeric traits and the first occurrence for non-numeric traits. Missing values are represented as NA.

Examples

# Load the dataset
data(TRY)

# Extract all traits for species
species_list <- c("Acaena novae-zelandiae", "Adiantum capillus-veneris", "Zuelania guidonia")
extract_traits(species_list, TRY, rank = "species")

# Extract specific traits for species
extract_traits(species_list, TRY, rank = "species",
               traits = c("LA", "LMA", "LeafN", "PlantHeight", "SeedMass", "SSD"))

# Extract specific traits at genus level
genus_list <- c("Acaena", "Adiantum")
extract_traits(genus_list, TRY, rank = "genus",
               traits = c("LDMC", "PlantHeight", "SeedMass"))

Merge Two Datasets Based on Species Column

Description

This function merges two data frames based on the 'species' column, handling missing values and column differences intelligently. It provides flexible options for resolving conflicts when the same species appears in both datasets.

Usage

merge_dataset(main_data, additional_data, priority = "main")

Arguments

main_data

A data frame containing the primary dataset. Must include a 'species' column.

additional_data

A data frame containing the secondary dataset. Must include a 'species' column.

priority

A character string specifying how to handle conflicts when both datasets contain non-missing values for the same species and column. Options are:

"main" (default): Use values from main_data
"additional": Use values from additional_data
"mean": Calculate mean for numeric values, use main_data for non-numeric

Details

The function performs the following operations:

Combines all unique species from both datasets
Includes all columns from both datasets
Handles missing values by using available non-missing values
Resolves conflicts based on the specified priority
For duplicate species within a dataset, only the first occurrence is used

Value

A data frame containing all unique species from both input datasets, with all columns from both datasets. The 'species' column is placed first, followed by all other columns in alphabetical order.

Note

Both input datasets must contain a 'species' column
If a species appears multiple times in a dataset, only the first occurrence is used
When priority is "mean", non-numeric values default to main_data values
The function preserves the original data types of columns

Examples

# Create sample datasets
main_data <- data.frame(
  species = c("Abies alba", "Coussapoa trinervia", "Crataegus monogyna"),
  genus = c("Abies", "Coussapoa", "Crataegus"),
  family = c("Pinaceae", "Urticaceae", "Rosaceae"),
  LA = c(NA, 2050.24, 449.15),
  LeafN = c(13.10, 14.52, 17.46),
  Seedmass = c(53.64, NA, 95.92),
  stringsAsFactors = FALSE
)

additional_data <- data.frame(
  species = c("Abies alba", "Corydalis solida"),
  genus = c("Abies", "Corydalis"),
  family = c("Pinaceae", "Papaveraceae"),
  LA = c(25.58, NA),
  LMA = c(0.19, 0.2),
  PlantHeight = c(53.66, 0.14),
  stringsAsFactors = FALSE
)

# Merge with main data priority (default)
merge_dataset(main_data, additional_data)

Analyze Phylogenetic Niche Conservatism in Ecological Communities

Description

This function performs in-depth phylogenetic niche conservatism analysis for communities by quantifying phylogenetic signal in trait data using multiple statistical methods. The function integrates trait data preprocessing, phylogenetic tree manipulation, optional principal component analysis, and robust statistical testing to provide detailed insights into evolutionary constraints on trait evolution.

Usage

pnc(
  trait_data,
  phylo_tree,
  methods = "lambda",
  pca_axes = c("PC1", "PC2"),
  sig_levels = c(0.001, 0.01, 0.05),
  nsim = 1000,
  verbose = TRUE
)

Arguments

trait_data

A data frame or matrix containing trait data with species as rows

phylo_tree

A phylogenetic tree object of class "phylo"

methods

Character vector specifying methods to use. Options: "lambda", "K"

pca_axes

Character vector specifying which PCA axes to include (e.g., c("PC1", "PC2"))

sig_levels

Numeric vector of significance levels for marking results

nsim

Number of permutations for significance testing

verbose

Logical indicating whether to show progress and warnings

Value

A data frame containing phylogenetic signal results

References

Münkemüller, T., Lavergne, S., Bzeznik, B., Dray, S., Jombart, T., Schiffers, K. and Thuiller, W. (2012). How to measure and test phylogenetic signal. Methods in Ecology and Evolution, 3(4), 743-756. doi:10.1111/j.2041-210X.2012.00196.x

Examples


#' # Load example data
data(BCI)
data(TRY)

# Extract trait data
sp <- colnames(BCI$com)
subtraits <- extract_traits(sp, TRY, rank = "species",
                            traits = c("LA", "LMA", "LeafN", "PlantHeight", "SeedMass", "SSD"))

# Calculate phylogenetic signal using Lambda method
pnc(subtraits, BCI$phy_species, methods = "lambda")

# Calculate without PCA analysis
pnc(subtraits, BCI$phy_species, methods = "lambda", pca_axes = NULL)

Test Robustness of Phylogenetic Niche Conservatism Analysis

Description

This function evaluates the robustness of phylogenetic signal estimates by simulating trait data with the same phylogenetic signal strength as observed, applying the original missing data pattern, and testing how consistently the statistical significance is recovered across multiple simulations.

Usage

pnc_robustness(
  trait_data,
  phylo_tree,
  methods = "lambda",
  pca_axes = c("PC1", "PC2"),
  n_simulations = 100,
  alpha_level = 0.05,
  tolerance = 0.05
)

Arguments

trait_data

A data frame or matrix containing trait data with species as rows

phylo_tree

A phylogenetic tree object of class "phylo"

methods

Character string specifying method to use. Options: "lambda" or "K". Default is "lambda"

pca_axes

Character vector specifying which PCA axes to include (e.g., c("PC1", "PC2")). Default is c("PC1", "PC2")

n_simulations

Integer. Number of simulations to run for robustness testing. Default is 100

alpha_level

Numeric. Significance level for statistical testing. Default is 0.05

tolerance

Numeric. Acceptable difference between target and estimated signal values during trait simulation. Default is 0.05

Details

The robustness testing procedure involves:

1. Performing baseline phylogenetic signal analysis using pnc()

2. For each trait, simulating new trait data with the same phylogenetic signal strength as observed in the original data

3. Applying the exact missing data pattern from the original dataset to the simulated data

4. Re-testing phylogenetic signal on the simulated data and recording p-values

5. Calculating the percentage of simulations that maintain the same statistical significance conclusion (significant vs. non-significant)

The function uses simulate_lambda_trait() or simulate_K_trait() internally to generate trait data with target phylogenetic signal values.

For PCA axes, the missing data pattern corresponds to complete cases from the original trait matrix. For individual traits, the original missing pattern is preserved exactly.

Value

A data frame containing the original phylogenetic signal results with additional columns:

robustness: Percentage of simulations that maintain the same statistical significance conclusion as the original analysis
signal_sd: Standard deviation of phylogenetic signal values across successful simulations

Returns the enhanced results from the baseline pnc() analysis

Examples


# Load example data
data(BCI)
data(TRY)

# Extract trait data
sp <- colnames(BCI$com)
subtraits <- extract_traits(sp, TRY, rank = "species",
                          traits = c("LA", "LMA", "LeafN", "PlantHeight"))

# Test robustness of phylogenetic signal analysis
# This function's runtime is long
pnc_robustness(subtraits, BCI$phy_species, methods = "lambda", n_simulations = 5)

Simulate Trait Data with Target Phylogenetic Signal (Blomberg's K)

Description

This function generates trait data that matches a specified phylogenetic signal strength (Blomberg's K) through iterative simulation and testing.

Usage

simulate_K_trait(target_K, tree, max_attempts = 1e+05, tolerance = 0.02)

Arguments

target_K

Numeric. The desired phylogenetic signal strength (K value). - K = 0: No phylogenetic signal (star phylogeny) - K = 1: Expected signal under Brownian motion evolution - K > 1: Stronger phylogenetic signal than expected under Brownian motion - 0 < K < 1: Weaker phylogenetic signal than expected under Brownian motion

tree

An object of class "phylo". The phylogenetic tree for trait simulation.

max_attempts

Integer. Maximum number of simulation attempts before giving up. Default is 100000.

tolerance

Numeric. Acceptable difference between target and estimated K. Default is 0.02.

Details

The function works by:

1. Transforming the phylogenetic tree according to the target K value

2. Simulating trait data using phytools::fastBM() on the transformed tree

3. Estimating the phylogenetic signal using phytools::phylosig()

4. Repeating until the estimated K is within tolerance of the target

Tree transformation strategies: - When target_K = 0: Creates a star phylogeny using ape::stree() - When target_K = 1: Uses the original tree without transformation - When target_K > 1: Scales all branch lengths by the target K value - When 0 < target_K < 1: Interpolates between original tree and uniform branch lengths

Value

A data.frame with one column named 'trait' containing the simulated trait values. Row names correspond to tip labels from the phylogenetic tree. Returns NULL if the target K cannot be achieved within the specified tolerance and attempts.

Note

Blomberg's K measures the strength of phylogenetic signal relative to what would be expected under a Brownian motion model of evolution. Unlike Pagel's lambda, K can exceed 1, indicating stronger phylogenetic clustering than expected.

The function may take considerable time to converge for certain K values. Consider adjusting the tolerance parameter if convergence is slow.

Examples

# Generate a random tree
tree <- ape::rtree(50)

# Simulate trait with expected Brownian motion signal
trait_data <- simulate_K_trait(0.9, tree)

# Verify the phylogenetic signal
trait_vector <- setNames(trait_data$trait, rownames(trait_data))
phytools::phylosig(tree, trait_vector, method = "K", test = TRUE)

Simulate Trait Data with Target Phylogenetic Signal (Lambda)

Description

This function generates trait data that matches a specified phylogenetic signal strength (Pagel's lambda) through iterative simulation and testing.

Usage

simulate_lambda_trait(
  target_lambda,
  tree,
  max_attempts = 1e+05,
  tolerance = 0.02
)

Arguments

target_lambda

Numeric. The desired phylogenetic signal strength (lambda value). Should be between 0 and 1. - 0: No phylogenetic signal (star phylogeny) - 1: Full phylogenetic signal (Brownian motion)

tree

An object of class "phylo". The phylogenetic tree for trait simulation.

max_attempts

Integer. Maximum number of simulation attempts before giving up. Default is 100000.

tolerance

Numeric. Acceptable difference between target and estimated lambda. Default is 0.02.

Details

The function works by:

1. Transforming the phylogenetic tree according to the target lambda value using rescale()

2. Simulating trait data using fastBM() on the transformed tree

3. Estimating the phylogenetic signal using phylosig()

4. Repeating until the estimated lambda is within tolerance of the target

Special cases: - When target_lambda = 0: Sets internal branch lengths to 0, keeping only terminal branches - When target_lambda = 1: Uses the original tree without transformation

Value

A data.frame with one column named 'trait' containing the simulated trait values. Row names correspond to tip labels from the phylogenetic tree. Returns NULL if the target lambda cannot be achieved within the specified tolerance and attempts.

Note

The function may take considerable time to converge for certain lambda values, especially those close to intermediate values.

Consider adjusting the tolerance parameter if convergence is slow.

If 'target_lambda' is greater than 1, it will be automatically capped at 1, as lambda values typically range from 0 to 1.

Examples

# Generate a random tree
tree <- ape::rtree(50)

# Simulate trait with strong phylogenetic signal
trait_data <- simulate_lambda_trait(0.8, tree)

# Verify the phylogenetic signal
trait_vector <- setNames(trait_data$trait, rownames(trait_data))
phytools::phylosig(tree, trait_vector, method = "lambda", test = TRUE)