About this FAQ
We have compiled the most frequently asked questions we have received since the launch of the package in 2021, about its use and the multidimensional space approach for computing functional diversity indices.
If you have any questions or encounter issues while using the
mFD
package after reading this FAQ and the
tutorials on our website,
email me (camille.magneville@gmail.com).
Disclaimer/Reminder: While designing the package we did our best to code internal checks about key inputs and write detailed warning or error messages which appear in red color in the R console. They should not alarm (stress) you but rather help you understand what needs attention or what might be wrong with your input.
Functional traits
What is the difference between nominal and ordinal traits?
Nominal traits are used to describe features that could be biologically categorized, such as growth form in plants or diet for animals. Hence, nominal coding implies all categories are equally distinct from each other.
Ordinal traits are coded using discrete values like nominal traits, but the categories have a meaningful ranking, either because the trait is intrinsically continuous (e.g. body size being categorized into small, medium, large bins), or because it is ecologically relevant to consider ordered categories. For example, if the period of activity of animals is coded as “diurnal”, “nocturnal” or “both”, it is relevant to consider that generalist species “both” are functionally intermediate between the two specialists “diurnal” and “nocturnal”.
This distinction is important for calculating functional distances
using the Gower metric (e.g., with the mFD
::funct.dist()
function). In fact, using nominal traits species with different trait
values will have the same distance between each other, while using
ordinal traits, species pairs with neighboring categories (e.g., “small”
and “medium” size) will have lower distances than species pairs with
distant categories (e.g., “small” and “large”). Ordinal traits are thus
treated more similarly to continuous traits than nominal ones.
Trait coding could vary across case studies depending on the categories considered. For example, if diet is coded using categories describing main preys (e.g. leaves, seeds, fruits, invertebrates, vertebrates) nominal coding is the most relevant, while if diet is coded using broader categories actually reflecting trophic levels (e.g. plants, plants & animals, animals) it could be considered as an ordinal trait. Note that if there are only two categories, coding the trait as ordinal or ordinal has no impact.
The mFD::funct.dist()
function, uses the gawdis()
function (from the gawdis package)(https://rdrr.io/cran/gawdis/man/gawdis.html) which
handles ordinal traits with various methods. Using the
mFD::funct.dist()
function, you are able to choose among
three methods commonly used, which can be: classic
which
treats ordinal variables as continuous variables, metric
which refers to Eq. 3 of Podani (1999), podani
which refers
to Eqs. 2a-b of Podani (1999). The last two options convert ordinal
variables to ranks. The mFD::funct_dist()
default is
classic
for which the final distance between two species
doesn’t depend on the traits values of the other species (as it is the
case for the two other options).
How should I handle my data if it shows correlated traits?
If two traits are highly correlated (> 0.8), we advise removing one of the two traits to not overweight this facet. We suggest removing the trait which is more correlated to the other traits or the one which ecologically makes least sense, if any. If traits are moderately correlated (0.6 < r < 0.8), using a PCA-based functional space will ensure functional axes are not correlated to each other.
Intraspecific variation in traits
I collected species traits in
my studied sites, how can I incorporate these traits data measured at
the site level into the mFD
framework?
To be able to eventually compare values of FD indices between assemblages, it is compulsory to have all the FD indices computed in the same functional space built on all species present in all your sites (just like when assessing phylogenetic diversity on a single tree linking all species).
Hence, when a study aims to account for intraspecific variability,
you could consider your combinations of species_site as the “functional”
units for FD computation. So you will have a
species_site*traits
matrix instead of the
species*traits
in our tutorial example, and similarly a
sites*species_site
matrix with abundance. In these
matrices, trait values and abundances should be averaged across
replicates for each site.
My species*traits data contains NA, what should I do?
If you were not able to measure/get trait values for all the studied
species and hence have a species*traits
dataframe
containing NA, some functions of mFD
will return an error
message. Indeed, while having NA values does not prevent from
computing Gower distance, such missing values often yield patterns of
species pairwise distance. In fact, distance between species is
computed using only traits which have values for both species within a
pair, hence two species could have a distance of 0 if they have the same
trait values for traits without NA, while having different distances to
a third species depending on values of traits with shared NA. So we
suggest imputing missing trait values using the most relevant methods
depending on the study case (cf https://onlinelibrary.wiley.com/doi/full/10.1111/geb.13185).
If you decide to continue the functional analysis with NA, you will have to set the argument stop_if_NA to FALSE otherwise the process will stop.
PCoA and negative eigenvalues
I am computing the Gower
distance between species, as my data includes quantitative, categorical,
and fuzzy traits. However, applying PCoA with Gower distances can lead
to negative eigenvalues, and the mFD
package does not
support corrections like Lingoes or Cailliez. What should I do?
As ecologists, we know that collecting trait values for all our studied species is time consuming so eventually we want to be sure that the FD indices are computed in a space that faithfully represents those traits values, hence the Gower distance computed on them.
Faithfulness is measured with deviation metrics such as mAD or rmSD which compare raw distances based on traits values and distances in the functional space. Based on simulations, Maire et al (2015) demonstrated that there is always a faithful space (with a low mAD) without applying a correction (note that the first PC axes kept to build such faithful spaces have always positive eigenvalues).
Even if square root transformation of distance before PCoA prevents negative eigenvalues, it actually alters the actual distances between species that are based on trait values you took a lot of care (and effort/time) to accurately measure/retrieve .Similarly, Cailliez correction also modifies distances by adding a constant to all values, affecting the relative differences between them. Therefore, since our goal is to ensure that FD indices are computed in a space that faithfully represents the raw trait values, we do not recommend applying any corrections.
Functional space
Overlaying traits as vectors in the functional space
Is there an inbuilt function
to overlay traits as vectors in the PCoA representing the
multidimensional functional space using
mFD::funct.space.plot()
instead of the boxplot produced by
mFD::trait.faxes.corr()
?
No, adding vectors to the PCoA space to show how traits relate to
axes is not possible in the mFD
package. This approach is
typically used with PCA, but PCoA differs from PCA, making it complex to
compute and represent actual vector lengths. To check the correlation
between axes and traits in the mFD
package, we recommend
using the mFD::trait.faxes.corr()
function.
Cumulative variance explained by PCoA axes
How can I check the cumulative variance explained by the PCoA axes in the functional space? Where can I access the variance explained by each PCoA axis? (as usually done with a PCA)
The goal of PCA is to find axes which reflect as best as possible traits variability. Relying on a matrix of correlation between traits, it is looking for axes which best reflect traits variability, and thus reduces the traits number to a lower number of variables, called principal components.
Unlike PCA, the goal of PCoA is to optimize distances between samples
- here species. Relying on a distance matrix (here, distances between
species based on their traits), it optimizes the representation of these
distances, to visualize them in a low-dimensional space, without too
much loss of information. As in a PCA, the first axis explains as much
of the variation as possible, the second axis explains as much of the
remaining variation as possible etc. However, while decomposing
distances in principal components, some eigenvalues can have negative
values - they represent possible mathematical axes which are difficult
to envision. As explained earlier (FAQ - Part “Functional traits”),
mFD
doesn’t include an option to correct for these negative
eigenvalues as these options alter trait-based distances, thus the total
variation explained by PCoA axes might not faithfully represent the
total variation captured by all axes.
We therefore recommend using the “mad” metric to assess the quality of the functional space, as this metric shows how much trait-based distances are faithfully represented in the multidimensional space.
Chosing the number of functional axes
The output of the
mFD::quality.fspaces()
function indicates that the best
functional space uses 5 axes. However, some of my assemblages have 5
species or fewer, and using the 5D space would require me to exclude
these small assemblages. What should I do?
Choosing a functional space is always a trade-off between its quality and being able to compute functional indices on most of the assemblages given the number of species they have. Some indices, like FRic and FDiv, can only be computed if the number of species is greater than the number of functional axes. If you want to compute these indices, the best quality functional space might not be your best option. For example, if the 5D space has the highest quality but some of your assemblages have 5 species or fewer, consider the quality of the 4D space and use it, if it still maintains good quality. While making this trade-off, remember that 2D spaces generally have poor quality (cf. Maire et al. (2015).
Functional Entities vs Species frameworks
How should I choose between working with Functional Entities or Species?
Functional Entities (FEs) are groups of species sharing the same traits combination, in other words, the functional distance between species pairs belonging to a given FE is zero.
The mFD::funct.dist()
function returns a warning when at
least one species pair has a functional distance which equals zero
(“Functional distance between some species is equal to 0. You can choose
to gather species into Functional Entities gathering species with
similar traits values”). But whether or not you will work with FEs
depends on how many species pairs have a functional distance equaling 0.
If it doesn’t concern a lot of your species pairs, you can continue
working with species, whereas if a substantial number of the species
pairs have a functional distance equaling 0, you might want to work with
FEs.
If you decide not to work with FEs but some of your species pairs have a functional distance equaling 0, keep in mind that you might encounter errors while computing functional indices, depending on the distribution of species within your assemblages. In fact, as species belonging to the same FE have the same position in the functional space, it could lead to problems while computing functional indices, for instance not being able to compute the convex-hull of Functional Richness if species delineating the assemblage have the same position in the functional space.
Which indices can I compute with the Functional Entities framework and how?
When using Functional Entities (FEs), you can compute functional indices based on those described by Mouillot et al. (2014) (see our tutorial, as well as alpha and beta metrics from our general workflow tutorial, including FRic, FDiv, FDis, FEve, FSpe, FMPD, FNND, FIde, and FORi.
To work with Functional Entities, you need data frames similar to those used for species but with FEs information. Specifically, you need:
- A FEs × assemblages matrix (assemblages in rows and FEs in columns)
- A traits × FEs matrix (FEs in rows and traits in columns)
To compute indices based on Functional Entities:
- Use
mFD::sp.to.fe()
to group species into FEs and create the traits × FEs dataframe. - Build the assemblage × FEs dataframe using fe_nm, which lists which species belong to each FE, and the species × assemblages dataframe.
- Compute functional distances between FEs with
mFD::funct.dist()
. - Follow the general workflow tutorial with the matrix of functional distances between FEs.
We have added new functions in the development version of the
mFD
package to assist with these steps:
-
mFD::from.spfe.to.feasb()
: Computes the assemblages × FEs dataframe with outputs of themFD
::sp.to.fe() function. -
mFD::fe.sp.df.computation()
: Creates a dataframe linking FEs names to species names with outputs of themFD
::sp.to.fe() function. -
mFD::search.sp.nm()
: Finds a species name given an FE name. -
mFD::from.fecoord.to.spcoord()
: Converts the dataframe of FEs coordinates to one with species coordinates.
Functional indices
Access functional specialisation and origianlity per species
I computed functional
specialization (FSpe) and originality (FOri) values for my assemblages
using the mFD::alpha.fd.multidim()
function. The outputs
are the mean value of these indices per assemblage. How can I access
these values per species?
If you want to retrieve the distance to the nearest species of the
global pool, use the output: details$asb_dist_nn_pool
(which is computed only if you are computing the FOri index). If you
want to retrieve the distance to the nearest species from a given
assemblage, use the output: details$asb_dist_nn_asb
(which
is computed only if you are computing the FNND index)
Fuzzy traits and alpha diversity
My dataset includes 2 fuzzy traits and 2 non-fuzzy traits. In the provided tutorials, fuzzy traits are only used when calculating diversity indices based on Hill Numbers or Beta Diversity. Is this because the authors omitted fuzzy traits from the alpha diversity tutorial for simplicity, or is it not advisable to use fuzzy traits for calculating alpha diversity indices?
We omitted fuzzy traits in the general workflow showing how to compute alpha and beta indices in a multidimensional space to keep the example simple. The workflow does work with fuzzy traits.
Fuzzy traits and Weights
I’m using fuzzy traits and
giving a weight for each modality. While computing the distance matrix
with the mFD::funct.dist()
function, a warning suggests
omitting columns related to fuzzy traits due to uneven distribution.
Since these traits are proportional values from 0 to 1 and sum to 1 with
other columns for the same fuzzy trait, is it safe to ignore this
warning?
It’s a warning message from the gawdis() function of the gawdis
package. The warning is made based on a check for each trait modality,
you can check the code here
(last lines for the warning message). For more information you can read
the gawdis package description and the paper
from De Bello. The mFD
package, does not (yet) include
this corrected-weight approach but if you want to use it, you can use
the gawdis::gawdis() function instead of mFD::funct.dist()
and use its outputs in other mFD
functions.
High beta diversity in functional space & low beta diversity using Hill numbers
How can I explain a high beta diversity between assemblages in the multidimensional space and low functional beta diversity using the generalization of Hill numbers?
The beta-diversity indices based on overlap of convex hulls (computed
through mFD::beta.fd.multidim())
and those based on
distance between species (computed through
mFD::beta.fd.hill()
) are measuring different types of
dissimilarity. Convex hulls account for the species with the most
extreme coordinates, while Hill index accounts for position (and biomass
if available) of all species. Hence they provide complementary
indicators of how assemblages differ to each other.
Beta FD Hill and pairs of sites
The output of the
mFD::beta.fd.hill()
function is a distance matrix between
assemblages. The article
presenting these indices states that one value of the beta metric is
defined for all sites together. How can I link the outputs of the
mFD::beta.fd.hill()
functions to the overall values
highlighted in the article?
Gamma=alpha*beta decompositions are presented for the general case of
N communities in Chao et al (2019). In the mFD
package, we
have coded the most common use for ecologists, which is N=2. Thus, the
beta function calculates values for all possible pairs among the N.
Gamma and alpha values are returned if store.details=TRUE is set.
What does “functionally equally dstinct species” mean?
While reading the “Compute Functional Diversity Hill Indices”, I am struggling to understand the phrase “functionally equally distinct species”. What does it mean?
The “functionally equally distinct species” refers to the concept of Hill numbers (see Jost (2006) Ecology) that refers “to the number of equivalent species”. The main point is that the higher the index, the higher the proportion of biomass on the most distant species. If FDq equals 1, it means that all species are functionally identical so it is like having a single “functional” unit.
Null values of Beta FD Hill
I’m using the
mFD::beta.fd.hill()
function with presence-absence data and
setting q = 0. However, the output only contains null values
(beta_fd_q$q0 = 0). Why is this happening?
In Chao’s framework, the parameter tau is used to indicate that all species with a distance < tau are considered to be in the same functional unit (equivalent to species in taxonomy). You get 0 in your case, as even though the species have different trait values, they are similar to each other. If you want more sensitivity for q=0, you can set tau=min.
How to plot species detected in both assemblages with a third color?
There is currently no option in the
mFD::alpha.multidim.plot()
function to do this directly.
However, you can use the functions within
mFD::alpha.multidim.plot()
to achieve this by creating a
“false” assemblage that includes species from both assemblages you want
to plot. In the following
tutorial we have shown how to use these functions.