Coding for tree taxonomy: TREEmendous

The R package to unify tree species names

Solving today’s ecological challenges requires combining datasets and knowledge from all over the globe. Unfortunately, this can be challenging when matching species names, as different datasets have different accepted names, synonyms, and spellings for the same species. 

When combining multiple datasets with thousands of different records, these discrepancies can lead to large losses of data simply because species’ names can’t be matched to each other. 

“Indeed, different sources will use different taxonomic backbones or even just be in conflict between old and new species names, names that are synonyms or species that have been split. Getting the databases to talk to each other under a unified “language” (unified species names) is time consuming and often requires “translating” all datasets to an intermediary common “language” to then have them talk to each other,” explains Associate Prof. Dr. Andrea Paz Velez of the University of Montreal. 

Because of this challenge, Crowther Lab data scientist Felix Specker created an R package: TREEmendous. The TREEmendous package and package for standardizing taxonomic names of tree species. “This package addresses the challenges of unifying datasets by trying to leverage the information and relationships across all these backbones, while ensuring that resulting species lists are still accepted and consistent within a single backbone,” says Felix Specker. 

Felix’s work was supported by Dr. Andrea Paz Velez and Dr. Daniel Maynard during their tenure at Crowther Lab.

“This R package addresses this issue by providing a simple, reproducible, and streamlined approach to combining different taxonomic species lists, even if they are incomplete or inconsistent with each other,” says Associate Professor of Quantitative Ecology at University College London, Dr. Daniel Maynard.  By identifying synonym relationships across several existing backbones, this package allows the user to “translate” their species list into a target list, minimizing the loss of data and increasing reproducibility and transparency. “In doing so, this package can foster collaboration and data sharing among environmental scientists, restoration ecologists, and conservation biologists.” 

When dealing with big biological data sets that come from different sources or even different researchers within a single group it is always time consuming to get them to be unified, one could say they are basically in different languages and need to be translated. 

“This tool allows for easy and fast translations between databases of tree species names including translating database into your own target and can be potentially be expanded to other taxonomic groups,” says Associate Professor Paz.

Do you have any questions or encounter any issues with TREEmendous? Please reach out and let us know!