Method motivation
Microbes do not operate as independent entities, but instead form a complex network to jointly shape biodiversity and regulate ecosystem functions. However, quantifying biotic associations among pairs of microbes at the community level on behalf of potential facilitation and competition across contrasting environments is extremely challenging due to the organism’s intricacy and associated culturing difficulties.
We developed a workflow—the quantifying community-level microbial interactions ‘qcmi’ R package—to identify putative biotic associations from microbial ecological networks and quantify them at the community level. This R package is based on a series of common and advanced approaches in microbial ecology, and is designed to be user-friendly, easily applied, and flexible.
Package features
qcmi computational modules are independent of each other and can be manually selected according to the user’s needs.
qcmi assigns the community assembly processes to microbial associations to identify putative biotic associations from complex microbial ecological networks.
qcmi was developed based on matrix multiplication to quantify the strength of putative biotic associations from the taxon level to the community level.
Main steps and key functions
The method consists of four main steps: (ⅰ) constructing ecological networks, (ⅱ) assigning assembly processes, (ⅲ) quantifying biotic associations, and (ⅳ) assessing ecological consequence.
First, reliable ecological networks were constructed, after which some abiotic-driven associations, i.e., dispersal limitations and environmental selections, were filtered out by link tests. This ensured that the filtered networks represent putative biotic associations as far as possible. Second, the average associating strength was weighted with corresponding microbial abundance, and the resulting metrics for each sample were quantified to describe microbial biotic associations at the community level. After verifying that the metrics improved the application methodology for interpreting and predicting microbial biodiversity and functions, and confirming that this method offers a simple way for microbial ecologists to explore the importance of community-level negative (antagonistic) and positive (mutualistic) associations in their study systems, the workflow was developed as a flexible R package (https://github.com/joshualiuxu/qcmi).
The igraph format is the basic class. All the other classes depend on the igraph package.
The objects inside the rectangle with dot line represent input files. The red rectangle with full line means it is output file or object. The italics denotes the key functions that deserve more attention.
The trans_ps()
function converts the OTU abundance table
and the corresponding taxonomic information into the phyloseq format,
which enables the user to customize the types of data included in the
analysis.
The filter_ps()
function utilizes phyloseq for filtering
data by both ubiquity and abundance.
The cal_network()
function was applied to construct
ecological networks.
the methods of random matrix theory rmt()
and
idirect()
were also provided to filter ecological
associations
The assigned_process()
function, which incorporates the
test_link_env()
and test_link_dl()
embedded
functions, is a tool for calculating the relative contribution of
assembly processes to microbial ecological associations (also called
link tests in previous study).
The number of zeros was calculated using zero()
, while
pos()
and neg()
were applied to calculate the
average value of all positive and negative connectedness for each taxon,
respectively.
Using a qcmi()
core function, the species (corresponding
to ASVs or phylotypes) associations (connectedness) weighted
corresponding abundance was quantified to the community level via matrix
multiplication.
A standard method that included the cal_alphacon()
and
cal_betacon()
functions was compiled to explore the
ecological consequence of biotic associations between microbes on
biodiversity as well as functions.
Example for a general workflow
This method was applied to empirical data including otu_table, tax_info, geo_info, and env_info.
We explained the results obtained from the workflow and the functions involved in each step.
Firstly, we should load related R packages
including phyloseq ggClusterNet tidyverse WGCNA igraph Matrix vegan reticulate
like this:
library("phyloseq")
library("ggClusterNet")
library("tidyverse")
library("WGCNA")
library("igraph")
library("Matrix")
library("vegan")
library("magrittr")
library("reticulate")
library("SpiecEasi")
library("ggstatsplot")
If these related packages are not installed, first install it
# from CRAN
#install.packages("reticulate")
#install.packages("magrittr")
#install.packages("vegan")
#install.packages("Matrix")
#install.packages("igraph")
#install.packages("WGCNA")
#install.packages("tidyverse")
#install.packages("ggalluvial")
#install.packages("ggstatsplot")
# from BiocManager
#BiocManager::install("phyloseq")
#BiocManager::install(c("AnnotationDbi", "impute","GO.db", "preprocessCore"))
#from GitHub
#devtools::install_github("taowenmicro/ggClusterNet")
#devtools::install_github("zdk123/SpiecEasi")
We install the qcmi package from the GitHub and load it
# If devtools package is not installed, first install it
#install.packages("devtools")
#devtools::install_github("joshualiuxu/qcmi")
library(qcmi)
Then, we set the working path
="./data"
wd="./result"
save.wdif(!dir.exists(wd)){dir.create(wd)}
Collate the data involved in the workflow of qcmi and construct microbial ecological networks
setwd(wd)
=read.table("otu_table.txt",header=TRUE, row.names=1)
otu=read.table("tax.txt",header=TRUE, row.names=1)
tax=read.table("env.txt",header=TRUE, row.names=1)
env=read.table("geo.txt",header=TRUE, row.names=1) geo
Change the work path for saving the subsequent output results
if(!dir.exists(save.wd)){dir.create(save.wd)}
setwd(save.wd)
Check the imported feature table
library(kableExtra)
kable(head(otu,c(6L, 10L)), "html") %>%
kable_styling(full_width = F)
LX1 | LX2 | LX3 | LX4 | LX5 | LX6 | LX7 | LX8 | LX9 | DZ1 | |
---|---|---|---|---|---|---|---|---|---|---|
4479944 | 40 | 30 | 26 | 31 | 18 | 40 | 27 | 35 | 34 | 54 |
244331 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 1 | 0 | 1 |
244336 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
973124 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4436710 | 1 | 1 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 |
101767 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Check the imported taxonomic table
library(kableExtra)
kable(head(tax,c(6L, 6L)), "html") %>%
kable_styling(full_width = F)
Phylum | Class | |
---|---|---|
4479944 | Actinobacteria | MB-A2-108 |
244331 | Proteobacteria | Alphaproteobacteria |
244336 | Firmicutes | Bacilli |
973124 | Bacteroidetes | [Saprospirae] |
4436710 | Nitrospirae | Nitrospira |
101767 | Proteobacteria | Alphaproteobacteria |
Check the imported environmental table
library(kableExtra)
kable(head(env,c(6L, 6L)), "html") %>%
kable_styling(full_width = F)
pH | SM | OC | DOC | TN | DTN | |
---|---|---|---|---|---|---|
LX1 | 8.36 | 17.04036 | 10.586006 | 52.24753 | 1.258441 | 83.42257 |
LX2 | 8.48 | 25.89928 | 12.247537 | 58.36148 | 1.346855 | 82.19653 |
LX3 | 8.39 | 29.92701 | 10.801080 | 65.17704 | 1.238134 | 18.52955 |
LX4 | 8.61 | 20.51282 | 9.196798 | 58.71469 | 1.011230 | 26.72758 |
LX5 | 8.18 | 15.67329 | 12.329535 | 51.06561 | 1.284343 | 240.60114 |
LX6 | 8.65 | 26.65037 | 11.303184 | 54.70042 | 1.246163 | 26.16724 |
Check the imported geographical table
library(kableExtra)
kable(head(geo,c(6L, 2L)), "html") %>%
kable_styling(full_width = F)
long | lat | |
---|---|---|
LX1 | 116.6106 | 37.28731 |
LX2 | 116.6251 | 37.30519 |
LX3 | 116.6336 | 37.32939 |
LX4 | 116.6489 | 37.26731 |
LX5 | 116.6633 | 37.28381 |
LX6 | 116.6663 | 37.30561 |
Next, we convert these data to phyloseq format and filter the feature table by ubiquity and abundance
= trans_ps(otu_table = otu, taxa_table = tax)
psprint(ps)
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 74750 taxa and 243 samples ]
## tax_table() Taxonomy Table: [ 74750 taxa by 2 taxonomic ranks ]
Based on experience and published studies, we generally recommend to retain taxa with > 25% occurrence throughout the whole community (ubiquity) and a relative abundance ≥ 0.001% (abundance)
= filter_ps(ps= ps, occurrence.threshold=60, abundance.threshold=500)
filteredpsprint(filteredps)
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 1420 taxa and 243 samples ]
## tax_table() Taxonomy Table: [ 1420 taxa by 2 taxonomic ranks ]
Then, weconstruct the ecological network through different methods (e.g., Pearson, Spearman, SpiecEasi)
It is worth noting that the correlation-based approach has its own drawbacks including the above mentioned difficulty in dealing with zero-inflated models (ignoring rare species in the community) and filtering for weakly correlated relationships (e.g., commensalism and amensalism)
Spiec-Easi’s estimate of entries in the inverse covariance matrix depends on the conditional states of all available nodes, which helps avoid detection of indirect network interactions; thus, we recommend this method to infer the network.
#To calculate the speed of cases, we selected the top 100 taxa with the highest relative abundance as the display
= cal_network(ps = filteredps,N=100,lambda.min.ratio=1e-2,nlambda=20,method ="spieceasi")
result_netsummary(result_net)
## Length Class Mode
## occor.r 10000 dsCMatrix S4
## elist 3 sparseSummary list
## method 1 -none- character
## phyloseq 1 phyloseq S4
## igraph 100 igraph list
## result.se 5 pulsar.refit list
To evaluate the network reliability, we use the function of
getStability()
#around 0.05 is acceptable
= getStability(result_net$result.se)
net_reliability print(net_reliability)
## [1] 0.04959063
Similarly, if you have used other network inference methods to build your network, you have the flexibility to import the igraph format file here.
write.csv(result_net$elist,file="edgelist.csv")
import edgelist into R and convert to igraph
<- read.csv("D:/YANJIUSHENG/method/qcmi/v0107/RMD/result/edgelist.csv")
edgelist =graph_from_edgelist(as.matrix(edgelist[,-3]) , directed = FALSE)
ig.spieceasi=set_edge_attr(ig.spieceasi, 'weight', index = E(ig.spieceasi), as.numeric(edgelist[,3]) ) ig.spieceasi
Check the number of nodes and edges in the constructed network
length(V(ig.spieceasi))#number of nodes
## [1] 100
length(E(ig.spieceasi))#number of edges
## [1] 240
View the distribution of association weights
hist(result_net$elist[,3], main='', xlab='edge weights')
Assign edgelist formatted network results to a new object
=result_net$elist
edgelistlibrary(kableExtra)
kable(head(edgelist,c(6L, 3L)), "html") %>%
kable_styling(full_width = F)
i | j | x |
---|---|---|
1 | 4 | 0.0095135 |
5 | 6 | 0.3037520 |
5 | 7 | 0.0680487 |
3 | 9 | 0.0290132 |
4 | 11 | -0.0980523 |
10 | 11 | 0.0403767 |
Optional
Disentangling direct from indirect relationships in association
networks could be conducted by idirect()
Note that since
idirect is compiled from python, the function only provides support for
py3.5 3.8 3.9, cited by Xiao et al., 2022 First, save the edgelist file
to txt format, convenient to calculate by python module in R
Notably, idirect currently only handles absolute values of association weights and needs to reassign the positive and negative type after the calculation is complete.
idirect(pathname='D:/YANJIUSHENG/method/qcmi/v0107/RMD/result', filename = "D:/YANJIUSHENG/method/qcmi/v0107/RMD/result/edgelist.txt", outputname = "D:/YANJIUSHENG/method/qcmi/v0107/RMD/result/output_idirect1.txt")
## [1] 240
In addition to idirect()
for disentangling direct from
indirect relationships, if the correlations requires the RMT algorithm
rmt()
to filter the correlation coefficients, the RMT
function wiil be calculated.
= cal_network(ps = filteredps,N=100,r.threshold=0,p.threshold=0.05,method ="pearson")
result_net1 =result_net1$occor.r
adj_mat<- rmt(adj_mat ,lcor=0.4, hcor=0.6) tc1
## [1] 40
## [1] 41
## [1] 42
## [1] 43
## [1] 44
## [1] 45
## [1] 46
## [1] 47
## [1] 48
## [1] 49
## [1] 50
## [1] 51
## [1] 52
## [1] 53
## [1] 54
## [1] 55
## [1] 56
## [1] 57
## [1] 58
## [1] 59
## [1] 60
print(message("The optimized COR threshold: ", tc1, "...\n"))
## NULL
abs(adj_mat)< tc1] = 0 adj_mat[
Convert the correlation-based approach to igraph format as well to compare with the network inferred by Spiec-Easi
=graph_from_adjacency_matrix(adj_mat, mode = "undirected",weighted = TRUE, diag = FALSE) ig.pearson
We also calculate the topology for the level of nodes and networks, using ggClusterNet package
#for Spiec-Easi
=node_properties(ig.spieceasi)
results_node_properties_se library(kableExtra)
kable(head(results_node_properties_se,c(6L, 3L)), "html") %>%
kable_styling(full_width = F)
igraph.degree | igraph.closeness | igraph.betweenness |
---|---|---|
4 | 0.2903226 | 106.1048 |
5 | 0.2886297 | 350.6054 |
2 | 0.2578125 | 0.0000 |
7 | 0.3036810 | 202.3105 |
7 | 0.3224756 | 148.4480 |
6 | 0.2928994 | 108.0314 |
=net_properties(ig.spieceasi)
results_net_properties_se library(kableExtra)
kable(head(results_net_properties_se,c(6L, 1L)), "html") %>%
kable_styling(full_width = F)
value | |
---|---|
num.edges | 240.0000000 |
num.pos.edges | 162.0000000 |
num.neg.edges | 78.0000000 |
num.vertices | 100.0000000 |
connectance | 0.0484848 |
average.degree | 4.8000000 |
#for Pearson
=node_properties(ig.pearson)
results_node_properties_pearson library(kableExtra)
kable(head(results_node_properties_pearson,c(6L, 3L)), "html") %>%
kable_styling(full_width = F)
igraph.degree | igraph.closeness | igraph.betweenness | |
---|---|---|---|
828667 | 13 | 0.4545455 | 14.316054 |
996116 | 19 | 0.4846939 | 33.635714 |
223418 | 13 | 0.4398148 | 5.564854 |
1012112 | 42 | 0.5757576 | 139.460634 |
717396 | 43 | 0.5900621 | 116.885781 |
892000 | 36 | 0.5459770 | 80.349590 |
=net_properties(ig.pearson)
results_net_properties_pearson library(kableExtra)
kable(head(results_net_properties_pearson,c(6L, 1L)), "html") %>%
kable_styling(full_width = F)
value | |
---|---|
num.edges | 1098.0000000 |
num.pos.edges | 748.0000000 |
num.neg.edges | 350.0000000 |
num.vertices | 100.0000000 |
connectance | 0.2218182 |
average.degree | 21.9600000 |
Continuing with the example of the network obtained by Spiec-Easi inference, we assign the assembly processes to significant network associations
First, we calculate the relative abundance of involved nodes.
= filter_OTU_ps(ps = filteredps,Top = 100)
ps_sub = as.data.frame(t(vegan_otu(ps_sub)))
otu_table row.names(otu_table)<-c(1:100)
Classify ecological associations of dispersal limitation, based on the geographical distance
= assigned_process(link_table_row=edgelist, OTUabd=otu_table, p=0.05 , data= geo,cutoff=0, method="dl")
result_dl
library(kableExtra)
kable(head(result_dl,c(3L, 7L)), "html") %>%
kable_styling(full_width = F)
OTU1 | OTU2 | cor1 | P1 | cor2 | P2 | dl | |
---|---|---|---|---|---|---|---|
16 | 11 | 20 | 0.307669753711242 | 0 | 0.254228347372594 | 0 | yes |
22 | 1 | 27 | 0.270631576262993 | 0 | 0.241122060035479 | 0 | yes |
66 | 27 | 49 | 0.241122060035479 | 0 | 0.381612264424945 | 0 | yes |
Classify ecological associations of environmental filtering, based on the selected environmental dissimilarity
First, detect redundancy of environment variables
library(Hmisc)
plot(varclus(as.matrix(env) ))
Remove variables with rho>0.5 to avoid multicollinearity
library(Hmisc)
=env[,-c(3,6)]
envplot(varclus(as.matrix(env) ))
In this case, we select soil pH, soil moisture, DOC and TN as the variables of environmental selection
= assigned_process(link_table_row=edgelist, OTUabd=otu_table, p=0.05 , data= env['pH'],cutoff=0, method="ef")
result_pH= assigned_process(link_table_row=edgelist, OTUabd=otu_table, p=0.05 , data= env['SM'],cutoff=0, method="ef")
result_SM= assigned_process(link_table_row=edgelist, OTUabd=otu_table, p=0.05 , data= env['DOC'],cutoff=0, method="ef")
result_DOC= assigned_process(link_table_row=edgelist, OTUabd=otu_table, p=0.05 , data= env['TN'],cutoff=0, method="ef")
result_TN
library(kableExtra)
kable(head(result_pH,c(3L, 7L)), "html") %>%
kable_styling(full_width = F)
OTU1 | OTU2 | cor1 | P1 | cor2 | P2 | ef |
---|---|---|---|---|---|---|
1 | 4 | 0.268684799772225 | 0 | 0.443727414612255 | 0 | yes |
5 | 6 | 0.723948580141444 | 0 | 0.701867034073904 | 0 | yes |
5 | 7 | 0.723948580141444 | 0 | 0.57906776258965 | 0 | yes |
kable(head(result_SM,c(3L, 7L)), "html") %>%
kable_styling(full_width = F)
OTU1 | OTU2 | cor1 | P1 | cor2 | P2 | ef | |
---|---|---|---|---|---|---|---|
16 | 11 | 20 | 0.0189453830340267 | 0.00116037984779224 | 0.0120862512867735 | 0.0382428571960613 | yes |
147 | 37 | 74 | 0.0195514514009895 | 0.000800203088686429 | 0.0161098650449364 | 0.00573659774176314 | yes |
198 | 22 | 89 | 0.0177079718909344 | 0.00239303550230058 | 0.0226595641331924 | 0.000101977576680781 | yes |
kable(head(result_DOC,c(3L, 7L)), "html") %>%
kable_styling(full_width = F)
OTU1 | OTU2 | cor1 | P1 | cor2 | P2 | ef | |
---|---|---|---|---|---|---|---|
8 | 7 | 12 | 0.0185455371037244 | 0.00147174597248044 | 0.0124528633890703 | 0.03273469883588 | yes |
17 | 6 | 23 | 0.0209541041336835 | 0.000326509449779305 | 0.0122467876732842 | 0.0357303140797139 | yes |
26 | 4 | 31 | 0.0157444674909832 | 0.00693803171808941 | 0.0158319189797347 | 0.00663154390920595 | yes |
kable(head(result_TN,c(3L, 7L)), "html") %>%
kable_styling(full_width = F)
OTU1 | OTU2 | cor1 | P1 | cor2 | P2 | ef | |
---|---|---|---|---|---|---|---|
15 | 17 | 19 | 0.0138351354606266 | 0.0176861930560371 | 0.0200383479628713 | 0.00059062271339281 | yes |
55 | 17 | 44 | 0.0138351354606266 | 0.0176861930560371 | 0.0202390759778341 | 0.000522378970510267 | yes |
60 | 6 | 46 | 0.0181092671666077 | 0.00190038876227798 | 0.0197620914084732 | 0.000701857438086419 | yes |
Note: Some studies classify elevation as a comprehensive environment, while others classify it as dispersal limitation, which is only shown here.
# Save the results
write.csv(result_pH,"result_pH.csv")
write.csv(result_SM,"result_SM.csv")
write.csv(result_DOC,"result_DOC.csv")
write.csv(result_TN,"result_TN.csv")
write.csv(result_dl,"result_dl.csv")
We combined microbial associations attributed to dispersal limitation and environmental selection to obtain putative biotic associations from the ecological network
library(dplyr)
=row.names(edgelist)
total_link=union(as.character(row.names(result_pH)),as.character(row.names(result_SM)))
ef_link=union(ef_link,as.character(row.names(result_DOC)))
ef_link=union(ef_link,as.character(row.names(result_TN)))
ef_link
=as.character(row.names(result_dl))
dl_link
=setdiff(total_link,ef_link)
bi_link=setdiff(bi_link,dl_link)
bi_link
=edgelist[ef_link,]
efedgewrite.csv(efedge,"efedge.csv")
=edgelist[dl_link,]
dledgewrite.csv(dledge,"dledge.csv")
=edgelist[bi_link,]
biedgewrite.csv(biedge,"effilteredlist.csv")
Convert biotic associations to igraph format and assign weight
=graph_from_edgelist(as.matrix(biedge[,1:2]) , directed = FALSE)
ig.bi=set_edge_attr(ig.bi, 'weight', index = E(ig.bi), as.numeric(biedge[,3]) ) ig.bi
For the output files with igraph format, the node properties have no abundance information.
Now, we show the node colors with the Phylum information and the edges colors with the positive and negative correlations by the softwares of cytoscape and gephi.
We further carry out the qcmi process to quantify the complexity of microbial associations (including positive and negative relationships) at the community level
= qcmi(igraph= ig.bi, OTU= otu_table, pers.cutoff=0) result_bi
hist(result_bi[[1]], main='', xlab='negative connectedness')
hist(result_bi[[2]], main='', xlab='positive connectedness')
hist(result_bi[[3]], main='', xlab='the community complexity of negative associations')
hist(result_bi[[4]], main='', xlab='the community complexity of positive associations')
Finally, we used the results of community complexity of positive and negative associations of microbes in conjunction with microbial diversity and soil pH to assess the ecological consequences of putative biotic associations
#load used packages
library(MASS)
library(adespatial)
library(lmerTest)
We combine the biotic factors represented by putative biotic associations and the abiotic factors represented by environmental and spatial variables
#data=scale(data,center=T,scale=T)
=cbind(result_bi[[3]],result_bi[[4]],env)
datacolnames(data)=c("PNA","PPA",colnames(env))
=as.matrix(data)
data
library(kableExtra)
kable(head(data,c(6L, 7L)), "html") %>%
kable_styling(full_width = F)
PNA | PPA | pH | SM | DOC | TN | NO3 | |
---|---|---|---|---|---|---|---|
LX1 | -0.0536575 | 0.0249428 | 8.36 | 17.04036 | 52.24753 | 1.258441 | 27.838636 |
LX2 | -0.0496019 | 0.0283030 | 8.48 | 25.89928 | 58.36148 | 1.346855 | 24.498240 |
LX3 | -0.0485577 | 0.0292142 | 8.39 | 29.92701 | 65.17704 | 1.238134 | 3.693799 |
LX4 | -0.0487986 | 0.0254426 | 8.61 | 20.51282 | 58.71469 | 1.011230 | 7.867440 |
LX5 | -0.0506057 | 0.0264540 | 8.18 | 15.67329 | 51.06561 | 1.284343 | 73.295009 |
LX6 | -0.0496283 | 0.0225921 | 8.65 | 26.65037 | 54.70042 | 1.246163 | 7.719697 |
First, the contribution of biotic associations on alpha diversity of microbial community is assessed, using two common methods incluidng forward.sel and olslm
#forward.sel
=cal_alphacon(data,"forward.sel",filteredps) res_forward.sel
## Testing variable 1
## Testing variable 2
## Testing variable 3
## Testing variable 4
## Testing variable 5
## Procedure stopped (alpha criteria): pvalue for variable 5 is 0.242000 (> 0.050000)
library(kableExtra)
kable(head(res_forward.sel,c(6L, 7L)), "html") %>%
kable_styling(full_width = F)
variables | order | R2 | R2Cum | AdjR2Cum | F | pvalue |
---|---|---|---|---|---|---|
pH | 3 | 0.3455316 | 0.3455316 | 0.3428159 | 127.237778 | 0.001 |
NH4 | 8 | 0.0793970 | 0.4249286 | 0.4201363 | 33.135496 | 0.001 |
PNA | 1 | 0.0475305 | 0.4724591 | 0.4658372 | 21.533490 | 0.001 |
TN | 6 | 0.0103507 | 0.4828098 | 0.4741175 | 4.763165 | 0.036 |
#olslm
=cal_alphacon(data,"olslm",filteredps) res_olslm
## Start: AIC=1864.28
## diversity[, 1] ~ PNA + PPA + pH + SM + DOC + TN + NO3 + NH4 +
## DON + TP + TK + AP + AK
##
## Df Sum of Sq RSS AIC
## - TP 1 140 465120 1862.3
## - AK 1 186 465166 1862.4
## - AP 1 212 465191 1862.4
## - SM 1 483 465462 1862.5
## - TK 1 752 465731 1862.7
## - PPA 1 862 465842 1862.7
## - DOC 1 1033 466013 1862.8
## - DON 1 2377 467357 1863.5
## <none> 464980 1864.3
## - NO3 1 4004 468984 1864.4
## - TN 1 6266 471246 1865.5
## - pH 1 34551 499531 1879.7
## - PNA 1 38070 503050 1881.4
## - NH4 1 39685 504665 1882.2
##
## Step: AIC=1862.35
## diversity[, 1] ~ PNA + PPA + pH + SM + DOC + TN + NO3 + NH4 +
## DON + TK + AP + AK
##
## Df Sum of Sq RSS AIC
## - AP 1 103 465223 1860.4
## - AK 1 175 465294 1860.4
## - SM 1 410 465530 1860.6
## - PPA 1 798 465917 1860.8
## - TK 1 879 465999 1860.8
## - DOC 1 989 466109 1860.9
## - DON 1 2258 467378 1861.5
## <none> 465120 1862.3
## - NO3 1 4417 469536 1862.6
## - TN 1 7071 472191 1864.0
## + TP 1 140 464980 1864.3
## - PNA 1 38157 503277 1879.5
## - NH4 1 39870 504990 1880.3
## - pH 1 45290 510409 1882.9
##
## Step: AIC=1860.4
## diversity[, 1] ~ PNA + PPA + pH + SM + DOC + TN + NO3 + NH4 +
## DON + TK + AK
##
## Df Sum of Sq RSS AIC
## - AK 1 209 465432 1858.5
## - SM 1 363 465586 1858.6
## - TK 1 798 466021 1858.8
## - PPA 1 818 466041 1858.8
## - DOC 1 989 466212 1858.9
## - DON 1 2455 467678 1859.7
## <none> 465223 1860.4
## - NO3 1 4407 469630 1860.7
## - TN 1 7001 472224 1862.0
## + AP 1 103 465120 1862.3
## + TP 1 32 465191 1862.4
## - PNA 1 38130 503353 1877.5
## - NH4 1 39993 505216 1878.4
## - pH 1 48019 513242 1882.3
##
## Step: AIC=1858.51
## diversity[, 1] ~ PNA + PPA + pH + SM + DOC + TN + NO3 + NH4 +
## DON + TK
##
## Df Sum of Sq RSS AIC
## - SM 1 481 465913 1856.8
## - TK 1 623 466055 1856.8
## - PPA 1 758 466190 1856.9
## - DOC 1 1071 466503 1857.1
## - DON 1 2278 467710 1857.7
## <none> 465432 1858.5
## - NO3 1 4226 469657 1858.7
## - TN 1 6903 472335 1860.1
## + AK 1 209 465223 1860.4
## + AP 1 138 465294 1860.4
## + TP 1 19 465413 1860.5
## - PNA 1 37955 503387 1875.6
## - NH4 1 39790 505222 1876.5
## - pH 1 48914 514346 1880.8
##
## Step: AIC=1856.76
## diversity[, 1] ~ PNA + PPA + pH + DOC + TN + NO3 + NH4 + DON +
## TK
##
## Df Sum of Sq RSS AIC
## - PPA 1 601 466514 1855.1
## - TK 1 625 466538 1855.1
## - DOC 1 1132 467045 1855.3
## - DON 1 1954 467867 1855.8
## <none> 465913 1856.8
## - NO3 1 4469 470382 1857.1
## - TN 1 6436 472349 1858.1
## + SM 1 481 465432 1858.5
## + AK 1 327 465586 1858.6
## + AP 1 82 465831 1858.7
## + TP 1 4 465909 1858.8
## - PNA 1 37477 503390 1873.6
## - NH4 1 39462 505375 1874.5
## - pH 1 50013 515925 1879.5
##
## Step: AIC=1855.08
## diversity[, 1] ~ PNA + pH + DOC + TN + NO3 + NH4 + DON + TK
##
## Df Sum of Sq RSS AIC
## - TK 1 387 466901 1853.3
## - DOC 1 1241 467755 1853.7
## <none> 466514 1855.1
## - DON 1 3978 470492 1855.1
## - NO3 1 6003 472517 1856.2
## + PPA 1 601 465913 1856.8
## - TN 1 7132 473646 1856.8
## + SM 1 324 466190 1856.9
## + AK 1 237 466277 1857.0
## + AP 1 101 466413 1857.0
## + TP 1 0 466514 1857.1
## - PNA 1 37188 503702 1871.7
## - NH4 1 39925 506439 1873.0
## - pH 1 99116 565630 1899.9
##
## Step: AIC=1853.28
## diversity[, 1] ~ PNA + pH + DOC + TN + NO3 + NH4 + DON
##
## Df Sum of Sq RSS AIC
## - DOC 1 1218 468119 1851.9
## <none> 466901 1853.3
## - DON 1 4578 471480 1853.7
## - NO3 1 5907 472808 1854.3
## + TK 1 387 466514 1855.1
## + PPA 1 364 466538 1855.1
## + SM 1 351 466550 1855.1
## + AK 1 79 466822 1855.2
## + TP 1 30 466871 1855.3
## + AP 1 25 466876 1855.3
## - TN 1 7964 474865 1855.4
## - PNA 1 36864 503765 1869.7
## - NH4 1 39922 506823 1871.2
## - pH 1 98924 565826 1898.0
##
## Step: AIC=1851.91
## diversity[, 1] ~ PNA + pH + TN + NO3 + NH4 + DON
##
## Df Sum of Sq RSS AIC
## <none> 468119 1851.9
## - DON 1 4143 472263 1852.0
## - NO3 1 5419 473538 1852.7
## + DOC 1 1218 466901 1853.3
## + PPA 1 453 467667 1853.7
## + SM 1 391 467728 1853.7
## + TK 1 364 467755 1853.7
## + AK 1 137 467982 1853.8
## + AP 1 29 468091 1853.9
## + TP 1 8 468111 1853.9
## - TN 1 8003 476123 1854.0
## - PNA 1 37861 505980 1868.8
## - NH4 1 40483 508602 1870.1
## - pH 1 98598 566718 1896.4
library(kableExtra)
kable(head(res_olslm$coefficients,c(6L, 7L)), "html") %>%
kable_styling(full_width = F)
Estimate | Std. Error | t value | Pr(>|t|) | |
---|---|---|---|---|
(Intercept) | 1344.7185641 | 55.3378541 | 24.300157 | 0.0000000 |
PNA | 4011.4331168 | 918.1745939 | 4.368922 | 0.0000187 |
pH | 18.6569407 | 2.6462309 | 7.050383 | 0.0000000 |
TN | -30.0485658 | 14.9593210 | -2.008685 | 0.0457098 |
NO3 | -0.1989349 | 0.1203622 | -1.652802 | 0.0997008 |
NH4 | -1.3299222 | 0.2943819 | -4.517676 | 0.0000099 |
Then, we also evaluate to what extent the biotic associations explain the community composition (beta diversity) by the Mantel test
#Mantel
library(vegan)
= cal_betacon(data,otu_table,"spearman")
res_mantel
library(kableExtra)
kable(head(res_mantel,c(6L, 7L)), "html") %>%
kable_styling(full_width = F)
variable_name | cor_method | corr_res | p_res | significance |
---|---|---|---|---|
PNA | spearman | 0.4409807 | 0.001 | *** |
PPA | spearman | 0.5888146 | 0.001 | *** |
pH | spearman | 0.7306087 | 0.001 | *** |
SM | spearman | 0.0505215 | 0.040 |
|
DOC | spearman | 0.0996807 | 0.002 | ** |
TN | spearman | 0.0009720 | 0.489 |
Next, we establish the relationships between biotic associations and environmental variables
set.seed(123)
ggcorrmat(
data = as.data.frame(data) ,
colors = c("#B2182B", "white", "#4D4D4D"),
p.adjust.method = "none",
title = "Correlalogram for biotic and abiotic dataset"
)
In particular, the linear relationships between biotic associations and soil pH are established
set.seed(123)
ggscatterstats(
data = as.data.frame(data) ,
x = PPA,
y = pH,
xlab = "PPA",
ylab = "Soil pH",
title = "Relationships between putative positive associations and Soil pH"
)
set.seed(123)
ggscatterstats(
data = as.data.frame(data) ,
x = PNA,
y = pH,
xlab = "PNA",
ylab = "Soil pH",
title = "Relationships between putative negative associations and Soil pH"
)
Notes and outlooks
Overall, qcmi is a user-friendly and reliable workflow for quantifying putative biotic associations of microbes at the community level. The package features: (ⅰ) the ability to insert select data and/or variables at each step, thus enabling a customized approach for specific, individualized use; (ⅱ) assembly processes assignment to each significant association, which helps depict true biotic interactions; and (ⅲ) use of matrix multiplication for quantifying the nature of biotic associations from the taxon to community level. Furthermore, testing with empirical community data demonstrated that qcmi effectively removes abiotic-driven associations and significantly improves diversity variation explanations. This methodology circumvents and resolves known challenges and flaws in microbial analysis related to quantifying microbial biotic associations. Thus, this workflow is expected to have broad applications in many fields across contrasting biomes and environments.
In previously published studies, microbial ecological networks were widely used to infer ecological associations between pairs of co-occurring taxa, but often drastically misused as microbial interactions. This study built a general workflow for quantifying biotic associations that avoids the exaggerated description of ecologically meaningless associations, and thus, disentangles assembly processes in significant ecological associations. Every step in the workflow is well documented in previous studies (#1: Spiec-Easi, #2: Goberna’ s method, #3: Cohesion, #4: microeco package), making this effort a process-based as opposed to technology-based innovation.