Method motivation

Microbes do not operate as independent entities, but instead form a complex network to jointly shape biodiversity and regulate ecosystem functions. However, quantifying biotic associations among pairs of microbes at the community level on behalf of potential facilitation and competition across contrasting environments is extremely challenging due to the organism’s intricacy and associated culturing difficulties.

We developed a workflow—the quantifying community-level microbial interactions ‘qcmi’ R package—to identify putative biotic associations from microbial ecological networks and quantify them at the community level. This R package is based on a series of common and advanced approaches in microbial ecology, and is designed to be user-friendly, easily applied, and flexible.

Package features

qcmi computational modules are independent of each other and can be manually selected according to the user’s needs.

qcmi assigns the community assembly processes to microbial associations to identify putative biotic associations from complex microbial ecological networks.

qcmi was developed based on matrix multiplication to quantify the strength of putative biotic associations from the taxon level to the community level.

Main steps and key functions

The method consists of four main steps: (ⅰ) constructing ecological networks, (ⅱ) assigning assembly processes, (ⅲ) quantifying biotic associations, and (ⅳ) assessing ecological consequence.

First, reliable ecological networks were constructed, after which some abiotic-driven associations, i.e., dispersal limitations and environmental selections, were filtered out by link tests. This ensured that the filtered networks represent putative biotic associations as far as possible. Second, the average associating strength was weighted with corresponding microbial abundance, and the resulting metrics for each sample were quantified to describe microbial biotic associations at the community level. After verifying that the metrics improved the application methodology for interpreting and predicting microbial biodiversity and functions, and confirming that this method offers a simple way for microbial ecologists to explore the importance of community-level negative (antagonistic) and positive (mutualistic) associations in their study systems, the workflow was developed as a flexible R package (https://github.com/joshualiuxu/qcmi).

The igraph format is the basic class. All the other classes depend on the igraph package.

The objects inside the rectangle with dot line represent input files. The red rectangle with full line means it is output file or object. The italics denotes the key functions that deserve more attention.

The trans_ps() function converts the OTU abundance table and the corresponding taxonomic information into the phyloseq format, which enables the user to customize the types of data included in the analysis.

The filter_ps() function utilizes phyloseq for filtering data by both ubiquity and abundance.

The cal_network() function was applied to construct ecological networks.

the methods of random matrix theory rmt() and idirect() were also provided to filter ecological associations

The assigned_process() function, which incorporates the test_link_env() and test_link_dl() embedded functions, is a tool for calculating the relative contribution of assembly processes to microbial ecological associations (also called link tests in previous study).

The number of zeros was calculated using zero(), while pos() and neg() were applied to calculate the average value of all positive and negative connectedness for each taxon, respectively.

Using a qcmi() core function, the species (corresponding to ASVs or phylotypes) associations (connectedness) weighted corresponding abundance was quantified to the community level via matrix multiplication.

A standard method that included the cal_alphacon() and cal_betacon() functions was compiled to explore the ecological consequence of biotic associations between microbes on biodiversity as well as functions.

Example for a general workflow

This method was applied to empirical data including otu_table, tax_info, geo_info, and env_info.

We explained the results obtained from the workflow and the functions involved in each step.

Firstly, we should load related R packages

including phyloseq ggClusterNet tidyverse WGCNA igraph Matrix vegan reticulate

like this:

library("phyloseq")
library("ggClusterNet")
library("tidyverse")
library("WGCNA")
library("igraph")
library("Matrix")
library("vegan")
library("magrittr")
library("reticulate")
library("SpiecEasi")
library("ggstatsplot")

If these related packages are not installed, first install it

# from CRAN
#install.packages("reticulate")
#install.packages("magrittr")
#install.packages("vegan")
#install.packages("Matrix")
#install.packages("igraph")
#install.packages("WGCNA")
#install.packages("tidyverse")
#install.packages("ggalluvial")
#install.packages("ggstatsplot")
# from BiocManager
#BiocManager::install("phyloseq")
#BiocManager::install(c("AnnotationDbi", "impute","GO.db", "preprocessCore"))
#from GitHub
#devtools::install_github("taowenmicro/ggClusterNet")
#devtools::install_github("zdk123/SpiecEasi")

We install the qcmi package from the GitHub and load it

# If devtools package is not installed, first install it
#install.packages("devtools")
#devtools::install_github("joshualiuxu/qcmi")
library(qcmi)

Then, we set the working path

wd="./data"
save.wd="./result"
if(!dir.exists(wd)){dir.create(wd)}

Collate the data involved in the workflow of qcmi and construct microbial ecological networks

setwd(wd)
otu=read.table("otu_table.txt",header=TRUE, row.names=1)
tax=read.table("tax.txt",header=TRUE, row.names=1)
env=read.table("env.txt",header=TRUE, row.names=1)
geo=read.table("geo.txt",header=TRUE, row.names=1)

Change the work path for saving the subsequent output results

if(!dir.exists(save.wd)){dir.create(save.wd)}
setwd(save.wd)

Check the imported feature table

library(kableExtra)
kable(head(otu,c(6L, 10L)), "html") %>%
kable_styling(full_width = F)
LX1 LX2 LX3 LX4 LX5 LX6 LX7 LX8 LX9 DZ1
4479944 40 30 26 31 18 40 27 35 34 54
244331 0 0 0 0 0 0 2 1 0 1
244336 0 0 0 0 0 0 0 0 0 0
973124 0 0 0 0 0 0 0 0 0 0
4436710 1 1 0 0 2 0 0 0 0 0
101767 0 0 0 0 0 0 0 0 0 0

Check the imported taxonomic table

library(kableExtra)
kable(head(tax,c(6L, 6L)), "html") %>%
kable_styling(full_width = F)
Phylum Class
4479944 Actinobacteria MB-A2-108
244331 Proteobacteria Alphaproteobacteria
244336 Firmicutes Bacilli
973124 Bacteroidetes [Saprospirae]
4436710 Nitrospirae Nitrospira
101767 Proteobacteria Alphaproteobacteria

Check the imported environmental table

library(kableExtra)
kable(head(env,c(6L, 6L)), "html") %>%
kable_styling(full_width = F)
pH SM OC DOC TN DTN
LX1 8.36 17.04036 10.586006 52.24753 1.258441 83.42257
LX2 8.48 25.89928 12.247537 58.36148 1.346855 82.19653
LX3 8.39 29.92701 10.801080 65.17704 1.238134 18.52955
LX4 8.61 20.51282 9.196798 58.71469 1.011230 26.72758
LX5 8.18 15.67329 12.329535 51.06561 1.284343 240.60114
LX6 8.65 26.65037 11.303184 54.70042 1.246163 26.16724

Check the imported geographical table

library(kableExtra)
kable(head(geo,c(6L, 2L)), "html") %>%
kable_styling(full_width = F)
long lat
LX1 116.6106 37.28731
LX2 116.6251 37.30519
LX3 116.6336 37.32939
LX4 116.6489 37.26731
LX5 116.6633 37.28381
LX6 116.6663 37.30561

Next, we convert these data to phyloseq format and filter the feature table by ubiquity and abundance

ps= trans_ps(otu_table = otu, taxa_table =  tax)
print(ps)
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 74750 taxa and 243 samples ]
## tax_table()   Taxonomy Table:    [ 74750 taxa by 2 taxonomic ranks ]

Based on experience and published studies, we generally recommend to retain taxa with > 25% occurrence throughout the whole community (ubiquity) and a relative abundance ≥ 0.001% (abundance)

filteredps= filter_ps(ps= ps, occurrence.threshold=60,  abundance.threshold=500)
print(filteredps)
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 1420 taxa and 243 samples ]
## tax_table()   Taxonomy Table:    [ 1420 taxa by 2 taxonomic ranks ]

Then, weconstruct the ecological network through different methods (e.g., Pearson, Spearman, SpiecEasi)

It is worth noting that the correlation-based approach has its own drawbacks including the above mentioned difficulty in dealing with zero-inflated models (ignoring rare species in the community) and filtering for weakly correlated relationships (e.g., commensalism and amensalism)

Spiec-Easi’s estimate of entries in the inverse covariance matrix depends on the conditional states of all available nodes, which helps avoid detection of indirect network interactions; thus, we recommend this method to infer the network.

#To calculate the speed of cases, we selected the top 100 taxa with the highest relative abundance as the display
result_net= cal_network(ps = filteredps,N=100,lambda.min.ratio=1e-2,nlambda=20,method ="spieceasi")
summary(result_net)
##           Length Class         Mode     
## occor.r   10000  dsCMatrix     S4       
## elist         3  sparseSummary list     
## method        1  -none-        character
## phyloseq      1  phyloseq      S4       
## igraph      100  igraph        list     
## result.se     5  pulsar.refit  list

To evaluate the network reliability, we use the function of getStability()

#around 0.05 is acceptable
net_reliability = getStability(result_net$result.se)
print(net_reliability)
## [1] 0.04959063

Similarly, if you have used other network inference methods to build your network, you have the flexibility to import the igraph format file here.

write.csv(result_net$elist,file="edgelist.csv")

import edgelist into R and convert to igraph

edgelist <- read.csv("D:/YANJIUSHENG/method/qcmi/v0107/RMD/result/edgelist.csv")
ig.spieceasi=graph_from_edgelist(as.matrix(edgelist[,-3]) , directed = FALSE)  
ig.spieceasi=set_edge_attr(ig.spieceasi, 'weight', index = E(ig.spieceasi),  as.numeric(edgelist[,3]) )

Check the number of nodes and edges in the constructed network

length(V(ig.spieceasi))#number of nodes
## [1] 100
length(E(ig.spieceasi))#number of edges
## [1] 240

View the distribution of association weights

hist(result_net$elist[,3], main='', xlab='edge weights')

Assign edgelist formatted network results to a new object

edgelist=result_net$elist
library(kableExtra)
kable(head(edgelist,c(6L, 3L)), "html") %>%
kable_styling(full_width = F)
i j x
1 4 0.0095135
5 6 0.3037520
5 7 0.0680487
3 9 0.0290132
4 11 -0.0980523
10 11 0.0403767

Optional

Disentangling direct from indirect relationships in association networks could be conducted by idirect() Note that since idirect is compiled from python, the function only provides support for py3.5 3.8 3.9, cited by Xiao et al., 2022 First, save the edgelist file to txt format, convenient to calculate by python module in R

Notably, idirect currently only handles absolute values of association weights and needs to reassign the positive and negative type after the calculation is complete.

idirect(pathname='D:/YANJIUSHENG/method/qcmi/v0107/RMD/result', filename = "D:/YANJIUSHENG/method/qcmi/v0107/RMD/result/edgelist.txt", outputname = "D:/YANJIUSHENG/method/qcmi/v0107/RMD/result/output_idirect1.txt")
## [1] 240

In addition to idirect() for disentangling direct from indirect relationships, if the correlations requires the RMT algorithm rmt() to filter the correlation coefficients, the RMT function wiil be calculated.

result_net1 = cal_network(ps = filteredps,N=100,r.threshold=0,p.threshold=0.05,method ="pearson")
adj_mat=result_net1$occor.r
tc1 <- rmt(adj_mat ,lcor=0.4, hcor=0.6)
## [1] 40
## [1] 41
## [1] 42
## [1] 43
## [1] 44
## [1] 45
## [1] 46
## [1] 47
## [1] 48
## [1] 49
## [1] 50
## [1] 51
## [1] 52
## [1] 53
## [1] 54
## [1] 55
## [1] 56
## [1] 57
## [1] 58
## [1] 59
## [1] 60
print(message("The optimized COR threshold: ", tc1, "...\n"))
## NULL
adj_mat[abs(adj_mat)< tc1] = 0

Convert the correlation-based approach to igraph format as well to compare with the network inferred by Spiec-Easi

ig.pearson=graph_from_adjacency_matrix(adj_mat, mode =  "undirected",weighted = TRUE, diag = FALSE)

We also calculate the topology for the level of nodes and networks, using ggClusterNet package

#for Spiec-Easi
results_node_properties_se =node_properties(ig.spieceasi)
library(kableExtra)
kable(head(results_node_properties_se,c(6L, 3L)), "html") %>%
kable_styling(full_width = F)
igraph.degree igraph.closeness igraph.betweenness
4 0.2903226 106.1048
5 0.2886297 350.6054
2 0.2578125 0.0000
7 0.3036810 202.3105
7 0.3224756 148.4480
6 0.2928994 108.0314
results_net_properties_se =net_properties(ig.spieceasi)
library(kableExtra)
kable(head(results_net_properties_se,c(6L, 1L)), "html") %>%
kable_styling(full_width = F)
value
num.edges 240.0000000
num.pos.edges 162.0000000
num.neg.edges 78.0000000
num.vertices 100.0000000
connectance 0.0484848
average.degree 4.8000000
#for Pearson
results_node_properties_pearson =node_properties(ig.pearson)
library(kableExtra)
kable(head(results_node_properties_pearson,c(6L, 3L)), "html") %>%
kable_styling(full_width = F)
igraph.degree igraph.closeness igraph.betweenness
828667 13 0.4545455 14.316054
996116 19 0.4846939 33.635714
223418 13 0.4398148 5.564854
1012112 42 0.5757576 139.460634
717396 43 0.5900621 116.885781
892000 36 0.5459770 80.349590
results_net_properties_pearson =net_properties(ig.pearson)
library(kableExtra)
kable(head(results_net_properties_pearson,c(6L, 1L)), "html") %>%
kable_styling(full_width = F)
value
num.edges 1098.0000000
num.pos.edges 748.0000000
num.neg.edges 350.0000000
num.vertices 100.0000000
connectance 0.2218182
average.degree 21.9600000

Continuing with the example of the network obtained by Spiec-Easi inference, we assign the assembly processes to significant network associations

First, we calculate the relative abundance of involved nodes.

ps_sub = filter_OTU_ps(ps = filteredps,Top = 100)
otu_table = as.data.frame(t(vegan_otu(ps_sub)))
row.names(otu_table)<-c(1:100)

Classify ecological associations of dispersal limitation, based on the geographical distance

result_dl= assigned_process(link_table_row=edgelist, OTUabd=otu_table, p=0.05 , data= geo,cutoff=0, method="dl")

library(kableExtra)
kable(head(result_dl,c(3L, 7L)), "html") %>%
kable_styling(full_width = F)
OTU1 OTU2 cor1 P1 cor2 P2 dl
16 11 20 0.307669753711242 0 0.254228347372594 0 yes
22 1 27 0.270631576262993 0 0.241122060035479 0 yes
66 27 49 0.241122060035479 0 0.381612264424945 0 yes

Classify ecological associations of environmental filtering, based on the selected environmental dissimilarity

First, detect redundancy of environment variables

library(Hmisc)
plot(varclus(as.matrix(env) ))

Remove variables with rho>0.5 to avoid multicollinearity

library(Hmisc)
env=env[,-c(3,6)]
plot(varclus(as.matrix(env) ))

In this case, we select soil pH, soil moisture, DOC and TN as the variables of environmental selection

result_pH= assigned_process(link_table_row=edgelist, OTUabd=otu_table, p=0.05 , data= env['pH'],cutoff=0, method="ef")
result_SM= assigned_process(link_table_row=edgelist, OTUabd=otu_table, p=0.05 , data= env['SM'],cutoff=0, method="ef")
result_DOC= assigned_process(link_table_row=edgelist, OTUabd=otu_table, p=0.05 , data= env['DOC'],cutoff=0, method="ef")
result_TN= assigned_process(link_table_row=edgelist, OTUabd=otu_table, p=0.05 , data= env['TN'],cutoff=0, method="ef")

library(kableExtra)
kable(head(result_pH,c(3L, 7L)), "html") %>%
kable_styling(full_width = F)
OTU1 OTU2 cor1 P1 cor2 P2 ef
1 4 0.268684799772225 0 0.443727414612255 0 yes
5 6 0.723948580141444 0 0.701867034073904 0 yes
5 7 0.723948580141444 0 0.57906776258965 0 yes
kable(head(result_SM,c(3L, 7L)), "html") %>%
kable_styling(full_width = F)
OTU1 OTU2 cor1 P1 cor2 P2 ef
16 11 20 0.0189453830340267 0.00116037984779224 0.0120862512867735 0.0382428571960613 yes
147 37 74 0.0195514514009895 0.000800203088686429 0.0161098650449364 0.00573659774176314 yes
198 22 89 0.0177079718909344 0.00239303550230058 0.0226595641331924 0.000101977576680781 yes
kable(head(result_DOC,c(3L, 7L)), "html") %>%
kable_styling(full_width = F)
OTU1 OTU2 cor1 P1 cor2 P2 ef
8 7 12 0.0185455371037244 0.00147174597248044 0.0124528633890703 0.03273469883588 yes
17 6 23 0.0209541041336835 0.000326509449779305 0.0122467876732842 0.0357303140797139 yes
26 4 31 0.0157444674909832 0.00693803171808941 0.0158319189797347 0.00663154390920595 yes
kable(head(result_TN,c(3L, 7L)), "html") %>%
kable_styling(full_width = F)
OTU1 OTU2 cor1 P1 cor2 P2 ef
15 17 19 0.0138351354606266 0.0176861930560371 0.0200383479628713 0.00059062271339281 yes
55 17 44 0.0138351354606266 0.0176861930560371 0.0202390759778341 0.000522378970510267 yes
60 6 46 0.0181092671666077 0.00190038876227798 0.0197620914084732 0.000701857438086419 yes

Note: Some studies classify elevation as a comprehensive environment, while others classify it as dispersal limitation, which is only shown here.

#   Save the results
write.csv(result_pH,"result_pH.csv")
write.csv(result_SM,"result_SM.csv")
write.csv(result_DOC,"result_DOC.csv")
write.csv(result_TN,"result_TN.csv")
write.csv(result_dl,"result_dl.csv")

We combined microbial associations attributed to dispersal limitation and environmental selection to obtain putative biotic associations from the ecological network

library(dplyr)
total_link=row.names(edgelist)
ef_link=union(as.character(row.names(result_pH)),as.character(row.names(result_SM)))
ef_link=union(ef_link,as.character(row.names(result_DOC)))
ef_link=union(ef_link,as.character(row.names(result_TN)))

dl_link=as.character(row.names(result_dl))

bi_link=setdiff(total_link,ef_link)
bi_link=setdiff(bi_link,dl_link)

efedge=edgelist[ef_link,]
write.csv(efedge,"efedge.csv")

dledge=edgelist[dl_link,]
write.csv(dledge,"dledge.csv")

biedge=edgelist[bi_link,]
write.csv(biedge,"effilteredlist.csv")

Convert biotic associations to igraph format and assign weight

ig.bi=graph_from_edgelist(as.matrix(biedge[,1:2]) , directed = FALSE)
ig.bi=set_edge_attr(ig.bi, 'weight', index = E(ig.bi),  as.numeric(biedge[,3]) )

For the output files with igraph format, the node properties have no abundance information.

Now, we show the node colors with the Phylum information and the edges colors with the positive and negative correlations by the softwares of cytoscape and gephi.

We further carry out the qcmi process to quantify the complexity of microbial associations (including positive and negative relationships) at the community level

result_bi= qcmi(igraph= ig.bi, OTU= otu_table, pers.cutoff=0)
hist(result_bi[[1]], main='', xlab='negative connectedness')

hist(result_bi[[2]], main='', xlab='positive connectedness')

hist(result_bi[[3]], main='', xlab='the community complexity of negative associations')

hist(result_bi[[4]], main='', xlab='the community complexity of positive associations')

Finally, we used the results of community complexity of positive and negative associations of microbes in conjunction with microbial diversity and soil pH to assess the ecological consequences of putative biotic associations

#load used packages
library(MASS)
library(adespatial)
library(lmerTest)

We combine the biotic factors represented by putative biotic associations and the abiotic factors represented by environmental and spatial variables

#data=scale(data,center=T,scale=T)
data=cbind(result_bi[[3]],result_bi[[4]],env)
colnames(data)=c("PNA","PPA",colnames(env))
data=as.matrix(data)

library(kableExtra)
kable(head(data,c(6L, 7L)), "html") %>%
kable_styling(full_width = F)
PNA PPA pH SM DOC TN NO3
LX1 -0.0536575 0.0249428 8.36 17.04036 52.24753 1.258441 27.838636
LX2 -0.0496019 0.0283030 8.48 25.89928 58.36148 1.346855 24.498240
LX3 -0.0485577 0.0292142 8.39 29.92701 65.17704 1.238134 3.693799
LX4 -0.0487986 0.0254426 8.61 20.51282 58.71469 1.011230 7.867440
LX5 -0.0506057 0.0264540 8.18 15.67329 51.06561 1.284343 73.295009
LX6 -0.0496283 0.0225921 8.65 26.65037 54.70042 1.246163 7.719697

First, the contribution of biotic associations on alpha diversity of microbial community is assessed, using two common methods incluidng forward.sel and olslm

#forward.sel
res_forward.sel=cal_alphacon(data,"forward.sel",filteredps)
## Testing variable 1
## Testing variable 2
## Testing variable 3
## Testing variable 4
## Testing variable 5
## Procedure stopped (alpha criteria): pvalue for variable 5 is 0.242000 (> 0.050000)
library(kableExtra)
kable(head(res_forward.sel,c(6L, 7L)), "html") %>%
kable_styling(full_width = F)
variables order R2 R2Cum AdjR2Cum F pvalue
pH 3 0.3455316 0.3455316 0.3428159 127.237778 0.001
NH4 8 0.0793970 0.4249286 0.4201363 33.135496 0.001
PNA 1 0.0475305 0.4724591 0.4658372 21.533490 0.001
TN 6 0.0103507 0.4828098 0.4741175 4.763165 0.036
#olslm
res_olslm=cal_alphacon(data,"olslm",filteredps)
## Start:  AIC=1864.28
## diversity[, 1] ~ PNA + PPA + pH + SM + DOC + TN + NO3 + NH4 + 
##     DON + TP + TK + AP + AK
## 
##        Df Sum of Sq    RSS    AIC
## - TP    1       140 465120 1862.3
## - AK    1       186 465166 1862.4
## - AP    1       212 465191 1862.4
## - SM    1       483 465462 1862.5
## - TK    1       752 465731 1862.7
## - PPA   1       862 465842 1862.7
## - DOC   1      1033 466013 1862.8
## - DON   1      2377 467357 1863.5
## <none>              464980 1864.3
## - NO3   1      4004 468984 1864.4
## - TN    1      6266 471246 1865.5
## - pH    1     34551 499531 1879.7
## - PNA   1     38070 503050 1881.4
## - NH4   1     39685 504665 1882.2
## 
## Step:  AIC=1862.35
## diversity[, 1] ~ PNA + PPA + pH + SM + DOC + TN + NO3 + NH4 + 
##     DON + TK + AP + AK
## 
##        Df Sum of Sq    RSS    AIC
## - AP    1       103 465223 1860.4
## - AK    1       175 465294 1860.4
## - SM    1       410 465530 1860.6
## - PPA   1       798 465917 1860.8
## - TK    1       879 465999 1860.8
## - DOC   1       989 466109 1860.9
## - DON   1      2258 467378 1861.5
## <none>              465120 1862.3
## - NO3   1      4417 469536 1862.6
## - TN    1      7071 472191 1864.0
## + TP    1       140 464980 1864.3
## - PNA   1     38157 503277 1879.5
## - NH4   1     39870 504990 1880.3
## - pH    1     45290 510409 1882.9
## 
## Step:  AIC=1860.4
## diversity[, 1] ~ PNA + PPA + pH + SM + DOC + TN + NO3 + NH4 + 
##     DON + TK + AK
## 
##        Df Sum of Sq    RSS    AIC
## - AK    1       209 465432 1858.5
## - SM    1       363 465586 1858.6
## - TK    1       798 466021 1858.8
## - PPA   1       818 466041 1858.8
## - DOC   1       989 466212 1858.9
## - DON   1      2455 467678 1859.7
## <none>              465223 1860.4
## - NO3   1      4407 469630 1860.7
## - TN    1      7001 472224 1862.0
## + AP    1       103 465120 1862.3
## + TP    1        32 465191 1862.4
## - PNA   1     38130 503353 1877.5
## - NH4   1     39993 505216 1878.4
## - pH    1     48019 513242 1882.3
## 
## Step:  AIC=1858.51
## diversity[, 1] ~ PNA + PPA + pH + SM + DOC + TN + NO3 + NH4 + 
##     DON + TK
## 
##        Df Sum of Sq    RSS    AIC
## - SM    1       481 465913 1856.8
## - TK    1       623 466055 1856.8
## - PPA   1       758 466190 1856.9
## - DOC   1      1071 466503 1857.1
## - DON   1      2278 467710 1857.7
## <none>              465432 1858.5
## - NO3   1      4226 469657 1858.7
## - TN    1      6903 472335 1860.1
## + AK    1       209 465223 1860.4
## + AP    1       138 465294 1860.4
## + TP    1        19 465413 1860.5
## - PNA   1     37955 503387 1875.6
## - NH4   1     39790 505222 1876.5
## - pH    1     48914 514346 1880.8
## 
## Step:  AIC=1856.76
## diversity[, 1] ~ PNA + PPA + pH + DOC + TN + NO3 + NH4 + DON + 
##     TK
## 
##        Df Sum of Sq    RSS    AIC
## - PPA   1       601 466514 1855.1
## - TK    1       625 466538 1855.1
## - DOC   1      1132 467045 1855.3
## - DON   1      1954 467867 1855.8
## <none>              465913 1856.8
## - NO3   1      4469 470382 1857.1
## - TN    1      6436 472349 1858.1
## + SM    1       481 465432 1858.5
## + AK    1       327 465586 1858.6
## + AP    1        82 465831 1858.7
## + TP    1         4 465909 1858.8
## - PNA   1     37477 503390 1873.6
## - NH4   1     39462 505375 1874.5
## - pH    1     50013 515925 1879.5
## 
## Step:  AIC=1855.08
## diversity[, 1] ~ PNA + pH + DOC + TN + NO3 + NH4 + DON + TK
## 
##        Df Sum of Sq    RSS    AIC
## - TK    1       387 466901 1853.3
## - DOC   1      1241 467755 1853.7
## <none>              466514 1855.1
## - DON   1      3978 470492 1855.1
## - NO3   1      6003 472517 1856.2
## + PPA   1       601 465913 1856.8
## - TN    1      7132 473646 1856.8
## + SM    1       324 466190 1856.9
## + AK    1       237 466277 1857.0
## + AP    1       101 466413 1857.0
## + TP    1         0 466514 1857.1
## - PNA   1     37188 503702 1871.7
## - NH4   1     39925 506439 1873.0
## - pH    1     99116 565630 1899.9
## 
## Step:  AIC=1853.28
## diversity[, 1] ~ PNA + pH + DOC + TN + NO3 + NH4 + DON
## 
##        Df Sum of Sq    RSS    AIC
## - DOC   1      1218 468119 1851.9
## <none>              466901 1853.3
## - DON   1      4578 471480 1853.7
## - NO3   1      5907 472808 1854.3
## + TK    1       387 466514 1855.1
## + PPA   1       364 466538 1855.1
## + SM    1       351 466550 1855.1
## + AK    1        79 466822 1855.2
## + TP    1        30 466871 1855.3
## + AP    1        25 466876 1855.3
## - TN    1      7964 474865 1855.4
## - PNA   1     36864 503765 1869.7
## - NH4   1     39922 506823 1871.2
## - pH    1     98924 565826 1898.0
## 
## Step:  AIC=1851.91
## diversity[, 1] ~ PNA + pH + TN + NO3 + NH4 + DON
## 
##        Df Sum of Sq    RSS    AIC
## <none>              468119 1851.9
## - DON   1      4143 472263 1852.0
## - NO3   1      5419 473538 1852.7
## + DOC   1      1218 466901 1853.3
## + PPA   1       453 467667 1853.7
## + SM    1       391 467728 1853.7
## + TK    1       364 467755 1853.7
## + AK    1       137 467982 1853.8
## + AP    1        29 468091 1853.9
## + TP    1         8 468111 1853.9
## - TN    1      8003 476123 1854.0
## - PNA   1     37861 505980 1868.8
## - NH4   1     40483 508602 1870.1
## - pH    1     98598 566718 1896.4
library(kableExtra)
kable(head(res_olslm$coefficients,c(6L, 7L)), "html") %>%
kable_styling(full_width = F)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1344.7185641 55.3378541 24.300157 0.0000000
PNA 4011.4331168 918.1745939 4.368922 0.0000187
pH 18.6569407 2.6462309 7.050383 0.0000000
TN -30.0485658 14.9593210 -2.008685 0.0457098
NO3 -0.1989349 0.1203622 -1.652802 0.0997008
NH4 -1.3299222 0.2943819 -4.517676 0.0000099

Then, we also evaluate to what extent the biotic associations explain the community composition (beta diversity) by the Mantel test

#Mantel
library(vegan)
res_mantel= cal_betacon(data,otu_table,"spearman")

library(kableExtra)
kable(head(res_mantel,c(6L, 7L)), "html") %>%
kable_styling(full_width = F)
variable_name cor_method corr_res p_res significance
PNA spearman 0.4409807 0.001 ***
PPA spearman 0.5888146 0.001 ***
pH spearman 0.7306087 0.001 ***
SM spearman 0.0505215 0.040
DOC spearman 0.0996807 0.002 **
TN spearman 0.0009720 0.489

Next, we establish the relationships between biotic associations and environmental variables

set.seed(123)
ggcorrmat(
  data     = as.data.frame(data) ,
  colors   = c("#B2182B", "white", "#4D4D4D"),
   p.adjust.method = "none",
  title    = "Correlalogram for biotic and abiotic dataset"
)

In particular, the linear relationships between biotic associations and soil pH are established

set.seed(123)
ggscatterstats(
  data  = as.data.frame(data) ,
  x     = PPA,
  y     = pH,
  xlab  = "PPA",
  ylab  = "Soil pH",
  title = "Relationships between putative positive associations and Soil pH"
)

set.seed(123)
ggscatterstats(
  data  = as.data.frame(data) ,
  x     = PNA,
  y     = pH,
  xlab  = "PNA",
  ylab  = "Soil pH",
  title = "Relationships between putative negative associations and Soil pH"
)

Notes and outlooks

Overall, qcmi is a user-friendly and reliable workflow for quantifying putative biotic associations of microbes at the community level. The package features: (ⅰ) the ability to insert select data and/or variables at each step, thus enabling a customized approach for specific, individualized use; (ⅱ) assembly processes assignment to each significant association, which helps depict true biotic interactions; and (ⅲ) use of matrix multiplication for quantifying the nature of biotic associations from the taxon to community level. Furthermore, testing with empirical community data demonstrated that qcmi effectively removes abiotic-driven associations and significantly improves diversity variation explanations. This methodology circumvents and resolves known challenges and flaws in microbial analysis related to quantifying microbial biotic associations. Thus, this workflow is expected to have broad applications in many fields across contrasting biomes and environments.

In previously published studies, microbial ecological networks were widely used to infer ecological associations between pairs of co-occurring taxa, but often drastically misused as microbial interactions. This study built a general workflow for quantifying biotic associations that avoids the exaggerated description of ecologically meaningless associations, and thus, disentangles assembly processes in significant ecological associations. Every step in the workflow is well documented in previous studies (#1: Spiec-Easi, #2: Goberna’ s method, #3: Cohesion, #4: microeco package), making this effort a process-based as opposed to technology-based innovation.