Quantify Community-level Microbial Interactions

Method motivation

Microbes do not operate as independent entities, but instead form a complex network to jointly shape biodiversity and regulate ecosystem functions. However, quantifying biotic associations among pairs of microbes at the community level on behalf of potential facilitation and competition across contrasting environments is extremely challenging due to the organism’s intricacy and associated culturing difficulties.

We developed a workflow—the quantifying community-level microbial interactions ‘qcmi’ R package—to identify putative biotic associations from microbial ecological networks and quantify them at the community level. This R package is based on a series of common and advanced approaches in microbial ecology, and is designed to be user-friendly, easily applied, and flexible.

Package features

qcmi computational modules are independent of each other and can be manually selected according to the user’s needs.

qcmi assigns the community assembly processes to microbial associations to identify putative biotic associations from complex microbial ecological networks.

qcmi was developed based on matrix multiplication to quantify the strength of putative biotic associations from the taxon level to the community level.

Main steps and key functions

The method consists of four main steps: (ⅰ) constructing ecological networks, (ⅱ) assigning assembly processes, (ⅲ) quantifying biotic associations, and (ⅳ) assessing ecological consequence.

First, reliable ecological networks were constructed, after which some abiotic-driven associations, i.e., dispersal limitations and environmental selections, were filtered out by link tests. This ensured that the filtered networks represent putative biotic associations as far as possible. Second, the average associating strength was weighted with corresponding microbial abundance, and the resulting metrics for each sample were quantified to describe microbial biotic associations at the community level. After verifying that the metrics improved the application methodology for interpreting and predicting microbial biodiversity and functions, and confirming that this method offers a simple way for microbial ecologists to explore the importance of community-level negative (antagonistic) and positive (mutualistic) associations in their study systems, the workflow was developed as a flexible R package (https://github.com/joshualiuxu/qcmi).

The igraph format is the basic class. All the other classes depend on the igraph package.

The objects inside the rectangle with dot line represent input files. The red rectangle with full line means it is output file or object. The italics denotes the key functions that deserve more attention.

The trans_ps() function converts the OTU abundance table and the corresponding taxonomic information into the phyloseq format, which enables the user to customize the types of data included in the analysis.

The filter_ps() function utilizes phyloseq for filtering data by both ubiquity and abundance.

The cal_network() function was applied to construct ecological networks.

the methods of random matrix theory rmt() and idirect() were also provided to filter ecological associations

The assigned_process() function, which incorporates the test_link_env() and test_link_dl() embedded functions, is a tool for calculating the relative contribution of assembly processes to microbial ecological associations (also called link tests in previous study).

The number of zeros was calculated using zero(), while pos() and neg() were applied to calculate the average value of all positive and negative connectedness for each taxon, respectively.

Using a qcmi() core function, the species (corresponding to ASVs or phylotypes) associations (connectedness) weighted corresponding abundance was quantified to the community level via matrix multiplication.

A standard method that included the cal_alphacon() and cal_betacon() functions was compiled to explore the ecological consequence of biotic associations between microbes on biodiversity as well as functions.

Example for a general workflow

This method was applied to empirical data including otu_table, tax_info, geo_info, and env_info.

We explained the results obtained from the workflow and the functions involved in each step.

Firstly, we should load related R packages

including phyloseq ggClusterNet tidyverse WGCNA igraph Matrix vegan reticulate

like this:

library("phyloseq")
library("ggClusterNet")
library("tidyverse")
library("WGCNA")
library("igraph")
library("Matrix")
library("vegan")
library("magrittr")
library("reticulate")
library("SpiecEasi")
library("ggstatsplot")

If these related packages are not installed, first install it

# from CRAN
#install.packages("reticulate")
#install.packages("magrittr")
#install.packages("vegan")
#install.packages("Matrix")
#install.packages("igraph")
#install.packages("WGCNA")
#install.packages("tidyverse")
#install.packages("ggalluvial")
#install.packages("ggstatsplot")
# from BiocManager
#BiocManager::install("phyloseq")
#BiocManager::install(c("AnnotationDbi", "impute","GO.db", "preprocessCore"))
#from GitHub
#devtools::install_github("taowenmicro/ggClusterNet")
#devtools::install_github("zdk123/SpiecEasi")

We install the qcmi package from the GitHub and load it

# If devtools package is not installed, first install it
#install.packages("devtools")
#devtools::install_github("joshualiuxu/qcmi")
library(qcmi)

Then, we set the working path

wd="./data"
save.wd="./result"
if(!dir.exists(wd)){dir.create(wd)}

Collate the data involved in the workflow of qcmi and construct microbial ecological networks

setwd(wd)
otu=read.table("otu_table.txt",header=TRUE, row.names=1)
tax=read.table("tax.txt",header=TRUE, row.names=1)
env=read.table("env.txt",header=TRUE, row.names=1)
geo=read.table("geo.txt",header=TRUE, row.names=1)

Change the work path for saving the subsequent output results

if(!dir.exists(save.wd)){dir.create(save.wd)}
setwd(save.wd)

Check the imported feature table

library(kableExtra)
kable(head(otu,c(6L, 10L)), "html") %>%
kable_styling(full_width = F)

	LX1	LX2	LX3	LX4	LX5	LX6	LX7	LX8	LX9	DZ1
4479944	40	30	26	31	18	40	27	35	34	54
244331	0	0	0	0	0	0	2	1	0	1
244336	0	0	0	0	0	0	0	0	0	0
973124	0	0	0	0	0	0	0	0	0	0
4436710	1	1	0	0	2	0	0	0	0	0
101767	0	0	0	0	0	0	0	0	0	0

Check the imported taxonomic table

library(kableExtra)
kable(head(tax,c(6L, 6L)), "html") %>%
kable_styling(full_width = F)

	Phylum	Class
4479944	Actinobacteria	MB-A2-108
244331	Proteobacteria	Alphaproteobacteria
244336	Firmicutes	Bacilli
973124	Bacteroidetes	[Saprospirae]
4436710	Nitrospirae	Nitrospira
101767	Proteobacteria	Alphaproteobacteria

Check the imported environmental table

library(kableExtra)
kable(head(env,c(6L, 6L)), "html") %>%
kable_styling(full_width = F)

	pH	SM	OC	DOC	TN	DTN
LX1	8.36	17.04036	10.586006	52.24753	1.258441	83.42257
LX2	8.48	25.89928	12.247537	58.36148	1.346855	82.19653
LX3	8.39	29.92701	10.801080	65.17704	1.238134	18.52955
LX4	8.61	20.51282	9.196798	58.71469	1.011230	26.72758
LX5	8.18	15.67329	12.329535	51.06561	1.284343	240.60114
LX6	8.65	26.65037	11.303184	54.70042	1.246163	26.16724

Check the imported geographical table

library(kableExtra)
kable(head(geo,c(6L, 2L)), "html") %>%
kable_styling(full_width = F)

	long	lat
LX1	116.6106	37.28731
LX2	116.6251	37.30519
LX3	116.6336	37.32939
LX4	116.6489	37.26731
LX5	116.6633	37.28381
LX6	116.6663	37.30561

Next, we convert these data to phyloseq format and filter the feature table by ubiquity and abundance

ps= trans_ps(otu_table = otu, taxa_table =  tax)
print(ps)

## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 74750 taxa and 243 samples ]
## tax_table()   Taxonomy Table:    [ 74750 taxa by 2 taxonomic ranks ]

Based on experience and published studies, we generally recommend to retain taxa with > 25% occurrence throughout the whole community (ubiquity) and a relative abundance ≥ 0.001% (abundance)

filteredps= filter_ps(ps= ps, occurrence.threshold=60,  abundance.threshold=500)
print(filteredps)

## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 1420 taxa and 243 samples ]
## tax_table()   Taxonomy Table:    [ 1420 taxa by 2 taxonomic ranks ]

Then, weconstruct the ecological network through different methods (e.g., Pearson, Spearman, SpiecEasi)

It is worth noting that the correlation-based approach has its own drawbacks including the above mentioned difficulty in dealing with zero-inflated models (ignoring rare species in the community) and filtering for weakly correlated relationships (e.g., commensalism and amensalism)

Spiec-Easi’s estimate of entries in the inverse covariance matrix depends on the conditional states of all available nodes, which helps avoid detection of indirect network interactions; thus, we recommend this method to infer the network.

#To calculate the speed of cases, we selected the top 100 taxa with the highest relative abundance as the display
result_net= cal_network(ps = filteredps,N=100,lambda.min.ratio=1e-2,nlambda=20,method ="spieceasi")
summary(result_net)

##           Length Class         Mode     
## occor.r   10000  dsCMatrix     S4       
## elist         3  sparseSummary list     
## method        1  -none-        character
## phyloseq      1  phyloseq      S4       
## igraph      100  igraph        list     
## result.se     5  pulsar.refit  list

To evaluate the network reliability, we use the function of getStability()

#around 0.05 is acceptable
net_reliability = getStability(result_net$result.se)
print(net_reliability)

## [1] 0.04959063

Similarly, if you have used other network inference methods to build your network, you have the flexibility to import the igraph format file here.

write.csv(result_net$elist,file="edgelist.csv")

import edgelist into R and convert to igraph

edgelist <- read.csv("D:/YANJIUSHENG/method/qcmi/v0107/RMD/result/edgelist.csv")
ig.spieceasi=graph_from_edgelist(as.matrix(edgelist[,-3]) , directed = FALSE)  
ig.spieceasi=set_edge_attr(ig.spieceasi, 'weight', index = E(ig.spieceasi),  as.numeric(edgelist[,3]) )

Check the number of nodes and edges in the constructed network

length(V(ig.spieceasi))#number of nodes

## [1] 100

length(E(ig.spieceasi))#number of edges

## [1] 240

View the distribution of association weights

hist(result_net$elist[,3], main='', xlab='edge weights')

Assign edgelist formatted network results to a new object

edgelist=result_net$elist
library(kableExtra)
kable(head(edgelist,c(6L, 3L)), "html") %>%
kable_styling(full_width = F)

i	j	x
1	4	0.0095135
5	6	0.3037520
5	7	0.0680487
3	9	0.0290132
4	11	-0.0980523
10	11	0.0403767

Optional

Disentangling direct from indirect relationships in association networks could be conducted by idirect() Note that since idirect is compiled from python, the function only provides support for py3.5 3.8 3.9, cited by Xiao et al., 2022 First, save the edgelist file to txt format, convenient to calculate by python module in R

Notably, idirect currently only handles absolute values of association weights and needs to reassign the positive and negative type after the calculation is complete.

idirect(pathname='D:/YANJIUSHENG/method/qcmi/v0107/RMD/result', filename = "D:/YANJIUSHENG/method/qcmi/v0107/RMD/result/edgelist.txt", outputname = "D:/YANJIUSHENG/method/qcmi/v0107/RMD/result/output_idirect1.txt")

## [1] 240

In addition to idirect() for disentangling direct from indirect relationships, if the correlations requires the RMT algorithm rmt() to filter the correlation coefficients, the RMT function wiil be calculated.

result_net1 = cal_network(ps = filteredps,N=100,r.threshold=0,p.threshold=0.05,method ="pearson")
adj_mat=result_net1$occor.r
tc1 <- rmt(adj_mat ,lcor=0.4, hcor=0.6)

## [1] 40
## [1] 41
## [1] 42
## [1] 43
## [1] 44
## [1] 45
## [1] 46
## [1] 47
## [1] 48
## [1] 49
## [1] 50
## [1] 51
## [1] 52
## [1] 53
## [1] 54
## [1] 55
## [1] 56
## [1] 57
## [1] 58
## [1] 59
## [1] 60

print(message("The optimized COR threshold: ", tc1, "...\n"))

## NULL

adj_mat[abs(adj_mat)< tc1] = 0

Convert the correlation-based approach to igraph format as well to compare with the network inferred by Spiec-Easi

ig.pearson=graph_from_adjacency_matrix(adj_mat, mode =  "undirected",weighted = TRUE, diag = FALSE)

We also calculate the topology for the level of nodes and networks, using ggClusterNet package

#for Spiec-Easi
results_node_properties_se =node_properties(ig.spieceasi)
library(kableExtra)
kable(head(results_node_properties_se,c(6L, 3L)), "html") %>%
kable_styling(full_width = F)

igraph.degree	igraph.closeness	igraph.betweenness
4	0.2903226	106.1048
5	0.2886297	350.6054
2	0.2578125	0.0000
7	0.3036810	202.3105
7	0.3224756	148.4480
6	0.2928994	108.0314

results_net_properties_se =net_properties(ig.spieceasi)
library(kableExtra)
kable(head(results_net_properties_se,c(6L, 1L)), "html") %>%
kable_styling(full_width = F)

	value
num.edges	240.0000000
num.pos.edges	162.0000000
num.neg.edges	78.0000000
num.vertices	100.0000000
connectance	0.0484848
average.degree	4.8000000

#for Pearson
results_node_properties_pearson =node_properties(ig.pearson)
library(kableExtra)
kable(head(results_node_properties_pearson,c(6L, 3L)), "html") %>%
kable_styling(full_width = F)

	igraph.degree	igraph.closeness	igraph.betweenness
828667	13	0.4545455	14.316054
996116	19	0.4846939	33.635714
223418	13	0.4398148	5.564854
1012112	42	0.5757576	139.460634
717396	43	0.5900621	116.885781
892000	36	0.5459770	80.349590

results_net_properties_pearson =net_properties(ig.pearson)
library(kableExtra)
kable(head(results_net_properties_pearson,c(6L, 1L)), "html") %>%
kable_styling(full_width = F)

	value
num.edges	1098.0000000
num.pos.edges	748.0000000
num.neg.edges	350.0000000
num.vertices	100.0000000
connectance	0.2218182
average.degree	21.9600000

Continuing with the example of the network obtained by Spiec-Easi inference, we assign the assembly processes to significant network associations

First, we calculate the relative abundance of involved nodes.

ps_sub = filter_OTU_ps(ps = filteredps,Top = 100)
otu_table = as.data.frame(t(vegan_otu(ps_sub)))
row.names(otu_table)<-c(1:100)

Classify ecological associations of dispersal limitation, based on the geographical distance

result_dl= assigned_process(link_table_row=edgelist, OTUabd=otu_table, p=0.05 , data= geo,cutoff=0, method="dl")

library(kableExtra)
kable(head(result_dl,c(3L, 7L)), "html") %>%
kable_styling(full_width = F)

	OTU1	OTU2	cor1	cor2	dl
16	11	20	0.307669753711242	0.254228347372594	yes
22	1	27	0.270631576262993	0.241122060035479	yes
66	27	49	0.241122060035479	0.381612264424945	yes

Classify ecological associations of environmental filtering, based on the selected environmental dissimilarity

First, detect redundancy of environment variables

library(Hmisc)
plot(varclus(as.matrix(env) ))

Remove variables with rho>0.5 to avoid multicollinearity

library(Hmisc)
env=env[,-c(3,6)]
plot(varclus(as.matrix(env) ))

In this case, we select soil pH, soil moisture, DOC and TN as the variables of environmental selection

result_pH= assigned_process(link_table_row=edgelist, OTUabd=otu_table, p=0.05 , data= env['pH'],cutoff=0, method="ef")
result_SM= assigned_process(link_table_row=edgelist, OTUabd=otu_table, p=0.05 , data= env['SM'],cutoff=0, method="ef")
result_DOC= assigned_process(link_table_row=edgelist, OTUabd=otu_table, p=0.05 , data= env['DOC'],cutoff=0, method="ef")
result_TN= assigned_process(link_table_row=edgelist, OTUabd=otu_table, p=0.05 , data= env['TN'],cutoff=0, method="ef")

library(kableExtra)
kable(head(result_pH,c(3L, 7L)), "html") %>%
kable_styling(full_width = F)

OTU1	OTU2	cor1	cor2	ef
1	4	0.268684799772225	0.443727414612255	yes
5	6	0.723948580141444	0.701867034073904	yes
5	7	0.723948580141444	0.57906776258965	yes

kable(head(result_SM,c(3L, 7L)), "html") %>%
kable_styling(full_width = F)

	OTU1	OTU2	cor1	P1	cor2	P2	ef
16	11	20	0.0189453830340267	0.00116037984779224	0.0120862512867735	0.0382428571960613	yes
147	37	74	0.0195514514009895	0.000800203088686429	0.0161098650449364	0.00573659774176314	yes
198	22	89	0.0177079718909344	0.00239303550230058	0.0226595641331924	0.000101977576680781	yes

kable(head(result_DOC,c(3L, 7L)), "html") %>%
kable_styling(full_width = F)

	OTU1	OTU2	cor1	P1	cor2	P2	ef
8	7	12	0.0185455371037244	0.00147174597248044	0.0124528633890703	0.03273469883588	yes
17	6	23	0.0209541041336835	0.000326509449779305	0.0122467876732842	0.0357303140797139	yes
26	4	31	0.0157444674909832	0.00693803171808941	0.0158319189797347	0.00663154390920595	yes

kable(head(result_TN,c(3L, 7L)), "html") %>%
kable_styling(full_width = F)

	OTU1	OTU2	cor1	P1	cor2	P2	ef
15	17	19	0.0138351354606266	0.0176861930560371	0.0200383479628713	0.00059062271339281	yes
55	17	44	0.0138351354606266	0.0176861930560371	0.0202390759778341	0.000522378970510267	yes
60	6	46	0.0181092671666077	0.00190038876227798	0.0197620914084732	0.000701857438086419	yes

Note: Some studies classify elevation as a comprehensive environment, while others classify it as dispersal limitation, which is only shown here.

#   Save the results
write.csv(result_pH,"result_pH.csv")
write.csv(result_SM,"result_SM.csv")
write.csv(result_DOC,"result_DOC.csv")
write.csv(result_TN,"result_TN.csv")
write.csv(result_dl,"result_dl.csv")

We combined microbial associations attributed to dispersal limitation and environmental selection to obtain putative biotic associations from the ecological network

library(dplyr)
total_link=row.names(edgelist)
ef_link=union(as.character(row.names(result_pH)),as.character(row.names(result_SM)))
ef_link=union(ef_link,as.character(row.names(result_DOC)))
ef_link=union(ef_link,as.character(row.names(result_TN)))

dl_link=as.character(row.names(result_dl))

bi_link=setdiff(total_link,ef_link)
bi_link=setdiff(bi_link,dl_link)

efedge=edgelist[ef_link,]
write.csv(efedge,"efedge.csv")

dledge=edgelist[dl_link,]
write.csv(dledge,"dledge.csv")

biedge=edgelist[bi_link,]
write.csv(biedge,"effilteredlist.csv")

Convert biotic associations to igraph format and assign weight

ig.bi=graph_from_edgelist(as.matrix(biedge[,1:2]) , directed = FALSE)
ig.bi=set_edge_attr(ig.bi, 'weight', index = E(ig.bi),  as.numeric(biedge[,3]) )

For the output files with igraph format, the node properties have no abundance information.

Now, we show the node colors with the Phylum information and the edges colors with the positive and negative correlations by the softwares of cytoscape and gephi.

We further carry out the qcmi process to quantify the complexity of microbial associations (including positive and negative relationships) at the community level

result_bi= qcmi(igraph= ig.bi, OTU= otu_table, pers.cutoff=0)

hist(result_bi[[1]], main='', xlab='negative connectedness')

hist(result_bi[[2]], main='', xlab='positive connectedness')

hist(result_bi[[3]], main='', xlab='the community complexity of negative associations')

hist(result_bi[[4]], main='', xlab='the community complexity of positive associations')

Finally, we used the results of community complexity of positive and negative associations of microbes in conjunction with microbial diversity and soil pH to assess the ecological consequences of putative biotic associations

#load used packages
library(MASS)
library(adespatial)
library(lmerTest)

We combine the biotic factors represented by putative biotic associations and the abiotic factors represented by environmental and spatial variables

#data=scale(data,center=T,scale=T)
data=cbind(result_bi[[3]],result_bi[[4]],env)
colnames(data)=c("PNA","PPA",colnames(env))
data=as.matrix(data)

library(kableExtra)
kable(head(data,c(6L, 7L)), "html") %>%
kable_styling(full_width = F)

	PNA	PPA	pH	SM	DOC	TN	NO3
LX1	-0.0536575	0.0249428	8.36	17.04036	52.24753	1.258441	27.838636
LX2	-0.0496019	0.0283030	8.48	25.89928	58.36148	1.346855	24.498240
LX3	-0.0485577	0.0292142	8.39	29.92701	65.17704	1.238134	3.693799
LX4	-0.0487986	0.0254426	8.61	20.51282	58.71469	1.011230	7.867440
LX5	-0.0506057	0.0264540	8.18	15.67329	51.06561	1.284343	73.295009
LX6	-0.0496283	0.0225921	8.65	26.65037	54.70042	1.246163	7.719697

First, the contribution of biotic associations on alpha diversity of microbial community is assessed, using two common methods incluidng forward.sel and olslm

#forward.sel
res_forward.sel=cal_alphacon(data,"forward.sel",filteredps)

## Testing variable 1
## Testing variable 2
## Testing variable 3
## Testing variable 4
## Testing variable 5
## Procedure stopped (alpha criteria): pvalue for variable 5 is 0.242000 (> 0.050000)

library(kableExtra)
kable(head(res_forward.sel,c(6L, 7L)), "html") %>%
kable_styling(full_width = F)

variables	order	R2	R2Cum	AdjR2Cum	F	pvalue
pH	3	0.3455316	0.3455316	0.3428159	127.237778	0.001
NH4	8	0.0793970	0.4249286	0.4201363	33.135496	0.001
PNA	1	0.0475305	0.4724591	0.4658372	21.533490	0.001
TN	6	0.0103507	0.4828098	0.4741175	4.763165	0.036

#olslm
res_olslm=cal_alphacon(data,"olslm",filteredps)

## Start:  AIC=1864.28
## diversity[, 1] ~ PNA + PPA + pH + SM + DOC + TN + NO3 + NH4 + 
##     DON + TP + TK + AP + AK
## 
##        Df Sum of Sq    RSS    AIC
## - TP    1       140 465120 1862.3
## - AK    1       186 465166 1862.4
## - AP    1       212 465191 1862.4
## - SM    1       483 465462 1862.5
## - TK    1       752 465731 1862.7
## - PPA   1       862 465842 1862.7
## - DOC   1      1033 466013 1862.8
## - DON   1      2377 467357 1863.5
## <none>              464980 1864.3
## - NO3   1      4004 468984 1864.4
## - TN    1      6266 471246 1865.5
## - pH    1     34551 499531 1879.7
## - PNA   1     38070 503050 1881.4
## - NH4   1     39685 504665 1882.2
## 
## Step:  AIC=1862.35
## diversity[, 1] ~ PNA + PPA + pH + SM + DOC + TN + NO3 + NH4 + 
##     DON + TK + AP + AK
## 
##        Df Sum of Sq    RSS    AIC
## - AP    1       103 465223 1860.4
## - AK    1       175 465294 1860.4
## - SM    1       410 465530 1860.6
## - PPA   1       798 465917 1860.8
## - TK    1       879 465999 1860.8
## - DOC   1       989 466109 1860.9
## - DON   1      2258 467378 1861.5
## <none>              465120 1862.3
## - NO3   1      4417 469536 1862.6
## - TN    1      7071 472191 1864.0
## + TP    1       140 464980 1864.3
## - PNA   1     38157 503277 1879.5
## - NH4   1     39870 504990 1880.3
## - pH    1     45290 510409 1882.9
## 
## Step:  AIC=1860.4
## diversity[, 1] ~ PNA + PPA + pH + SM + DOC + TN + NO3 + NH4 + 
##     DON + TK + AK
## 
##        Df Sum of Sq    RSS    AIC
## - AK    1       209 465432 1858.5
## - SM    1       363 465586 1858.6
## - TK    1       798 466021 1858.8
## - PPA   1       818 466041 1858.8
## - DOC   1       989 466212 1858.9
## - DON   1      2455 467678 1859.7
## <none>              465223 1860.4
## - NO3   1      4407 469630 1860.7
## - TN    1      7001 472224 1862.0
## + AP    1       103 465120 1862.3
## + TP    1        32 465191 1862.4
## - PNA   1     38130 503353 1877.5
## - NH4   1     39993 505216 1878.4
## - pH    1     48019 513242 1882.3
## 
## Step:  AIC=1858.51
## diversity[, 1] ~ PNA + PPA + pH + SM + DOC + TN + NO3 + NH4 + 
##     DON + TK
## 
##        Df Sum of Sq    RSS    AIC
## - SM    1       481 465913 1856.8
## - TK    1       623 466055 1856.8
## - PPA   1       758 466190 1856.9
## - DOC   1      1071 466503 1857.1
## - DON   1      2278 467710 1857.7
## <none>              465432 1858.5
## - NO3   1      4226 469657 1858.7
## - TN    1      6903 472335 1860.1
## + AK    1       209 465223 1860.4
## + AP    1       138 465294 1860.4
## + TP    1        19 465413 1860.5
## - PNA   1     37955 503387 1875.6
## - NH4   1     39790 505222 1876.5
## - pH    1     48914 514346 1880.8
## 
## Step:  AIC=1856.76
## diversity[, 1] ~ PNA + PPA + pH + DOC + TN + NO3 + NH4 + DON + 
##     TK
## 
##        Df Sum of Sq    RSS    AIC
## - PPA   1       601 466514 1855.1
## - TK    1       625 466538 1855.1
## - DOC   1      1132 467045 1855.3
## - DON   1      1954 467867 1855.8
## <none>              465913 1856.8
## - NO3   1      4469 470382 1857.1
## - TN    1      6436 472349 1858.1
## + SM    1       481 465432 1858.5
## + AK    1       327 465586 1858.6
## + AP    1        82 465831 1858.7
## + TP    1         4 465909 1858.8
## - PNA   1     37477 503390 1873.6
## - NH4   1     39462 505375 1874.5
## - pH    1     50013 515925 1879.5
## 
## Step:  AIC=1855.08
## diversity[, 1] ~ PNA + pH + DOC + TN + NO3 + NH4 + DON + TK
## 
##        Df Sum of Sq    RSS    AIC
## - TK    1       387 466901 1853.3
## - DOC   1      1241 467755 1853.7
## <none>              466514 1855.1
## - DON   1      3978 470492 1855.1
## - NO3   1      6003 472517 1856.2
## + PPA   1       601 465913 1856.8
## - TN    1      7132 473646 1856.8
## + SM    1       324 466190 1856.9
## + AK    1       237 466277 1857.0
## + AP    1       101 466413 1857.0
## + TP    1         0 466514 1857.1
## - PNA   1     37188 503702 1871.7
## - NH4   1     39925 506439 1873.0
## - pH    1     99116 565630 1899.9
## 
## Step:  AIC=1853.28
## diversity[, 1] ~ PNA + pH + DOC + TN + NO3 + NH4 + DON
## 
##        Df Sum of Sq    RSS    AIC
## - DOC   1      1218 468119 1851.9
## <none>              466901 1853.3
## - DON   1      4578 471480 1853.7
## - NO3   1      5907 472808 1854.3
## + TK    1       387 466514 1855.1
## + PPA   1       364 466538 1855.1
## + SM    1       351 466550 1855.1
## + AK    1        79 466822 1855.2
## + TP    1        30 466871 1855.3
## + AP    1        25 466876 1855.3
## - TN    1      7964 474865 1855.4
## - PNA   1     36864 503765 1869.7
## - NH4   1     39922 506823 1871.2
## - pH    1     98924 565826 1898.0
## 
## Step:  AIC=1851.91
## diversity[, 1] ~ PNA + pH + TN + NO3 + NH4 + DON
## 
##        Df Sum of Sq    RSS    AIC
## <none>              468119 1851.9
## - DON   1      4143 472263 1852.0
## - NO3   1      5419 473538 1852.7
## + DOC   1      1218 466901 1853.3
## + PPA   1       453 467667 1853.7
## + SM    1       391 467728 1853.7
## + TK    1       364 467755 1853.7
## + AK    1       137 467982 1853.8
## + AP    1        29 468091 1853.9
## + TP    1         8 468111 1853.9
## - TN    1      8003 476123 1854.0
## - PNA   1     37861 505980 1868.8
## - NH4   1     40483 508602 1870.1
## - pH    1     98598 566718 1896.4

library(kableExtra)
kable(head(res_olslm$coefficients,c(6L, 7L)), "html") %>%
kable_styling(full_width = F)

	Estimate	Std. Error	t value	Pr(>\|t\|)
(Intercept)	1344.7185641	55.3378541	24.300157	0.0000000
PNA	4011.4331168	918.1745939	4.368922	0.0000187
pH	18.6569407	2.6462309	7.050383	0.0000000
TN	-30.0485658	14.9593210	-2.008685	0.0457098
NO3	-0.1989349	0.1203622	-1.652802	0.0997008
NH4	-1.3299222	0.2943819	-4.517676	0.0000099

Then, we also evaluate to what extent the biotic associations explain the community composition (beta diversity) by the Mantel test

#Mantel
library(vegan)
res_mantel= cal_betacon(data,otu_table,"spearman")

library(kableExtra)
kable(head(res_mantel,c(6L, 7L)), "html") %>%
kable_styling(full_width = F)

variable_name	cor_method	corr_res	p_res	significance
PNA	spearman	0.4409807	0.001	***
PPA	spearman	0.5888146	0.001	***
pH	spearman	0.7306087	0.001	***
SM	spearman	0.0505215	0.040
DOC	spearman	0.0996807	0.002	**
TN	spearman	0.0009720	0.489

Next, we establish the relationships between biotic associations and environmental variables

set.seed(123)
ggcorrmat(
  data     = as.data.frame(data) ,
  colors   = c("#B2182B", "white", "#4D4D4D"),
   p.adjust.method = "none",
  title    = "Correlalogram for biotic and abiotic dataset"
)

In particular, the linear relationships between biotic associations and soil pH are established

set.seed(123)
ggscatterstats(
  data  = as.data.frame(data) ,
  x     = PPA,
  y     = pH,
  xlab  = "PPA",
  ylab  = "Soil pH",
  title = "Relationships between putative positive associations and Soil pH"
)

set.seed(123)
ggscatterstats(
  data  = as.data.frame(data) ,
  x     = PNA,
  y     = pH,
  xlab  = "PNA",
  ylab  = "Soil pH",
  title = "Relationships between putative negative associations and Soil pH"
)

Notes and outlooks

Overall, qcmi is a user-friendly and reliable workflow for quantifying putative biotic associations of microbes at the community level. The package features: (ⅰ) the ability to insert select data and/or variables at each step, thus enabling a customized approach for specific, individualized use; (ⅱ) assembly processes assignment to each significant association, which helps depict true biotic interactions; and (ⅲ) use of matrix multiplication for quantifying the nature of biotic associations from the taxon to community level. Furthermore, testing with empirical community data demonstrated that qcmi effectively removes abiotic-driven associations and significantly improves diversity variation explanations. This methodology circumvents and resolves known challenges and flaws in microbial analysis related to quantifying microbial biotic associations. Thus, this workflow is expected to have broad applications in many fields across contrasting biomes and environments.

In previously published studies, microbial ecological networks were widely used to infer ecological associations between pairs of co-occurring taxa, but often drastically misused as microbial interactions. This study built a general workflow for quantifying biotic associations that avoids the exaggerated description of ecologically meaningless associations, and thus, disentangles assembly processes in significant ecological associations. Every step in the workflow is well documented in previous studies (#1: Spiec-Easi, #2: Goberna’ s method, #3: Cohesion, #4: microeco package), making this effort a process-based as opposed to technology-based innovation.