One Ecosystem :
Review Article

Corresponding author: James Benjamin Grace (gracej@usgs.gov)
Academic editor: Gbenga Akomolafe
Received: 14 Apr 2021  Accepted: 03 Jul 2021  Published: 08 Jul 2021
This is an open access article distributed under the terms of the CC0 Public Domain Dedication.
Citation:
Grace JB, Steiner M (2021) A protocol for modelling generalised biological responses using latent variables in structural equation models. One Ecosystem 6: e67320. https://doi.org/10.3897/oneeco.6.e67320

In this paper, we consider the problem of how to quantitatively characterise the degree to which a study object exhibits a generalised response. By generalised response, we mean a multivariate response where numerous individual properties change in concerted fashion due to some internal integration. In latent variable structural equation modelling (LVSEM), we would typically approach this situation using a latent variable to represent a general property of interest (e.g. performance) and multiple observed indicator variables that reflect the specific features associated with that general property. While ecologists have used LVSEM in a number of cases, there is substantial potential for its wider application. One obstacle is that LV models can be complex and easily overspecified, degrading their value as a means of generalisation. It can also be challenging to diagnose causes of misspecification and understand which model modifications are sensible. In this paper, we present a protocol, consisting of a series of questions, designed to guide the researchers through the evaluation process. These questions address: (1) theoretical development, (2) data requirements, (3) whether responses to perturbation are general, (4) unique reactions by individual measures and (5) how far generality can be extended. For this illustration, we reference a recent study considering the potential consequences of maintaining biodiversity as part of agricultural management on the overall quality of grapes used for winemaking. We extend our presentation to include the complexities that occur when there are multiple species with unique reactions.
generalised responses, latent variables, multivariate responses, statistical modelling, structural equation modelling
The quest for generalisation in the ecological sciences is a fundamental challenge. One way that a general reaction by a system or organism can be detected is if there is a multivariate response where numerous individual properties change in concerted fashion. While such concerted reactions are often described using standard multivariate statistical analyses, causal investigations of the nature of integrated multivariate responses fall primarily into the purview of latent variable structural equation modelling (LVSEM,
Fig.
Structural equation metamodel representing the general modelling goal. Note that a metamodel is a generalisation that defines a finite set of possible fullyspecified models. Dotted outlines are used to convey that the entities represented are general concepts rather than specific variables.
The hypothesis represented in Fig.
Fig.
Structural equation model representing the hypothesis that observed intercorrelations amongst response indicator variables (1 – 4) can be explained by a common cause, the generalised response. In contrast to Fig.
Numerous complexities can be encountered when analysing models containing latent variables with multiple indicators. It is probably safe to say that the available literature may be inadequate for the beginning user of SEM to navigate the various diagnostics and decisions required for such models. Our primary objective in this paper is to provide a series of questions that can guide the investigator through the process. Our advice is targeted for the general objective outlined in Fig.
There exist many technical descriptions of the analytical machinery used to implement LVSEM. Here, we provide a nontechnical summary and refer the reader to
Fig.
Example SE model for the study of a general response to a specific perturbation and the role of a specific hypothesised mediator variable. See text for a discussion of notation.
The classical approach to implementing SEM involves the analysis of covariances. For this, the rows of raw data are converted into a variancecovariance square matrix. Hypothesised models represent a set of expectations about the patterns of covariances that should be found in data. Typically, covariance modelling estimates the parameters of the causal diagram via maximum likelihood while respecting the assumed causal relationships specified in the causal graph. Covariance SEM also produces a statistic that summarises the differences between the observed covariances and those predicted while agreeing with the model structure and tests the null hypothesis that the observed and predicted covariances are equal, except for random sampling variation. Failure to reject this null hypothesis is evidence that the assumed causal structure is correct.
For LVSEM, model structure is described using equations representing the relationships between latent variables and their indicators and equations describing relationships amongst latent variables (Suppl. material
Concepts related to the structural equation metamodel in Figure 4 and their relationships to measured variables (from
Concept of interest 
Measurements 
Scientific rationale 
Management intensity 
Intensity is a threelevel index {1,2,3}. 1 = minimal control of interrow vegetation, 2 = vegetation removal in every other row between grape plants, 3 = vegetation removal in all rows between grape plants. 
The primary purpose of management is to reduce competitive effects of noncrop plants on grape plants. It is assumed that competition primarily acts through reductions in soil water and nutrients, but other forms of interference could be possible. 
Noncrop vegetation properties 
Plant species richness (numbers), abundance of Nfixing plants (% cover) 
One possibility we wished to consider was a general beneficial effect of plant richness on grape qualities due to complementarity. Another possibility of interest was a specific effect of the abundance of Nfixing plants on grape properties due to facilitation. 
Soil nitrogen 
Total soil N content (%) 
We considered it possible that variations in total soil N might help explain variations in grape N. Such an effect either might or might not be indirectly related to management intensity. 
Grape qualities 
Nitrogen concentration Sugar concentration Tartaric acid Malic acid 
We measured a suite of standard grape chemical parameters of importance for winemaking. While all of these parameters determine the character of wine, N concentration is perhaps of primary concern because of its critical role in the fermentation process (Bell and Henschke 2005) 
The overall study objectives are summarised in Fig.
Question #1: What are the Anticipated Characteristics of the Theoretical Construct(s) of Interest?
We learn about latent variables indirectly. More specifically, we learn about them through theorising and empirical investigations, rather than direct measurement. It is important, therefore, that we consider the theoretical meaning of constructs carefully and explicitly. Most ecologists are accustomed to using descriptive procedures, such as principal components analysis (PCA), when faced with a set of related measurements. PCA seeks to reduce a set of variables to some smaller number of composite variables (aka components) that contain most of the information in the set. PCA is purely a datareduction method and there is no basis for drawing causal interpretations of the resulting components (
With LVSEM, we might pose a hypothesis, such as the one shown in Fig.
(A) The initial hypothesis evaluated by
In thinking about our theoretical constructs, of fundamental importance is whether we think the concept is unidimensional (behaves like it is one thing) or multidimensional (behaves like a collection of different things). Taken literally (which software estimation will do), the hypothesis being evaluated in Fig.
There are many other possibilities that might be supported by theory. The most common alternative is that a theoretical “construct” or concept may be a collection of independent or semiindependent causes. The details that accompany this situation are beyond our purpose in this paper and the reader is referred to
Question #2: Are there Appropriate Measured Variables that can Serve as Indicators of the General Theoretical Constructs?
When interested in a general property of a study system, it is recommended that one gives careful consideration to the previous question about expected attributes when designing the sampling scheme. This is one of those interesting differences between science practice in the social sciences versus the ecological sciences. In the social sciences, particularly when studies involve human behaviour, the default assumption is that the latent properties are of primary interest. Studies may involve human attitudes and motivations, which are assumed from the outset to be “deeply latent” and only discernible indirectly. This has led to the development of a process for careful consideration of the development of proper measures for the constructs of interest. For example, the American Association of Psychology Dictionary (
“The process of creating a new instrument [a set of specific measurements] for measuring an unobserved or latent construct, such as depression, sociability, or fourthgrade mathematics ability. The process includes defining the construct and test specifications, generating items and response scales, piloting the items in a large sample, conducting analyses to finetune the measure, and then readministering the refined measure to develop norms (if applicable) and to assess aspects of reliability and validity.”
Our purpose here is to raise awareness of the fact that there has been substantial development of methodologies in other scientific disciplines that could be of interest to natural scientists, but that has been systematically ignored to the detriment of our scientific studies. It is beyond the scope of the present paper to consider this body of knowledge in detail, though the expected requirements for a set of indicators to represent a theoretical construct will be illustrated via our presentation. For a more general introduction to scale development, one can refer to
When one wishes to develop a latent variable SE model, it is possible to proceed by having one or more indicator measurements. Having only a single measure provides limited opportunities. The most commonly adopted approach is to simply assume that the measured variable is a perfect representation of the latent property. The main accomplishment achieved in such a model is to make a conceptual distinction between the concept of interest and the observed measure. When we have some estimate for the reliability (repeatability) of a measurement process, we can insert that information into our model and remove bias due to measurement error. Once we have two or more indicators, it is possible to confirm or not the presence of a latent cause. This is the example situation we address in the current paper.
Indicator validity refers to the requirement that measured variables are interpretable as measures of the concept of interest. This is a theoretical requirement, but one to not forget to address in a paper. We recommend the construction of a table such as Table 1 as a formal means of defining explicitly the basis for explaining the logic connecting indicators to latent variables.
Question #3: What do the Patterns of Intercorrelations Amongst Indicator Variables Suggest?
It is one thing to conceptualise a set of observed variables as reflections of a concept of interest, but it is another thing for the data to agree with one’s conceptualisation. A simple first approach to this problem is to construct a correlation matrix to see if the patterns of correlations amongst indicator variables are roughly consistent with theoretical expectations. For this exercise, we focus on the submodel shown in Fig.
Fig.
When one starts working with LVSEM, it is found that there are many ways that data may deviate from showing equal correlation strengths amongst indicators, aside from error correlations, some of which are suggested in Fig.
Nitrogen 
Sugars 
Tartric Acid 
Malic Acid 

Nitrogen 
1.00 

Sugars 
0.53 
1.00 

Tartric Acid 
0.40 
0.36 
1.00 

Malic Acid 
0.61 
0.41 
0.27 
1.00 
Question #4: Do Analyses Support There Being a Generalised Response?
It is customary in SEM practice to analyse latent variable models in two stages, first evaluating the fit between latent variables and indicators (Fig.
Table 3 presents the code used to conduct a CFA examination of the model shown in Fig.
R code for the Latent Response Model (Fig.
library(lavaan) 

input.cov < ' 

2.602  
1.187  1.896  
1.038  0.781  2.536  
1.270  0.726  0.559  1.688  
0.592  0.451  0.147  0.219  1.670  
0.821  0.364  0.455  0.578  0.864  1.366 ' 
cov.dat < getCov(input.com, names = c("N", "Sugars", "Tart", "Malic", "Nfixers", "Mgt")) 

cfa1 < 'GrapeQual =~ lambda1*N +lambda2*Sugars +lambda3*Tart +lambda4*Malic ' 

cfa1.fit < sem(cfa1, sample.cov = cfa.cov.dat, sample.nobs = 50) 
Tables of results for all models run in the paper are provided in Suppl. material
Examination of results focuses initially on overall model fit (Suppl. material
Results show strong support for our initial model (Table S2.1). A test statistic (Model Chisquare) value of 0.808 with an associated pvalue of 0.668 was found. This pvalue is well above the 0.05 criterion, providing strong support for there not being major modeldata discrepancies. A Comparative Fit Index value of 1.000 further indicates a nearperfect explanation of the observed covariances by the model. Thus, it is extremely unlikely that additions to our model, such as shown in Fig.
Having assessed the global model fit, we turn attention to the parameter estimates (Table S2.1). Again, we do not treat pvalues as absolute cutoffs, but instead as continuous measures of evidence that a parameter or model deviates from the default expectation (
Question #5: Does the Generalised Response Exhibit a Concerted Reaction to Perturbation? and
Question #6: Are there Unique Reactions by Specific Indicators?
The complexity of SE models and the variety of inferences we typically wish to make lead us to move through the evaluation of our overall hypothesis in stages. It is important to keep in mind that conclusions one might draw, based on the analysis of submodels, may need to be reconsidered once the full model is examined. Having examined the latent response submodel, we now move to a pair of competing models shown in Fig.
In Fig.
## Initial Net Effect Model 
LVNet1 < ' GrapeQual =~ lambda1*N +lambda2*Sugars +lambda3*Tart +lambda4*Malic ManInten =~ lambda5*Mgt GrapeQual ~ gamma1*ManInten' 
LVNet1.fit < sem(LVNet1, sample.cov=cov.dat, sample.nobs=50 
show(LVNet1.fit); fitMeasures(LVNet1.fit, "cfi") subset(modindices(LVNet1.fit), mi>3) 
## Revised Net Effect Model 
LVNet2 < ' GrapeQual =~ lambda1*N +lambda2*Sugars +lambda3*Tart +lambda4*Malic ManInten =~ lambda5*Mgt GrapeQual ~ gamma1*ManInten Tart ~ gamma3*ManInten' 
Results for the initial model (Fig.
As illustrated in Fig.
Our second question, represented in Fig.
### Initial Mediated Effect Model 
LVmed1 < ' GrapeQual =~ lambda1*N +lambda2*Sugars +lambda3*Tart +lambda4*Malic ManInten =~ lambda5*Mgt NonCrop =~ lambda6*Nfixers GrapeQual ~ gamma1*ManInten + beta1*NonCrop NonCrop ~ gamma2*ManInten' 
### Revised Mediated Effect Model 
LVmed2 < ' GrapeQual =~ lambda1*N +lambda2*Sugars +lambda3*Tart +lambda4*Malic ManInten =~ lambda5*Mgt NonCrop =~ lambda6*Nfixers GrapeQual ~ gamma1*ManInten + beta1*NonCrop NonCrop ~ gamma2*ManInten Tart ~ gamma3*ManInten' #added direct effect 
Question #7: Can we Simplify the Model, Thereby Increasing Generality?
Since SE models are used for explanatory representations of scientist's understanding of systems (
Regarding our example, we next turn to an examination of individual parameter estimates to determine whether model simplification of model LVmed2 is possible (Table S2.4). Pvalues provide strong support for all estimated lambdas (all < 0.001), as well as all other estimated parameters, except beta1 (p = 0.713), which is the effect of the mediator NonCrop Vegetation on Grape Qualities. We estimated a simplified model (not shown) with beta1 set to zero (beta1 == 0) and determined that model fit was improved, as discrepancy increased very slightly while the number of estimated parameters was reduced by one (and fit is a measure of the amount of discrepancy prorata to the number of estimated parameters). We continue discussing ways to minimise the number of estimated parameters in the next section where we address the complexity that arises when there is more than one variety of grape being modelled.
Question #8: What About Generality Across Groups?
LVSEM has the capacity to formally evaluate parameter equality across groups. Referred to as multigroup analysis, the investigator can test hypotheses by asking whether models of the same general form apply beyond single groups. With regard to the Swiss grape study, the investigators sampled vineyards that cultivated two different varieties of grapes, Chasselas and Pinot noir. Suppl. material
If multigroup models are specified without constraints, all parameters will be independently estimated for each group by default. One way to set equality constraints across groups is to add labels to the code. In this case, one first uses the format c(“label1”, “label2”) to create names for the parameters where there are two groups. This example will generate two independent parameter estimates, one for each group, since the labels are unique. If we specify c(“lambda1”, “lambda1”), the repeated use of a common label means a single value will be estimated for both groups (Table
## CFA independence model with distinct labels for each group 
mg.mod0 < ' GrapeQual =~ c("lambda1a","lambda1b")*N + c("lambda2a","lambda2b")*Sugars + c("lambda3a","lambda3b")*Tart + c("lambda4a","lambda4b")*Malic' 
## CFA model with parameters equal across groups (using repeat labels) 
mg.mod1 < ' GrapeQual =~ c("lambda1","lambda1")*N + c("lambda2","lambda2")*Sugars + c("lambda3","lambda3")*Tart + c("lambda4","lambda4")*Malic' 
Using the approach in Table
mg.mod4 < ' # declare latent variables GrapeQual =~ c("lambda1","lambda1")*N + c("lambda2","lambda2")*Sugars + c("lambda3","lambda3")*Tart + c("lambda4","lambda4")*Malic ManInten =~ c("lambda5","lambda5")*Mgt NonCrop =~ c("lambda6","lambda6")*Nfixers # regressions GrapeQual ~ c("gamma1a","gamma1b")*ManInten + c("beta1a","beta1b")*NonCrop NonCrop ~ c("gamma2a","gamma2b")*ManInten Tart ~ c("gamma3a","gamma3b")*ManInten # set constraints beta1a == 0 gamma3b == 0 gamma2a == gamma2b' 
It is important to be able to judge whether a system exhibits a generalised multivariate response to environmental change rather than an independent collection of uncoordinated responses. This paper presents an approach to addressing that question. A particular aspect of the approach demonstrated is that it invokes causal reasoning. We ask if suites of observed properties behave as if they are jointly influenced by a “hidden hand” or integrative cause.
Studying generalised responses is inherently challenging. Our objective is to focus our attention on the general, while moving the specifics to the background – at least initially. The sequence of operations described support a “general first, specifics second” perspective. Ultimately, SEM forces us to address both. Along the way, we must confront the large number of possible explanations that can exist for the actual functioning of the system being studied. This complexity means one cannot take a rigid approach, but must follow clues along a path to selecting a final model to use for interpretation. We suggest a series of questions that can guide investigators through several critical steps in model evaluation. In addition, we recognise that the research context matters, so the list may need to be modified for particular applications.
Success in applying a flexible, adaptive approach requires a solid understanding of how the analytical system ‘thinks’ about things. Within LVSEM, latent variables represent the common variance or overlapping information for a set of measures. They represent, in essence, the consensus opinion about the latent factor that functions as their common causal connection. There will, of course, be unique information associated with the individual measures, particularly if they are selected to represent multiple facets of a theoretical construct. Our core challenge is to capture the general opinions of the data without becoming overly distracted by the unique responses.
Fig.
A number of mysteries are exposed in our multigroup model (Fig.
It is our hope that this paper demonstrates both how to approach using LVSEM to investigate multivariate responses and also to hint at the variety of scientific insights that can be gleaned from the effort. We believe there is an important opportunity for LVSEM to play a greater role in our quantitative understanding of ecological responses to environmental change.
We thank two anonymous reviewers for helpful comments and suggestions. This work was supported by the USGS Ecosystems and Land Change Science Climate Research and Development Programs. Any use of trade, firm or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government.
This text file contains the equations and notation mentioned in Grace JB, Steiner M (2021) A protocol for modelling generalised biological responses using latent variables in structural equation models. One Ecosystem
This file contains the results tables for the demonstrations included in Grace JB, Steiner M (2021) A protocol for modelling generalised biological responses using latent variables in structural equation models. One Ecosystem.
: This text file contains the R code used to develop the demonstrations included in Grace JB, Steiner M (2021) A protocol for modelling generalised biological responses using latent variables in structural equation models. One Ecosystem.