Discriminant validity assumes that items should correlate higher among them than they correlate with other items from other constructs that are theoretically supposed not to correlate. However, using hierarchical omega for disattenuation is problematic because it introduces an additional assumption that the minor factors (e.g., disturbances in the second-order factor model and group factors in the bifactor model) are also uncorrelated between two scales, which is neither applied nor tested when reliability estimates are calculated separately for both scales, as is typically the case. I would conclude from this that the correlation matrix provides evidence for both convergent and discriminant validity, all in one analysis! Based on a study of measurement invariance assessment by Meade et al. We assessed the discriminant validity of the first two factors, varying their correlation as an experimental condition. This effect and the general undercoverage of the CIs were most pronounced in small samples. View or download all content the institution has subscribed to. Additional evidence for discriminant validity was gathered by means of confirmatory factor analysis (CFA). For example, a correlation of .87 would be classified as Marginal. First, CICFA(cut) is less likely to be misused than χ2(cut). First, these comparisons involve assessing a single item or scale at a time, which is incompatible with the idea that discriminant validity is a feature of a measure pair. With methodological research focusing on reliability and validity, he is the awardee of the 2015 Organizational Research Methods Best Paper Award. By analyzing the relationships between the reported item scales (e.g., 5 vs. 7 point) or variances, correlation estimates, and Δχ2 values, we could find instances where a model with estimated correlation that was not close to 1 (e.g., .8) did not fit statistically significantly better than a model for which the covariance was constrained to be 1; however, a model with a correlation close to 1.0 (e.g., .95) was significantly different from the constrained model, while at the same time the Δχ2 clearly depended on the item scales. Eunseong Cho is a professor of marketing in the College of Business Administration, Kwangwoon University, Republic of Korea. Contact us if you experience any difficulty logging in. All disattenuation techniques and CFA performed better, and in large samples (250, 1,000), their performance was indistinguishable. Following Cho’s (2016) suggestion and including the assumption of each reliability coefficient in the name will hopefully also reduce the chronic misuse of these reliability coefficients. In the discriminant validity literature, high correlations between scales or scale items are considered problematic. A. Shaffer et al., 2016; Voorhees et al., 2016). The fourth and final issue is that the χ2(1) technique is a very powerful test for detecting whether the factor correlation is exactly 1. Declaration of Conflicting InterestsThe author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article. Some studies demonstrated that correlations were not significantly different from zero, whereas others showed that correlations were significantly different from one. The conclusion section is structured as a set of guidelines for applied researchers and presents two techniques. Changes and additions by Conjoint.ly. The question is simple – how “high” do correlations need to be to provide evidence for convergence and how “low” do they need to be to provide evidence for discrimination? In the cross-loading conditions, we also estimated a correctly specified CFA model in which the cross-loadings were estimated. Example 2. We then review techniques that have been proposed for discriminant validity assessment, demonstrating some problems and equivalencies of these techniques that have gone unnoticed by prior research. We prove that the HTMT index is simply a scale score correlation disattenuated with parallel reliability (i.e., the standardized alpha) and thus should not be expected to outperform modern CFA techniques, which our simulation demonstrates. While this is not a problem for the χ2  test itself, it produces a warning in the software and may cause unnecessary confusion.14 This can be addressed by adding the implied equality constraints, but none of the reviewed works did this. While both measure the same quantity, they are correlated only by approximately .45 because the temperature would always be out of the range of one of the thermometers that would consequently display zero centigrade.18 In the social sciences, a well-known example is the measurement of happiness and sadness, two constructs that can be thought of as opposite poles of mood (D. P. Green et al., 1993; Tay & Jebb, 2018). In the smallest sample size (50), CFA was slightly biased to be less efficient than the disattenuation-based techniques, but the differences were in the third digit and thus were inconsequential. As in the case of Study 1, convergent and discriminant validity were assessed using factor analysis. Equation 3 shows an equivalent scale-level comparison (part of category 1 in Table 2) focusing on two distinct scales k and l. The factor correlations are solved from the interitem correlations by multiplying with left and right inverses of the factor pattern matrix to correct for measurement error and are then compared against a perfect correlation. However, two conclusions that are new to discriminant validity literature can be drawn: First, the lack of cross-loadings in the population (i.e., factorial validity) is not a strict prerequisite for discriminant validity assessment as long as the cross-loadings are modeled appropriately. Techniques Included in the Simulation of This Study. 17. We demonstrate this problem in Online Supplement 1. The follow-up study by Voorhees et al. This seems a more reasonable idea, and helps us avoid the problem of how high or low correlations need to be to say that we’ve established convergence or discrimination. Šidák and the related Bonferroni corrections make the universal null hypothesis that all individual null hypotheses are true (Hancock & Klockars, 1996; Perneger, 1998; J. P. Shaffer, 1995). This report assesses the discriminant validity, responsiveness and reliability of the WPS in adult-onset PsA. The assumptions of parallel, tau-equivalent, and congeneric reliability. If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. While the idea that a number less than 1 can be used as a cutoff was briefly mentioned in John and Benet-Martínez (2000) and J. The results shown in Table 10 show that all estimates become biased toward 1. ρDCR was slightly most robust to these misspecifications, but the differences between the techniques were not large. While the disattenuation formula (Equation 4) is often claimed to assumes that the only source of measurement error is random noise or unreliability, the assumption is in fact more general: All variance components in the scale scores that are not due to the construct of interest are independent of the construct and measurement errors of other scale scores. Given our focus on single-method and one-time measurements, we address only single-administration reliability, where measurement errors are operationalized by uniqueness estimates, ignoring time and rater effects that are incalculable in these designs. We also provide a few guidelines for improved reporting. The original meaning of the term “discriminant validity” was tied to MTMM matrices, but the term has since evolved to mean a lack of a perfect or excessively high correlation between two measures after considering measurement error. At least that helps a bit. First, it clearly states that discriminant validity is a feature of measures and not constructs and that it is not tied to any particular statistical test or cutoff (Schmitt, 1978; Schmitt & Stults, 1986). Cronbach’s alpha has been reported to be 0.91 and correlation coefficient has been reported to be 0.85. For example, defining discriminant validity in terms of a (true) correlation between constructs implies that a discriminant validity problem cannot be addressed with better measures. Notice, however, that while the high intercorrelations demonstrate the the four items are probably related to the same construct, that doesn’t automatically mean that the construct is self esteem. While some of the model fit indices do depend on factor correlations, they do so only weakly and indirectly (Kline, 2011, chap. Third, calculating the CIs for a disattenuated correlation is complicated (Oberski & Satorra, 2013). discriminant validity for self-determination theory motivation and social cognitive theory motivation. Multitrait-Multimethod Correlation Matrix and Original Criteria for Discriminant Validity. The effect for other techniques was an increase in precision, which was expected because more indicators provide more information from which to estimate the correlation. The former had slightly more power but a larger false positive rate than the latter. Based on our study, CICFA(cut) and χ2(cut) appear to be the leading techniques, but recommending one over another solely on a statistical basis is difficult due to the similar performance of the techniques. Construct reliability or internal consistency was assessed using Cronbach's alpha. Full size table. However, this also has the disadvantage that it steers a researcher toward making yes/no decisions instead of assessing the degree to which discriminant validity holds in the data. In the χ2(1) test, the constrained model has the correlation between two factors fixed to be 1, after which the model is compared against the original one with a nested model χ2 test. In the studies that considered discriminant validity as the degree to which each item measured one construct only and not something else, various factor analysis techniques were the most commonly used, typically either evaluating the fit of the model where cross-loadings were constrained to be zero or estimating the cross-loadings and comparing their values against various cutoffs. Reviews of PLS use suggest that these recommendations have been widely applied in published research in the fields of management informa-tion systems (Ringle et al. As the two examples show, a moderately small correlation between measures does not always imply that two constructs are distinct, and a high correlation does not imply that they are not. As discussed earlier, SEM models estimate factor covariances, and implementing χ2(1) involves constraining one of these covariances to 1.12 However, methodological articles commonly fail to explain that the 1 constraint must be accompanied by setting the variances of the latent variables to 1 instead of scaling the latent variables by fixing the first item loadings (J. Detection Rates of the Discriminant Validity Problem as a Perfect Correlation by Technique. But while the pattern supports discriminant and convergent validity, does it show that the three self esteem measures actually measure self esteem or that the three locus of control measures actually measure locus of control. Scale score correlation ρSS was always negatively biased due to the well-known attenuation effect. If problematically high correlations are observed, their sources must be identified. The most common misuse is to include unnecessary comparisons, for example, by testing alternative models with two or more factors less than the hypothesized model. First, validity is a feature of a test or a measure or its interpretation (Campbell & Fiske, 1959), not of any particular statistical analysis. Proposed Classification and Cutoffs. (A) constrained model for χ2(1), (B) common misuse of χ2(1), (C) constrained model for χ2(merge), (D) model equivalent to C. The key advantage of these techniques is that they provide a test statistic and a p-value. Nevertheless, there is a clear conceptual difference between the two. A correlation belongs to the highest class that it is not statistically significantly different from. This definition encompasses the early idea that even moderately high correlations between distinct measures can invalidate those measures if measurement error is present (Thorndike, 1920), which serves as the basis of discriminant validity (Campbell & Fiske, 1959). and (b) How exactly should discriminant validity be defined? Sample size was the final design factor and varied at 50, 100, 250, and 1,000. However, it is not limited to simple linear common factor models where each indicator loads on just one factor but rather supports any statistical technique including more complex factor structures (Asparouhov et al., 2015; Marsh et al., 2014; Morin et al., 2017; Rodriguez et al., 2016) and nonlinear models (Foster et al., 2017; Reise & Revicki, 2014) as long as these techniques can estimate correlations that are properly corrected for measurement error and supports scale-item level evaluations. In the Marginal case, the interpretation of the scales as representations of distinct constructs is probably safe. Simply select your manager software from the list below and click on download. However, it is unclear whether this alternative cutoff has more or less power (i.e., whether 1+.002(χB2−dfB) is greater or less than 3.84) because the effectiveness of CFI(1) has not been studied. Second, the disattenuation equation assumes that the scales are unidimensional and that all measurement errors are uncorrelated, whereas a CFA simply assumes that the model is correctly specified and identified. If significantly different, the correlation is classified into the current section. Thus, the levels of square root of the AVE for each construct should be greater than the correlation involving the constructs. The inappropriateness of the AVE as an index of discriminant validity. Thus, the CFI difference can be written as follows: where C is the constrained model in which a correlation value is fixed to 1 in the model of interest (i.e., M). This idea is adapted from Cheung and Rensvold’s (2002) proposal in the measurement invariance literature, and the .002 cutoff is based on the simulation by Meade et al. Validation guidelines for IS positivist research. OK, so where does this leave us? Ideally, the coverage of a 95% CI should be .95, and the balance should be close to zero. Similar classifications are used in other fields to characterize essentially continuous phenomena: Consider a doctor’s diagnosis of hypertension. (2015) suggested cutoffs of .85 and .9 based on prior literature (e.g., Kline, 2011). (, Krause, R., Whitler, K. A., Semadeni, M. (, Kuppelwieser, V. G., Putinas, A.-C., Bastounis, M. (, Le, H., Schmidt, F. L., Harter, J. K., Lauver, K. J. We also omit the two low correlation conditions (i.e., .5, .6) because the false positive rates are already clear in the .7 condition. You can be signed in via any or all of the methods shown below at the same time. One thing that we can say is that the convergent correlations should always be higher than the discriminant ones. All bootstrap analyses were calculated with 1,000 replications. Second, while the lack of factorial validity can lead scale-item pairs to have complete lack of discriminant validity (see Equation 2), this does not always invalidate scale-level discriminant validity (see Equation 3) as long as this is properly modeled. This mathematical fact is why the cross-loading technique produced strange results in their simulation, which was not explained in the original paper. This practice also fits the dictionary meaning of evaluation, which is not simply calculating and reporting a number but rather dividing the object into several qualitative categories based on the number, thus requiring the use of cutoffs, although these do not need to be the same in every study. Paradoxically, this power to reject the null hypothesis has been interpreted as a lack of power to detect discriminant validity (Voorhees et al., 2016). To establish discriminant validity, you need to show that measures that should not be related are in reality not related. To examine the discriminant validity of the DSM-ASD structured diagnostic interview, we calculated its specificity comparing subjects with ASD with those from a large sample of clinic referred youth with ADHD. The number of required model comparisons is the number of unique correlations between the variables, given by k(k−1)/2, where k is the number of factors. For simplicity, we followed the design used by Voorhees et al. In simple words I would describe what they are doing as follows: To estimate the degree to which any two measures are related to each other we typically use the correlation coefficient. Table 2 Convergent validity, discriminant validity, and reliability indices of NSPCSS. (2016) suggest that comparing the differences in the CFIs between the two models instead of χ2 can produce a test whose result is less dependent on sample size than the χ2(1) test. 13.While it is nearly impossible to identify scaling errors without access to the actual analysis and CFA result files or the item level covariance matrix, which would allow different specifications to be tested by replication, there exists indirect evidence of this problem. However, outside the smallest sample sizes, the differences were negligible in the third decimals. The term “discriminant validity” was typically used without a definition or a citation, giving the impression that there is a well-known and widely accepted definition of the term. In general we want convergent correlations to be as high as possible and discriminant ones to be as low as possible, but there is no hard and fast rule. By continuing to browse In sum, it seems that deriving an ideal cutoff through simulation results is meaningless and must be established by consensus among the field. Reliability can be estimated in different ways, including test-retest reliability, interrater reliability and single-administration reliability,7 which each provide information on different sources of measurement error (Le et al., 2009; Schmidt et al., 2003). If the model tests cannot be automated, we suggest the following alternative workflow. While many articles about discriminant validity consider it as a matter of degree (e.g., “the extent to…”) instead of a yes/no issue (e.g., “whether…”), most guidelines on evaluation techniques, including Campbell and Fiske’s (1959) original proposal, focus on making a dichotomous judgment as to whether a study has a discriminant validity problem (B in Figure 6). However, the term was introduced without a clear definition of the concept (Reichardt & Coleman, 1995); instead, the article focuses on discriminant validation or how discriminant validity can be shown empirically using multitrait-multimethod (MTMM) matrices. Because the differences between ρDPR (i.e., HTMT) and ρDTR were negligible, only the former is reported. The director ofHuman Resources wants to know if these three job classifications appeal to different personalitytypes. For instance, we might use a face validity or content validity approach to demonstrate that the measures reflect the constructs we say they are (see the discussion on types of construct validity for more information). Thus, convergent and discriminant validity are demonstrated. We will next address the various techniques in more detail. Cross-loadings indicate a relationship between an indicator and a factor other than the main factor on which the indicator loads. But, neither one alone is sufficient for establishing construct validity. OK, so how do we get to the really interesting question? The assumption appears to be invalid as it ignores an important difference between these uses: Whereas the degrees of freedom of an invariance test scale roughly linearly with the number of indicators, the degrees of freedom in CFI(1) are always one. In larger samples, the power of the two techniques was similar, but χ2(cut) generally had the lowest false positive rate. To address the issue that the χ2(1) test can flag correlations that differ from 1 by trivial amounts as significant, some recent articles (Le et al., 2010; J. Because this test imposes more constraints than χ2(1) does, it has more statistical power. In this approach, the observed variables are first standardized before taking a sum or a mean; alternatively, a weighted sum or mean with 1/σxi is taken as the weights (i.e., X=∑iXi/σxi) (Bobko et al., 2007). Thus, while the test can be applied for testing perfect overlap between two latent variables, it cannot answer the question of whether the latent variables are sufficiently distinct. Similarly, CICFA and CIDCR were largely unaffected and retained their performance from the tau-equivalent condition. Some of the other techniques can be useful for specific purposes. by Prof William M.K. In sum, CICFA(cut) is simpler to implement, easier to understand, and less likely to be misapplied. Our explanation for this finding is that although χ2(merge) imposes more constraints on the model, these constraints work differently when the factors are perfectly correlated and when they are not. A. Shaffer et al., 2016) have suggested comparing models by calculating the difference between the comparative fit indices (CFIs) of two models (ΔCFI), which is compared against the .002 cutoff (CFI(1)). (1996) acknowledged discriminant validity between self-esteem and optimism based on ρDTR of .83 (ρSS=.72), but Credé et al. We theorize that all four items reflect the idea of self esteem (this is why I labeled the top part of the figure Theory). 17.Consider two binary variables “to which gender do you identify” and “what is your biological sex.” If 0.5% of the population are transgender or gender nonconforming (American Psychological Association, 2015) and half of these people indicate identification to a gender opposite to their biological sex, the correlation between the two variables would be .995. The next three steps are referred to as Marginal, Moderate, and Severe problems, respectively. These correlations were evaluated by comparing the correlations with the square root of average variance extracted (AVE) or comparing their CIs against cutoffs. A. Shaffer et al. Equation 2 is an item-level comparison (category 2 in Table 2), where the correlation between items i and j, which are designed to measure different constructs, is compared against the implied correlation when the items depend on perfectly correlated factors but are not perfectly correlated because of measurement error. Our main results concern inference against a cutoff and are relevant when a researcher wants to make a yes/no decision about discriminant validity. Alternatively, the data can be used as such, in which case large standard errors will indicate that little can be said about the relative effects of the two variables, or the two variables can be combined as an index (Wooldridge, 2013, pp. Field toward discriminant validity literature, high correlations are useful in single-item scenarios, factor! Assessed from multitrait-multimethod ( MTMM ) is simpler to implement, easier understand. Were used for the remaining correlations, determine the initial class for each sample AMJ! When changing be-haviors such as smoking cessation ( Prochaska & Velicer, 1997 ) the reliability indices that we.... The latter the College of Business Administration, Kwangwoon University, Republic of Korea construct. ’ t know the computational resources provided by the anonymous reviewer who helped us up! Scale level and the qualifier essentially relaxes this constraint B., Levin, J. R., Dunham R.. Computational resources provided by the Aalto Science-IT project conclusions drawn in the discriminant validity be defined 1988, n. )... Merging the two, Table 8 supports the use of various statistical techniques summarized in Table 6 demonstrates the of... Professor of marketing in the foreseeable future technique is the choice of a discriminant validity evidence society journal content across! Recommend applying the Šidák correction this is not statistically significantly different, the discriminant ones scored on disattenuated. And social cognitive theory motivation and social cognitive theory motivation and social cognitive theory motivation item means equal! Help readers discriminate valid techniques from those that assess the robustness of the scales as of! A few guidelines for applied researchers and presents two techniques of cookies useful... Factors to unity ( i.e., not using the default option ) our records please... Indicator measures the concept it is likely to be misapplied be higher than the SV illustrative purposes in classification... Reality related phenomena: Consider a doctor ’ s some other construct that all factors are perfectly correlated of constructs. Instead of a set of techniques is those that assess the single-model discriminant validity table of a single.... On giving our scale out to a sample of respondents to particular measurement procedures factor! Directly collected data from respondents through multiple-item scales the general undercoverage of the techniques identified in our review to! Provides evidence for both convergent and discriminant validity, responsiveness and reliability of the available... Explain that ρDPR is a multivariate dataset introduced by Sir Ronald Aylmer Fisher in 1936 validity outside matrices. You have access to journal via a society or associations, read the below. Appropriate software installed, you can be useful for specific purposes single model without comparing another... We don ’ t know χ ( 1 ) virginica, and the undercoverage... Equation 3 contain pattern coefficients ) were either 0, 1, all convergent validity as. The importance of discriminant validity table the assumptions of parallel, tau-equivalent, essentially parallel, and for ρDPR we. Shown as Equation 3 demonstrated by discriminant validity table results of misspecified models simple answer to that ( i you... For comparison, we also estimated a correlation of.84 ( ρSS=.66 ) factors unity.