Nonparametric Limits of Agreement in Method Comparison Studies: A Simulation Study on Extreme Quantile Estimation

Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Associated Data

GUID: FB471912-68B3-45D3-B2A6-D72A575568EC

Abstract

Bland–Altman limits of agreement and the underlying plot are a well-established means in method comparison studies on quantitative outcomes. Normally distributed paired differences, a constant bias, and variance homogeneity across the measurement range are implicit assumptions to this end. Whenever these assumptions are not fully met and cannot be remedied by an appropriate transformation of the data or the application of a regression approach, the 2.5% and 97.5% quantiles of the differences have to be estimated nonparametrically. Earlier, a simple Sample Quantile (SQ) estimator (a weighted average of the observations closest to the target quantile), the Harrell–Davis estimator (HD), and estimators of the Sfakianakis–Verginis type (SV) outperformed 10 other quantile estimators in terms of mean coverage for the next observation in a simulation study, based on sample sizes between 30 and 150. Here, we investigate the variability of the coverage probability of these three and another three promising nonparametric quantile estimators with n = 50 ( 50 ) 200 , 250 ( 250 ) 1000 . The SQ estimator outperformed the HD and SV estimators for n = 50 and was slightly better for n = 100 , whereas the SQ, HD, and SV estimators performed identically well for n ≥ 150 . The similarity of the boxplots for the SQ estimator across both distributions and sample sizes was striking.

Keywords: agreement, Bland–Altman analysis, coverage, limits of agreement, method comparison, quantile estimation, repeatability, reproducibility

1. Introduction

When comparing methods for the measurement of a continuous outcome, the Bland–Altman Limits of Agreement (BA LoAs) are a well-known and well-established means to this end [1,2,3]. Mean values are plotted against the paired differences in a scatter plot that is supplemented by an estimate of the bias (i.e., the mean of the paired differences) and the so-called LoAs (bias estimate +/− 1.96 standard deviations of the paired differences), including the respective 95% confidence intervals [4,5,6,7]. Under the assumptions of normally distributed paired differences, the BA LoAs represent estimates for the boundaries between which 95% of all population differences are supposed to lie. In the case of the variance heterogeneity of the paired differences, a normalizing transformation (like the natural logarithm) may render the usual analysis (on the log-scale) possible, whereas a bias that is non-constant across the measurement range might require a regression approach [3,8]. If neither a transformation of the data, nor a regression approach are applicable, the LoAs are often estimated by simple empirical 2.5% and 97.5% quantiles. However, various nonparametric quantile estimators have been proposed in the literature during the recent decades, which likewise may serve in the nonparametric estimation of 2.5% and 97.5% quantiles [9,10,11,12]. In an earlier endeavor, we performed a literature search on nonparametric quantile estimators from which we compared 15 in a simulation study [13]. We assessed the performance of these estimators by the average coverage probability for one newly generated observation from 20,000 fictive trials, and we assumed six different distributions and sample sizes between 30 and 150 to this end. We found that a simple sample quantile estimator based on two rank statistics [9], the Harrell–Davis estimator [14], and the estimators of the Sfakianakis–Verginis type [15] performed, on average, closely to the nominal coverage probability of 95%. The purpose of this paper is to illuminate the variability of the coverage probability of these nonparametric quantile estimators in a simulation study. Three further, possibly promising nonparametric quantile estimators from our former investigation [13] and the classical BA LoAs [3] (as a benchmark measure and for illustrative purposes) are added. In Section 2, all nonparametric quantile estimators and the simulation setup are described. In Section 3, the results are given and illustrated by boxplots. A discussion closes the paper.

2. Materials and Methods

The description of the six different methods of nonparametric quantile estimation considered here is kept brief. Further details can be found elsewhere [13].

2.1. Nonparametric Quantile Estimators

2.1.1. Sample Quantile Estimator

A random sample of n paired differences X 1 , … X n is sorted in increasing order X ( 1 ) ≤ X ( 2 ) ≤ … ≤ X ( n ) ; the symbols here denote the order statistics of the random sample. The SQ estimator is a weighted average of the two order statistics that are closest to including p percent of all the observations in the sample:

S Q = ( 1 − α ) X ( r ) + α X ( r + 1 )

with α = p ( n + 1 ) − r , r = p ( n + 1 ) , and x is the greatest integer that is less than or equal to x [10,12].

2.1.2. Harrell–Davis Estimator

The HD estimator, as well as those estimators that follow, employs linear combinations of all the available order statistics, weighting them according to their relative closeness to the target percentile. They can be given as L-statistics, which is, ∑ j = 1 n W j · X ( j ) , where W j and X ( j ) is the weight for the j-th order statistic and the j-th order statistic itself, respectively [16]. The Harrell–Davis estimator is given by:

H D = ∑ i = 1 n W i X ( i )

with weight function:

W i = I i / n p ( n + 1 ) , ( 1 − p ) ( n + 1 ) − I ( i − 1 ) / n p ( n + 1 ) , ( 1 − p ) ( n + 1 ) ,

where I i / n a , b is the incomplete beta function [9,14].

2.1.3. Bernstein Polynomial Estimator

The BP estimator employs the binomial probability of observing exactly i out of n events with an event probability of p, B ( i ; n , p ) , and is given by:

B P = ∑ i = 1 n B ( i − 1 ; n − 1 , p ) X ( i )

according to Cheng [17].

2.1.4. HD Estimator Using a Level Crossing Empirical Distribution

Huang [18] modified the Harrell–Davis estimator (2) by applying a weighted empirical distribution function instead of the empirical distribution with equal weights 1 / n . The Harrell–Davis estimator using a level crossing empirical distribution function can be written as:

H D l c = ∑ i = 1 n W i X ( i )

with weight function:

W i = I q i p ( n + 1 ) , ( 1 − p ) ( n + 1 ) − I q i − 1 p ( n + 1 ) , ( 1 − p ) ( n + 1 ) ,

the incomplete beta function I q i a , b , q i = ∑ j = 1 i w j , i = 1 , … , n , and:

w j = 1 2 1 − n − 2 n ( n − 1 ) if j = 1 , n 1 n ( n − 1 ) if j = 2 , 3 … , n − 1 .

2.1.5. Sfakianakis–Verginis Estimator

Sfakianakis and Verginis [15] proposed a group of three estimators, from which we chose the first one due to the similarity of the results in our former investigation [13]. SV estimators are supposed to better estimate quantiles in the tails of a distribution when using small samples, and they employ also the binomial probabilities B ( i ; n , p ) as weights for the ordered statistics X ( i ) , i = 1 , … , n :

S V = 2 B ( 0 ; n , p ) + B ( 1 ; n , p ) 2 X ( 1 ) + B ( 0 ; n , p ) 2 X ( 2 ) − B ( 0 ; n , p ) 2 X ( 3 ) + ∑ i = 2 n − 1 B ( i ; n , p ) + B ( i − 1 ; n , p ) 2 X ( i ) − B ( n ; n , p ) 2 X ( n − 2 ) + B ( n ; n , p ) 2 X ( n − 1 ) + 2 B ( n ; n , p ) + B ( n − 1 ; n , p ) 2 X ( n ) .

2.1.6. Navruz–Özdemir Estimator

Recently, Navruz and Özdemir [19] introduced a new quantile estimator, which is also a linear function of the order statistics with weights of binomial probabilities:

N O = ( B ( 0 ; n , p ) 2 p + B ( 1 ; n , p ) p ) X ( 1 ) + B ( 0 ; n , p ) ( 2 − 3 p ) X ( 2 ) − B ( 0 ; n , p ) ( 1 − p ) X ( 3 ) + ∑ i = 1 n − 2 ( B ( i ; n , p ) ( 1 − p ) + B ( i + 1 ; n , p ) p ) X ( i + 1 ) − B ( n ; n , p ) p X ( n − 2 ) + B ( n ; n , p ) ( 3 p − 1 ) X ( n − 1 ) + ( B ( n − 1 ; n , p ) ( 1 − p ) + B ( n ; n , p ) ( 2 − 2 p ) ) X ( n ) .

2.2. Simulation Setup

We assumed six distributions ( Figure 1 ), the choice and number of which were motivated by former simulation studies on nonparametric quantile estimation, i.e., 4–6 symmetric and asymmetric distributions [11,14,15,18,19]:

the standard normal distribution; a lognormal distribution with meanlog = 1 and sdlog = 1; a beta distribution with shape parameters α = 2 and β = 5 (non-centrality parameter λ = 0 ); a beta distribution with shape parameters α = 2 and β = 2 ( λ = 0 ); a chi-squared distribution with 4 degrees of freedom; and an exponential distribution with rate parameter 1.

An external file that holds a picture, illustration, etc. Object name is ijerph-17-08330-g001.jpg

Density functions of the assumed distributions.

For sample sizes n = 50 ( 50 ) 200 , 250 ( 250 ) 1000 , each distribution, and quantile estimator, we simulated 5000 fictive trials and derived the LoAs. Their respective coverage probability c was derived by applying the cumulative distribution function F ( x ) to the LoAs, i.e.,

c = F ( u p p e r L o A ) − F ( l o w e r L o A ) .

This resulted in the distributions of coverage probabilities, which we display with boxplots. We compared the nonparametric quantile estimators with respect to (a) the average (i.e., median and mean) coverage probability and the closeness to the nominal 95% level and (b) their variability in terms of the 5% quantile and the first quartile. The classical BA LoAs served as the benchmark indication of expectable variation under the standard normal distribution; their performance under the remaining distributions illustrated the inappropriateness under non-normal distributions. The Root Mean Squared Error (RMSE) for the estimation of the 2.5% and 97.5% quantiles was supplemented. Here, the true values for these quantiles were known, as were the assumed distributions. Note, though, that the investigation of the coverage probability enabled a simultaneous assessment of both LoAs, whereas the RMSE was related to one of the two LoAs at a time. For instance, underestimation of the lower LoA and underestimation of the upper LoA for one set of estimated LoAs can imply the very same coverage probability as an overestimated lower LoA and an overestimated upper LoA for another set of estimated LoAs. The R code is available as Supplementary Material Code S1 ( R Version 4.0.2).

3. Results

3.1. BA LoA

For n = 50 , the median (mean) coverage probability was close to 0.95 for three distributions, but above the nominal level for the beta distributions (Distributions 3 and 4 in Figure 2 ; 0.957 (0.953) and 0.972 (0.967), respectively) and slightly below for the exponential distribution (Distribution 6 in Figure 2 ; 0.943 (0.940)). For Distribution 4, even the first quartile exceeded 0.95 (0.953), indicating overestimation of the BA LoA, which is visible by the box lying completely above 0.95 in Figure 2 . The same applied in the case of Distribution 3 for sample sizes of n ≥ 200 and for Distribution 2 for sample sizes of n ≥ 500 ( Figure 2 and Figure 3 ). In the case of Distribution 6, the median (mean) coverage probability fell short of the nominal level for all sample sizes, even for n = 1000 (0.948 (0.948)).