With external quality assessment (EQA) assessment (also called “round Robbin” or proficiency testing) a series of samples is prepared and distributed to different laboratories, allowing parallel testing of the ‘same’ sample set within a certain time period. In several European countries external quality assessment (EQA) is performed on the same molecular pathologic parameters, e.g. EGFR mutation analysis in non-small cell lung cancer.(Normanno et al., 2013)(Thunnissen et al., 2011) Recently a European guideline on EQA in molecular pathology has been published, reporting combined experience from different countries. (Van Krieken et al., 2013)

To gain more insight at a European level combining data from EQA assessments in different countries is useful. At a meeting in Brussels, supported by the European Society of Pathology (ESP), i) definitions for a central database were discussed, as well as ii) the level of input into a central data and iii) which data will be made publicly available on the ESP website. The aim of this manuscript is to present the outcome of these three issues.

In establishing a central database the use of common terminology and scoring system is essential. The agreed definitions for a common terminology for reporting of EQA are shown in table 1.

Two types of samples are discerned for EQA: a test and an educational sample. The test samples is the sample measuring the performance of the labs, while the ‘educational’ may be included for interest e.g. low prevalence or other reasons. For definitions see table 1. A sample with low fraction of neoplastic cells is acceptable as reference sample as they are part of daily practice. Such a sample may be challenging, and is evaluated as test sample. The EQA provider must prove that all test samples give concordant results among at least two analyses by reference laboratories. From participants it is expected to report the analytical sensitivity of the method for mutations tested.

For the fraction of neoplastic cells in a tumor sample, it is realised that this estimation across the board is not a robust method, but may be the best approximation. The word ‘neoplastic’ aims to look specifically for the cells with genetic changes and is not taking the stromal and inflammatory cells into account. If a neoplastic cell has homozygous mutation then the fraction of mutated alleles may be the same as the fraction of neoplastic cells, assuming diploidy. However, if there is amplification of the mutated gene, then the fraction of mutated alleles may be higher than the fraction of neoplastic cells. The fraction of neoplastic cells may be categorized into high moderate, low and very low, as shown in table 2. For preparation of EQA samples it is preferable to give an estimation of the allelic frequency measured e.g. with pyrosequencing or other equivalent method.

The scoring system for genotype error is shown in table 3.

The codes for data transfer are shown in appendix 1.

A distinction is made between level of data entry and data publishing. For discussion different levels are recognised. At this point in time it was agreed that for data input the laboratory data will be anonymous; i.e. level 1. The summary data will be publicly available.

The amount of samples to be tested is not critical. The website will be designed for a maximum of 20 samples. With the Bayesian credible interval (90% CI) around level of performance sets of samples obtained in different (sub)rounds can be combined.

**Statistical approach**

For a binomial distribution with conjugate beta prior, the likelihood of the data (the number of correct answers, k out of n) is ,

where θ is the unknown probability of correctly answering a case by the lab (the probability is assumed to be equal for all cases). In our Bayesian framework a beta distribution is taken as the prior distribution for the unknown probability θ, i.e. ,

where β1 ≥ 0 and β2 ≥ 0 are fixed parameters. The prior distribution is used to model the uncertainty in the probability θ before observing any data. In this setting a so-called uninformative prior will be used with β1 = β2 = 0. These values for the parameters β1 and β2 correspond to a uniform prior distribution reflecting the situation that before observing the data all values for θ in the interval [0,1] are equally likely. The posterior distribution for the parameter is (1)

which is again a beta distribution but with different parameters.The posterior mean k/n for θ provides an estimate for the mean level of performance of the lab. A 90% credible interval can be obtained using the 5th and 95th percentiles of the posterior distribution in (1), see appendix 2. Repeated updating with k1, k2, …, kR, correct answers from n1, n2, …, nR samples in R rounds, respectively, gives the posterior distribution: .

The previous result is found by first updating the prior with results from the first round yielding the posterior .

This posterior distribution serves as a new prior (taking β1 = k1 and β2 = n1 - k1) to be combined with the data observed in the second round, yielding the posterior distribution .

This process can be repeated to obtain the posterior after R rounds.

A single update assuming correct answers from samples gives the same posterior distribution (see equation (1)).

The Bayesian framework combines prior information on the level of performance (specified in terms of a statistical distribution referred to as the prior distribution) with observed data to form the so-called posterior distribution for level of performance. The posterior distribution combines both sources of information (prior distribution and data) and quantifies the uncertainty in the level of performance taking into account the data observed so far. The posterior distribution for level of performance can serve as a new prior distribution when additional data is observed. Therefore, for EQA rounds with 3-4 samples quarterly, the posterior distribution, and associated confidence intervals, can be updated as soon as new data becomes available. Repeated updating with data from the subrounds will yield the same Bayesian confidence interval as would be obtained by a single update using all data observed in the (sub)rounds. The width of the 90% CI denotes the range of performance the next assay(s) in practice. For 10 samples the level is still rather broad. Reducing the width to essential proportions (lower bound of 90% CI>95%) requires very high sample numbers. This is not realistic for EQA testing. But the upper limit of the 95% CI < 95% is used to delineate the threshold of poor performance. This is to some extend an arbitrary threshold, translating in accepting 1 error out of 10 cases (90%). In subjective terms this is felt as a low threshold. At the 2nd meeting on external quality assessment in molecular pathology (22-23 March 2013, Naples), the threshold for successful performance of proficiency testing was set at 92.5%, implying that one real genotype error in 10 samples will lead to unsatisfactory performance. A reasonable number of samples to test is 10 per year. The Bayesian approach will also be used to determine the overall performance across EQA schemes.

The EQA data will be aggregated per sample and per laboratory.

For each sample will be calculated: i) mutation present (or not); ii) number of participants, iii) number of correct answers; iv) number of analytical failures, v) number of false positives; vi) number of false negatives; vii) number of genotyping errors 1 point; ix)number of genotyping errors 1.5 point; x) analytical failure rate (no test result); xi) error rate over participants with test result; xii) success rate (accuracy) over participants with test result; xiii) mean score (maximum= 2, if all participants have correct answer). For the coded laboratories (i.e. number) will be calculated i) total points; ii) maximum lab result over all samples. The summary data over all samples and all laboratories (with ‘average’ in the ‘title’ of the item) will be publicly available.

Normanno, N., Pinto, C., Taddei, G., Gambacorta, M., Castiglione, F., Barberis, M., Clemente, C., et al. (2013).

Results of the First Italian External Quality Assurance Scheme for Somatic EGFR Mutation Testing in Non-Small-Cell Lung Cancer. Journal of thoracic oncology : official publication of the International Association for the Study of Lung Cancer. doi:10.1097/JTO.0b013e31828c2b08

Thunnissen, E., Bovée, J. V. M. G., Bruinsma, H., Van den Brule, A. J. C., Dinjens, W., Heideman, D. A. M., Meulemans, E., et al. (2011). EGFR and KRAS quality assurance schemes in pathology: generating normative data for molecular predictive marker analysis in targeted therapy. Journal of clinical pathology, 64(10), 884–92. doi:10.1136/jclinpath-2011-200163

Van Krieken, J. H., Normanno, N., Blackhall, F., Boone, E., Botti, G., Carneiro, F., Celik, I., et al. (2013). Guideline on the requirements of external quality assessment programs in molecular pathology. Virchows Archiv : an international journal of pathology, 462(1), 27–37. doi:10.1007/s00428-012-1354-4

**Erik Thunnissen**