If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
The Ring Study: an international comparison of PD-L1 diagnostic assays and their interpretation in non-small cell lung cancer, head and neck squamous cell cancer and urothelial cancer
Department of Clinical Laboratory Sciences and Medical Biotechnology, National Taiwan University, Taipei, TaiwanDepartment of Laboratory Medicine, National Taiwan University Hospital, Taipei, Taiwan
Tissue Pathology and Diagnostic Oncology, NSW Health Pathology, Royal Prince Alfred Hospital, Sydney, NSW, AustraliaSydney Medical School, University of Sydney, Sydney, NSW, AustraliaSchool of Medicine, Western Sydney University, Sydney, NSW, Australia
Biobank and Tissue Bank and Department of Pathology, Kaohsiung Chang Gung Memorial Hospital, Chang Gung University College of Medicine, Kaohsiung, Taiwan
Department of Pathology, Seoul National University Hospital, Seoul National University College of Medicine, Seoul, South KoreaCancer Research Institute, Seoul National University, Seoul, South Korea
National Taiwan University Cancer Center, Taipei, TaiwanDepartment of Otolaryngology, National Taiwan University Hospital and College of Medicine, Taipei, Taiwan
Pathology and Molecular Diagnostics, Diagnósticos da América, DASA, São Paulo, BrazilMolecular Oncology Research Center, Hospital de Amor de Barretos, Barretos, Brazil
Institute of Anatomical Pathology, Rede D’Or São Luiz Hospitals Network, Rio de Janeiro and São Paulo, BrazilD’Or Institute for Research and Education, Rio de Janeiro and São Paulo, Brazil
PD-L1 immunohistochemistry has been approved as a diagnostic assay for immunotherapy. However, an international comparison across multiple cancers is lacking. This study aimed to assess the performance of PD-L1 diagnostic assays in non-small cell lung cancer (NSCLC), head and neck squamous cell cancer (HNSCC) and urothelial cancer (UC). The excisional specimens of NSCLC, HNSCC and UC were assayed by Ventana SP263 and scored at three sites in each country, including Australia, Brazil, Korea, Mexico, Russia and Taiwan. All slides were rotated to two other sites for interobserver scoring. The same cohort of NSCLC was assessed with Dako 22C3 pharmDx PD-L1 for comparison. The PD-L1 immunopositivity was scored according to the approved PD-L1 scoring algorithms which were the percentage of PD-L1-expressing tumour cell (TC) and tumour proportion score (TPS) by Ventana SP263 and Dako 22C3 staining, respectively. In NSCLC, the comparison demonstrated the comparability of the SP263 and 22C3 assays (cut-off of 1%, κ=0.71; 25%, κ=0.75; 50%, κ=0.81). The interobserver comparisons showed moderate to almost perfect agreement for SP263 in TC staining at 25% cut-off (NSCLC, κ=0.72 to 0.86; HNSCC, κ=0.60 to 0.82; UC, κ=0.68 to 0.91) and at 50% cut-off for NSCLC (κ=0.64 to 0.90). Regarding the immune cell (IC) scoring in UC, there was a lower correlation (concordance correlation coefficient=0.10 to 0.68) and poor to substantial agreements at the 1%, 5%, 10% and 25% cut-offs (κ= –0.04 to 0.76). The interchangeability of SP263 and 22C3 in NSCLC might be acceptable, especially at the 50% cut-off. In HNSCC, the performance of SP263 is comparable across five countries. In UC, there was low concordance of IC staining, which may affect treatment decisions. Overall, the study showed the reliability and reproducibility of SP263 in NSCLC, HNSCC and UC.
Although immune checkpoint inhibitors (ICIs) targeting programmed death-1 (PD-1)/programmed death ligand-1 (PD-L1) provide promising survival benefits, overall response rates (ORR) are still low without stratifying patients by PD-L1 expression.
At least six immune checkpoint agents targeting PD-1 or PD-L1, atezolizumab, avelumab, cemiplimab-rwlc, durvalumab, nivolumab and pembrolizumab are approved by the US Food and Drug Administration (FDA).
FDA approval summary: pembrolizumab, atezolizumab, and cemiplimab-rwlc as single agents for first-line treatment of advanced/metastatic PD-L1 high NSCLC.
The indications for these drugs included melanoma, non-small cell lung cancer (NSCLC), head and neck squamous cell cancer (HNSCC), urothelial cancer (UC), gastric cancer, renal cell cancer, cervical cancer, triple-negative breast cancer (TNBC), colorectal cancer, etc.
In ICIs clinical trial studies, particularly KEYNOTE-024 (pembrolizumab), CheckMate-057 (nivolumab) and MEDI4736-1108 (durvalumab), patients with high PD-L1 expression have a better response to ICI treatment and prolonged survival time than those with low or negative PD-L1 expression, indicating the importance of this biomarker.
Based on the results of biomarkers co-developed in clinical trials, each ICI has a specific PD-L1 immunohistochemistry (IHC) diagnostic assay and individual cut-off to identify the PD-L1-positive patients who are most likely to benefit from the treatment and who are eligible to receive the treatment. The US FDA approved PD-L1 IHC assays for ICIs treatments include Dako 22C3 pharmDx, Dako 28-8 pharmDx, Ventana SP142, and Ventana SP263 assays (Supplementary Table 1, Appendix A). Each approval of PD-L1 companion diagnostic assay is specific for drug, scoring algorithm, a cut-off to measure PD-L1 expression in tumour cells (TCs), immune cells (ICs), or both, disease and patient population including stage, treatment status and previous therapies. Thus, based on the development of the above four assays, there are more than ten approvals of companion and complementary diagnostic assays available in NSCLC, HNSCC and UC.
Specifics associated with Dako 22C3 pharmDx and Ventana SP263 such as the associated drugs, scoring methods, cut-offs and relevant clinical trials in NSCLC, HNSCC and UC are compared in Table 1. The Dako 22C3 assay has two scoring algorithms, tumour proportion score (TPS) and combined positive score (CPS).
The cut-off using TPS was approved in NSCLC with pembrolizumab and cemiplimab-rwlc treatments while CPS cut-offs were approved in HNSCC, oesophageal cancer, TNBC and cervical cancer with pembrolizumab treatment (Table 1). For the Ventana 263 assay, TC (%) and IC (%) were used as the proportion of tumour cells/immune cells with PD-L1 staining within the total tumour/immune cell area. Of note, The IC scoring method of the SP263 assay differs from Ventana SP142. The IC (%) of Ventana SP142 assay is measured by the proportion of the tumour area occupied by PD-L1-positive stained ICs.
The companion and complementary diagnostic assays and related cut-offs were approved by US FDA. For 22C3 assay, it is approved as a companion diagnostic assay for the treatments of NSCLC, HNSCC and UC.20 For SP263 assay, it is approved as a companion diagnostic assay for the treatment of NSCLC and a complementary diagnostic assay for treatment of UC,60,61 but the use of durvalumab in UC was withdrawn by FDA in 2021.27
Pembrolizumab versus chemotherapy for previously untreated, PD-L1-expressing, locally advanced or metastatic non-small-cell lung cancer (KEYNOTE-042): a randomised, open-label, controlled, phase 3 trial.
Pembrolizumab versus docetaxel for previously treated, PD-L1-positive, advanced non-small-cell lung cancer (KEYNOTE-010): a randomised controlled trial.
Pembrolizumab alone or with chemotherapy versus cetuximab with chemotherapy for recurrent or metastatic squamous cell carcinoma of the head and neck (KEYNOTE-048): a randomised, open-label, phase 3 study.
First-line pembrolizumab in cisplatin-ineligible patients with locally advanced and unresectable or metastatic urothelial cancer (KEYNOTE-052): a multicentre, single-arm, phase 2 study.
Cemiplimab monotherapy for first-line treatment of advanced non-small-cell lung cancer with PD-L1 of at least 50%: a multicentre, open-label, global, phase 3, randomised, controlled trial.
Roche Diagnostics states that the SP263 assay can be used to identify NSCLC patients eligible for treatment with durvalumab, pembrolizumab, and nivolumab.62 This claim is based on a comparison study for the SP263, 22C3 and 28-8 assays carried out by AstraZeneca.41 Based on this study, not clinical trials, the SP263 assay was CE-IVD marked in Europe for diagnostic use related to the treatment with pembrolizumab, durvalumab or nivolumab.63
Durvalumab for recurrent or metastatic head and neck squamous cell carcinoma: results from a single-arm, phase II study in patients with >/=25% tumour cell PD-L1 expression who have progressed on platinum-based chemotherapy.
Safety and efficacy of durvalumab with or without tremelimumab in patients with PD-L1-low/negative recurrent or metastatic HNSCC: the Phase 2 CONDOR Randomized Clinical Trial.
Adjuvant atezolizumab after adjuvant chemotherapy in resected stage IB-IIIA non-small-cell lung cancer (IMpower010): a randomised, multicentre, open-label, phase 3 trial.
Pembrolizumab versus docetaxel for previously treated, PD-L1-positive, advanced non-small-cell lung cancer (KEYNOTE-010): a randomised controlled trial.
Five-year outcomes from the randomized, phase III trials CheckMate 017 and 057: nivolumab versus docetaxel in previously treated non-small-cell lung cancer.
CE-IVD, Conformité Européenne marking for in vitro diagnostic; HNSCC, head and neck squamous cell cancer; IC, immune cells; ICP, immune cells present; NSCLC, non-small cell lung cancer; NSQ, non-squamous; TC, tumour cells; TPS, tumour proportion score; UC, urothelial cancer.
a CPS: combined positive score of positive tumour infiltrating immune cells and positive tumour cells per (membrane staining) tumour cells (PD-L1- positive ICs/total TCs and ICs × 100).
b The companion and complementary diagnostic assays and related cut-offs were approved by US FDA. For 22C3 assay, it is approved as a companion diagnostic assay for the treatments of NSCLC, HNSCC and UC.
c Roche Diagnostics states that the SP263 assay can be used to identify NSCLC patients eligible for treatment with durvalumab, pembrolizumab, and nivolumab.
Based on this study, not clinical trials, the SP263 assay was CE-IVD marked in Europe for diagnostic use related to the treatment with pembrolizumab, durvalumab or nivolumab.
In NSCLC, the Dako 22C3 pharmDx has been approved as a companion diagnosis at the cut-off of 1% for the first line and second line treatments with pembrolizumab and the cut-off of 50% for the first line treatment with cemiplimab-rwlc (Table 1). Recently, the Ventana SP263 assay was approved as a companion diagnostic tool to select NSCLC patients whose tumours have PD-L1 expression (≥1% of tumour cells) for adjuvant treatment with atezolizumab.
Adjuvant atezolizumab after adjuvant chemotherapy in resected stage IB-IIIA non-small-cell lung cancer (IMpower010): a randomised, multicentre, open-label, phase 3 trial.
In HNSCC, the Dako 22C3 assay was approved for first line treatment with pembrolizumab at a CPS cut-off ≥1, a cut-off of 25% TC has been studied for the Ventana SP263 assay in the clinical trials of durvalumab (Table 1).
Durvalumab for recurrent or metastatic head and neck squamous cell carcinoma: results from a single-arm, phase II study in patients with >/=25% tumour cell PD-L1 expression who have progressed on platinum-based chemotherapy.
Safety and efficacy of durvalumab with or without tremelimumab in patients with PD-L1-low/negative recurrent or metastatic HNSCC: the Phase 2 CONDOR Randomized Clinical Trial.
In UC, the Ventana SP263 assay has been approved as the complementary diagnostic test for durvalumab at the cut-off of 25% TC or IC PD-L1-positive expression.
Although AstraZeneca announced a voluntary withdrawal of the durvalumab indication in February 2021 for previously treated patients with locally advanced or metastatic urothelial cancer, the Ventana SP263 assay is likely to be developed as diagnostic testing for other ICI, such as atezolizumab, which has been approved by FDA for the treatment of UC patients.
FDA approval summary: atezolizumab or pembrolizumab for the treatment of patients with advanced urothelial carcinoma ineligible for cisplatin-containing chemotherapy.
Hence, assessing the performance of the SP263 assay in UC is needed for determining the reliability of the Ventana SP263 assay.
Although the performance of the PD-L1 IHC assays in multiple tumour types has been compared previously, this is the first comprehensive international study using resected samples to assess the reliability of Ventana SP263 across different tumour types. Previous studies have mostly been conducted with a small sample size or in a limited region.
There are increasing requirements for PD-L1 IHC assays using resected samples for making treatment decisions. Therefore, the Ring Study evaluates the PD-L1 immunopositivity in resected tissues and is conducted across six countries. We aim to compare the performance and interobserver concordance of Ventana SP263 in NSCLC, HNSCC and UC and assess the comparability of Ventana SP263 and Dako 22C3 pharmDx assays in NSCLC.
Materials and methods
Study design and sample selection
The Ring Study was conducted in six countries, including Australia, Brazil, Korea, Mexico, Russia and Taiwan. Three laboratories were named Site A, B and C in each country but only two laboratories participated in Korea. The detailed information is listed in Supplementary Table 2 (Appendix A). All studies were approved by their respective institutional review boards and followed the relevant regulations in every country. For the Ventana SP263 assay, Site A was assigned to prepare and stain NSCLC tissue samples. Site B was responsible for HNSCC and Site C was responsible for UC. Additionally, the serial sections from the same cohort of NSCLC were simultaneously stained with Dako 22C3 pharmDx in Site A. All stained sections were rotated among Site A, B and C, and scored by one pathologist at each site. Site A was designated to collect, analyse and report all the results in each country. Roche Ventana provided refresher training courses to participating pathologists on the use of the Ventana SP263 PD-L1 IHC assay. The board-certified pathologists were also trained in the interpretation of the IHC slides stained with SP263 and completed the PD-L1 scoring training sessions.
A target of 50 cases was selected covering 0% to 100 % PD-L1-positive TCs, with a minimum of five samples for each staining level [negative (<1%), low (1–10%), low-medium (11–30%), medium (31–50%), and high (>50%)]. However, due to the schedule and time limit of the global trial, not all scoring results were available for further analysis. For Ventana SP263 staining, Russia provided 34 cases for UC but zero cases for HNSCC; Korea provided the results from Site A and Site B. For Dako 22C3 staining, 50 cases from each respective country (Australia, Brazil, Korea and Taiwan) were assayed.
PD-L1 IHC staining and evaluation
The archived formalin-fixed, paraffin-embedded (FFPE) surgical specimens were obtained from patients with histological NSCLC, HNSCC or UC diagnosed within 3 years. The corresponding haematoxylin and eosin (H&E) sections were reviewed and were confirmed to contain at least 100 tumour cells. According to the standard protocols of each assay, samples of NSCLC, HNSCC, and UC were stained with Ventana PD-L1 SP263 (Ventana Medical Systems, USA) and the same NSCLC samples were further stained with Dako PD-L1 IHC 22C3 pharmDx (Agilent, USA). For NSCLC, all countries performed the Ventana SP263 assay; Australia, Brazil, Korea and Taiwan performed the Dako 22C3 assay. For HNSCC, all countries performed Ventana SP263 staining except for Russia. For UC, all countries performed Ventana SP263 staining. The guidelines of Ventana PD-L1 (SP263) and Dako 22C3 pharmDx assays were used for the interpretation and scoring.
The interpretations of pitfalls and artifacts were also covered in the training course.
The whole resected tissue on the slide was evaluated. The PD-L1-positive score of SP263 staining and 22C3 staining were assessed as TC (%) and tumour proportion score (TPS), respectively, defined as the percentage of tumour cells with any membrane staining above the background relative to all viable tumour cells present in the sample (positive and negative). In UC, the PD-L1 immunopositivity score of ICs was determined by the proportion of IC at any intensity of PD-L1 staining above the background within the IC present area. The algorithms are listed below:
TC (%) = TCs with PD-L1-positive staining/total TCs present in the sample × 100
TPS (%) = TCs with PD-L1-positive staining/total TCs present in the sample × 100
IC (%) = ICs with PD-L1-positive staining total ICs present in the sample × 100
Cut-offs chosen for comparison in this study were based on Table 1. All participating pathologists were blinded to the results from other pathologists.
Statistical analysis
The TC, TPS and IC scores for PD-L1 expression were collected and binary categorised at the indicated cut-off. For each case, one PD-L1 staining in one scoring algorithm (TC or IC) had three scores from three different sites. To evaluate the interobserver agreement of binary and continuous data, Fleiss's and Cohen's kappa and concordance correlation coefficient (CCC) were calculated. The pair-wise Cohen's kappa was calculated by comparing Site A vs B, B vs C or A vs C. Group Fleiss' kappa was calculated for the comparison among Site A, B and C. Light's kappa equals the average of all Cohen's kappa values among raters. The interpretation of Cohen's kappa and Fleiss's kappa coefficients were as follows: <0.00 indicates poor agreement; 0.00–0.20 slight agreement; 0.21–0.40 fair agreement; 0.41–0.60 moderate agreement; 0.61–0.80 substantial agreement and ≥0.81 almost perfect agreement. The interpretation of CCC was as follows: <0.5 indicates poor reliability, 0.5–0.74 moderate reliability, 0.75–0.9 good reliability, and >0.90 excellent reliability. Statistical analyses were performed using SPSS software, version 24 (IBM Corporation, USA).
Results
Comparability of SP263 staining in NSCLC
In order to assess the performance of the Ventana SP263 and Dako 22C3 pharmDx IHC assays, we conducted the Ring Study in six countries (Fig. 1). Due to sample availability and time limitation, not all countries were able to collect samples at each level of PD-L1 expression. The proportion of cases in five categories of PD-L1 score and the descriptive metrics for the number of cases showed the distribution of PD-L1 score in each cohort (Fig. 2; Supplementary Table 3, Appendix A). The five categories of SP263 PD-L1 staining representative images in NSCLC, HNSCC and UC are demonstrated in Supplementary Fig. 1–3 (Appendix A). In NSCLC, the interobserver agreement of SP263 was measured by Cohen's kappa coefficient at clinical cut-offs of 1%, 25% and 50%. Among the six countries, the kappa value ranged from 0.64 to 0.90 at a cut-off of 50%, from 0.72 to 0.86 at a cut-off of 25% and from 0.57 to 0.83 at a cut-off of 1% (Table 2). The results indicated that a greater interobserver agreement of SP263 was obtained at the 50% cut-off than at the 1% cut-off.
Fig. 1Experimental design of the international Ring Study. (A) Sample preparation, staining and rotation for reading sections at A, B and C sites among six countries. (B) Number of cases analysed and interpreted in each country. TC and TPS scores were assessed for SP263 and 22C3 staining, respectively. TC (%) and TPS were defined as the percentage of tumour cells with any membrane staining above the background relative to all viable tumour cells present in the sample (positive and negative). A, Australia; B, Brazil; K, Korea; M, Mexico; R, Russia; T, Taiwan. NSCLC, non-small cell lung cancer; HNSCC, head and neck squamous cell cancer; UC, urothelial cancer. TC, tumour cells; TPS, tumour proportion score for 22C3 assay.
Fig. 2Distribution of Ventana SP263 and Dako 22C3 pharmDx staining percentage on NSCLC, UC and HNSCC tumour cells or immune cells in six countries. PD-L1 scores were categorised into five expression levels: <1%, 1–10%, 11–30%, 30–50% and >50%. (A) The proportions of cases in five PD-L1 categories with Ventana SP263 or Dako 22C3 pharmDx assays in tumour cells of NSCLC. (B) In UC, the proportion of cases with Ventana SP263 PD-L1 staining in tumour cells or immune cells. (C) The proportion of cases with Ventana SP263 staining of HNSCC tumour cells in five countries. NSCLC, non-small cell lung cancer; HNSCC, head and neck squamous cell cancer; UC, urothelial cancer. A, Site A; B, Site B; C, Site C; AU, Australia; BR, Brazil; KR, Korea; MX, Mexico; RU, Russia; TW, Taiwan.
Cohen's kappa coefficient; in Korea, only Site A and B participated in reading the sample.
0.89 (0.77–1.00)
Mexico
0.81 (0.65–0.98)
N/A
N/A
Russia
0.76 (0.64–0.88)
N/A
N/A
Taiwan
0.64 (0.48–0.80)
0.60 (0.44–0.76)
0.78 (0.65–0.91)
≥25%
0.75 (0.65–0.83)
Australia
0.86 (0.50–1.00)
0.81 (0.64–0.98)
0.70 (0.58–0.83)
Brazil
0.86 (0.50–1.00)
0.90 (0.50–1.00)
0.83 (0.71–0.95)
Korea
0.82 (0.62–1.00)
0.90 (0.75–1.00)
0.81 (0.67–0.94)
Mexico
0.78 (0.61–0.94)
N/A
N/A
Russia
0.76 (0.64–0.88)
N/A
N/A
Taiwan
0.72 (0.56–0.88)
0.75 (0.59–0.91)
0.65 (0.53–0.78)
≥1%
0.71 (0.53–0.79)
Australia
0.72 (0.56–0.88)
0.88 (0.50–1.00)
0.53 (0.40–0.66)
Brazil
0.74 (0.58–0.90)
0.83 (0.67–0.99)
0.79 (0.68–0.89)
Korea
0.63 (0.42–0.85)
0.67 (0.47–0.88)
0.78 (0.66–0.90)
Mexico
0.57 (0.40–0.73)
N/A
N/A
Russia
0.83 (0.71–0.95)
N/A
N/A
Taiwan
0.72 (0.56–0.88)
0.75 (0.59–0.91)
0.65 (0.53–0.78)
The concordances of SP263 and 22C3 in Site A, B and C are calculated using Fleiss's kappa statistics. The kappa values are characterised as >0.75, excellent; 0.40–0.75, fair to good; <0.40, poor.
To evaluate the performance of SP263 and 22C3 assays in NSCLC, the average PD-L1 expression per case is illustrated in Fig. 3A–D. The results of SP263 staining were available for Mexico and Russia (Supplementary Fig. 4, Appendix A). Detailed PD-L1 expressions of SP263 and 22C3 at Site A, B and C were compared to each participating country (Supplementary Fig. 5A–J, Appendix A). The interobserver agreements of SP263 and 22C3 between Site A vs B, B vs C and A vs C were measured by Cohen's kappa coefficient at clinical cut-offs of 25% and 50%, respectively. The comparisons between two sites in most countries showed substantial to almost perfect agreements with the kappa coefficient ranging from 0.61 to 1.00. But moderate agreements were observed in both assays from Taiwan (SP263 assay, Site A vs Site B, κ=0.58; 22C3 assay, Site A vs Site B, κ=0.56; Site A vs Site C, κ=0.48) (Supplementary Fig. 6, Appendix A).
Fig. 3Comparability of Ventana SP263 and Dako 22C3 pharmDx assays on tumour cells of NSCLC in four countries. Comparison of SP263 and 22C3 PD-L1 staining in Site A, B and C in (A) Australia, (B) Brazil, (C) Korea, (D) Taiwan. NSCLC, non-small cell lung cancer.
The Light's kappa values of all countries for SP263 at a 25% cut-off and 22C3 at a 50% cut-off were 0.80 (min–max 0.72–0.86) and 0.82 (min–max 0.60–0.94), respectively, which showed that the SP263 has less variation than the 22C3 at respective clinical cut-offs (Table 2). The categorical concordance between the SP263 and 22C3 assays was calculated at several cut-offs (Table 2). The results showed a better consistency at higher cut-offs in most countries. Analysing the Cohen's kappa coefficient between SP263 and 22C3 for each site, there was excellent agreement between the SP263 and 22C3 assays at a cut-off of 50% across all participating sites in Brazil and Korea (Brazil, κ=0.85 to 1.00; Korea, κ=0.85 to 0.93) (Supplementary Table 4, Appendix A). We further calculated the CCC for the agreement on a continuous measurement (Supplementary Table 5, Appendix A). In NSCLC, the CCC between the SP263 and 22C3 staining of all sites ranged from 0.79 to 0.95, demonstrating very good to excellent comparability between these two assays.
Comparability of SP263 staining in HNSCC
The SP263 staining of HNSCC was assessed in Australia, Brazil, Korea, Mexico and Taiwan (Supplementary Fig. 7A–E, Appendix A). Among the five countries, all had data from three sites, except for Korea, which only had data from Site A and B. According to the results of clinical trials, TC ≥25% is the clinical cut-off for PD-L1 immunopositivity in HNSCC.
Durvalumab for recurrent or metastatic head and neck squamous cell carcinoma: results from a single-arm, phase II study in patients with >/=25% tumour cell PD-L1 expression who have progressed on platinum-based chemotherapy.
The Cohen's kappa coefficients of 25% cut-off between the two scoring sites are illustrated in Fig. 4A. Overall, the interobserver agreement of SP263 in each country was assessed by Fleiss's kappa coefficient (Supplementary Table 6, Appendix A). The results showed moderate to almost perfect agreement of SP263 assay staining in HNSCC (κ=0.60 to 0.82).
Fig. 4Consistency of Ventana SP263 PD-L1 staining on tumour cells of HNSCC, and on tumour cells and immune cells of UC between two reading sites. Comparison of Cohen's kappa coefficient of SP263 staining between two reading sites in (A) HNSCC and (B) UC. The cut-off setting in HNSCC is TC≥25% and in UC is TC≥25%, IC≥25%, and TC≥25% or IC≥25%. TC, tumour cells; IC, immune cells. AU, Australia; BR, Brazil; KR, Korea; MX, Mexico; RU, Russia; TW, Taiwan. HNSCC, head and neck squamous cell cancer; UC, urothelial cancer.
In UC, SP263 PD-L1 staining of tumour cells (TC) and immune cells (IC) was assessed. First, the comparability between Site A vs B, B vs C and C vs A in Australia, Brazil, Mexico, Russia and Taiwan was assessed for each case (Supplementary Fig. 8A–D,G–L, Appendix A). The SP263 staining of TC and IC at Site A vs B in Korea is illustrated in Supplementary Fig. 8E,F, respectively, Appendix A. The average PD-L1 positivity of TC and IC is demonstrated in Supplementary Fig. 9A–F (Appendix A). The CCC estimation showed that SP263 had poor to moderate reliability in IC staining (0.10–0.68) while SP263 TC staining showed excellent reliability in Brazil, Mexico, Russia and Taiwan (0.94–0.96) (Table 3).
Table 3The concordance correlation coefficient of AB, BC or AC reading sites in six countries
Average concordance correlation coefficient of AB, BC and AC sites.
Australia
NSCLC
0.92 (0.86–0.95)
0.97 (0.95–0.98)
0.92 (0.86–0.95)
0.93 (0.92–0.97)
HNSCC
0.92 (0.86–0.95)
0.87 (0.80–0.92)
0.79 (0.67–0.87)
0.86 (0.79–0.92)
UC (TC)
0.92 (0.87–0.95)
0.76 (0.67–0.84)
0.81 (0.71–0.88)
0.83 (0.76–0.92)
UC (IC)
0.63 (0.45–0.77)
0.61 (0.44–0.74)
0.58 (0.37–0.73)
0.61 (0.58–0.63)
Brazil
NSCLC
0.83 (0.73–0.90)
0.99 (0.98–0.99)
0.82 (0.71–0.89)
0.88 (0.82–0.99)
HNSCC
0.88 (0.82–0.93)
0.88 (0.81–0.92)
0.96 (0.94–0.98)
0.91 (0.88–0.96)
UC (TC)
0.95 (0.91–0.97)
0.93 (0.89–0.96)
0.93 (0.89–0.96)
0.94 (0.93–0.95)
UC (IC)
0.56 (0.39–0.70)
0.61 (0.42–0.75)
0.85 (0.75–0.91)
0.68 (0.56–0.85)
Korea
NSCLC
0.93 (0.88–0.96)
N/A
N/A
0.93 (0.88–0.96)
HNSCC
0.82 (0.72–0.89)
N/A
N/A
0.82 (0.72–0.89)
UC (TC)
0.74 (0.65–0.81)
N/A
N/A
0.74 (0.65–0.81)
UC (IC)
0.10 (0.02–0.18)
N/A
N/A
0.10 (0.02–0.18)
Mexico
NSCLC
0.92 (0.86–0.95)
0.85 (0.75–0.92)
0.86 (0.76–0.92)
0.88 (0.85–0.92)
HNSCC
0.89 (0.82–0.94)
0.88 (0.79–0.93)
0.87 (0.78–0.92)
0.88 (0.87–0.89)
UC (TC)
0.95 (0.91–0.98)
0.96 (0.91–0.98)
0.96 (0.92–0.97)
0.95 (0.95–0.96)
UC (IC)
0.53 (0.27–0.71)
0.62 (0.39–0.77)
0.59 (0.39–0.74)
0.58 (0.53–0.62)
Russia
NSCLC
0.92 (0.89–0.95)
0.89 (0.84–0.93)
0.87 (0.81–0.91)
0.90 (0.87–0.92)
UC (TC)
0.97 (0.94–0.98)
0.96 (0.92–0.98)
0.96 (0.92–0.98)
0.96 (0.96–0.97)
UC (IC)
0.51 (0.27–0.69)
0.40 (0.20–0.56)
0.26 (0.12–0.39)
0.39 (0.26–0.51)
Taiwan
NSCLC
0.75 (0.62–0.84)
0.91 (0.85–0.95)
0.90 (0.84–0.94)
0.85 (0.75–0.91)
HNSCC
0.91 (0.84–0.95)
0.96 (0.93–0.98)
0.95 (0.91–0.97)
0.94 (0.91–0.96)
UC (TC)
0.95 (0.93–0.97)
0.97 (0.95–0.98)
0.94 (0.90–0.96)
0.95 (0.94–0.97)
UC (IC)
0.38 (0.14–0.58)
0.41 (0.15–0.61)
0.67 (0.48–0.80)
0.48 (0.38–0.67)
The concordance correlation coefficient was defined as follows: <0.5, poor reliability; 0.5–0.74, moderate reliability; 0.75–0.9 good reliability; >0.90 excellent reliability.
CI, confidence interval; IC, immune cells; N/A, not available; NSCLC, non-small cell lung cancer; HNSCC, head and neck squamous cell cancer; TC, tumour cells; UC, urothelial cancer.
a Average concordance correlation coefficient of AB, BC and AC sites.
In general, over 80% of cases had a score lower than 40% PD-L1 immunopositivity in IC (Supplementary Fig. 8, Appendix A). To assess the agreement at the categorical data in UC, the IC cut-offs of 1%, 5%, 10% and 25% were used to compare to TC. As shown in Table 4, there was substantial to almost perfect agreement at 10% and 25% cut-offs in TC (κ=0.68 to 0.91). However, the kappa values in IC ranged from –0.04 to 0.76 which depicted poor to moderate agreement at each cut-off (Table 4). To further assess the concordance of positive IC and TC staining at AB, BC and AC sites, we calculated Cohen's kappa coefficient at the previous clinical cut-off TC ≥25%, IC ≥25% and either one ≥25% (Fig. 4B). Collectively, the interobserver agreement of TC ≥25% was superior to IC ≥25% by SP263 PD-L1 staining in UC.
Table 4The concordance of PD-L1 immunopositivity detected by Ventana SP263 in UC
Comparability of all TC staining results in NSCLC, HNSCC and UC
The performance of SP263 across NSCLC, HNSCC and UC were evaluated at the cut-off of 25%. The interobserver comparisons showed moderate to almost perfect agreement of SP263 in TC staining, ranging from 0.72 to 0.86 in NSCLC, 0.60 to 0.82 in HNSCC and 0.68 to 0.91 in UC (Supplementary Table 6, Appendix A). To measure the concordance of SP263 by continuous results, we calculated the CCC of NSCLC, HNSCC and UC. The average concordance of SP263 staining in tumour cells was greater than 0.80 across three cancers in all the six countries, with the exception of the CCC value in UC of 0.74 (95% CI, 0.65 to 0.81) in Korea (Table 3).
Discussion
PD-L1 IHC has been developed as complementary or companion diagnostic assays for different ICIs and each assay has its specific requirement of antibody clones, staining platforms, scoring algorithms, and cut-offs for the determination of PD-L1 immunopositivity.
This is critical for understanding the reliability of these assays in surgical resections. Overall, our data showed excellent concordance in the interobserver agreement of TCs in Ventana SP263 in NSCLC, HNSCC and UC (CCC=0.82 to 0.96) and the comparability of Ventana SP263 and Dako 22C3 assays in NSCLC.
The Blueprint 2 project of NSCLC specifically used surgical specimens and biopsies from routine clinical pathology practice. The intraclass correlation coefficient (ICC) for Dako 22C3 and Ventana SP263 assays was 0.88 and 0.92, respectively. At the cut-offs of 25% and 50%, a high level of reliability was demonstrated in both assays (Fleiss' κ>0.7) but was slightly diminished at the cut-off of 1%.
Similar to the results of Blueprint 2, the Ventana SP263 assay had good to excellent reliability across six countries (Table 3). Moreover, the interobserver concordance of Dako 22C3 and Ventana SP263 assays was also better at the cut-offs of 25% and 50% than 1% (Table 2). Consistent with the previous studies, while comparing TCs staining among Dako 22C3 and Ventana SP263 assays, the 22C3 assay in our study showed less sensitivity in detecting PD-L1 expression in Australia, Brazil and Taiwan (Fig. 3).
Multicenter comparison of 22C3 PharmDx (Agilent) and SP263 (Ventana) assays to test PD-L1 expression for NSCLC patients to be treated with immune checkpoint inhibitors.
The interobserver agreement of the Ventana SP263 assay was equally good for biopsies and surgical specimens but tended to be better at the 50% cut-off (biopsy, κ=0.90; surgery, κ=0.88) than at 1% (biopsy, κ=0.67; surgery, κ=0.70).
These results demonstrated that the performance of Dako 22C3 and Ventana SP263 assays were comparable in both biopsies and surgical specimens, but showed slightly lower agreement at the cut-off of 1%.
Regarding the interchangeability of Dako 22C3 and Ventana SP263 assays, the Blueprint comparison projects demonstrated good concordance in tumour cell membrane staining.
When computed tomography-guided transthoracic needle biopsy was assessed, the concordance of PD-L1 expression levels between 22C3 and SP263 assays was high (ICC=0.892). Agreements at cut-off levels of 1%, 25% and 50% were also good, with kappa values of 0.878, 0.698 and 0.790, respectively.
In the Ring Study, the concordance between SP263 and 22C3 assay was substantial to perfect (the cut-off of 50%, κ=0.81; 25%, κ=0.75; 1%, κ=0.71). Moreover, the interchangeability of Ventana SP263 and 22C3 pharmDx at their approved clinical cut-offs is recognised based on the high correlation between the two assays.
Multicenter comparison of 22C3 PharmDx (Agilent) and SP263 (Ventana) assays to test PD-L1 expression for NSCLC patients to be treated with immune checkpoint inhibitors.
Thus, the Ventana SP263 assay gained approval in Europe for pembrolizumab in NSCLC according to the reported high concordance between the SP263 and 22C3 assays (Table 1).
A meta-analysis of the PD-L1 IHC accuracy in NSCLC indicated that none of the standard assays, including 22C3, 28-8, SP263 and SP142, could be regarded as interchangeable if the interchangeability is defined as achieving ≥90% sensitivity and specificity between two assays.
Our data indicated the reliability of both Ventana SP263 and Dako 22C3 pharmDx assays across most laboratories and countries at any cut-off. While comparing the concordance between SP263 and 22C3 assay, the excellent agreement in overall comparison at the 50% cut-off in NSCLC is slightly better than the 25% cut-off and the 1% cut-off (TC≥50%, κ=0.81; TC≥25%, κ=0.75; TC≥1%, κ=0.71) (Table 2). A similar result has been demonstrated in previous studies, indicating that higher cut-off levels are likely more robust against interobserver effects.
Multicenter comparison of 22C3 PharmDx (Agilent) and SP263 (Ventana) assays to test PD-L1 expression for NSCLC patients to be treated with immune checkpoint inhibitors.
Assessment of programmed cell death ligand-1 expression by 4 diagnostic assays and its clinicopathological correlation in a large cohort of surgical resected non-small cell lung carcinoma.
Predictive performance of four programmed cell death ligand 1 assay systems on nivolumab response in previously treated patients with non-small cell lung cancer.
Assessment of programmed cell death ligand-1 expression by 4 diagnostic assays and its clinicopathological correlation in a large cohort of surgical resected non-small cell lung carcinoma.
Performance of the Food and Drug Administration/EMA-approved programmed cell death ligand-1 assays in urothelial carcinoma with emphasis on therapy stratification for first-line use of atezolizumab and pembrolizumab.
Different staining intensities between SP263 and 22C3 may also contribute to differences in interpretation, especially when the result is close to the cut-off value and low percentage. In addition to using the dichotomous proportion cut-off, we assessed the CCC between the Ventana SP263 and Dako 22C3 pharmDx assays in Australia, Brazil, Korea and Taiwan. The correlation was good to excellent, ranging from 0.79 to 0.95, also demonstrating the similar quality of these two assays. Based on the accumulated evidence, the interchangeability of Ventana SP263 and Dako 22C3 pharmDx assays may be acceptable, especially at a high cut-off (50%), but should always be performed under well-optimised conditions with appropriate quality assurance measures in place.
Prior comparisons of single or multiple PD-L1 IHC assays for immunotherapy have mostly been conducted in NSCLC.
The US FDA has granted accelerated approval to immunotherapy agents for breast cancer, HNSCC and UC (Supplementary Table 1, Appendix A). The Fleiss's kappa values for Ventana SP263 assay using tissue microarray were 0.836 and 0.710 for HNSCC and UC, respectively. Across breast cancer, HNSCC and UC, the interobserver agreement of Ventana SP263 staining on TC was almost perfect at the cut-offs of 5% and 25%.
Inter- and intraobserver agreement of programmed death ligand 1 scoring in head and neck squamous cell carcinoma, urothelial carcinoma and breast carcinoma.
In the Ring Study, the superior performance of SP263 assay on TC scoring was also found in HNSCC and UC. The highest CCC of HNSCC achieved was 0.94 in Taiwan and that of UC was 0.96 in Russia. At the clinically relevant cut-off of 25%, substantial to almost perfect agreement of HNSCC was observed across every country (Fleiss's κ=0.60 to 0.82). Collectively, the performance of the SP263 assay on TC staining in NSCLC, HNSCC and UC was reliable and reproducible among pathologists from different countries. However, even though all the trained pathologists enrolled in this study are aware of the interpretations of pitfalls and artifacts including non-specific background staining, non-specific granular staining on crushed or necrotic tissue, alveolar macrophages, and stromal cells, it is impossible to quantify all minute artifacts. The interpretation of PD-L1 immunopositivity may be obscured by too many keratinised cells. Necrotic tumour cells, inflammatory cells and fibrosis also may obscure the edge of the tumour nest, influencing the determination of total TC area. In NSCLC, lepidic adenocarcinoma displays a spreading growth along alveolar structures, but tumour cells of squamous cell carcinoma form solid tumour nests.
In this study, the surgical resections were scored without any field selection that maximises the sample uniformity and minimises the intra-tumoural heterogeneity. Whether the tumour cell density and histological subtype influence the interobserver concordance needs further investigation.
In the setting of clinical study MEDI4736-1108 that utilised the clinical algorithm (TC ≥25% or IC ≥25%) for advanced UC, inter-reader and intrareader overall agreement of Ventana SP263 assay were 93.0% and 92.4%, respectively. Interlaboratory reproducibility testing had an overall agreement of 92.6%.
Analytical validation and clinical utility of an immunohistochemical programmed death ligand-1 diagnostic assay and combined tumor and immune cell scoring algorithm for durvalumab in urothelial carcinoma.
In the Ring Study, fair to almost perfect interobserver agreements of SP263 assay were demonstrated in the TC and IC combined algorithm (Cohen's κ=0.35 to 0.83) (Fig. 4). The variation of agreements in the combined algorithm may partially come from the varying concordance of IC scoring (κ= –0.04 to 0.76). Compared to TC staining, the concordance of IC staining in UC had less agreement at each cut-off. The lowest Fleiss kappa was –0.04 at the cut-off of 1%, 0.12 at the cut-off of 5%, 0.05 at the cut-off of 10% and 0.00 at the cut-off of 25% (Table 4). In a previous study, the interobserver agreement at a 5% cut-off of SP263 IC staining was substantial (κ=0.76); that was demonstrated in a tissue microarray with 251 UC samples.
Performance of the Food and Drug Administration/EMA-approved programmed cell death ligand-1 assays in urothelial carcinoma with emphasis on therapy stratification for first-line use of atezolizumab and pembrolizumab.
Another tissue microarray study also demonstrated stable substantial interobserver agreement in IC staining (κ=0.712 at a cut-off of 1%, 0.681 at a cut-off of 5%, and 0.650 at a cut-off of 25%).
Inter- and intraobserver agreement of programmed death ligand 1 scoring in head and neck squamous cell carcinoma, urothelial carcinoma and breast carcinoma.
Instead of using small biopsy specimens or tissue microarrays, we used the surgical resections of NSCLC, HNSCC and UC that reflect more tumour heterogeneity. It is speculated that the heterogeneity of immune cells makes it more difficult to estimate the IC area than TC.
The low agreement on IC may be due to the interobserver variation in the definition of IC area, while the infiltrating ICs are scattered in the tumour mass. The smaller size of immune cells may also augment the difficulty and subjectivity in the interpretation of IC staining. Compared to TC scoring, reduced concordance for IC scoring may be due to the relative lack of pathologist experience with IC scoring and less methodological standardisation.
Continuous training may improve the harmonisation of PD-L1 interpretation in scoring the tumour infiltrating immune cells.
Ideally, comparison testing should be conducted in clinical trials that assess clinical outcome following ICI treatment, but there are many barriers to undertaking such costly and time consuming global clinical trials to evaluate the various assays with different cut-offs. The compilation of local ring studies could reflect the actual scenario, in which worldwide slides could not be prepared in a central laboratory for every assay. However, multicentre design strengthens the generalisability of the performance of Ventana SP263 and Dako 22C3 IHC assays. Nevertheless, this study has its limitations of a lower case count, unavailable data and potentially different staining conditions in different centres. While the study is meaningful in its own way, a true ring study needs to be conducted in a set of samples read by each hospital in the countries to calculate the degree of concordance.
In conclusion, our study provides comparable results for Ventana SP263 and Dako 22C3 pharmDx assays in TC scoring of NSCLC in the six countries, especially at the cut-off of 50% while analysing the whole surgical sections. The results of CCC demonstrate good to excellent reliability of Ventana SP263 assay in TC assessment in NSCLC, HNSCC, and UC, while IC assessment in UC displays poor to moderate reliability across six countries. The interobserver agreements in IC at the clinical cut-off of 25% show lower value than those in TC. Overall, the Ring Study provides comprehensive results to pathologists, clinicians and regulators, locally and internationally.
Acknowledgement
The work is the project ‘Assessment of testing capability of Ventana SP263 PD-L1 IHC assay in NSCLC, HNSCC and comparative performance UBC with Dako pharmDx 22C3 PD-L1 IHC assay in NSCLC‘ sponsored by AstraZeneca which contributed to the conception and design of the study, data collection, and publication.
Conflicts of interest and sources of funding
This work is supported by AstraZeneca which contributed to the conception and design of the study, data collection, and publication. S. Fox receives research support from AstraZeneca, Bristol-Myers Squibb, Roche, and Amgen, the funds of which go to his institution. T. Y. Chou receives research support from Roche (Foundation One) and honorarium for lectures and advisory boards from AstraZeneca, MSD and Roche. There are no potential conflicts of interest declared by the other authors.
Appendix A. Supplementary data
The following is the Supplementary data to this article:
FDA approval summary: pembrolizumab, atezolizumab, and cemiplimab-rwlc as single agents for first-line treatment of advanced/metastatic PD-L1 high NSCLC.
Adjuvant atezolizumab after adjuvant chemotherapy in resected stage IB-IIIA non-small-cell lung cancer (IMpower010): a randomised, multicentre, open-label, phase 3 trial.
Durvalumab for recurrent or metastatic head and neck squamous cell carcinoma: results from a single-arm, phase II study in patients with >/=25% tumour cell PD-L1 expression who have progressed on platinum-based chemotherapy.
Safety and efficacy of durvalumab with or without tremelimumab in patients with PD-L1-low/negative recurrent or metastatic HNSCC: the Phase 2 CONDOR Randomized Clinical Trial.
FDA approval summary: atezolizumab or pembrolizumab for the treatment of patients with advanced urothelial carcinoma ineligible for cisplatin-containing chemotherapy.
Multicenter comparison of 22C3 PharmDx (Agilent) and SP263 (Ventana) assays to test PD-L1 expression for NSCLC patients to be treated with immune checkpoint inhibitors.
Assessment of programmed cell death ligand-1 expression by 4 diagnostic assays and its clinicopathological correlation in a large cohort of surgical resected non-small cell lung carcinoma.
Predictive performance of four programmed cell death ligand 1 assay systems on nivolumab response in previously treated patients with non-small cell lung cancer.
Performance of the Food and Drug Administration/EMA-approved programmed cell death ligand-1 assays in urothelial carcinoma with emphasis on therapy stratification for first-line use of atezolizumab and pembrolizumab.
Inter- and intraobserver agreement of programmed death ligand 1 scoring in head and neck squamous cell carcinoma, urothelial carcinoma and breast carcinoma.
Analytical validation and clinical utility of an immunohistochemical programmed death ligand-1 diagnostic assay and combined tumor and immune cell scoring algorithm for durvalumab in urothelial carcinoma.
Pembrolizumab versus chemotherapy for previously untreated, PD-L1-expressing, locally advanced or metastatic non-small-cell lung cancer (KEYNOTE-042): a randomised, open-label, controlled, phase 3 trial.
Pembrolizumab versus docetaxel for previously treated, PD-L1-positive, advanced non-small-cell lung cancer (KEYNOTE-010): a randomised controlled trial.
Pembrolizumab alone or with chemotherapy versus cetuximab with chemotherapy for recurrent or metastatic squamous cell carcinoma of the head and neck (KEYNOTE-048): a randomised, open-label, phase 3 study.
First-line pembrolizumab in cisplatin-ineligible patients with locally advanced and unresectable or metastatic urothelial cancer (KEYNOTE-052): a multicentre, single-arm, phase 2 study.
Cemiplimab monotherapy for first-line treatment of advanced non-small-cell lung cancer with PD-L1 of at least 50%: a multicentre, open-label, global, phase 3, randomised, controlled trial.
Five-year outcomes from the randomized, phase III trials CheckMate 017 and 057: nivolumab versus docetaxel in previously treated non-small-cell lung cancer.