The Use of Scoring Systems in Selecting Patients for Lung Resection: Work-up Bias Comes Full-Circle

The process of selecting patients for lung resection is complex and, to outsiders such as the first author, fascinating. There are many good, clear accounts of the individual roles of different forms of clinical information (radiographs, CT, PET, pathology, or spirometry) in determining diagnosis, TNM stage, and the other important factors that determine the choice of treatment. Less is written about the process by which management decisions emerge from a clinical team.

The team must balance, negotiate, or choose from a wealth of internal and external inputs including patientspecific data, published evidence, national guidelines, local protocols, and the experience, opinion, and preferences of a number of health care professionals, not forgetting the wishes of the patient. In an ideal world one might consider all the information available, sort it into some order of importance to the case under discussion, and then agree on diagnosis, stage, and management plan. That is not quite how the process works in practice.

In the authors’ observations of many multidisciplinary meetings, experienced chest physicians, surgeons, oncologists, radiologists, and pathologists assimilate these streams of independent evidence and do indeed agree on the most likely histology, the most likely stage of the disease, and the fitness of the patient for treatment. The negotiation of the agreed diagnosis and stage, however, has started before and continues during each specialty’s presentation of its findings rather than being held until all the information is set out.

It is perhaps natural that people do not wish the findings from their specialty, be it CT, PET, or pathology, to be at odds with the majority view. As an example, in a recent lung cancer multidisciplinary meeting, a pathologist was equivocal as to whether the sample from a solitary pulmonary nodule confirmed a diagnosis of primary lung cancer. Further discussion of the patient elicited a statement as to the patient’s smoking history, at which point the pathologist exclaimed, ‘‘Well if you’d told me that .’’

The purpose of selecting patients for surgery

Surgeons judge themselves and are judged by others according to the levels of perioperative mortality and morbidity among their patients and the duration and quality of postoperative survival. During the workings of the multidisciplinary team and in discussions with patients, selection of patients appropriate for surgery, balancing the short-term risks against long-term survival, is a key contribution of the surgeon. The original use of scoring systems in cardiothoracic surgery was to redress the imbalance caused by differential selection of cases between the private and the university systems in the United States when coronary artery surgery was burgeoning.

There was a perception that the private clinics could ‘‘cherry pick’’ low-risk cases, leaving the higher-risk patients (those who were older, hypertensive, diabetic, or had damaged ventricles) to the university hospitals. The proposed system made it ‘‘feasible to analyze operative results by risk groups and to compare results in similar groups between institutions’’. In coronary artery surgery, the determinants of longterm outcome (the completeness of restoration of coronary blood flow and the maintenance of myocardial function) also are the predominant factors determining perioperative survival rates.

In cancer surgery, however, strategies to reduce risk in the short term may run counter to the longterm benefit of eradication of the cancer. Using low perioperative mortality rates as a measure of the system may be counterproductive. At a population level, the objective in selecting patients for surgery is to maximize the net benefit conferred to all patients who have lung cancer by a team deploying a range of treatment modalities.

Risk models: what goes in and what comes out

The first task in constructing a model of perioperative risk or postoperative survival is to identify ‘‘candidate prognostic factors’’ and amass data on these factors and the outcomes of interest. More often than not, it is a case of identifying candidate factors from an existing data set. Ideally, candidate factors should be objective measures that are available for all patients and that bear a putative relationship to the outcome to be predicted (for a broader discussion of these points, see).

Data analysis, typically using regression techniques but increasingly based on other techniques such as neural networks (eg, see), is undertaken to identify which of the candidate factors are associated independently with the outcome measure of interest. These factors then are used to derive an equation for the perioperative risk or the postoperative survival curve that applies to any group of patients that have a given presentation in terms of these factors. Risk scores are clearly of limited use in selecting patients for surgery if they rely on data not available at the time of the decision, such as pathologic stage.

At a more subtle level, the inclusion of treatment-related factors, such as the extent of the resection, presents a similar problem. A decision to perform a sublobar resection typically is made only for patients considered unlikely to tolerate a lobectomy; it almost is the result of the selection process rather than a free choice open to surgeon and patient within that process. A useful review article by Birim and colleagues tabulates the clinical, tumor-, and treatmentrelated factors recently found to be associated with perioperative mortality and morbidity and those recently found to
be associated with longterm survival.

Interpreting models of risk and survival

If numerous studies identify the same clinical factor as being associated with increased risk or poor survival, this agreement strengthens the notion that this identification is a true finding rather than a statistical fluke. If a factor is identified in only one study, this fact could reflect simply that the item of data was collected in only that study. For instance, when constructing a model for the risk of in-hospital death following lung resection, using data from the European Thoracic Database Project, the authors had DLCO data available for less than a third of patients and did not have details of Charlson comorbidity index (to give just one example) for any of the 3426 patients included in the study.

None of the 13 studies mentioned in the previous section identified more than four factors as being independently associated with differential survival following lung resection. There are statistical limitations to identifying many prognostic factors because of the number of deaths in the databases. (Statisticians may be heard grumbling that the perioperative mortality rate is too low to allow them to build good models.) Nevertheless, there is a pragmatic limit to the number of data items that can be completed accurately for all cases.

These observations point to an intrinsic limitation in relating risk scores and survival models to individual patients and hence in the use of such modeling to inform decisions concerning surgery. Risk models are not complete. It would be entirely wrong to suppose that a clinical feature does not affect the risk for a particular patient simply because that feature does not appear in a risk model.

The appropriateness of risk models over time

Models of perioperative risk become out of date. Although the language is of ‘‘intrinsic risk faced by the patient’’ and ‘‘patient related risk,’’ improvements in surgical and anesthetic techniques (including the adoption of new technology) and the standards of intensive care mean that the risks faced by different groups of patients change over time. This development is widely understood; for example, few cardiac surgeons today boast of ‘‘outperforming’’ the Parsonnet risk-scoring system developed in the late 1980s. (They may, of course, have become less prone to boasting.)

Another mechanism whereby risk models may become out of date also applies to models of longterm survival. Suppose that a surgical risk model became widely accepted and used in the selection of patients. Specifically, suppose that a certain category of patients was shown to have poor postoperative survival. What could be expected if another risk model–building exercise was performed 10 years later?

It could be that no patients exhibiting the unfavorable characteristics populate the new surgical database; alternatively it could be that there are such patients but that they are considered by clinical teams to be patients for whom the risk model, for one reason or another, was not the whole picture. Either way, if the first risk model changes clinical practice in selecting patients, it is unlikely the same factors will emerge with the same relationship to risk in a later risk model.

 

Author: Martin Utley, PhD, Tom Treasure, MD, FRCS