Multi conditional lung nodule synthesis for improved nodule malignancy classification in Computed Tomography scans

J. Spronck

Master thesis 2020.

Deep learning systems are increasingly being researched for the development of Computer-Aided Diagnosis (CAD) of lung nodules in lung cancer screening. In order to achieve clinically relevant accuracy, these supervised deep learning systems rely on large amounts of annotated data. However, the availability of labels in medical imaging is rather limited due to the costs involved in obtaining annotations from expert clinicians. To compensate for this data scarcity, the synthesis of additional training data by Generative Adversarial Networks (GANs) has gained increased attention and has shown to be useful for the improvement of several supervised learning tasks. This study compares multiple advanced training methods involving synthesized data from a multi-conditional-Wasserstein-GAN (mcWGAN) to improve a nodule malignancy classifier. It specifically examines the potential of a mcWGAN to generate synthetic nodules that are hard-to-classify through input conditions from misclassified nodules. This approach was compared with conventional nodule synthesis, where the input conditions are sampled from the distributions of all nodules, rather than the misclassified nodules solely. We examined whether the use of pre-training on synthesized nodules or ImageNet further improves the classifier's performance. The proposed mcWGAN proved to be successful at generating a wide variety of additional training data with manipulable malignancy and nodule diameter attributes. We show that ImageNet pre-training, combined with synthetic data augmentation, consistently outperforms other training approaches.