Data reduction for X‐ray serial crystallography using machine learning

Rahmani, Vahid
Nawaz, Shah
Pennicard, David
Setty, Shabarish Pala Ramakantha
Graafsma, Heinz

DOI: https://doi.org/10.1107/S1600576722011748
Persistent URL: http://resolver.sub.uni-goettingen.de/purl?gldocs-11858/10850
Rahmani, Vahid; Nawaz, Shah; Pennicard, David; Setty, Shabarish Pala Ramakantha; Graafsma, Heinz, 2023: Data reduction for X‐ray serial crystallography using machine learning. In: Journal of Applied Crystallography, 56, 1, 200-213, DOI: https://doi.org/10.1107/S1600576722011748. 
 
Nawaz, Shah; 1Deutsches Elektronen-Synchrotron DESYNotkestraße 85 Hamburg 22607 Germany
Pennicard, David; 1Deutsches Elektronen-Synchrotron DESYNotkestraße 85 Hamburg 22607 Germany
Setty, Shabarish Pala Ramakantha; 1Deutsches Elektronen-Synchrotron DESYNotkestraße 85 Hamburg 22607 Germany
Graafsma, Heinz; 1Deutsches Elektronen-Synchrotron DESYNotkestraße 85 Hamburg 22607 Germany

Abstract

Serial crystallography experiments produce massive amounts of experimental data. Yet in spite of these large‐scale data sets, only a small percentage of the data are useful for downstream analysis. Thus, it is essential to differentiate reliably between acceptable data (hits) and unacceptable data (misses). To this end, a novel pipeline is proposed to categorize the data, which extracts features from the images, summarizes these features with the `bag of visual words' method and then classifies the images using machine learning. In addition, a novel study of various feature extractors and machine learning classifiers is presented, with the aim of finding the best feature extractor and machine learning classifier for serial crystallography data. The study reveals that the oriented FAST and rotated BRIEF (ORB) feature extractor with a multilayer perceptron classifier gives the best results. Finally, the ORB feature extractor with multilayer perceptron is evaluated on various data sets including both synthetic and experimental data, demonstrating superior performance compared with other feature extractors and classifiers.


A machine learning method for distinguishing good and bad images in serial crystallography is presented. To reduce the computational cost, this uses the oriented FAST and rotated BRIEF feature extraction method from computer vision to detect image features, followed by a multilayer perceptron (neural network) to classify the images.