Microplastics Finder: Why, What and How?
The measurement of microplastics using spectroscopy, more specifically Fourier-Transform-Infrared (FTIR) spectroscopy, turned out to be one of the most promising methods since it allows a particle determination based on the chemical structure (characteristic vibrational bands). With FTIR imaging devices, it is possible to analyse a full microplastics filter in a short amount of time and the analysis of size (and type) is fully based on the chemical information instead of just visual information. However, the enormous amounts of data generated during the measurement process pose a challenge. A typical dataset generates millions of spectra which correspond to file sizes between 5 and 70 GB. To access the full value of this chemical information, a robust data analysis method is needed.
The identification of microplastics is based on the comparison of the measured spectra with reference spectra which come from particles of known composition. Oxidation, the presence of biofilms, total absorption for large particles, residues of the sample matrix and much more can influence the infrared spectra of microplastics to varying degrees. Consequently, these must be considered when selecting the reference spectra for reliable detection of microplastics. For this purpose, an extensive collection of spectra must be available. In a classical database comparison, each spectrum in the acquired image is compared with every reference spectrum from the database. The total computing time quickly adds up to hours, which limits the number of reference spectra. Therefore, there is a trade-off between time and analytical quality when using spectral libraries. Furthermore, there is no standardized database for microplastics analysis, which raises questions of comparability.
The limitations of conventional spectral libraries when it comes to large data files and high data variability, which we observe in the case of microplastics analysis, opened up the search for more advanced, alternative data analysis solutions. The key word being machine learning. Unlike "classical" database matching, model-based machine learning analysis can contain practically any number of reference spectra and substance classes. These reference spectra, now called the training data, are the basis for deriving statistical models by means of machine learning algorithms.
Purency applied this method to create a robust data analysis solution for microplastics analysis based on microFTIR images: the Microplastics Finder. The current version of the Microplastics Finder (R2021a) is based on a unique, expert-curated training dataset of more than 12000 reference spectra. About 50% of the data consist of polymer spectra and 50% of matrix spectra. The training data comes from real-life samples and, therefore, also includes “imperfect” spectra. For example, spectra with (partial) total absorption and those from different environmental matrices such as wastewater, sediment or sewage sludge are considered. Therefore, the solution is robust to these challenging effects.
The analysis of new, unknown spectra and the decision whether a spectrum is microplastic takes a fraction of a second. With a typical file size of a spectroscopic image (approx. 5 - 10 GB), the overall results are available in a few minutes. Furthermore, the Microplastics Finder has a broad applicability: more than 20 polymer types in a wide range of environmental matrices can be measured.