Semi-automatic scheme for predicting solubility shows potential to accelerate early-stage exploration in material development

A team of researchers from the Sony Group Coporation and ICReDD have developed a semi-automatic system for predicting the solubility of porphyrin derivatives in a range of different solvents. Porphyrin derivatives are indispensable for various applications, such as photothermal and photodynamic therapy, dye-sensitized solar cells, and optoelectronic materials, and research has been intensively conducted to improve and optimize their properties. On the other hand, porphyrin derivatives sometimes form aggregates or become insoluble, making the prediction of solubility an essential issue for selecting appropriate solvents.

To predict the solubility of porphyrin derivatives, the research team developed a semi-automatic scheme for early-stage material search, which consists of four steps: 1) definition of a practical chemical search space, 2) prioritization of molecules in the space using an extended algorithm for submodular function maximization that does not require biased variable selection or pre-existing data, 3) synthesis of porphyrin derivatives & automatic measurement of their UV-Visible absorption spectra, and 4) machine-learning model estimation based on four key characteristics of each spectrum.

By evaluating molecules in the order selected by submodular function maximization in step 2, it was possible to cover a relatively large number of similar molecules (32% of all targeted molecules) while only evaluating a small number of molecules (10 molecules: 0.13% of all targeted molecules). This is a higher coverage rate than conventional methods (random sampling: ~7%, uncertainty sampling: ~4%). The binary classification model obtained with this scheme could predict good solvents of porphyrin derivatives with more than 80% accuracy. The newly developed method is expected to accelerate material exploration in the early stages.

This work was collaborative research between Sony Group Corporation and Hokkaido University and was partly supported by JSPS KAKENHI (JP21H01924) and JST-ERATO (JPMJER1903).

Conceptual diagram of the semi-automatic scheme.
STEP 1: The priority ranking of the INPUT molecule set is constructed using the submodular function maximization algorithm.
STEP 2: Molecules with the highest priority are prepared or synthesized and a solution is made for measurement in the following step.
STEP 3: The system performs UV–Vis absorption spectroscopy to observe the spectrum.
STEP 4: The system evaluates the four representative indicators from each spectrum and estimates both the qualitative and quantitative prediction models based on the accumulated data.