About the Research
Theory and application of multi-armed bandit
I am working on a research on machine learning, especially the theory and applications of multi-armed bandit algorithm, which is a kind of reinforcement learning.
Broadly speaking, machine learning can be divided into supervised learning, unsupervised learning, and reinforcement learning. In supervised learning, the task is constructing an algorithm that returns a correct output for a given input based on annotated training data. In unsupervised learning, the task is extracting the patterns for data without annotation.
On the other hand, in reinforcement learning, the task is learning by trial and error. In the given “environment”, the program selects “action”, and as a result of the selected action, “reward” is given and “environment” changes. Under this circumstance, the goal is to maximize the cumulative sum of rewards or learn the best behavior. As an example of applications for Game AI of reinforcement learning, AlphaGo, which beats the world champion of Go, is very famous.
By using the framework of multi-armed bandit, I am developing an algorithm such as a method to reduce the cost and time required for drug screening experiments and a method to accelerate a cancer diagnosis by microscopy.
Representative Research Achievements
- A bad arm existence checking problem: How to utilize asymmetric problem structure?
Tabata, K., Nakamura, A., Honda, J. Komatsuzaki, T, Mach Learn, 2020, 109, 327–372
- Raman spectroscopic histology using machine learning for nonalcoholic fatty liver disease
Helal KM, Taylor JN, Cahyadi H, Okajima A, Tabata K, Itoh Y, Tanaka H, Fujita K, Harada Y, Komatsuzaki T, FEBS Lett., 2019, 593, 2535-2544
- Feature selection as Monte-Carlo Search in Growing Single Rooted Directed Acyclic Graph by Best Leaf Identification
A. Pélissier, A. Nakamura, K. Tabata, SDM, 2019
- An Efficient Approximate Algorithm for the 1-Median Problem on a Graph
K. Tabata, A. Nakamura, M. Kudo, IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2017, E100D, 994 – 1002
- An Algorithm for Influence Maximization in a Two-Terminal Series Parallel Graph and its Application to a Real Network
K. Tabata, A. Nakamura, M. Kudo, DS, 2015, 275 – 283