学際大規模情報基盤共同利用・共同研究拠点

採択課題 【詳細】

EX24301 Enhanced Component-Wise Natural Gradient Descent Training Method for Deep Neural Networks
課題代表者 TRAN VAN SANG(東京大学 情報理工学系研究科)
TRAN VAN SANG
概要

Deep neural networks have become integral to advancements in numerous computational applications, such as image recognition, natural language processing, and autonomous driving. As the networks increase in complexity, they often face greater challenges in training, a process where the network learns to make predictions or decisions from data. This results in issues such as slow convergence in data learning and poor predictive capabilities of the output model. These limitations largely stem from the fact that researchers currently have a limited number of options for network training, restricting their ability to effectively address and overcome these challenges. In this study, we aim to expand the diversity of network training options by introducing a new training algorithm. This algorithm enhances the probability of resolving existing training issues and provides researchers with a more robust toolkit to select the most suitable methods for their specific problems.


Central to our approach is refining the second-order training method, Natural Gradient Descent, which traditionally requires prohibitive computational resources. We make the second-order training method feasible on deep neural networks by approximating the curvature matrix, i.e., the Fisher Information Matrix, used in these methods. We can approximate the Fisher Information Matrix as a block-diagonal matrix by focusing on significant elements and ignoring minor interactions between distant network parameters. This significantly reduces computational overhead while preserving essential curvature information. We name our method Component-Wise Natural Gradient Descent (CW-NGD).

Historically, advancements in hardware, exemplified by Moore's law, have led to an increase in transistor counts on microchips, but clock speeds in CPUs have plateaued, necessitating the shift to multicore architectures and distributed computing. A common issue in distributed system training is the need to increase the batch size, which generally involves processing more data simultaneously. This larger batch size can negatively impact training efficiency by reducing the frequency of model updates and potentially degrading the model's ability to generalize from the training data. Our method addresses this challenge by proposing a mechanism named Sub-Batch update which can improve accuracy of the neural network when training while enabling isolated calculations for parallel processing.

We implemented the proposed method and conducted a thorough hyperparameter search to clarify the optimal settings, making the method user-friendly and practical. We validated the efficacy of the Sub-Batch update mechanism and confirmed it is exclusively effective with CW-NGD.

To prove the method's effectiveness, we compared it against established methods, Adam and SGD, on eight different datasets with various models, including the large-scale ImageNet dataset. In the comparison experiments on small-scale datasets, CW-NGD achieved higher testing accuracy and converged faster and more stably than Adam and SGD. For middle-size datasets with a small batch size, CW-NGD achieved the highest testing accuracy in all five datasets and converged as fast as Adam. For the same middle-size datasets and the large-scale ImageNet dataset, CW-NGD achieved higher testing accuracy than all the others in all six datasets while converging faster than SGD and slightly slower than Adam.

This dissertation is structured first to lay down the mathematical groundwork and definitions essential for understanding the new training algorithm, followed by a detailed presentation of the methodology and its implementation. The evaluation chapter substantiates the theoretical advantages of CW-NGD through empirical evidence, showcasing its effectiveness in real-world scenarios. The final chapters summarize this work's contributions and propose directions for future research. They aim to further refine and expand CW-NGD's capabilities in the high-performance computing landscape.


報告書等 研究紹介ポスター / 最終報告書
関連Webページ
無断転載禁止