Toran Billups - Fine-tuning language models with Axon
Leave comments at: https://elixirforum.com/t/elixirconf-2023-toran-billups-fine-tuning-language-models-with-axon/59071 Access to large language models has revolutionized natural language processing, but to take better advantage of this capability, you need to fine-tune them for distinct tasks and with industry-specific data. This process seemed like magic until recently when I got my hands dirty solving a classification problem with `RoBERTa`. In this session, we will explore the key aspects of fine-tuning with Axon and Bumblebee, from data engineering and model selection to optimization techniques and evaluation strategies. By the end of the session, attendees will have a firm understanding of the nuance involved in fine-tuning language models and the confidence to apply these techniques effectively with Elixir. Highlights: - __Connecting the Dots with Deep Learning:__ Attendees without prior knowledge of deep learning will get an introduction to help them comprehend the training techniques discussed throughout the session. - __Classification is the "Hello World" of Fine-Tuning:__ We will delve into the fundamentals of fine-tuning and discuss why classification was an ideal starting point for exploring the potential of large language models. - __Training Datasets:__ We will address the complexities involved in data engineering and what to be aware of when you start pulling together real training data. - __Accelerated Training with NVIDIA GPUs:__ The faster you train your models, the more quickly you can learn and adapt. We will explore the advantages of NVIDIA GPUs and how I got started with PopOS!. We will also cover out-of-memory errors and pitfalls I ran into while training on a GPU with limited vRAM. - __Model Selection:__ Attendees will learn how to leverage different models from the Hugging Face library by modifying specifications, allowing for flexibility and customization with help from Bumblebee. - __Optimization:__ Understanding the impact of batch size and sequence length on model performance is crucial. We will delve into the intricacies of these parameters and provide optimization strategies specific to NLP. - __Trainer Accuracy:__ While trainer accuracy is a useful metric, it is important to approach it with skepticism and manually verify results. Attendees will gain insights into effective methods for validating model performance. - __Comparative Analysis:__ We will guide attendees through the process of training multiple models such as BERT and `RoBERTa` and demonstrate the importance of comparing and contrasting results to determine the most suitable model for a given dataset. - __Quantity vs. Quality:__ More training data is not always better, it turns out. Attendees will gain an understanding of the tradeoffs involved in using larger training datasets and the risks of overfitting. - __Evaluation:__ We will conclude the session by addressing common challenges you face throughout the training process and how you might perform static analysis on your fine-tuned models to be sure the network is adequately trained.