•  
  •  
 

SMU Data Science Review

Abstract

This study explores the utilization of Retrieval Augmented Fine-Tuning (RAFT) to enhance the performance of Large Language Models (LLMs) in domain-specific Retrieval Augmented Generation (RAG) tasks. By integrating domain-specific information during the retrieval process, RAG aims to reduce hallucination and improve the accuracy of LLM outputs. We investigate the use of RAFT, an approach that enhances LLMs by incorporating domain-specific knowledge and effectively handling distractor documents. This paper validates previous work, which found that RAFT can considerably improve the performance of Llama2-7B in specific domains. We also expand upon previous work into new state-of-the-art open-source models and other datasets with mixed results. After fine-tuning three models (Llama2-7B, Llama3-8B, and Mistral-7B-v0.3) using RAFT and evaluating their performance compared to their instruction-tuned versions, our results suggest that RAFT can only improve accuracy for older LLMs on domain specific data. This effect was not found in the latest generation of open-source LLMs.

Creative Commons License

Creative Commons Attribution-Noncommercial 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License

Share

COinS