[대형언어모델] LLM & RAG
- Minwu Kim
- 2024년 7월 1일
- 4분 분량
최종 수정일: 2024년 8월 5일
[지속적으로 업데이트 중 - 최신 업데이트 날짜: 240909]
사람들이 개떼 같이 달려들고 있는 LLM 연구를 나도 하고 있다. 나도 개떼 중 하나다. 읽었던 논문들을 정리하면 좋을 것 같아 남겨본다.
배운 것들을 정리하고 기록하는 것은 생각보다 어려운 것 같다. 연구를 하면 이 논문 저 논문을 읽으며 지식을 산발적으로 흡수한다. 이해 하지 않고 넘어가는 부분도 많다. 그래서 배움을 연결짓고 구조화 하는 것이 어렵다. 물론 시간을 충분히 들이면 해결할 수 있다. 하지만 영 효율이 안 나온다. 그리고 무엇보다 내가 그냥 좀 게으르다.
그래서 논문마다 짤막한 노트만 남긴다.
볼드체로 남긴 건 보고 아이디어 기발하다고 생각한 논문이다.
Corrective Retrieval Augmented Generation
Core idea: Given a user query, an evaluator decides whether there is relevant context in the database. Do retrieval if yes and do web search otherwise
The evaluator is fine-tuned with LLM-generated QA pairs. We mark the document with the highest cosine similarity as relevant and the one with the lowest as irrelevant.
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Core idea: L Instead of asking for direct answers, let the LLM conduct reasoning.
Chain-of-verification Reduces Hallucination in Large Language Models
Core idea: Prevent Hallucination through fact-checks
Implement chain-of-thought prompting
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
Core idea: All the processes (retrieval check, relevance check etc) is done by the single generator. This is computationally efficient, and intuitively makes sense thanks to its autoregressive nature.
Fine-tune the generator such that it produces reflective tokens for every unit (sentence in the case of the paper)
Fine-tune the critique network for training data generation only. It is never used in the inference process.
However, instability is an issue, as reflective tokens should be generated properly every single time. Otherwise, string parsing collapses.
MemGPT: Towards LLMs as Operating Systems
Core idea: Dynamic control of the context window (like virtual memory) rather than naive FIFO approach. This allows permanent memory.
Update database based on conversation history, and update the context window based on the user query and the updated database.
Making Retrieval-Augmented Language Models Robust to Irrelevant Context
Core idea: Similar to CRAG, it fine tunes the evaluator that decides whether retrieval is needed.
NLI part can be ignored.
Wiki-LLaVA: Hierarchical Retrieval-Augmented Generation for Multimodal LLMs
Core idea: RAG for multimodal LLM.
TextGrad: Automatic "Differentiation" via Text
Core idea: Recursively improve the input prompt with backward feedback.
All the improvement is done by the LLM. No parameter tuning or backpropagation involved in here. THERE IS NO DIFFERENTIATION HERE. IT IS JUST "FEEDBACK".
OpenFedLLM: Training Large Language Models on Decentralized Private Data via Federated Learning
Core idea: Train LLM with private data.
아직 제대로 안 읽음.
Instruction Pre-Training: Language Models are Supervised Multitask Learners
Core idea: Use LLM-annotated data to train a new LLM. It shows better performance than the one pre-trained with raw text data with the same parameter size.
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
Core idea: Recursive Clustering & Summarization -> Long-length robust.
Differentiable Ranking and Sorting using Optimal Transport
Core idea: Discrete ranking, traditional ranking methodology simply computes metrics (e.g. precision, recall, etc). Such a classifier can't be optimized as not differentiable. Therefore suggests an differentiable ranking system.
이 영상 아주 좋음.
Differentiable Patch Selection for Image Recognition
A utilization of top-k differentiable (아직 안 읽음)
From Local to Global: A Graph RAG Approach to Query-Focused Summarization
Store and augment the data in the form of graph, thereby make the RAG robust to global context question.
Employ the hierarchical structure to have different levels of globality -> Same idea as the one of RAPTOR.
RAFT: Adapting Language Model to Domain Specific RAG
Core idea: Collect (Q, D*, D1,D2,..Dk, A*), freeze retriever, and fine-tune LLM.
RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs
Fine-tune it such that a single LLM simultaneously check context relevance.
I'd rather do self-rag than this.
REPLUG: Retrieval-Augmented Black-Box Language Models
Core idea 1: Inference with ensembled probability distribution of tokens
Core idea 2: Using LM as the score function (perplexity metric)
Minimize the KL divergence between retrieval likelihood and LM likelihood ('y' being the ground truth)
Reinforcement Learning for Optimizing RAG for Domain Chatbots
Core idea: Decide whether retrieve or not with RL (decision being the policy)
State: input to the policy model, previous policy actions, and the current query.
Reward: Quality of response graded by ChatGPT
Action/Policy: Retrieve or not
For certain patterns/sequences of queries, we can get a good answer from the bot even without fetching the FAQ context. Examples of such scenarios can be:
1. For a follow-up query; FAQ context need not be retrieved if it has already been fetched for the previous query.
2. For the sequence of queries referring to the same FAQ, a context can be fetched only once at the start.
3. For OOD queries, the LLM prompt itself can guide the bot to generate the answer.
Concern: Why necessarily RL though? RL is for maximizing the cumulative rewards. However, RAG is a task that should be optimized for every single round, no? Such a point hasn't been addressed clearly enough.
RRAML: Reinforced Retrieval Augmented Machine Learning
Gives a high-level framework of how to optimize RAG system with RL, but never clearly defines state, action, and reward.
LLM parameters frozen.
Enhancing Generative Retrieval with Reinforcement Learning from Relevance Feedback
Use RL to optimize generative retriever
아직 다 이해 못 함. 이해하기 위해 아래 논문(generative retriever)에 대한 선행 공부 필요
Learning to Tokenize for Generative Retrieval (youtube link)
Preliminary: instead of Dense Passage Retrieval (DPR), in which similarities should be learned every time, we can use Differentiable Search Index (DSI), in which query can directly generate the DocID.
Core idea: GenRet - DocID generated during the inference process.
Strength over DPR: Unlike DPR, DSI's performance improves significantly as model size increases.
Stochastic RAG: End-to-End Retrieval-Augmented Generation through Expected Utility Maximization
Core Idea 1: Use Straight-through Gumbel Softmax reparameterization trick.
Core Idea 2: Expectation Maximization - Expected Utility (utility not differentiable -> weighted sum differentiable).
댓글