[대형언어모델] LLM & RAG

2024년 7월 1일
4분 분량

최종 수정일: 2024년 8월 5일

[지속적으로 업데이트 중 - 최신 업데이트 날짜: 240909]

사람들이 개떼 같이 달려들고 있는 LLM 연구를 나도 하고 있다. 나도 개떼 중 하나다. 읽었던 논문들을 정리하면 좋을 것 같아 남겨본다.

배운 것들을 정리하고 기록하는 것은 생각보다 어려운 것 같다. 연구를 하면 이 논문 저 논문을 읽으며 지식을 산발적으로 흡수한다. 이해 하지 않고 넘어가는 부분도 많다. 그래서 배움을 연결짓고 구조화 하는 것이 어렵다. 물론 시간을 충분히 들이면 해결할 수 있다. 하지만 영 효율이 안 나온다. 그리고 무엇보다 내가 그냥 좀 게으르다.

그래서 논문마다 짤막한 노트만 남긴다.

볼드체로 남긴 건 보고 아이디어 기발하다고 생각한 논문이다.

Corrective Retrieval Augmented Generation
- Core idea: Given a user query, an evaluator decides whether there is relevant context in the database. Do retrieval if yes and do web search otherwise
- The evaluator is fine-tuned with LLM-generated QA pairs. We mark the document with the highest cosine similarity as relevant and the one with the lowest as irrelevant.

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
- Core idea: L Instead of asking for direct answers, let the LLM conduct reasoning.

Chain-of-verification Reduces Hallucination in Large Language Models
- Core idea: Prevent Hallucination through fact-checks
- Implement chain-of-thought prompting

Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
- Core idea: All the processes (retrieval check, relevance check etc) is done by the single generator. This is computationally efficient, and intuitively makes sense thanks to its autoregressive nature.
- Fine-tune the generator such that it produces reflective tokens for every unit (sentence in the case of the paper)
- Fine-tune the critique network for training data generation only. It is never used in the inference process.
- However, instability is an issue, as reflective tokens should be generated properly every single time. Otherwise, string parsing collapses.

MemGPT: Towards LLMs as Operating Systems
- Core idea: Dynamic control of the context window (like virtual memory) rather than naive FIFO approach. This allows permanent memory.
- Update database based on conversation history, and update the context window based on the user query and the updated database.

Making Retrieval-Augmented Language Models Robust to Irrelevant Context
- Core idea: Similar to CRAG, it fine tunes the evaluator that decides whether retrieval is needed.
- NLI part can be ignored.

Wiki-LLaVA: Hierarchical Retrieval-Augmented Generation for Multimodal LLMs
- Core idea: RAG for multimodal LLM.

TextGrad: Automatic "Differentiation" via Text
- Core idea: Recursively improve the input prompt with backward feedback.
- All the improvement is done by the LLM. No parameter tuning or backpropagation involved in here. THERE IS NO DIFFERENTIATION HERE. IT IS JUST "FEEDBACK".
OpenFedLLM: Training Large Language Models on Decentralized Private Data via Federated Learning
- Core idea: Train LLM with private data.
- 아직 제대로 안 읽음.

Instruction Pre-Training: Language Models are Supervised Multitask Learners
- Core idea: Use LLM-annotated data to train a new LLM. It shows better performance than the one pre-trained with raw text data with the same parameter size.

RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
- Core idea: Recursive Clustering & Summarization -> Long-length robust.

Differentiable Ranking and Sorting using Optimal Transport
- Core idea: Discrete ranking, traditional ranking methodology simply computes metrics (e.g. precision, recall, etc). Such a classifier can't be optimized as not differentiable. Therefore suggests an differentiable ranking system.
- 이 영상 아주 좋음.

Differentiable Patch Selection for Image Recognition
- A utilization of top-k differentiable (아직 안 읽음)

From Local to Global: A Graph RAG Approach to Query-Focused Summarization
- Store and augment the data in the form of graph, thereby make the RAG robust to global context question.
- Employ the hierarchical structure to have different levels of globality -> Same idea as the one of RAPTOR.

RAFT: Adapting Language Model to Domain Specific RAG
- Core idea: Collect (Q, D*, D1,D2,..Dk, A*), freeze retriever, and fine-tune LLM.

RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs
- Fine-tune it such that a single LLM simultaneously check context relevance.
- I'd rather do self-rag than this.

REPLUG: Retrieval-Augmented Black-Box Language Models
- Core idea 1: Inference with ensembled probability distribution of tokens
- Core idea 2: Using LM as the score function (perplexity metric)
  - Minimize the KL divergence between retrieval likelihood and LM likelihood ('y' being the ground truth)

Reinforcement Learning for Optimizing RAG for Domain Chatbots
- Core idea: Decide whether retrieve or not with RL (decision being the policy)
  - State: input to the policy model, previous policy actions, and the current query.
  - Reward: Quality of response graded by ChatGPT
  - Action/Policy: Retrieve or not
- For certain patterns/sequences of queries, we can get a good answer from the bot even without fetching the FAQ context. Examples of such scenarios can be:
  - 1. For a follow-up query; FAQ context need not be retrieved if it has already been fetched for the previous query.
  - 2. For the sequence of queries referring to the same FAQ, a context can be fetched only once at the start.
  - 3. For OOD queries, the LLM prompt itself can guide the bot to generate the answer.
- Concern: Why necessarily RL though? RL is for maximizing the cumulative rewards. However, RAG is a task that should be optimized for every single round, no? Such a point hasn't been addressed clearly enough.

RRAML: Reinforced Retrieval Augmented Machine Learning
- Gives a high-level framework of how to optimize RAG system with RL, but never clearly defines state, action, and reward.
- LLM parameters frozen.

Enhancing Generative Retrieval with Reinforcement Learning from Relevance Feedback
- Use RL to optimize generative retriever
- 아직 다 이해 못 함. 이해하기 위해 아래 논문(generative retriever)에 대한 선행 공부 필요

Learning to Tokenize for Generative Retrieval (youtube link)
- Preliminary: instead of Dense Passage Retrieval (DPR), in which similarities should be learned every time, we can use Differentiable Search Index (DSI), in which query can directly generate the DocID.
- Core idea: GenRet - DocID generated during the inference process.
- Strength over DPR: Unlike DPR, DSI's performance improves significantly as model size increases.

End-to-End Training of Multi-Document Reader and Retriever for Open-Domain Question Answering

Retrieval as Attention: End-to-end Learning of Retrieval and Reading within a Single Transformer

Stochastic RAG: End-to-End Retrieval-Augmented Generation through Expected Utility Maximization
- Core Idea 1: Use Straight-through Gumbel Softmax reparameterization trick.
- Core Idea 2: Expectation Maximization - Expected Utility (utility not differentiable -> weighted sum differentiable).

Minwu Kim

[대형언어모델] LLM & RAG

댓글