top of page

[대형언어모델] LLM & RAG

  • 작성자 사진: Minwu Kim
    Minwu Kim
  • 2024년 7월 1일
  • 4분 분량

최종 수정일: 2024년 8월 5일

[지속적으로 업데이트 중 - 최신 업데이트 날짜: 240909]


사람들이 개떼 같이 달려들고 있는 LLM 연구를 나도 하고 있다. 나도 개떼 중 하나다. 읽었던 논문들을 정리하면 좋을 것 같아 남겨본다.


배운 것들을 정리하고 기록하는 것은 생각보다 어려운 것 같다. 연구를 하면 이 논문 저 논문을 읽으며 지식을 산발적으로 흡수한다. 이해 하지 않고 넘어가는 부분도 많다. 그래서 배움을 연결짓고 구조화 하는 것이 어렵다. 물론 시간을 충분히 들이면 해결할 수 있다. 하지만 영 효율이 안 나온다. 그리고 무엇보다 내가 그냥 좀 게으르다.


그래서 논문마다 짤막한 노트만 남긴다.

볼드체로 남긴 건 보고 아이디어 기발하다고 생각한 논문이다.


  • Corrective Retrieval Augmented Generation

    • Core idea: Given a user query, an evaluator decides whether there is relevant context in the database. Do retrieval if yes and do web search otherwise

    • The evaluator is fine-tuned with LLM-generated QA pairs. We mark the document with the highest cosine similarity as relevant and the one with the lowest as irrelevant.




  • Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

    • Core idea: All the processes (retrieval check, relevance check etc) is done by the single generator. This is computationally efficient, and intuitively makes sense thanks to its autoregressive nature.

    • Fine-tune the generator such that it produces reflective tokens for every unit (sentence in the case of the paper)

    • Fine-tune the critique network for training data generation only. It is never used in the inference process.

    • However, instability is an issue, as reflective tokens should be generated properly every single time. Otherwise, string parsing collapses.


  • MemGPT: Towards LLMs as Operating Systems

    • Core idea: Dynamic control of the context window (like virtual memory) rather than naive FIFO approach. This allows permanent memory.

    • Update database based on conversation history, and update the context window based on the user query and the updated database.












  • REPLUG: Retrieval-Augmented Black-Box Language Models

    • Core idea 1: Inference with ensembled probability distribution of tokens

    • Core idea 2: Using LM as the score function (perplexity metric)

      • Minimize the KL divergence between retrieval likelihood and LM likelihood ('y' being the ground truth)


  • Reinforcement Learning for Optimizing RAG for Domain Chatbots

    • Core idea: Decide whether retrieve or not with RL (decision being the policy)

      • State: input to the policy model, previous policy actions, and the current query.

      • Reward: Quality of response graded by ChatGPT

      • Action/Policy: Retrieve or not

    • For certain patterns/sequences of queries, we can get a good answer from the bot even without fetching the FAQ context. Examples of such scenarios can be:

      • 1. For a follow-up query; FAQ context need not be retrieved if it has already been fetched for the previous query.

      • 2. For the sequence of queries referring to the same FAQ, a context can be fetched only once at the start.

      • 3. For OOD queries, the LLM prompt itself can guide the bot to generate the answer.

    • Concern: Why necessarily RL though? RL is for maximizing the cumulative rewards. However, RAG is a task that should be optimized for every single round, no? Such a point hasn't been addressed clearly enough.




  • Learning to Tokenize for Generative Retrieval (youtube link)

    • Preliminary: instead of Dense Passage Retrieval (DPR), in which similarities should be learned every time, we can use Differentiable Search Index (DSI), in which query can directly generate the DocID.

    • Core idea: GenRet - DocID generated during the inference process.

    • Strength over DPR: Unlike DPR, DSI's performance improves significantly as model size increases.






 
 
 

댓글


bottom of page