LLM for Generative QA on Long Documents

Recently I was working on a problem statement that needed to explore LLM for Question Answering context. The problem was to use opensource LLM model (not Open AI gpt) and host in a local system for QA on custom data. I found very interesting set of articles and code repo that helped through this process. This is a fast moving topic and hence the code I wrote or the articles might get outdated in a weeks time. My LLM Code repo

  1. This is an amazing article by Abonia Sojasingarayar
    1. Author gives basic architecture for a QA system to process Long and verbose documents eg: Contracts, Legal agreements, clinical research papers etc.
    2. Link to author’s codebase gives starter help on Langchain, Chroma. Though the code is a little outdated considering the speed in which Langchain community is updating the repo so fast to keep up with latest trends.Code
  2. This article by Mick gives a good view on constraints of using LLM models directly for a Long document QA system Link
  3. Matt Boegner talks about a generic knowledge retrieval architecture Link
  4. Amazing blog by Langchain contributors. Talks about how they updated their framework for better retrieval by providing more classes and connectors. Link
  5. Couple of Youtube Creators who I follow to get amazing code samples and explanations on Langchain and LLM in general are
    1. 1littlecoder playlist
    2. Sam Witteveen playlist
  6. Interesting projects to track. There are more projects in each space. The above articles can give a varied list. The ones which I have used are listed down.
    1. LLM framework - Langchain
    2. Vector Stores - Chromadb, FAISS, Elastic Search, Milvus, Weaviate, Pinecone, Quadrant. Lot of traction I see is for Pinecone now. I saw a stack called OPL (Open AI, Pinecone and Langchain)
    3. Connect data to LLMs - LLamaIndex
    4. Models to try from HuggingFace - Google Flan T5 - XXL, Facebook Blenderbot 1B distill, Facebook Opt 66B, BigScience Bloom 560m
    5. To demo a quick version - Gradio, Streamlit
    6. To learn more prompt templates - James Briggs playlist