INSIGHTLLM: Intelligent System for Integrating Global Human & Animal Health Technology

Team Members

Jian Tao (Group Leader)

Assistant Professor in Visualization, Assistant Director at the Texas A&M Institute of Data cience, Affiliated Faculty at Department of Electrical & Computer Engineering, Department of Nuclear Engineering, and Department of Multidisciplinary Engineering

Dheeraj Mudireddy

Master’s Student, Data Science

Overview

The INSIGHTLLM project aims to develop a Retrieval-Augmented Generation (RAG) system that serves as a bridge between animal science and human nutrition research. By leveraging the latest reasoning language models, various RAG techniques, and document retrieval techniques, the system delivers precise, cited responses to complex queries with respect to both domains.

It is designed to empower researchers with fast, contextualized access to a vast corpus of scientific literature, streamlining interdisciplinary insights.

Project Objectives

Bridge the gap between animal science and human nutrition through AI
Implement a custom RAG architecture tailored for scientific query resolution
Enable citation-backed answers derived from peer-reviewed literature

Support interdisciplinary research through smart document retrieval
Design an intuitive user interface for query-answer interaction
Facilitate reproducible, scalable deployment via modular components

Key Features

Query Translation & Expansion: Prompted query is rewritten clearly and broken down into multiple sub-queries
Hybrid Retrieval: Combines vector embeddings (dense) with BM25 keyword search (sparse)
LLM Integration: Uses your local Ollama LLM (e.g., LLaMA3.1, DeepSeek-R1) to answer scientific queries
Cross-Encoder Reranking: Improves result accuracy using semantic scoring (MiniLM L6 v2)
Adaptive Top-k Selection: Top-k docs are selected upon reranking, given that k changes dynamically based on reranking scores
Deduplication & Filtering: Ensures clean, diverse, and relevant chunk selection
Chain-of-Thought Reasoning: LLMs are prompted to think step-by-step
Self-Verification: Final answers are reviewed by a smaller lightweight LLM for consistency and accuracy, also called LLM-as-a-Judge

Important Links

Source Code Repository (WIP - access required)