Lighthouse Project : Red Flag AI Tool





Team Members

Jian Tao (Group Leader)

Assistant Professor in Visualization, Assistant Director at the Texas A&M Institute of Data cience, Affiliated Faculty at Department of Electrical & Computer Engineering, Department of Nuclear Engineering, and Department of Multidisciplinary Engineering

Sree Kiran Prasad

Master’s Student, NLP & Data Science Researcher

Revanth Reddy

Master’s Student, ML & FullStack Developer

Harikrishnan Raghukumar

Master’s Student, Data Science Researcher



Overview

The aim of this project is to develop algorithms and their embodiment in prototype software to implement the categorization.

The software will accept as input the project information and produce as output a profile summarizing the documents relevance to each topic e.g. through a numerical score in each topic, and analysis of the frequencies of keywords in each topic, together with a summary of the set of proposals.


Project Objectives

  • Create analytical dashboard for the Texas A&M research enterprise
  • Identify Texas A&M research capacity for strategic assessment
  • Identify Texas A&M subject matter experts and research clusters
  • Advance opportunities for interdisciplinary research
  • Strategically map capacity to state/national funding opportunities
  • Define the Texas A&M global research footprint






Key Features

  • Implementation - Frontend with Web Programming
    • Interactive dashboard interface
    • Built with Django - a Python-based open source web programming framework that seamlessly integrates with Python-based analysis backend
    • Authentication required for internal usage
    • Model-View-Controller design pattern to facilitate future development
    • Modular design to scale and sustain



  • Implementation - Backend with Natural Language Processing
    • Define Red Flag Lists There are 6 red flag lists: AWO, Biosafety, HRPP, EHS, Export Control, and Privacy.
    • Read in PDFs - Read in and transform PDF files into plain text files for further analysis.
    • TF-IDF Analysis The Term Frequency Inverse Document Frequency (TF-IDF) analysis is done to identify representative terms for each proposal.
    • Similarity Analysis Carry out similarity analysis between the reg flags and parsed proposals to identify compliance issues.



Important Links

Lighthouse Website (TAMU CAS protected)
Documentation of the project (WIP- TauGroup members only)
Github repository (TauGroup members only)