Skip to content

ha0lyu/PaperScreener

Repository files navigation

🧠 PaperScreener

PaperScreener is a lightweight Python tool that automatically collects academic papers from DBLP and filters them based on research interests using a LLM. It helps researchers quickly find relevant papers from top conferences.


🚀 Features

  • 📚 Automatic paper collection — Fetch papers from specific conferences and years via DBLP's SPARQL endpoint.
  • 🤖 LLM-based topic filtering — Automatically determine whether a paper title or abstract matches your research interests.
  • 🔍 Two-stage filtering pipeline
    1. Check relevance based on the title.
    2. If uncertain, fetch paper summary using Tavily API (which provides an AI-generated answer) or Semantic Scholar abstract.
  • 💾 Structured output — Results are stored in a JSON file with clearly separated categories for confirmed and uncertain papers.
  • 🔐 Secure API key management — Uses environment variables to keep API keys safe.

🛠️ Installation

  1. Clone the repository:

    git clone <repository-url>
    cd PaperScreener
  2. Install dependencies:

    pip install -r requirements.txt
  3. Set up environment variables:

    cp .env.example .env

    Then edit .env file with your actual API keys:

    # Required
    LLM_API_KEY=your_actual_llm_api_key
    LLM_BASE_URL=https://api.openai.com/v1
    
    # Optional (for higher rate limits or enhanced functionality)
    SEMANTIC_SCHOLAR_API_KEY=your_semantic_scholar_api_key
    TAVILY_API_KEY=your_tavily_api_key

⚙️ Configuration

1. Environment Variables (.env)

Create a .env file based on .env.example:

# LLM API Configuration (Required)
LLM_API_KEY=your_llm_api_key_here
LLM_BASE_URL=your_llm_endpoint_here

# Semantic Scholar API Key (Optional - for direct academic search)
SEMANTIC_SCHOLAR_API_KEY=your_semantic_scholar_api_key_here

# Tavily API Key (Optional - for enhanced web search and abstract retrieval)
# Tavily provides 1,000 free credits per month. Each basic search costs 1 credit.
# Tavily returns an AI-generated 'answer' field that summarizes paper information.
TAVILY_API_KEY=your_tavily_api_key_here

2. Application Configuration (config.json)

Configure your research parameters in config.json:

{
    "conferences": ["icse", "ccs", "uss"],
    "year": 2024,
    "topics": ["software security", "hardware security"],
    "interests": ["Hardware Security", "System Security"],
    "interval": "monthly",
    "llm_model": "Qwen/Qwen3-30B-A3B-Instruct-2507",
    "llm_max_tokens": 100,
    "llm_temperature": 0.7,
    "semantic_scholar_result_limit": 1
}

Configuration Options:

  • conferences — List of conference names (e.g., "ccs", "micro", "uss")
  • year — Target year for paper collection
  • topics — Research topics for filtering
  • interests — Specific research interests
  • llm_model — LLM model to use for filtering
  • llm_max_tokens — Maximum tokens for LLM responses
  • llm_temperature — Temperature setting for LLM (0.0-1.0)
  • semantic_scholar_result_limit — Number of results to fetch per query

▶️ Usage

Simply run the script:

python PaperScreener.py

The tool will:

  1. Load configuration from config.json and environment variables
  2. Fetch papers from DBLP for specified conferences and year
  3. Filter by title using LLM to determine relevance
  4. Fetch summaries for uncertain papers via Tavily API (returns AI-generated answer; recommended way) or Semantic Scholar (returns abstract)
  5. Filter by summary/abstract for final relevance determination
  6. Save results in structured JSON format

Output Files

The tool generates several output files in the process:

  • *_dblp_*.json — Raw papers from DBLP
  • *_title_filter_*.json — Papers filtered by title relevance
  • *_semantic_scholar_search_*.json — Abstract data from Semantic Scholar (if using S2 API)
  • *_crawled_abstracts_*.json — Summary/abstract data retrieved via Tavily API (contains Tavily's answer field)
  • *_final_filtered_*.json — Final filtered results

🧠 Example Workflow

  1. Set up environment:

    cp .env.example .env
    # Edit .env with your API keys
  2. Configure research parameters:

    # Edit config.json with your conferences, topics, etc.
  3. Run the tool:

    python PaperScreener.py
  4. Review results:

    # Check the generated JSON files for relevant papers

🔐 Security Notes

  • Never commit .env files — They contain sensitive API keys
  • Use .env.example as a template for required environment variables
  • Keep API keys secure — Don't share them in code or documentation

📋 Dependencies

  • requests — For API calls to DBLP and Semantic Scholar
  • python-dotenv — For loading environment variables from .env files
  • tavily-python — For Tavily API integration (enhanced web search)

Install all dependencies with:

pip install -r requirements.txt

💡 Future Improvements

  • Extend to more academic databases
  • Add support for more LLM providers
  • Implement caching for API responses
  • Add batch processing capabilities
  • Monthly automated checks

About

Automatically collects academic papers from DBLP and filters via LLM.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages