PaperScreener is a lightweight Python tool that automatically collects academic papers from DBLP and filters them based on research interests using a LLM. It helps researchers quickly find relevant papers from top conferences.
- 📚 Automatic paper collection — Fetch papers from specific conferences and years via DBLP's SPARQL endpoint.
- 🤖 LLM-based topic filtering — Automatically determine whether a paper title or abstract matches your research interests.
- 🔍 Two-stage filtering pipeline —
- Check relevance based on the title.
- If uncertain, fetch paper summary using Tavily API (which provides an AI-generated answer) or Semantic Scholar abstract.
- 💾 Structured output — Results are stored in a JSON file with clearly separated categories for confirmed and uncertain papers.
- 🔐 Secure API key management — Uses environment variables to keep API keys safe.
-
Clone the repository:
git clone <repository-url> cd PaperScreener
-
Install dependencies:
pip install -r requirements.txt
-
Set up environment variables:
cp .env.example .env
Then edit
.envfile with your actual API keys:# Required LLM_API_KEY=your_actual_llm_api_key LLM_BASE_URL=https://api.openai.com/v1 # Optional (for higher rate limits or enhanced functionality) SEMANTIC_SCHOLAR_API_KEY=your_semantic_scholar_api_key TAVILY_API_KEY=your_tavily_api_key
Create a .env file based on .env.example:
# LLM API Configuration (Required)
LLM_API_KEY=your_llm_api_key_here
LLM_BASE_URL=your_llm_endpoint_here
# Semantic Scholar API Key (Optional - for direct academic search)
SEMANTIC_SCHOLAR_API_KEY=your_semantic_scholar_api_key_here
# Tavily API Key (Optional - for enhanced web search and abstract retrieval)
# Tavily provides 1,000 free credits per month. Each basic search costs 1 credit.
# Tavily returns an AI-generated 'answer' field that summarizes paper information.
TAVILY_API_KEY=your_tavily_api_key_hereConfigure your research parameters in config.json:
{
"conferences": ["icse", "ccs", "uss"],
"year": 2024,
"topics": ["software security", "hardware security"],
"interests": ["Hardware Security", "System Security"],
"interval": "monthly",
"llm_model": "Qwen/Qwen3-30B-A3B-Instruct-2507",
"llm_max_tokens": 100,
"llm_temperature": 0.7,
"semantic_scholar_result_limit": 1
}conferences— List of conference names (e.g.,"ccs","micro","uss")year— Target year for paper collectiontopics— Research topics for filteringinterests— Specific research interestsllm_model— LLM model to use for filteringllm_max_tokens— Maximum tokens for LLM responsesllm_temperature— Temperature setting for LLM (0.0-1.0)semantic_scholar_result_limit— Number of results to fetch per query
Simply run the script:
python PaperScreener.pyThe tool will:
- Load configuration from
config.jsonand environment variables - Fetch papers from DBLP for specified conferences and year
- Filter by title using LLM to determine relevance
- Fetch summaries for uncertain papers via Tavily API (returns AI-generated answer; recommended way) or Semantic Scholar (returns abstract)
- Filter by summary/abstract for final relevance determination
- Save results in structured JSON format
The tool generates several output files in the process:
*_dblp_*.json— Raw papers from DBLP*_title_filter_*.json— Papers filtered by title relevance*_semantic_scholar_search_*.json— Abstract data from Semantic Scholar (if using S2 API)*_crawled_abstracts_*.json— Summary/abstract data retrieved via Tavily API (contains Tavily's answer field)*_final_filtered_*.json— Final filtered results
-
Set up environment:
cp .env.example .env # Edit .env with your API keys -
Configure research parameters:
# Edit config.json with your conferences, topics, etc. -
Run the tool:
python PaperScreener.py
-
Review results:
# Check the generated JSON files for relevant papers
- Never commit
.envfiles — They contain sensitive API keys - Use
.env.exampleas a template for required environment variables - Keep API keys secure — Don't share them in code or documentation
requests— For API calls to DBLP and Semantic Scholarpython-dotenv— For loading environment variables from.envfilestavily-python— For Tavily API integration (enhanced web search)
Install all dependencies with:
pip install -r requirements.txt- Extend to more academic databases
- Add support for more LLM providers
- Implement caching for API responses
- Add batch processing capabilities
- Monthly automated checks