🧠 PaperScreener

PaperScreener is a lightweight Python tool that automatically collects academic papers from DBLP and filters them based on research interests using a LLM. It helps researchers quickly find relevant papers from top conferences.

🚀 Features

📚 Automatic paper collection — Fetch papers from specific conferences and years via DBLP's SPARQL endpoint.
🤖 LLM-based topic filtering — Automatically determine whether a paper title or abstract matches your research interests.
🔍 Two-stage filtering pipeline —
1. Check relevance based on the title.
2. If uncertain, fetch paper summary using Tavily API (which provides an AI-generated answer) or Semantic Scholar abstract.
💾 Structured output — Results are stored in a JSON file with clearly separated categories for confirmed and uncertain papers.
🔐 Secure API key management — Uses environment variables to keep API keys safe.

🛠️ Installation

Clone the repository:

git clone <repository-url>
cd PaperScreener

Install dependencies:
```
pip install -r requirements.txt
```

Set up environment variables:

cp .env.example .env

Then edit .env file with your actual API keys:

# Required
LLM_API_KEY=your_actual_llm_api_key
LLM_BASE_URL=https://api.openai.com/v1

# Optional (for higher rate limits or enhanced functionality)
SEMANTIC_SCHOLAR_API_KEY=your_semantic_scholar_api_key
TAVILY_API_KEY=your_tavily_api_key

⚙️ Configuration

1. Environment Variables (.env)

Create a .env file based on .env.example:

# LLM API Configuration (Required)
LLM_API_KEY=your_llm_api_key_here
LLM_BASE_URL=your_llm_endpoint_here

# Semantic Scholar API Key (Optional - for direct academic search)
SEMANTIC_SCHOLAR_API_KEY=your_semantic_scholar_api_key_here

# Tavily API Key (Optional - for enhanced web search and abstract retrieval)
# Tavily provides 1,000 free credits per month. Each basic search costs 1 credit.
# Tavily returns an AI-generated 'answer' field that summarizes paper information.
TAVILY_API_KEY=your_tavily_api_key_here

2. Application Configuration (config.json)

Configure your research parameters in config.json:

{
    "conferences": ["icse", "ccs", "uss"],
    "year": 2024,
    "topics": ["software security", "hardware security"],
    "interests": ["Hardware Security", "System Security"],
    "interval": "monthly",
    "llm_model": "Qwen/Qwen3-30B-A3B-Instruct-2507",
    "llm_max_tokens": 100,
    "llm_temperature": 0.7,
    "semantic_scholar_result_limit": 1
}

Configuration Options:

conferences — List of conference names (e.g., "ccs", "micro", "uss")
year — Target year for paper collection
topics — Research topics for filtering
interests — Specific research interests
llm_model — LLM model to use for filtering
llm_max_tokens — Maximum tokens for LLM responses
llm_temperature — Temperature setting for LLM (0.0-1.0)
semantic_scholar_result_limit — Number of results to fetch per query

▶️ Usage

Simply run the script:

python PaperScreener.py

The tool will:

Load configuration from config.json and environment variables
Fetch papers from DBLP for specified conferences and year
Filter by title using LLM to determine relevance
Fetch summaries for uncertain papers via Tavily API (returns AI-generated answer; recommended way) or Semantic Scholar (returns abstract)
Filter by summary/abstract for final relevance determination
Save results in structured JSON format

Output Files

The tool generates several output files in the process:

*_dblp_*.json — Raw papers from DBLP
*_title_filter_*.json — Papers filtered by title relevance
*_semantic_scholar_search_*.json — Abstract data from Semantic Scholar (if using S2 API)
*_crawled_abstracts_*.json — Summary/abstract data retrieved via Tavily API (contains Tavily's answer field)
*_final_filtered_*.json — Final filtered results

🧠 Example Workflow

Set up environment:

cp .env.example .env
# Edit .env with your API keys

Configure research parameters:

# Edit config.json with your conferences, topics, etc.

Run the tool:
```
python PaperScreener.py
```

Review results:

# Check the generated JSON files for relevant papers

🔐 Security Notes

Never commit .env files — They contain sensitive API keys
Use .env.example as a template for required environment variables
Keep API keys secure — Don't share them in code or documentation

📋 Dependencies

requests — For API calls to DBLP and Semantic Scholar
python-dotenv — For loading environment variables from .env files
tavily-python — For Tavily API integration (enhanced web search)

Install all dependencies with:

pip install -r requirements.txt

💡 Future Improvements

Extend to more academic databases
Add support for more LLM providers
Implement caching for API responses
Add batch processing capabilities
Monthly automated checks

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
query		query
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
PaperScreener.py		PaperScreener.py
README.md		README.md
config.json		config.json
config_manager.py		config_manager.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 PaperScreener

🚀 Features

🛠️ Installation

⚙️ Configuration

1. Environment Variables (.env)

2. Application Configuration (config.json)

Configuration Options:

▶️ Usage

Output Files

🧠 Example Workflow

🔐 Security Notes

📋 Dependencies

💡 Future Improvements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🧠 PaperScreener

🚀 Features

🛠️ Installation

⚙️ Configuration

1. Environment Variables (.env)

2. Application Configuration (config.json)

Configuration Options:

▶️ Usage

Output Files

🧠 Example Workflow

🔐 Security Notes

📋 Dependencies

💡 Future Improvements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages