🧬 PCV2 Epitope Mapping & Prediction Platform

A structure-aware, machine learning–driven bioinformatics pipeline for predicting antibody-accessible epitopes on the Porcine Circovirus Type 2 (PCV2) capsid protein (ORF2).

🌐 Live Application

🔗 https://pcv2.epitope.aiconceptlimited.com.ng/

🎯 Overview

This project implements a research-grade computational framework integrating:

Evolutionary sequence analysis
Structural biology (PDB-based features)
Physicochemical characterization
Machine learning (XGBoost)

to identify potential B-cell epitopes on the PCV2 capsid protein.

💡 Applications

Epitope discovery
Vaccine target identification
Viral antigen characterization
Immunoinformatics research

🧠 Core Concept

Epitope prediction is treated as a multi-modal biological inference problem:

Sequence → Evolution → Structure → Features → ML → Prediction → Validation

🔬 Data Sources

Component	Source
Protein sequences	NCBI (Entrez API)
Reference sequence	UniProt
Protein structures	PDB (3R0R, 6EZG)
Epitope validation	IEDB

⚙️ Pipeline Architecture

NCBI Retrieval
     ↓
Sequence Cleaning (capsid-only filtering)
     ↓
Multiple Sequence Alignment (MAFFT)
     ↓
Feature Engineering
  - Conservation
  - Entropy
  - SASA
  - Residue Depth
  - Electrostatics
     ↓
Feature Matrix Construction
     ↓
Epitope Labeling (IEDB)
     ↓
Machine Learning (XGBoost)
     ↓
Prediction
     ↓
3D + Sequence Visualization (Streamlit)

🧬 Feature Engineering

🧬 Evolutionary Features

Conservation score (frequency-based)
Shannon entropy (sequence variability)

🧱 Structural Features

Solvent Accessible Surface Area (SASA)
Residue depth
Secondary structure (loop/helix/sheet)
Electrostatics

⚗️ Physicochemical Features

Hydrophobicity
Charge distribution

🔄 Contextual Features

Sliding window (±2 residues)
Spatial neighborhood aggregation

🤖 Machine Learning

Model: XGBoost Classifier
Input: Residue-level feature matrix
Output: Probability of epitope per residue

Training Strategy

Imbalanced dataset handling
Threshold tuning (default: 0.25)
Feature importance extraction

📊 Results

Metric	Value
Total residues	~162–245
Predicted epitopes	~24
Validated (IEDB overlap)	~4
ROC-AUC	~0.70–0.75

🧪 Validation Strategy

Predictions compared with IEDB experimental epitopes
Overlap analysis performed at residue level

Interpretation

✅ Overlapping residues → validated epitopes
🔬 Non-overlapping → novel candidate epitopes

🧬 Biological Insights

Predicted epitopes are enriched in:

Surface-exposed regions (high SASA)
Loop/coil structures
High-entropy (variable) regions

👉 This aligns with known principles of antibody binding.

📁 Project Structure

pcv2_epitope_project/
│
├── data/                  # Metadata, mappings, IEDB data
├── sequences/             # FASTA + alignments
├── structures/            # PDB files (3R0R, 6EZG)
├── features/              # Engineered features
├── results/               # Predictions + evaluation
├── models/                # Trained ML model
├── scripts/               # Feature + analysis scripts
├── pipeline/              # Automation scripts
│
├── dashboard.py           # Streamlit interface
└── run_smart_pipeline.py  # Full pipeline runner

⚙️ Installation

git clone https://github.com/YOUR_USERNAME/pcv2-epitope-platform.git
cd pcv2-epitope-platform

python -m venv pcv2_env
source pcv2_env/bin/activate

pip install -r requirements.txt

▶️ Usage

Run Full Pipeline

python run_smart_pipeline.py

Launch Dashboard

streamlit run dashboard.py

📊 Dashboard Features

📈 Epitope probability plots
🧬 Sequence visualization (UniProt-aligned)
🧊 3D structure mapping (Py3Dmol)
🧪 IEDB validation overlay
📦 Epitope clustering

⚠️ Limitations

Limited experimentally validated epitopes (class imbalance)
Predictions are computational (require lab validation)
Sequence–structure mapping introduces approximation

🚀 Future Work

Graph Neural Networks (GNN)
Transformer-based protein models
Improved structural alignment
REST API deployment
Continuous data updates (automated pipeline)

🤝 Collaboration

Open to collaborations in:

Bioinformatics
Immunoinformatics
Vaccine design
Structural biology

📜 Disclaimer

This system provides computational predictions and should not replace experimental validation.

👤 Author

Abubakar Bioinformatics & Computational Biology

⭐ Acknowledgements

NCBI (sequence data)
RCSB PDB (structural data)
IEDB (epitope data)
Biopython, XGBoost, Streamlit communities

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
analysis		analysis
data		data
features		features
models		models
pipeline		pipeline
scripts		scripts
sequences		sequences
structures		structures
.gitignore		.gitignore
.gitignore.save		.gitignore.save
README.md		README.md
build_epitope_labels.py		build_epitope_labels.py
dashboard.py		dashboard.py
import		import
io.mc		io.mc
requirements.txt		requirements.txt
run_daily_pipeline.sh		run_daily_pipeline.sh
run_pipeline.py		run_pipeline.py
run_smart_pipeline.py		run_smart_pipeline.py

Folders and files

Latest commit

History

Repository files navigation

🧬 PCV2 Epitope Mapping & Prediction Platform

🌐 Live Application

🎯 Overview

💡 Applications

🧠 Core Concept

🔬 Data Sources

⚙️ Pipeline Architecture

🧬 Feature Engineering

🧬 Evolutionary Features

🧱 Structural Features

⚗️ Physicochemical Features

🔄 Contextual Features

🤖 Machine Learning

Training Strategy

📊 Results

🧪 Validation Strategy

Interpretation

🧬 Biological Insights

📁 Project Structure

⚙️ Installation

▶️ Usage

Run Full Pipeline

Launch Dashboard

📊 Dashboard Features

⚠️ Limitations

🚀 Future Work

🤝 Collaboration

📜 Disclaimer

👤 Author

⭐ Acknowledgements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages