Skip to content

agarba360-beep/pcv2-epitope-platform

Repository files navigation

🧬 PCV2 Epitope Mapping & Prediction Platform

Python ML Bioinformatics Status

A structure-aware, machine learning–driven bioinformatics pipeline for predicting antibody-accessible epitopes on the Porcine Circovirus Type 2 (PCV2) capsid protein (ORF2).


🌐 Live Application

🔗 https://pcv2.epitope.aiconceptlimited.com.ng/


🎯 Overview

This project implements a research-grade computational framework integrating:

  • Evolutionary sequence analysis
  • Structural biology (PDB-based features)
  • Physicochemical characterization
  • Machine learning (XGBoost)

to identify potential B-cell epitopes on the PCV2 capsid protein.

💡 Applications

  • Epitope discovery
  • Vaccine target identification
  • Viral antigen characterization
  • Immunoinformatics research

🧠 Core Concept

Epitope prediction is treated as a multi-modal biological inference problem:

Sequence → Evolution → Structure → Features → ML → Prediction → Validation

🔬 Data Sources

Component Source
Protein sequences NCBI (Entrez API)
Reference sequence UniProt
Protein structures PDB (3R0R, 6EZG)
Epitope validation IEDB

⚙️ Pipeline Architecture

NCBI Retrieval
     ↓
Sequence Cleaning (capsid-only filtering)
     ↓
Multiple Sequence Alignment (MAFFT)
     ↓
Feature Engineering
  - Conservation
  - Entropy
  - SASA
  - Residue Depth
  - Electrostatics
     ↓
Feature Matrix Construction
     ↓
Epitope Labeling (IEDB)
     ↓
Machine Learning (XGBoost)
     ↓
Prediction
     ↓
3D + Sequence Visualization (Streamlit)

🧬 Feature Engineering

🧬 Evolutionary Features

  • Conservation score (frequency-based)
  • Shannon entropy (sequence variability)

🧱 Structural Features

  • Solvent Accessible Surface Area (SASA)
  • Residue depth
  • Secondary structure (loop/helix/sheet)
  • Electrostatics

⚗️ Physicochemical Features

  • Hydrophobicity
  • Charge distribution

🔄 Contextual Features

  • Sliding window (±2 residues)
  • Spatial neighborhood aggregation

🤖 Machine Learning

  • Model: XGBoost Classifier
  • Input: Residue-level feature matrix
  • Output: Probability of epitope per residue

Training Strategy

  • Imbalanced dataset handling
  • Threshold tuning (default: 0.25)
  • Feature importance extraction

📊 Results

Metric Value
Total residues ~162–245
Predicted epitopes ~24
Validated (IEDB overlap) ~4
ROC-AUC ~0.70–0.75

🧪 Validation Strategy

  • Predictions compared with IEDB experimental epitopes
  • Overlap analysis performed at residue level

Interpretation

  • ✅ Overlapping residues → validated epitopes
  • 🔬 Non-overlapping → novel candidate epitopes

🧬 Biological Insights

Predicted epitopes are enriched in:

  • Surface-exposed regions (high SASA)
  • Loop/coil structures
  • High-entropy (variable) regions

👉 This aligns with known principles of antibody binding.


📁 Project Structure

pcv2_epitope_project/
│
├── data/                  # Metadata, mappings, IEDB data
├── sequences/             # FASTA + alignments
├── structures/            # PDB files (3R0R, 6EZG)
├── features/              # Engineered features
├── results/               # Predictions + evaluation
├── models/                # Trained ML model
├── scripts/               # Feature + analysis scripts
├── pipeline/              # Automation scripts
│
├── dashboard.py           # Streamlit interface
└── run_smart_pipeline.py  # Full pipeline runner

⚙️ Installation

git clone https://github.com/YOUR_USERNAME/pcv2-epitope-platform.git
cd pcv2-epitope-platform

python -m venv pcv2_env
source pcv2_env/bin/activate

pip install -r requirements.txt

▶️ Usage

Run Full Pipeline

python run_smart_pipeline.py

Launch Dashboard

streamlit run dashboard.py

📊 Dashboard Features

  • 📈 Epitope probability plots
  • 🧬 Sequence visualization (UniProt-aligned)
  • 🧊 3D structure mapping (Py3Dmol)
  • 🧪 IEDB validation overlay
  • 📦 Epitope clustering

⚠️ Limitations

  • Limited experimentally validated epitopes (class imbalance)
  • Predictions are computational (require lab validation)
  • Sequence–structure mapping introduces approximation

🚀 Future Work

  • Graph Neural Networks (GNN)
  • Transformer-based protein models
  • Improved structural alignment
  • REST API deployment
  • Continuous data updates (automated pipeline)

🤝 Collaboration

Open to collaborations in:

  • Bioinformatics
  • Immunoinformatics
  • Vaccine design
  • Structural biology

📜 Disclaimer

This system provides computational predictions and should not replace experimental validation.


👤 Author

Abubakar Bioinformatics & Computational Biology


⭐ Acknowledgements

  • NCBI (sequence data)
  • RCSB PDB (structural data)
  • IEDB (epitope data)
  • Biopython, XGBoost, Streamlit communities

About

Structure-aware machine learning platform for PCV2 capsid epitope prediction using sequence, structural, and evolutionary features

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors