Skip to content
View pavanbadempet's full-sized avatar

Block or report pavanbadempet

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
pavanbadempet/README.md

👋 Hi, I'm Pavan Badempet

Pavan Badempet Profile Banner

Pavan Badempet Profile Views

🌌 Data & MLOps Platform Engineer | Big Data Architect

I am a Data Engineer and MLOps Platform Specialist focused on building high-throughput, distributed data platforms, scalable ETL pipelines, and machine learning infrastructure. My expertise lies in designing robust lakehouse architectures (Delta Lake, Lakefs), orchestrating complex workflows (Apache Airflow), and engineering production-grade ML pipelines.

Experienced in implementing healthcare interoperability gates (ABDM compliance), real-time vital signal streaming analytics, and deep learning clinical ensembles (FT-Transformers, Bi-LSTM temporal models) with automated cloud retraining triggers.


🚀 Technical Superpowers

💻 Languages & Core

PythonScalaSQL (PostgreSQL, MySQL, SQLite)JavaTypeScriptGoBash

📊 Big Data & Orchestration

Apache SparkPySpark StreamingDelta LakeApache IcebergDremioSnowflakeApache AirflowDatabricksApache HadoopData Quality (Great Expectations)

🤖 Machine Learning & MLOps

Scikit-learnPyTorchTensorFlowTabPFNKaggle APIHugging Face HubConformal Prediction

☁️ Cloud, Databases & DevOps

AWS (EMR, S3, EC2, RDS, IAM)DockerKubernetesMinIO / HDFSAutoSysGitHub Actions CI/CDPinecone / SimpleVectorStoreAllembic / migrations

📂 Featured Production Projects

Python, PySpark Streaming, Airflow, Delta Lake, FastAPI, Docker, Kubernetes, AWS

  • Built an end-to-end data platform for 250k+ clinical records using Apache Airflow and PySpark pipelines, staging data in partitioned Delta Lake tables.
  • Implemented automated cloud retraining triggers via Kaggle API and model weight synchronization with a private Hugging Face dataset hub.
  • Developed a FastAPI service with a local vector retrieval index (turbovec SIMD), JWT auth, and FHIR R4 clinical compliance serializers.

Python, PySpark, Airflow, Delta Lake, Redis, ONNX Runtime, FAISS, FastAPI, Docker

  • Engineered a causal movie recommendation engine using PySpark Medallion pipelines for ETL and feature store curation.
  • Developed a real-time clickstream feedback loop using Redis streams to update user sequential states asynchronously (sub-10ms latency).
  • Implemented an adaptive serving API with hardware-aware fallbacks (NVIDIA GPU ensembling, quantized ONNX CPU, and SIMD vector index search).

📊 GitHub Activity & Metrics

Pavan's GitHub Stats Pavan's Top Languages

Pavan's Contribution Streak


🌐 Connect & Collaborate


🔍 Career Keywords & Technical Index (SEO)

This profile indexes major industry domains and systems: Core Specializations: Data Platform Architect, Big Data Engineer Portfolio, MLOps Pipelines, Python and Scala Developer, AWS Solutions, Lakehouse Architect. Distributed Platforms: Apache Spark, PySpark Streaming, Delta Lake, Apache Airflow, Databricks, Data Lakehouses, PySpark ETL. AI Infrastructure & Inference: FT-Transformer models, TabPFN models, PyTorch Tabular MLP ensembles, conformal prediction bounds, Hugging Face Hub, Kaggle API integration. Compliance & Health Informatics: Ayushman Bharat Digital Mission (ABDM) gateways, FHIR standards, vital signals streaming.

Pinned Loading

  1. AI-Healthcare-System AI-Healthcare-System Public

    AI healthcare platform featuring multi-disease prediction & local medical AI assistant. Powered by FastAPI, React Vite, PySpark MLOps & vector database.

    Python 31 9

  2. Movie-Recommendation-System Movie-Recommendation-System Public

    Movie recommendation engine featuring a 6-model hybrid ensemble (SBERT semantic, turbovec vector search, Collaborative Filtering, PageRank, Content, Knowledge Graph) with FastAPI & React Vite. Powe…

    Python 8

  3. PRABC PRABC Public

    The idea is to estimate the chance of developing Breast Cancer. Providing advanced Data Insights. This helps women to understand the need for care. This comes with supposed Informatic Applications …

    Python 8

  4. 1dmusic 1dmusic Public

    1D Music (Material Design Based Application)

    Java 10 3

  5. pavanbadempet.github.io pavanbadempet.github.io Public

    Personal Data Engineering Portfolio of Me. A high-performance, responsive site built to showcase data architecture, technical projects, and engineering skills. Powered by Jekyll & Custom SCSS.

    JavaScript 5

  6. MovieGuide MovieGuide Public

    Forked from esoxjem/MovieGuide

    Movie discovery app showcasing MVP, RxJava, Dagger 2 and Clean Architecture

    Java 8