srptv aserputov

`> whoami`

Software Engineer building LLM infrastructure at scale. I design autonomous agent frameworks, optimize inference pipelines, and build the systems that make large language models actually work in production.

Currently architecting LLM agent systems with Model Context Protocol (MCP) — from code-generation compilers to root-cause analysis agents powered by Claude API. Experienced in shipping high-throughput distributed systems with ML-driven workloads.

When I'm not at work, I'm benchmarking inference engines, profiling KV cache bottlenecks, and quantizing models on Apple Metal.

What I'm Building

LLM Inference Benchmark

Benchmarking framework measuring inference throughput (tokens/sec), per-token latency, and memory footprint across model sizes. Covers INT4/8-bit quantization on Apple Metal GPU, thread-scaling analysis, and sub-linear OPS parallelization for decode vs near-linear scaling for prefill.

Python llama.cpp Metal GGUF

Next-Token Prediction Engine

Frequency-based language model implementing core next-token prediction from scratch — n-gram statistics, vocabulary search, probability ranking. Beam-search inspired ranking algorithm that surfaces high-probability completions in O(log n) time.

Python NLP Probability Beam Search

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

srptv aserputov

Achievements

Achievements

Highlights

Block or report aserputov

`> whoami`

What I'm Building

LLM Inference Benchmark

Next-Token Prediction Engine

Pinned Loading

Uh oh!