Mushfiq Rahman

Category: AI Infrastructure

AI Infrastructure, LLM Systems

How I Built llm0: An LLM Gateway Architecture Walk-Through

Written by

Mushfiq Rahman

on

May 6, 2026

From a weekend Python prototype to a 3 ms p50 Go gateway with Redis Lua scripts, pgvector semantic caching, and cross-provider failover. Why I built it A team I follow…
AI Infrastructure, LLM Systems

Benchmarking Mistral-7B on RTX 4090: vLLM Concurrency, Latency, and Capacity Planning

Written by

Mushfiq Rahman

on

March 5, 2026

Most LLM benchmarks highlight peak tokens per second. That metric is incomplete. For production systems — especially agentic workloads — what actually matters is: • Where is the latency–throughput knee?•…