Hi, I’m Mushfiq Rahman.

I’m a Senior Software Engineer and Tech Lead in healthcare. My interests are in LLM systems, agents, inference infrastructure, and distributed systems. I enjoy studying latency/throughput tradeoffs and building systems focused on production reliability.
On this blog I write about LLM inference systems, agents, and lessons from real-world software engineering. Many posts come from hands-on experiments such as benchmarking vLLM, understanding KV cache behavior, identifying the latency–throughput “knee” of the curve, and exploring how modern AI systems behave under load.
Based in Atlanta • Writing about AI infrastructure, LLM inference, agents, and distributed systems.