-
Why I’d Choose a Modular Monolith Over Microservices
Written by
on
This Is an Opinion This is not a tutorial — just a perspective shaped by building and operating production systems. I’ve worked on systems that needed to process data reliably,…
-
Benchmarking Mistral-7B on RTX 4090: vLLM Concurrency, Latency, and Capacity Planning
Written by
on
Most LLM benchmarks highlight peak tokens per second. That metric is incomplete. For production systems — especially agentic workloads — what actually matters is: • Where is the latency–throughput knee?•…