{
  "version": "https://jsonfeed.org/version/1.1",
  "title": "Anass Ez-zouaine — Senior Backend Engineer · Software Architect · AI Engineer",
  "home_page_url": "https://ansezz.com/",
  "feed_url": "https://ansezz.com/feed.json",
  "description": "Senior Lead Backend Engineer, Software Architect, and AI Engineer. 12+ years building production Laravel SaaS, Shopify Plus apps, and AI features (Claude, MCP, RAG, agentic systems). Remote-first since 2014.",
  "language": "en",
  "authors": [
    {
      "name": "Anass Ez-zouaine",
      "url": "https://ansezz.com"
    }
  ],
  "items": [
    {
      "id": "https://ansezz.com/blog/coolify-self-hosted-saas/",
      "url": "https://ansezz.com/blog/coolify-self-hosted-saas/",
      "title": "The Coolify revolution: why I'm ditching expensive cloud providers for self-hosted SaaS",
      "summary": "The 'cloud tax' kills SaaS margins before product-market fit. How Coolify — an open-source, self-hostable Heroku — plus ARM instances on Hetzner or Oracle's free tier cuts a $500/month AWS bill down to single digits, with zero vendor lock-in and a git-push deploy experience.",
      "content_html": "<p>Most SaaS founders are quietly getting robbed by their own cloud provider.</p>\n<p>I have spent over a decade building and scaling web applications, and if there is one thing I have learned, it is that the \"cloud tax\" is the most effective way to kill your margins before you even find product-market fit. We have been conditioned to believe that unless our small CRUD app is running on a multi-region, auto-scaling AWS EKS cluster, we are doing it wrong.</p>\n<p>That is a lie designed to keep you paying for complexity you do not need.</p>\n<h2>The architecture of a trap</h2>\n<p>It starts innocently enough. You sign up for AWS or GCP because they give you $1,000 in credits. You spin up an RDS instance for your database, a few S3 buckets for storage, and maybe a managed Kubernetes service because it feels \"professional.\"</p>\n<p>Then the credits run out.</p>\n<p>Suddenly, you are paying $200 a month for a database that is 99% idle. You are paying for NAT gateways, provisioned IOPS, and \"management fees\" for services that could easily run on a $5 VPS. You are stuck in a web of proprietary APIs and IAM roles that require a full-time DevOps engineer just to update an environment variable.</p>\n<p>This is the agitation: managed services feel like a superpower at the start, but they become a golden cage as you scale. The complexity overhead alone is enough to slow your development velocity to a crawl. When I look at how <a href=\"https://ansezz.com/blog/ai-vs-traditional-development/\">AI is changing traditional development</a>, it becomes clear that we need to move faster, not get bogged down in infrastructure molasses.</p>\n<h2>Enter Coolify: Heroku's open-source soulmate</h2>\n<p><img src=\"https://ansezz.com/blog/coolify-self-hosted-saas/features.webp\" alt=\"Coolify dashboard showing its core self-hosting features\" /></p>\n<p>The solution I have moved my entire stack to is Coolify.</p>\n<p>Coolify is an open-source, self-hostable alternative to Vercel, Heroku, and Railway. It gives you that same \"git push to deploy\" experience we all love, but it runs on your own hardware. Whether you have a $4 VPS on Hetzner or a massive ARM-based instance on Oracle Cloud, Coolify turns it into a private PaaS.</p>\n<p>I recently wrote about how <a href=\"https://ansezz.com/blog/coolify-docker-saas-hosting/\">Coolify and Docker are changing SaaS hosting</a>, but the shift is deeper than just a tool change. It is a mindset shift toward technical sovereignty.</p>\n<p>Here is what makes Coolify a game-changer for a senior engineer:</p>\n<ul>\n<li><strong>Zero vendor lock-in</strong> — your configurations are stored on your server. If Coolify disappeared tomorrow, your Docker containers would keep running.</li>\n<li><strong>Automatic SSL</strong> — it handles Let's Encrypt out of the box. No more messing with Nginx configs or certbot.</li>\n<li><strong>Database management</strong> — you can spin up Postgres, MySQL, Redis, or MongoDB in one click. They run as containers on your server, meaning you pay $0 in additional managed service fees.</li>\n<li><strong>Pull request deployments</strong> — it creates temporary environments for every PR, just like Vercel, but without the \"team seat\" tax.</li>\n</ul>\n<h2>The magic of ARM (Graviton and OCI)</h2>\n<p>If you want to see a 90% reduction in your infrastructure bill, you need to stop using x86 and start using ARM.</p>\n<p>AWS Graviton instances are roughly 20-40% cheaper and more performant than their Intel-based counterparts. But the real \"cheat code\" right now is Oracle Cloud Infrastructure (OCI). Their \"Always Free\" tier gives you 4 ARM Ampere A1 cores and 24 GB of RAM for free.</p>\n<p>I can run an entire production SaaS — frontend, backend, database, and Redis — on that single free instance using Coolify.</p>\n<p>When you pair ARM efficiency with a self-hosted orchestrator, the math changes. A startup that was paying $500/month on AWS can often move that entire workload to a $40/month ARM instance on Hetzner or OCI. That extra $460 goes back into your pocket or your marketing budget.</p>\n<p><img src=\"https://ansezz.com/blog/coolify-self-hosted-saas/architecture.webp\" alt=\"Architecture of a self-hosted SaaS running on a single ARM instance\" /></p>\n<h2>Docker and Nix: the engine room</h2>\n<p>Coolify relies heavily on Docker, which is the industry standard for a reason. It ensures that what works on my machine works on the server. But as I move deeper into the \"vibe coding\" era, I'm also looking at how technologies like Nix can further stabilize our environments.</p>\n<p>By using Nix flakes to define our development environment and Docker to package the runtime, we create a bulletproof deployment pipeline. When I use tools like the <a href=\"https://ansezz.com/blog/mcp-context-aware-agents/\">Model Context Protocol (MCP)</a>, I want my AI agents to have a clear, reproducible environment to work within. Self-hosting doesn't mean \"unprofessional\" — it means having total control over the stack.</p>\n<h2>Comparison: the hidden cost of \"easy\"</h2>\n<p>Let's look at the numbers for a standard Laravel or Node.js app with a database and a background worker.</p>\n<p><strong>The managed path (Vercel + Supabase + AWS S3):</strong></p>\n<ul>\n<li>Vercel Pro: $20/month per user</li>\n<li>Supabase Pro: $25/month</li>\n<li>AWS S3 + bandwidth: $15/month</li>\n<li><strong>Total: $60+/month (and rising with every user/teammate)</strong></li>\n</ul>\n<p><strong>The Coolify path (Hetzner VPS):</strong></p>\n<ul>\n<li>4 vCPU ARM / 8GB RAM: $6/month</li>\n<li>Backups to S3-compatible storage: $1/month</li>\n<li>Coolify: $0 (open source)</li>\n<li><strong>Total: $7/month</strong></li>\n</ul>\n<p>The \"managed\" path is nearly 10x more expensive before you even have your first 100 users. For a senior engineer, the 30 minutes it takes to install Coolify on a fresh Linux box is worth the thousands of dollars saved over the life of the project.</p>\n<h2>Practical steps to join the revolution</h2>\n<p><img src=\"https://ansezz.com/blog/coolify-self-hosted-saas/workspace.webp\" alt=\"A developer workspace set up for self-hosted deployment\" /></p>\n<p>If you are tired of the cloud tax, here is my recommended path to freedom:</p>\n<ol>\n<li><strong>Grab a VPS</strong> — I recommend Hetzner for raw performance/price or OCI for their insane free tier. Pick an ARM-based instance (Ubuntu 24.04).</li>\n<li><strong>Install Coolify</strong> — run the one-line install command from their documentation. It takes about 5 minutes.</li>\n<li><strong>Connect your Git</strong> — link your GitHub or GitLab account.</li>\n<li><strong>Dockerize your app</strong> — if you are using Laravel, it is as simple as adding a <code>Dockerfile</code>. For Vite or Next.js, Coolify has built-in builders that don't even require a <code>Dockerfile</code>.</li>\n<li><strong>Move your DB</strong> — export your managed DB and import it into a Coolify-managed container. Set up S3 backups immediately.</li>\n</ol>\n<h2>The bottom line</h2>\n<p>We are entering a cycle where efficiency is the only thing that matters. The days of \"VC-subsidized\" infrastructure are over. Whether you are building a small tool or a massive enterprise SaaS, you owe it to your bottom line to look at self-hosting.</p>\n<p>Coolify has matured to the point where the developer experience is indistinguishable from the big players. The only difference is who owns the keys to the castle.</p>\n<p>I am curious: what is the most \"expensive\" mistake you have ever made on a cloud bill — a forgotten NAT gateway or a runaway Lambda function? Drop a note via <a href=\"https://ansezz.com/contact/\">contact</a>. 🤘</p>\n",
      "date_published": "2026-05-30T00:00:00.000Z",
      "date_modified": "2026-05-30T00:00:00.000Z",
      "tags": [
        "devops",
        "coolify",
        "self-hosting",
        "devops",
        "docker",
        "arm",
        "hetzner",
        "saas",
        "cost-optimization"
      ],
      "image": "https://ansezz.com/blog/coolify-self-hosted-saas/hero.webp"
    },
    {
      "id": "https://ansezz.com/blog/laravel-octane-high-traffic/",
      "url": "https://ansezz.com/blog/laravel-octane-high-traffic/",
      "title": "Laravel Octane: making PHP scream on high-traffic apps",
      "summary": "Standard PHP rebuilds the whole kitchen for every request. Laravel Octane boots once, stays warm in memory, and feeds requests through a worker pool — taking 50ms responses down to single digits. Swoole vs RoadRunner, worker-pool tuning, and surviving persistent state without memory leaks.",
      "content_html": "<p>Standard PHP is a bit like a restaurant that fires its entire staff and rebuilds the kitchen from scratch for every single customer order. You walk in, they hire a chef, buy a stove, cook your meal, and then demolish the building as soon as you leave. It's consistent and safe, but it's a massive waste of energy when you're trying to serve thousands of people at once.</p>\n<p>This is the PHP-FPM lifecycle. Every request boots the entire Laravel framework, loads your service providers, parses your config, and instantiates your objects. For low-traffic sites, it's fine. But when you hit real scale, those milliseconds of \"boot time\" become a wall you can't climb without throwing excessive amounts of expensive hardware at the problem. Horizontal scaling buys you room, but it doesn't fix the underlying latency.</p>\n<p>The solution is to stop rebuilding the kitchen. Laravel Octane changes the game by booting your application once, keeping it in memory, and then feeding requests to it through a high-performance worker pool. It transforms PHP from a \"short-lived script\" language into a \"long-lived process\" powerhouse.</p>\n<h2>Why your app feels heavy</h2>\n<p><img src=\"https://ansezz.com/blog/laravel-octane-high-traffic/fpm-vs-octane.webp\" alt=\"PHP-FPM rebuilding the framework each request versus Octane staying warm\" /></p>\n<p>The overhead of traditional PHP isn't just about speed; it's about efficiency. In a standard request, your CPU spends a significant chunk of time just getting the application ready to do work. Once it finally starts execution, it does the database query, renders the view, and then dies.</p>\n<p>If you're running a complex Laravel monolith with dozens of packages and custom service providers, your boot time might be 30ms to 50ms before a single line of your actual business logic even runs. Under high traffic, this leads to CPU thrashing. You're paying for the \"setup\" over and over again.</p>\n<p>Laravel Octane removes this boot cycle. By using high-performance application servers like Swoole or RoadRunner, your app stays resident in memory. The first request boots the framework, and subsequent requests hit a \"warm\" application. We're talking about moving from 50ms responses to sub-10ms responses just by changing how the process is managed.</p>\n<h2>Swoole vs RoadRunner: choosing your engine</h2>\n<p><img src=\"https://ansezz.com/blog/laravel-octane-high-traffic/swoole-vs-roadrunner.webp\" alt=\"Swoole and RoadRunner compared as Octane application servers\" /></p>\n<p>When you drop Octane into your project, you have to choose between two primary engines: Swoole and RoadRunner. I've used both in production, and while they both solve the \"persistent state\" problem, they do it differently.</p>\n<h3>Swoole</h3>\n<p>Swoole is a C++ extension for PHP. It's essentially a high-performance networking engine that allows PHP to handle asynchronous tasks, coroutines, and long-lived connections.</p>\n<ul>\n<li><strong>Pros</strong> — it is incredibly fast. Because it lives as an extension, it has deep access to PHP's internals. It also gives you access to the \"Octane cache,\" an in-memory store that's significantly faster than Redis for local data.</li>\n<li><strong>Cons</strong> — it can be a bit of a nightmare to install and debug. Because it's a binary extension, you have to compile it or find the right package for your OS. Xdebug doesn't always play nice with it, and it can be picky about your environment.</li>\n</ul>\n<h3>RoadRunner</h3>\n<p>RoadRunner is written in Go. It acts as a load balancer and process manager that communicates with your PHP workers via a high-speed binary protocol (Goridge).</p>\n<ul>\n<li><strong>Pros</strong> — no extensions required. It's a single binary you drop into your project. It's much easier to set up in a <a href=\"https://ansezz.com/blog/coolify-docker-saas-hosting/\">Docker container</a> and generally feels more \"cloud-native.\"</li>\n<li><strong>Cons</strong> — it's slightly slower than Swoole because of the communication overhead between the Go binary and the PHP processes, though for 99% of apps, this difference is negligible.</li>\n</ul>\n<p>For most developers starting out with Octane, I recommend RoadRunner for the ease of use. If you are chasing every last millisecond and need features like async task workers, go with Swoole.</p>\n<h2>Tuning your worker pool</h2>\n<p><img src=\"https://ansezz.com/blog/laravel-octane-high-traffic/worker-pool.webp\" alt=\"A pool of warm Octane workers handling incoming requests\" /></p>\n<p>The secret sauce of Octane is the \"worker pool.\" Instead of one process per request, you have a fixed number of workers waiting to handle incoming traffic. If you misconfigure this, you'll either leave performance on the table or crash your server.</p>\n<p>A general rule of thumb for sizing your worker pool depends on whether your app is CPU-bound or I/O-bound.</p>\n<ol>\n<li><strong>CPU-bound apps</strong> — if you're doing heavy data processing or image manipulation, set your worker count to the number of CPU cores you have. Adding more workers will just cause context switching and slow things down.</li>\n<li><strong>I/O-bound apps</strong> — most web apps spend 90% of their time waiting for a database, Redis, or an external API. In this case, you can scale your workers to 2x or even 4x your core count. This allows one worker to wait for the DB while another handles a new request.</li>\n</ol>\n<pre><code># Starting Octane with 16 workers for an I/O-heavy app\nphp artisan octane:start --server=swoole --workers=16\n</code></pre>\n<p>Don't forget about <strong>task workers</strong> if you're using Swoole. These are separate from your HTTP workers and are perfect for offloading slow tasks like sending emails or processing webhooks without blocking the main request cycle.</p>\n<h2>The danger of persistent state</h2>\n<p><img src=\"https://ansezz.com/blog/laravel-octane-high-traffic/memory-leaks.webp\" alt=\"Hunting down a memory leak in a long-lived Octane worker\" /></p>\n<p>The biggest hurdle when moving to Octane is the shift in mindset. In traditional PHP, \"leaky\" code doesn't matter much because the process dies after 100ms. In Octane, a memory leak is a ticking time bomb.</p>\n<p>If you have a static array in a service provider that you append to on every request, that array will grow until your server runs out of RAM. You have to be extremely careful with:</p>\n<ul>\n<li><strong>Static properties</strong> — avoid using them for request-specific data.</li>\n<li><strong>Singletons</strong> — if you register a singleton in your app container, it stays alive. If that singleton caches data, you need to make sure that data is cleared or managed properly between requests.</li>\n<li><strong>Global state</strong> — avoid <code>global</code> variables at all costs (which you should be doing anyway).</li>\n</ul>\n<p>Laravel helps you by \"resetting\" some core services between requests, but it can't catch everything. If you're migrating an old codebase from <a href=\"https://ansezz.com/blog/monolith-to-microservices/\">monolith to microservices</a>, you'll want to audit your service providers for any long-lived state.</p>\n<h3>Protecting yourself with max-requests</h3>\n<p>The best insurance policy against memory leaks is the <code>--max-requests</code> flag. This tells Octane to kill and restart a worker after it has handled a certain number of requests.</p>\n<pre><code># Restart workers every 1000 requests to prevent memory bloat\nphp artisan octane:start --max-requests=1000\n</code></pre>\n<p>This keeps your memory usage predictable while still giving you the performance benefits of a warm application.</p>\n<h2>Real-world optimization tips</h2>\n<p>Once you have Octane running, there are a few technical levers you can pull to squeeze out even more performance.</p>\n<ol>\n<li><strong>Connection pooling</strong> — ensure your database connections are persistent. In Octane, your workers stay alive, so they can hold onto their DB connections instead of reconnecting every time. Check your <code>database.php</code> config and ensure you aren't hitting the max connection limit on your DB server.</li>\n<li><strong>Octane cache</strong> — if you're using Swoole, use <code>Octane::cache()</code>. It's an in-memory table that's blistering fast. Use it for frequently accessed configuration or small datasets that don't change often.</li>\n<li><strong>Bytecode caching</strong> — make sure OPcache is enabled and tuned. Set <code>opcache.validate_timestamps=0</code> in production since your code won't be changing while the server is running.</li>\n<li><strong>Graceful reloads</strong> — when you deploy new code, use <code>php artisan octane:reload</code>. This will gracefully restart the workers without dropping current connections. It's essential for zero-downtime deployments.</li>\n</ol>\n<h2>Takeaways for the high-traffic dev</h2>\n<p>Transitioning to Octane isn't just about installing a package; it's about maturing your infrastructure.</p>\n<ul>\n<li><strong>Identify the bottleneck</strong> — only use Octane if your boot time is the problem. If your database queries take 2 seconds, Octane won't help you.</li>\n<li><strong>Test for leaks</strong> — watch how your app behaves under sustained load and profile its memory growth over time.</li>\n<li><strong>Monitor workers</strong> — keep an eye on your CPU and RAM usage to find the \"sweet spot\" for your worker count.</li>\n<li><strong>Leverage concurrency</strong> — use <code>Octane::concurrently()</code> to execute multiple tasks at once and return their results, cutting down total response time for complex pages.</li>\n</ul>\n<p>Octane takes PHP to a level where it can compete with Node.js and Go for high-concurrency applications while keeping the developer experience of Laravel intact. If you're building a SaaS that expects a lot of noise, this is your jet engine.</p>\n<p>Are you running Octane in production yet, or is the fear of memory leaks keeping you on PHP-FPM? Drop a note via <a href=\"https://ansezz.com/contact/\">contact</a> — let's talk worker counts. 🤘</p>\n",
      "date_published": "2026-05-29T00:00:00.000Z",
      "date_modified": "2026-05-29T00:00:00.000Z",
      "tags": [
        "laravel",
        "laravel",
        "octane",
        "php",
        "performance",
        "swoole",
        "roadrunner",
        "scaling",
        "high-traffic"
      ],
      "image": "https://ansezz.com/blog/laravel-octane-high-traffic/hero.webp"
    },
    {
      "id": "https://ansezz.com/blog/shopify-ucp-quick-start/",
      "url": "https://ansezz.com/blog/shopify-ucp-quick-start/",
      "title": "Your quick-start guide to Shopify UCP: do this before June 15",
      "summary": "AI agents are the biggest spenders of 2026 — and they can't see your store without a UCP manifest. How the Universal Commerce Protocol works, why the /.well-known/ucp endpoint is the robots.txt of agentic commerce, and a pre-deadline checklist to make your Shopify store a first-class citizen in the AI economy.",
      "content_html": "<p>Your store is effectively invisible to the biggest spenders of 2026. I'm not talking about Gen Z or Alpha. I'm talking about AI agents.</p>\n<p>If Google Gemini or OpenAI's \"Operator\" can't find a machine-readable map of your Shopify store, it won't recommend your products. It definitely won't buy them.</p>\n<p>Traditional SEO was built for humans with eyeballs. The Universal Commerce Protocol (UCP) is built for agents with wallets.</p>\n<p>The rollout is moving fast. If you haven't configured your UCP manifest by June 15, you are opting out of the agentic economy.</p>\n<p>Here is how to fix that.</p>\n<h2>Why your legacy SEO is failing</h2>\n<p>For ten years, we've obsessed over meta tags and JSON-LD schema. That was enough when Google was just a list of links. But we've entered the era of <a href=\"https://ansezz.com/blog/agentic-commerce-shopify/\">agentic commerce on Shopify</a>.</p>\n<p>Agents like Gemini don't \"browse\" your collections. They don't look at your hero banners. They want to know three things:</p>\n<ol>\n<li>What do you sell?</li>\n<li>Can I buy it right now?</li>\n<li>Which API do I call to checkout?</li>\n</ol>\n<p>If your site doesn't answer these questions in a standardized way, the agent moves to a competitor who does. Legacy schema tells an agent what a product <em>is</em>. UCP tells an agent how to <em>transact</em>.</p>\n<p><img src=\"https://ansezz.com/blog/shopify-ucp-quick-start/seo-vs-ucp.webp\" alt=\"Side-by-side comparison of human-centric SEO versus agent-centric UCP\" /></p>\n<h2>What is the Universal Commerce Protocol?</h2>\n<p>UCP is an open standard backed by Google and Shopify. It creates a \"handshake\" between a merchant and an AI agent.</p>\n<p>The core of this handshake is the <code>/.well-known/ucp</code> endpoint.</p>\n<p>It's a simple JSON file that acts as a discovery manifest. When an agent hits your domain, it looks for this file to understand your capabilities. It's the <code>robots.txt</code> of the AI era, but for money.</p>\n<p>Here is what a raw UCP manifest looks like:</p>\n<pre><code>{\n  \"ucp\": {\n    \"version\": \"2026-01-11\"\n  },\n  \"role\": \"merchant\",\n  \"capabilities\": {\n    \"checkout\": {\n      \"endpoints\": {\n        \"create_session\": \"https://api.yourstore.com/ucp/checkout/sessions\"\n      }\n    },\n    \"payment\": {\n      \"handlers\": [\"shop_pay\", \"google_pay\"]\n    }\n  }\n}\n</code></pre>\n<p>This manifest tells the agent exactly where to send the payment data. No scraping required. No \"guessing\" where the add-to-cart button is.</p>\n<h2>How to enable UCP on Shopify</h2>\n<p>If you are running a standard Shopify or Shopify Plus store, the good news is you don't have to write this JSON manually. Shopify is baking this directly into the core.</p>\n<p>But it isn't always on by default.</p>\n<p>To ensure you're ready for the June 15 deadline, go to your Shopify admin. Navigate to <strong>Settings &gt; Apps and sales channels</strong>. Look for <strong>Agentic storefronts</strong>.</p>\n<p>Once you toggle this on, Shopify automatically generates and hosts the manifest at <code>your-store.com/.well-known/ucp</code>.</p>\n<p>If you are running a headless setup with Hydrogen or a custom frontend, you'll need to ensure your middleware correctly proxies this path back to Shopify's servers. We see a lot of developers break their agentic discovery because their Vercel or Netlify rewrites aren't handling the <code>.well-known</code> directory.</p>\n<p><img src=\"https://ansezz.com/blog/shopify-ucp-quick-start/architecture.webp\" alt=\"Architecture diagram of an agent reaching the UCP manifest through middleware\" /></p>\n<h2>The June 15 deadline: why now?</h2>\n<p>Why is everyone panicking about mid-June?</p>\n<p>Google is moving \"AI Mode\" out of beta for a massive segment of search users. When Gemini becomes the default shopping assistant, it will prioritize stores that support the UCP handshake.</p>\n<p>This is also when Shopify begins its wider rollout of <a href=\"https://ansezz.com/blog/mcp-context-aware-agents/\">MCP context-aware agents</a>. These agents use the Model Context Protocol to link your store's live data directly into the LLM's reasoning loop.</p>\n<p>If your manifest is missing, your store is a black box. The agent will see your products in the search index, but it won't be able to fulfill the request \"buy me the best hiking boots from a local store.\" It will choose the store where it can verify the checkout capability instantly.</p>\n<h2>Auditing your implementation</h2>\n<p>Don't just flip a switch and hope. You need to verify that your manifest is actually reachable by external bots.</p>\n<ol>\n<li>Open a private browser window.</li>\n<li>Go to <code>https://yourdomain.com/.well-known/ucp</code>.</li>\n<li>You should see a raw JSON object.</li>\n<li>If you see a 404 or a redirect to your homepage, your theme or app is blocking the path.</li>\n</ol>\n<p>Check your <code>robots.txt</code> file as well. Ensure you aren't accidentally disallowing agents from crawling the <code>.well-known</code> directory.</p>\n<p>If you are using a custom Laravel backend to power your Shopify store's logic, make sure your routes file includes a specific entry for this endpoint. At ansezz, we often see custom middleware stripping out these hidden directories for \"security\" reasons. In 2026, that security is just costing you sales.</p>\n<p><img src=\"https://ansezz.com/blog/shopify-ucp-quick-start/admin-toggle.webp\" alt=\"Shopify admin showing the Agentic storefronts toggle enabled\" /></p>\n<h2>Technical takeaways</h2>\n<p>Here is your pre-June 15 checklist:</p>\n<ul>\n<li>Enable <strong>Agentic storefronts</strong> in your Shopify admin.</li>\n<li>Verify the <code>/.well-known/ucp</code> path returns valid JSON.</li>\n<li>Ensure your payment handlers (Shop Pay, Google Pay) are correctly listed in the manifest.</li>\n<li>Test your site with an agentic browser or a UCP validator tool.</li>\n<li>Check your <a href=\"https://ansezz.com/blog/coolify-docker-saas-hosting/\">Coolify Docker SaaS hosting</a> configs if you're hosting custom middleware, to ensure the path isn't being dropped by your reverse proxy.</li>\n</ul>\n<p>The transition from a human-centric web to agent-centric commerce is happening whether we're ready or not. UCP is the first real step toward making your store a first-class citizen in the AI economy.</p>\n<p>Are you letting agents shop your store, or are you still building for a world that clicks? Drop a note via <a href=\"https://ansezz.com/contact/\">contact</a> — let's get your manifest live before the deadline. 🤘</p>\n",
      "date_published": "2026-05-28T00:00:00.000Z",
      "date_modified": "2026-05-28T00:00:00.000Z",
      "tags": [
        "shopify",
        "shopify",
        "ucp",
        "agentic-commerce",
        "ai",
        "seo",
        "hydrogen",
        "shopify-plus",
        "agents"
      ],
      "image": "https://ansezz.com/blog/shopify-ucp-quick-start/hero.webp"
    },
    {
      "id": "https://ansezz.com/blog/claude-mcp-dev-tools/",
      "url": "https://ansezz.com/blog/claude-mcp-dev-tools/",
      "title": "Claude MCP: why I'm connecting my dev tools to LLMs for real-time context",
      "summary": "The Model Context Protocol is the USB port for LLMs. One MCP server, any MCP-aware host — killing the M×N integration tax. How the host/client/server architecture works, the servers I run daily, and why the sandbox model makes it safe enough to plug into production.",
      "content_html": "<p>Every time I have to build a custom integration for a new tool, a little piece of my developer soul dies. It is a maintenance nightmare that never ends. We have reached a point where building the actual product is often faster than setting up the pipes to make it work with our data.</p>\n<p>If you have spent any time building <a href=\"https://ansezz.com/blog/agentic-workflows-vibe-coding/\">agentic workflows and vibe coding</a>, you know exactly what I am talking about. You have an LLM like Claude that is incredibly smart but essentially locked in a room with no windows. To give it context, you have to manually copy-paste code, export CSV files, or spend three days writing a brittle wrapper for a third-party API just so your assistant can \"see\" your work.</p>\n<p>This fragmentation is the biggest bottleneck in modern software development. We have powerful models, but they are isolated from our local files, our databases, and our production logs. It is like having a world-class architect who isn't allowed to visit the construction site. They are just guessing based on the photos you decide to send them.</p>\n<p>Enter the Model Context Protocol — or as I have been calling it: the USB port for LLMs.</p>\n<h2>The fragmentation tax is killing your productivity</h2>\n<p>The problem is simple but massive. Every AI application — whether it is Claude Desktop, a custom agent, or an IDE extension — wants to talk to your data. On the other side, every data source — your GitHub repos, your Postgres databases, your Slack channels — has its own specific API and authentication flow.</p>\n<p>Without a standard, we are stuck in an M×N problem. If you have 5 AI apps and 10 data sources, you need 50 different integrations. This is why most \"AI-powered\" tools feel shallow. They only support a few basic integrations, and if you want to use your internal company data, you are back to writing custom glue code.</p>\n<p>This agitation is real. We are wasting hours building the same connectors over and over again. We are worried about security because every new integration is another potential leak. And we are frustrated because the \"magic\" of AI disappears the moment we hit a data silo.</p>\n<h2>The solution: MCP as a universal standard</h2>\n<p><img src=\"https://ansezz.com/blog/claude-mcp-dev-tools/architecture.webp\" alt=\"MCP sitting between data sources and AI hosts as a universal translator\" /></p>\n<p>Claude MCP (Model Context Protocol) is the first serious attempt to standardize how AI applications discover and interact with data and tools. Instead of building a specific connector for every model and every tool, you build an MCP server.</p>\n<p>This server acts as a translator. It sits between your data and the AI, exposing a consistent interface that any MCP-compliant host (like Claude Desktop) can understand. It is exactly like the USB standard. It doesn't matter if you are plugging in a mouse, a keyboard, or an external drive. The protocol is the same, so it just works.</p>\n<p>This shifts the entire paradigm of <a href=\"https://ansezz.com/blog/mcp-context-aware-agents/\">context-aware agents</a>. Instead of hard-coding logic into the agent, you simply \"plug in\" the servers you need.</p>\n<h3>How the architecture actually works</h3>\n<p>There are three main players in the MCP ecosystem:</p>\n<ol>\n<li><strong>The host</strong> — this is the environment the user interacts with. It could be Claude Desktop, a terminal, or an IDE like Cursor. The host is responsible for managing the lifecycle of the connection.</li>\n<li><strong>The client</strong> — this is the part of the host that speaks the protocol. It does the \"handshake\" with the server to find out what it can do.</li>\n<li><strong>The server</strong> — this is a lightweight program that provides context (resources), actions (tools), and prompt templates.</li>\n</ol>\n<p>For example, if I want Claude to have access to my local project files, I run a local MCP server that exposes those files as \"resources.\" The host (Claude Desktop) asks the server: \"what do you have?\" The server replies: \"I have these 10 files and a tool to run grep searches.\"</p>\n<p>The model can then decide to call the \"grep\" tool whenever it needs to find a specific function definition. I didn't have to write a single line of logic inside Claude to make that happen. I just connected the server.</p>\n<h2>Modularity and the MCP server ecosystem</h2>\n<p><img src=\"https://ansezz.com/blog/claude-mcp-dev-tools/servers.webp\" alt=\"A grid of MCP servers — Postgres, GitHub, Google Drive — ready to plug in\" /></p>\n<p>The beauty of this modularity is that once a server is built, anyone can use it. The community has already started building servers for everything you can imagine. I have been using a few in my daily workflow that have completely changed how I code:</p>\n<ul>\n<li><strong>Postgres MCP</strong> — I can point Claude at a local or remote database. It can inspect schemas and even run queries to help me debug data issues without me leaving the chat.</li>\n<li><strong>GitHub MCP</strong> — this allows the model to search through my repositories, list issues, and even create pull requests. It is like having a junior dev who actually knows where the code is.</li>\n<li><strong>Google Drive MCP</strong> — perfect for when I need to cross-reference technical documentation stored in docs with the actual implementation in my IDE.</li>\n</ul>\n<p>This also solves a massive pain point in <a href=\"https://ansezz.com/blog/agentic-commerce-shopify/\">agentic commerce for Shopify</a>. Imagine an agent that can talk directly to your Shopify store via MCP to check inventory levels or update product descriptions in real-time, all while maintaining a secure, standardized connection.</p>\n<h2>Security first: the sandbox model</h2>\n<p><img src=\"https://ansezz.com/blog/claude-mcp-dev-tools/security.webp\" alt=\"Sandboxed MCP server communicating through a narrow pipe to the host\" /></p>\n<p>The biggest question I get when I talk about connecting dev tools to an LLM is: \"is it safe?\"</p>\n<p>Security is baked into the design of MCP. Because the server is a separate process, it runs in its own sandbox. It only has access to the specific resources you grant it.</p>\n<p>For local servers, the protocol typically uses <code>stdio</code> (stdin/stdout). This means the server can only talk to the host through a very narrow pipe. It doesn't have open network ports listening for connections. It only exists as long as the host is running it.</p>\n<p>For remote servers, MCP uses OAuth 2.1. This allows for fine-grained permissions. You can authorize a GitHub MCP server to only read public repositories, or a database server to only access specific tables.</p>\n<p>This is a huge improvement over the \"give me your master API key\" approach that we have seen in the past. We can now treat AI tools with the same \"least privilege\" mindset we use for any other service in our stack. This is especially important when you are trying to avoid <a href=\"https://ansezz.com/blog/7-rag-mistakes-production/\">RAG mistakes in production</a>, where data leakage is a top-tier risk.</p>\n<h2>Why I am betting on MCP</h2>\n<p>I have been a developer for over a decade, and I have seen plenty of \"standards\" come and go. What makes MCP different is its simplicity and its backers. Anthropic has made this open source because they realize that the more context a model has, the more valuable it becomes.</p>\n<p>We are moving toward a world of \"agentic\" software development. In this world, we don't just use AI to write snippets of code. We use AI as an orchestrator that can reach into our cloud infrastructure on GCP, check our Docker logs, and suggest fixes for a failing Laravel app.</p>\n<p>Without a protocol like MCP, that vision is impossible to scale. It would be too expensive and too risky to build. But with MCP, we are building a world where tools are plug-and-play.</p>\n<h3>Practical takeaways for senior engineers</h3>\n<p>If you are ready to start experimenting with this, here is what I recommend:</p>\n<ul>\n<li><strong>Install the Claude Desktop app</strong> — it is currently the most mature host for MCP.</li>\n<li><strong>Try the filesystem server</strong> — this is the easiest way to feel the power. Give Claude access to a specific folder and watch it navigate your codebase.</li>\n<li><strong>Don't build, search first</strong> — check the official MCP GitHub repository. There are already servers for Brave search, Postgres, Slack, and more.</li>\n<li><strong>Think in tools, not just prompts</strong> — start thinking about what \"tools\" your internal systems could expose. If you have a custom admin panel, could it be an MCP server?</li>\n</ul>\n<p>Connecting your dev tools to an LLM isn't just about speed. It is about reducing the cognitive load of switching between tabs, terminals, and documentation. It allows you to stay in the \"flow\" longer.</p>\n<p>Are you ready to stop copy-pasting your code into a chat box and start connecting your tools directly to the brain? What is the one internal tool you wish you could \"plug in\" to Claude right now? Drop a note via <a href=\"https://ansezz.com/contact/\">contact</a> — let's figure it out. 🤘</p>\n",
      "date_published": "2026-05-27T00:00:00.000Z",
      "date_modified": "2026-05-27T00:00:00.000Z",
      "tags": [
        "ai",
        "mcp",
        "claude",
        "anthropic",
        "agentic-ai",
        "tool-use",
        "context",
        "dev-tools",
        "integrations"
      ],
      "image": "https://ansezz.com/blog/claude-mcp-dev-tools/hero.webp"
    },
    {
      "id": "https://ansezz.com/blog/redis-semantic-caching-rag/",
      "url": "https://ansezz.com/blog/redis-semantic-caching-rag/",
      "title": "Caching for speed: Redis and semantic layers in RAG",
      "summary": "Stop paying for the same LLM call twice. Two-tier caching — exact-match Redis keys plus semantic vector lookups via RedisVL — that cuts RAG latency from seconds to milliseconds and slashes API spend by up to 80%. With tenant isolation, TTL tiers, and the precision metrics that keep it honest.",
      "content_html": "<p>You finally shipped your RAG pipeline. It works. The retrieval is accurate. The LLM is snappy. But then you look at your cloud bill and your P99 latency. Every single query — even \"what are your shipping times?\" asked for the tenth time — triggers a full chain of embedding, vector search, and an expensive LLM call.</p>\n<p>At scale, this is a disaster. You are essentially paying for the same computation over and over again. Your users are waiting two seconds for answers that should take twenty milliseconds. Your \"denial of wallet\" risk is through the roof.</p>\n<p>The solution isn't a bigger model or a faster vector DB. It's a smarter cache. I'm talking about semantic caching with Redis. It cuts latency from hundreds of milliseconds to single digits and slashes your API costs by up to 80 percent.</p>\n<p>Here is how I build these systems to handle production traffic.</p>\n<h2>The two-tier cache architecture</h2>\n<p>Standard caching relies on exact matches. If a user asks \"How do I reset my password?\" and another asks \"how do i reset my password\", they might hit the same key if you normalize the string. But if the second user asks \"Can you help me change my password?\", a traditional cache fails.</p>\n<p>In a modern RAG stack, I use a two-tier approach.</p>\n<ol>\n<li><strong>Exact cache</strong> — a simple key-value store in Redis. I normalize the query (lowercase, trim, strip punctuation) and hash it. It's your first line of defense. It costs almost nothing and has zero false positives.</li>\n<li><strong>Semantic cache</strong> — if the exact cache misses, I embed the query and look for \"near enough\" matches in a Redis vector index. If I find a previous question with a similarity score of 0.95 or higher, I serve that cached response instead of hitting the LLM.</li>\n</ol>\n<p>This architecture ensures that you never do the heavy lifting twice for the same intent.</p>\n<p><img src=\"https://ansezz.com/blog/redis-semantic-caching-rag/architecture.webp\" alt=\"Two-tier cache architecture — exact match, semantic match, LLM fallback\" /></p>\n<h2>Why Redis is the king of semantic caching</h2>\n<p>Most developers think of Redis as just a key-value store. But with the <a href=\"https://redis.io/blog/how-to-cache-semantic-search/\">Redis Vector Library (RedisVL)</a>, it becomes a high-performance vector database.</p>\n<p>Why use Redis for this instead of your main vector DB like Pinecone or Weaviate?</p>\n<p>Latency.</p>\n<p>Your main vector DB is likely optimized for searching through millions of document chunks. Your semantic cache is much smaller — it only stores recent queries and answers. By co-locating this cache in Redis, which likely already sits in your application tier, you reduce network hops.</p>\n<p>I typically see vector lookups in Redis finish in under 5ms. Compare that to an embedding API call that takes 100ms and an LLM generation that takes 1500ms. The math is simple.</p>\n<h2>Implementing the semantic layer</h2>\n<p>The trick to a good semantic cache is the similarity threshold. Too low, and you give users wrong answers (the \"semantic trap\"). Too high, and you never get a cache hit.</p>\n<p>I usually start with a distance threshold of 0.1 for cosine distance, which translates to roughly 90 percent similarity. You can implement this quickly using the RedisVL extensions.</p>\n<pre><code>from redisvl.extensions.llmcache import SemanticCache\n\n# Initialize the cache with a conservative threshold\nllm_cache = SemanticCache(\n    name=\"production_rag_cache\",\n    redis_url=\"redis://localhost:6379\",\n    distance_threshold=0.1,\n)\n\n# Check for a hit\nquery = \"how do i update my billing info?\"\nhit = llm_cache.check(prompt=query)\n\nif hit:\n    return hit[0][\"response\"]\n\n# If miss, run full RAG and then store\n# response = run_rag_pipeline(query)\n# llm_cache.store(prompt=query, response=response)\n</code></pre>\n<p>This simple wrapper handles the embedding of the incoming query, the vector search in Redis, and the logic for returning the most relevant cached response.</p>\n<p><img src=\"https://ansezz.com/blog/redis-semantic-caching-rag/code.webp\" alt=\"Python semantic-cache snippet on a developer workstation\" /></p>\n<h2>Avoid the semantic trap: context and versioning</h2>\n<p>Semantic caching is powerful but dangerous if you aren't careful. If your underlying data changes, your cache might still be serving old, incorrect information.</p>\n<p>I always include a <code>context_version</code> in my cache keys or metadata. If I re-index my product catalog or update my documentation, I bump the version. The cache immediately starts missing for old entries, forcing a refresh with the new data.</p>\n<p>Another trap is tenant isolation. If User A asks \"what is my balance?\", you absolutely cannot serve that cached response to User B. I solve this by partitioning the cache:</p>\n<ul>\n<li><strong>Use namespaces</strong> — <code>cache:tenant_id:query_hash</code></li>\n<li><strong>Include metadata</strong> — add <code>tenant_id</code> to the vector index filters.</li>\n</ul>\n<p>This ensures that semantic matches only happen within the correct security boundary. For more on building secure, multi-tenant systems, check out my thoughts on <a href=\"https://ansezz.com/blog/laravel-multi-tenancy/\">Laravel multi-tenancy</a> which shares similar isolation principles.</p>\n<h2>Managing TTL and staleness</h2>\n<p>In a standard cache, you just set an expiry of 3600 seconds and forget it. With a semantic cache, I prefer a tiered TTL strategy.</p>\n<ul>\n<li><strong>Exact matches</strong> — 1 hour TTL. If the user asks the exact same thing, they probably want the exact same answer.</li>\n<li><strong>Semantic matches</strong> — 4 hour TTL. These are more expensive to generate, so we want to keep them longer, but we also include a \"last validated\" timestamp.</li>\n<li><strong>Proactive invalidation</strong> — if my Shopify store updates a product price, I trigger a Redis worker to purge all cache entries related to that product ID.</li>\n</ul>\n<p>This hybrid approach keeps the system responsive without serving stale data. I've written about similar <a href=\"https://ansezz.com/blog/event-driven-pubsub/\">event-driven patterns here</a> if you want to dive deeper into how to handle these updates at scale.</p>\n<h2>Measuring success: precision and recall</h2>\n<p>Don't just turn on the cache and walk away. You need to monitor two specific metrics:</p>\n<ol>\n<li><strong>Cache hit rate</strong> — what percentage of queries are being handled by Redis? I aim for 30–50 percent for general FAQ-style bots.</li>\n<li><strong>Semantic precision</strong> — are the cached answers actually correct?</li>\n</ol>\n<p>I log every semantic hit along with its similarity score. Once a week, I sample hits with scores between 0.85 and 0.92 and manually review them. If I see too many \"near misses\" that are actually different questions, I tighten the threshold.</p>\n<p><img src=\"https://ansezz.com/blog/redis-semantic-caching-rag/dashboard.webp\" alt=\"Cache analytics dashboard — hit rate, latency, LLM cost, precision\" /></p>\n<h2>Final takeaways for senior engineers</h2>\n<p>Implementing Redis as a semantic layer isn't just about speed. It's about making your AI systems sustainable. If you are serious about moving from a prototype to a production-ready SaaS, caching is not optional.</p>\n<p>Here is your checklist for next week:</p>\n<ul>\n<li>Install <code>redisvl</code> and set up a basic vector index in your dev environment.</li>\n<li>Implement a two-tier lookup (exact then semantic).</li>\n<li>Set your distance threshold conservatively (start at 0.05 or 0.1).</li>\n<li>Add a <code>tenant_id</code> or <code>context_version</code> to your metadata to avoid cross-talk.</li>\n<li>Monitor your hit rate and watch your API bill drop.</li>\n</ul>\n<p>Building in public means sharing the war stories, not just the successes. For more technical deep dives into modern architecture, I suggest looking at <a href=\"https://ansezz.com/blog/7-rag-mistakes-production/\">7 RAG mistakes in production</a> to see what else might be slowing you down.</p>\n<p>What is the one query in your system that keeps hitting your LLM unnecessarily? Drop a note via <a href=\"https://ansezz.com/contact/\">contact</a> — let's figure out if a semantic cache would have caught it. 🤘</p>\n",
      "date_published": "2026-05-26T00:00:00.000Z",
      "date_modified": "2026-05-26T00:00:00.000Z",
      "tags": [
        "architecture",
        "redis",
        "caching",
        "ai",
        "rag",
        "semantic-cache",
        "redisvl",
        "vector-search",
        "performance",
        "infrastructure"
      ],
      "image": "https://ansezz.com/blog/redis-semantic-caching-rag/hero.webp"
    },
    {
      "id": "https://ansezz.com/blog/smart-auto-scaling-ai/",
      "url": "https://ansezz.com/blog/smart-auto-scaling-ai/",
      "title": "Scaling on demand: smart auto-scaling for modern AI apps",
      "summary": "CPU autoscaling is a lie for GPU workloads. Why queue depth, KV-cache pressure, and TTFT beat CPU as scaling triggers — KEDA-driven patterns, ARIMA forecasting, and composite metrics that scale your AI SaaS before users hit the spinner.",
      "content_html": "<p>Your AI application is lagging, users are complaining, but your cloud dashboard says everything is fine. Your CPU usage is hovering at a comfortable 20 percent while your inference requests are timing out.</p>\n<p>This is the classic scaling trap for AI engineers. Traditional auto-scaling is built for web servers where CPU and memory are the primary bottlenecks. In the world of large language models and vector databases, those metrics are practically useless.</p>\n<p>If you wait for your CPU to hit 80 percent before spinning up a new pod, your service will be dead in the water long before the second instance even starts its boot sequence. GPU-bound workloads require a completely different playbook.</p>\n<p>To build a resilient, cost-effective AI SaaS, you need to move beyond reactive hardware metrics. You need to scale on intent, queue pressure, and the specific physics of GPU memory.</p>\n<h2>Why the CPU lie is killing your UX</h2>\n<p>Most horizontal pod autoscalers (HPA) are configured to watch CPU utilization by default. For a Laravel or Node.js API, this works great. The work is linear — more requests equal more CPU cycles.</p>\n<p>AI models are different. The CPU handles the \"boring\" stuff like tokenization, request routing, and managing HTTP headers. The heavy lifting happens on the GPU.</p>\n<p>I have seen production clusters where the GPU is pinned at 100 percent while the CPU sits idle. Kubernetes sees the low CPU usage and thinks the pod is healthy. It might even try to pack <em>more</em> pods onto that node, leading to a catastrophic failure.</p>\n<p><img src=\"https://ansezz.com/blog/smart-auto-scaling-ai/cpu-lie.webp\" alt=\"CPU usage tells you nothing about GPU saturation\" /></p>\n<h2>GPU utilization vs occupancy: the hardware layer</h2>\n<p>When you finally switch to monitoring GPUs, you encounter two confusing metrics: utilization and occupancy.</p>\n<p>GPU utilization is essentially a duty cycle. It tells you the percentage of time the GPU was active over a sample period. It is a lagging indicator. By the time it hits 90 percent, your request queue has likely been building for 30 seconds.</p>\n<p>Occupancy is more granular. It measures how many \"warps\" or hardware slots are filled within the streaming multiprocessors (SM). You can have high utilization but low occupancy if your batch size is too small.</p>\n<p>For scaling, utilization is the baseline, but it isn't the truth. You need to look at what is happening before the request even hits the silicon.</p>\n<h2>Queue depth: your best leading indicator</h2>\n<p>If you want to stop fires before they start, monitor your queue depth. In vLLM or SGLang, this is the number of requests waiting for a slot in the inference engine.</p>\n<p>Queue depth is a direct predictor of latency. If you know your model can handle 16 concurrent requests before P99 latency starts to climb, set your scaling trigger at 12.</p>\n<p>Scaling on queue depth lets you provision capacity while the current hardware is still performing within SLO. It gives you that 60-second head start you need to pull a fresh container and load a 20GB model weights file into memory.</p>\n<p><img src=\"https://ansezz.com/blog/smart-auto-scaling-ai/queue-depth.webp\" alt=\"Queue depth predicts latency before users feel it\" /></p>\n<h2>Token velocity and the KV cache</h2>\n<p>In generative AI, not all requests are created equal. A 10-token summary request is light. A 4,000-token RAG retrieval analysis is a heavyweight.</p>\n<p>This is where token velocity and KV-cache usage come in. The KV cache is the memory on the GPU that stores the context of current conversations. If your KV cache is 95 percent full, the next long request will trigger an eviction or a \"swap to CPU\" event.</p>\n<p>Latency will skyrocket. Your P99 will look like a mountain range.</p>\n<p>I recommend scaling based on a combination of:</p>\n<ol>\n<li><strong>Token velocity</strong> — total tokens per second across all active instances.</li>\n<li><strong>KV-cache pressure</strong> — the percentage of available cache blocks currently occupied.</li>\n</ol>\n<p>When the cache is full, it doesn't matter how low your GPU utilization is. You cannot fit more work onto that chip. You must scale.</p>\n<h2>Predictive scaling with ARIMA</h2>\n<p>Reactive scaling is always playing catch-up. Even with fast boot times, there is a delay. For enterprise apps with predictable traffic patterns, I use ARIMA (Auto-Regressive Integrated Moving Average) models to forecast load.</p>\n<p>If I know traffic historically spikes at 9:00 am every Monday, I don't wait for the queue to grow. I use a time-series forecast to spin up the \"base load\" pods at 8:55 am.</p>\n<p>This turns your infrastructure into a proactive system rather than a reactive one. You pay for what you use, but you ensure the capacity is there before the first user clicks \"Generate.\"</p>\n<p><img src=\"https://ansezz.com/blog/smart-auto-scaling-ai/predictive.webp\" alt=\"ARIMA forecast lifting capacity before the 9am spike\" /></p>\n<h2>Practical steps for your stack</h2>\n<p>Implementing this doesn't have to be a nightmare. Here is how I structure it:</p>\n<ul>\n<li><strong>Use KEDA</strong> — the Kubernetes Event-Driven Autoscaler is the gold standard. It lets you scale based on Prometheus metrics like queue depth or P99 latency instead of just CPU.</li>\n<li><strong>Set TTFT SLOs</strong> — measure time-to-first-token (TTFT). This is the most critical metric for user perception. If TTFT P99 exceeds 500ms, you need more replicas.</li>\n<li><strong>Blur the lines</strong> — don't rely on a single metric. Create a composite score of GPU utilization, queue depth, and cache pressure.</li>\n<li><strong>Fix your RAG</strong> — sometimes the scaling issue is actually a retrieval issue. If your vector search is slow, the inference engine waits longer, hogging the GPU. Check out these <a href=\"https://ansezz.com/blog/7-rag-mistakes-production/\">common RAG mistakes</a> to ensure your bottleneck isn't upstream.</li>\n<li><strong>Optimize the frontend</strong> — for Shopify apps or custom SaaS, ensure your <a href=\"https://ansezz.com/blog/agentic-workflows-vibe-coding/\">agentic workflows</a> handle retries gracefully when the infrastructure is scaling up.</li>\n</ul>\n<p>Scaling AI isn't about having the biggest GPUs. It is about having the smartest triggers. By moving to service-level metrics, you save money on idle compute and save your users from the dreaded \"thinking...\" spinner.</p>\n<p>Are you still scaling on CPU, or have you made the jump to queue-based triggers yet? Drop a note via <a href=\"https://ansezz.com/contact/\">contact</a> — I love this conversation. 🤘</p>\n",
      "date_published": "2026-05-25T00:00:00.000Z",
      "date_modified": "2026-05-25T00:00:00.000Z",
      "tags": [
        "architecture",
        "auto-scaling",
        "ai",
        "llm",
        "kubernetes",
        "keda",
        "gpu",
        "kv-cache",
        "queue-depth",
        "ttft",
        "infrastructure"
      ],
      "image": "https://ansezz.com/blog/smart-auto-scaling-ai/hero.webp"
    },
    {
      "id": "https://ansezz.com/blog/gpu-aware-load-balancing/",
      "url": "https://ansezz.com/blog/gpu-aware-load-balancing/",
      "title": "GPU-aware load balancing: managing AI compute like a pro",
      "summary": "Round-robin is a relic when LLM requests span 50 tokens to 50,000. Prefill vs decode disaggregation, KV-cache-aware routing, prefix matching, and the four metrics that matter — how to route AI traffic so your P99 stops bleeding.",
      "content_html": "<p>You just scaled your RAG application to a hundred concurrent users. Suddenly, your latency spikes. Some users get their answers in two seconds, while others are staring at a loading spinner for thirty. You check your load balancer and it says everything is fine. CPU is at 40%. RAM is stable. But your GPUs are screaming, and your P99 latency is in the gutter.</p>\n<p>The problem is that you are treating your AI models like traditional web servers. Sending a 4,000-token prompt to the same GPU that is currently generating a 50-token summary is a recipe for disaster. Round-robin routing is a relic of the past when it comes to LLM inference. If you don't account for the unique way GPUs handle compute and memory, you aren't just wasting money. You are killing your user experience.</p>\n<p>The solution isn't just \"more GPUs.\" It is building a load balancer that actually understands what is happening inside the model. We need to talk about GPU-aware routing, prefill vs decode disaggregation, and why your KV cache is the most valuable asset in your stack.</p>\n<h2>Why round-robin is a trap for LLMs</h2>\n<p>In traditional software development, a request is a request. Whether it's a <code>GET /users</code> or a <code>POST /orders</code>, the variance in resource consumption is usually predictable and small. Standard load balancers like Nginx or HAProxy work great here. They look at basic health checks and send traffic to the next available worker.</p>\n<p>AI is different. A single request to an LLM has a massive variance in \"weight.\" One user might ask \"what is 2+2?\" while another uploads a 50-page PDF and asks for a deep analysis. If your load balancer sends both to the same GPU, the heavy request will hog the compute resources, forcing the light request to wait in a queue.</p>\n<p>This is why CPU-based metrics are useless. A GPU can be at 100% utilization while performing very different types of work. Some work is compute-bound, meaning it needs raw processing power. Other work is memory-bound, meaning it is limited by how fast data can move in and out of VRAM. To solve this, we have to look deeper into the inference lifecycle.</p>\n<h2>Prefill vs decode: the performance gap</h2>\n<p>LLM inference happens in two distinct phases. Understanding the difference between them is the \"aha!\" moment for GPU load balancing.</p>\n<p>The first phase is <strong>prefill</strong>. This is when the model reads your entire prompt and processes all the tokens at once. It is a heavy, compute-intensive task that builds something called the <strong>KV cache</strong> (key-value cache). Prefill loves big batches and high-performance tensor cores. It is where the \"heavy lifting\" happens.</p>\n<p><img src=\"https://ansezz.com/blog/gpu-aware-load-balancing/prefill-vs-decode.webp\" alt=\"Prefill pool vs decode pool — disaggregated GPU fleet\" /></p>\n<p>The second phase is <strong>decode</strong>. This is where the model generates the response one token at a time. Each new token only needs to look at the previously generated tokens and the KV cache. This phase is surprisingly light on compute but incredibly heavy on memory bandwidth. It is slow and long-lived.</p>\n<p>When you mix these two on the same GPU without a smart scheduler, the \"prefill\" of a new request will often pause the \"decode\" of existing requests. This causes the jittery, stuttering text generation that users hate. By using GPU-aware load balancing, we can prioritize these phases differently across our fleet.</p>\n<h2>Metrics for the real world</h2>\n<p>To build a better router, you need to stop looking at CPU and start looking at these four metrics:</p>\n<ol>\n<li><strong>Token queue depth</strong> — how many tokens are waiting to be processed? This is a much more accurate representation of \"load\" than simple request counts.</li>\n<li><strong>KV cache utilization</strong> — GPUs have a limited amount of VRAM. The KV cache stores the \"memory\" of ongoing conversations. If a GPU's VRAM is 90% full of KV cache, it literally cannot accept a large new prompt, even if it's currently \"idle.\"</li>\n<li><strong>Time to first token (TTFT)</strong> — this measures the latency of the prefill phase. If your TTFT is climbing, your prefill pool is congested.</li>\n<li><strong>Inter-token latency (ITL)</strong> — this measures the speed of the decode phase. If this is high, your GPUs are likely memory-bandwidth constrained.</li>\n</ol>\n<p>I often recommend using tools like <a href=\"https://github.com/vllm-project/vllm\">vLLM</a> because they expose these metrics out of the box. You can pipe these into a custom gateway that makes routing decisions based on real-time VRAM availability rather than just \"is the server up?\"</p>\n<h2>The prefix-aware hack: SkyWalker-style routing</h2>\n<p>Here is a secret — the most expensive part of a RAG request is often re-processing the same system prompt or long context over and over again. If you send five consecutive questions about the same document to five different GPUs, each GPU has to perform the \"prefill\" phase for that document from scratch.</p>\n<p><img src=\"https://ansezz.com/blog/gpu-aware-load-balancing/prefix-routing.webp\" alt=\"Prefix matching — routing prompts to GPUs with warm KV cache\" /></p>\n<p>This is where <strong>prefix-aware routing</strong> (sometimes called SkyWalker-style routing) comes in. Instead of routing randomly, your load balancer tokenizes the start of the prompt and looks for a GPU that already has that specific content in its KV cache.</p>\n<p>By matching the \"prefix\" of a prompt to a specific GPU, you can skip the prefill phase entirely for large chunks of text. This cuts latency from hundreds of milliseconds to almost zero. It is the single most effective way to optimize costs in production RAG systems. I've written before about <a href=\"https://ansezz.com/blog/7-rag-mistakes-production/\">common RAG mistakes</a>, and ignoring cache locality is definitely one of them.</p>\n<h2>Splitting the fleet into specialized pools</h2>\n<p>As you scale, you should stop treating every GPU as a generalist. A senior move is to create <strong>disaggregated inference fleets</strong>.</p>\n<p>I like to split my GPUs into two pools:</p>\n<ul>\n<li><strong>The prefill pool</strong> — high-compute GPUs (like H100s) optimized for processing massive amounts of context quickly. These nodes handle the initial prompt and then \"hand off\" the state.</li>\n<li><strong>The decode pool</strong> — memory-optimized GPUs (like A100s or even cheaper L40s) that focus on churning out tokens for existing requests.</li>\n</ul>\n<p>This separation lets you scale based on your specific workload. If your users are uploading huge documents but only asking for short summaries, you scale your prefill pool. If they are having long, chatty conversations, you scale your decode pool.</p>\n<p>This is the same logic we use in <a href=\"https://ansezz.com/blog/coolify-docker-saas-hosting/\">modern DevOps with Coolify</a>. You wouldn't put your heavy database on the same tiny instance as your frontend — why would you mix your heavy prefill work with your light decode work?</p>\n<h2>Implementing your first GPU-aware router</h2>\n<p>You don't need to build a custom engine from scratch to start doing this. Here is the practical path I follow when setting this up for a new SaaS:</p>\n<ol>\n<li><strong>Centralize your metrics</strong> — use Prometheus to scrape vLLM or TGI metrics from every GPU node.</li>\n<li><strong>Use a smart gateway</strong> — implement a middleware in Go or Rust (or even a heavy-duty Lua script in OpenResty) that queries these metrics before choosing a target.</li>\n<li><strong>Prioritize KV cache</strong> — check if the <code>conversation_id</code> has been seen by a specific node recently. If it has, and that node isn't at 100% KV utilization, send it there.</li>\n<li><strong>Set hard limits</strong> — if a GPU reaches 85% VRAM usage, take it out of the rotation for new prompts until some sessions finish.</li>\n</ol>\n<p><img src=\"https://ansezz.com/blog/gpu-aware-load-balancing/metrics-dashboard.webp\" alt=\"SaaS metrics dashboard tracking TTFT, ITL, KV utilization\" /></p>\n<p>Managing AI compute is about moving from \"black box\" infrastructure to \"context-aware\" infrastructure. When your load balancer knows the difference between a 10-token greeting and a 10,000-token context window, your costs go down and your users stay happy.</p>\n<p>It's easy to get lost in the hype of \"agentic systems\" and <a href=\"https://ansezz.com/blog/mcp-context-aware-agents/\">context-aware agents</a>, but none of that matters if your underlying infrastructure is buckling under the weight of unoptimized routing.</p>\n<p>If you are still using round-robin for your AI models, what is the biggest bottleneck you are seeing in your P99 latency right now? Drop a note via <a href=\"https://ansezz.com/contact/\">contact</a> — I love this conversation. 🤘</p>\n",
      "date_published": "2026-05-24T00:00:00.000Z",
      "date_modified": "2026-05-24T00:00:00.000Z",
      "tags": [
        "architecture",
        "gpu",
        "load-balancing",
        "ai",
        "llm",
        "inference",
        "vllm",
        "kv-cache",
        "prefill",
        "decode",
        "infrastructure"
      ],
      "image": "https://ansezz.com/blog/gpu-aware-load-balancing/hero.webp"
    },
    {
      "id": "https://ansezz.com/blog/circuit-breakers-vector-db/",
      "url": "https://ansezz.com/blog/circuit-breakers-vector-db/",
      "title": "Circuit breakers: preventing cascading failures in your vector DB",
      "summary": "A slow vector DB kills SaaS faster than a dead one. The circuit-breaker pattern for AI infrastructure — closed/open/half-open states, fallback tiers, semantic caches, LLM-only mode, and Laravel-friendly wiring to keep production from melting under one bad dependency.",
      "content_html": "<p>You built a beautiful RAG pipeline. It works perfectly on your machine with a few hundred vectors. Then you launch. Traffic spikes. Suddenly your managed vector database starts sweating. A single similarity search that used to take 50ms is now taking 5 seconds. Your web workers are all tied up waiting for responses that aren't coming. The database isn't technically down — but it is slow enough to kill your entire application. Your users see spinning loaders until the request finally times out. This is a classic cascading failure, and it is the fastest way to drain your \"innovation budget\" and your users' patience.</p>\n<p>The problem is that we often treat external APIs and databases as if they are always healthy. We write code that assumes the vector DB will return results. When it doesn't, we wait. And while we wait, we hold onto memory and CPU cycles. The solution is an old-school electrical engineering concept applied to software: the circuit breaker.</p>\n<p>In this guide I want to show you how to wrap your AI infrastructure in protective logic so a slow dependency doesn't take your whole SaaS down with it.</p>\n<h2>Understanding the three states</h2>\n<p>The circuit breaker pattern is a state machine that sits between your application code and your external service. It monitors every call you make. It has three specific states that dictate how it handles traffic.</p>\n<p><img src=\"https://ansezz.com/blog/circuit-breakers-vector-db/states.webp\" alt=\"Closed, open, and half-open circuit breaker states\" /></p>\n<p><strong>Closed — the healthy state.</strong>\nIn the closed state, the circuit is complete. Requests flow through to your vector database or LLM provider normally. The breaker is silently watching. It keeps a count of how many requests failed or took too long. As long as the failure rate stays below your threshold, it stays closed. This is the \"everything is fine\" mode.</p>\n<p><strong>Open — the fail-fast state.</strong>\nOnce the failure threshold is hit — let's say 50% of requests failed in the last 30 seconds — the breaker \"trips\" and moves to the open state. Now, every time your application tries to call the vector DB, the breaker immediately throws an error or returns a fallback response without even attempting the network call. This gives your database room to breathe and recover. It also ensures your application doesn't waste time waiting on a service that is clearly struggling.</p>\n<p><strong>Half-open — the recovery test.</strong>\nAfter a cooldown period, the breaker moves to the half-open state. It allows a small number of \"test\" requests to pass through. If these test calls succeed, the breaker assumes the service is healthy again and moves back to the closed state. If they fail, it immediately goes back to open for another cooldown cycle. This is a controlled way to probe the system before fully re-engaging.</p>\n<h2>Why your RAG pipeline needs this</h2>\n<p>RAG pipelines are particularly vulnerable because they usually involve multiple high-latency network hops. You have to embed the query, search the vector DB, and then call the LLM. If any of these pieces fail or slow down, the whole experience breaks.</p>\n<p>Most developers make the mistake of only handling hard errors like a <code>404</code> or a <code>500</code> status code. But in production, \"slow\" is often more dangerous than \"down.\" A slow vector DB creates a bottleneck that backs up your entire request queue. By the time you realize there is a problem, your server is out of memory because it is holding open thousands of connections.</p>\n<p>If you have read my previous post on <a href=\"https://ansezz.com/blog/7-rag-mistakes-production/\">7 RAG mistakes in production</a>, you know that reliability is the difference between a demo and a product. The circuit breaker is your insurance policy against these types of outages.</p>\n<h2>Implementing fallback strategies</h2>\n<p>Tripping the breaker shouldn't always mean showing an error message to the user. The best AI systems use fallbacks to maintain a level of service even when parts of the stack are failing.</p>\n<p><img src=\"https://ansezz.com/blog/circuit-breakers-vector-db/fallback-flow.webp\" alt=\"Architecture flow showing primary path with hot/cold/cache fallbacks\" /></p>\n<p><strong>Hot and cold tiers.</strong>\nYou can think of your vector DB as your \"hot\" knowledge tier. If it fails, you should have a \"cold\" fallback. Maybe you fall back to a standard keyword search in your primary Postgres or MySQL database. The results might not be as contextually relevant as a vector search, but a \"decent\" answer is always better than a \"timed out\" error.</p>\n<p><strong>Cached responses.</strong>\nAnother great strategy is semantic caching. If the circuit is open, you can check a Redis cache for similar queries that were answered recently. Even if you can't generate a fresh answer, you might be able to serve a cached one. This keeps the user moving while your backend recovers.</p>\n<p><strong>LLM-only mode.</strong>\nIf your retrieval step is what's failing, you can still send the user's prompt to the LLM with a note that external knowledge is currently unavailable. The LLM can then answer based on its general training data. It is a degraded experience, but it is still functional. Transparency here is key — tell the user that the \"live\" data isn't available so they know to verify the response.</p>\n<h2>Building it in Laravel</h2>\n<p>Since I spend a lot of time in the Laravel ecosystem, I often use tools that make this easy to implement. You don't need to write the state machine from scratch. Packages like <code>spatie/resilience</code> or even building a custom wrapper around the <code>illuminate/http</code> client can get the job done.</p>\n<p>The goal is to wrap your API calls in a block that understands these states. Here is a simplified look at how that logic looks in practice.</p>\n<p><img src=\"https://ansezz.com/blog/circuit-breakers-vector-db/code.webp\" alt=\"Circuit breaker wrapper around a vector-db client\" /></p>\n<p>When you call your vector DB client, you wrap it in the breaker. If the call fails multiple times, the breaker trips. In the <code>catch</code> block, you handle the <code>CircuitBreakerOpenException</code> by returning your fallback data. This keeps your controllers clean and your architecture robust.</p>\n<p>You can also integrate this with your <a href=\"https://ansezz.com/blog/coolify-docker-saas-hosting/\">SaaS hosting on Coolify</a> to ensure that your containers don't get killed by health checks just because an external API is slow. The breaker prevents the resource bloat that usually triggers those health-check failures.</p>\n<h2>Live telemetry and smart routing</h2>\n<p>Senior engineers don't just set a circuit breaker and walk away. They monitor it. You need live telemetry to see how often your circuits are tripping. Tools like Prometheus or even simple logs piped to a dashboard can tell you a lot.</p>\n<p>If you see that your primary vector DB in <code>us-east-1</code> is constantly tripping but your secondary in <code>eu-west-1</code> is healthy, you can implement smart routing. Your circuit breaker can act as a signal to your load balancer or your internal router to shift traffic to the healthy region.</p>\n<p>This kind of <a href=\"https://ansezz.com/blog/event-driven-pubsub/\">event-driven architecture</a> makes your system self-healing. It doesn't wait for a human to wake up at 3am to fix a database. It detects the failure, trips the breaker, uses the fallback, and tries to recover automatically.</p>\n<h2>Practical steps to get started</h2>\n<p>If you are ready to harden your AI infrastructure, start here:</p>\n<ol>\n<li><strong>Identify your weakest links</strong> — list every external call in your RAG pipeline. Usually it is the embedding API and the vector DB.</li>\n<li><strong>Define your thresholds</strong> — how many slow requests are you willing to tolerate? Start with a 50% failure rate over 30 seconds and a 2-second timeout.</li>\n<li><strong>Choose your fallbacks</strong> — decide what happens when the breaker is open. Do you show an error, use a cache, or switch to keyword search?</li>\n<li><strong>Wrap your clients</strong> — use a library to wrap your HTTP or database calls. Don't try to build the state machine logic yourself unless you have a very specific use case.</li>\n<li><strong>Monitor the trips</strong> — set up an alert when a circuit stays open for more than a few minutes. This usually indicates a major provider outage that needs your attention.</li>\n</ol>\n<p>The goal is to fail gracefully. Every system has issues, but the ones that survive are the ones that don't let a small fire in a dependency burn down the whole house.</p>\n<p>Have you ever had a slow dependency take down your entire application, or are you still relying on long timeouts and luck? Drop a note via <a href=\"https://ansezz.com/contact/\">contact</a> — I love this conversation. 🤘</p>\n",
      "date_published": "2026-05-23T00:00:00.000Z",
      "date_modified": "2026-05-23T00:00:00.000Z",
      "tags": [
        "architecture",
        "circuit-breakers",
        "ai",
        "rag",
        "resilience",
        "vector-db",
        "fallback",
        "laravel",
        "observability",
        "infrastructure"
      ],
      "image": "https://ansezz.com/blog/circuit-breakers-vector-db/hero.webp"
    },
    {
      "id": "https://ansezz.com/blog/message-queues-document-processing/",
      "url": "https://ansezz.com/blog/message-queues-document-processing/",
      "title": "Message queues: handling the heavy lifting of document processing",
      "summary": "Stop running embeddings inside the request-response cycle. A production-grade document ingestion pipeline — staged workers, exponential backoff, dead-letter quarantines, batched embeddings, and queue-depth autoscaling that keeps your AI app from melting under a 500-page PDF.",
      "content_html": "<p>If you are running your document embeddings inside your request-response cycle, you are playing with fire. I have seen too many junior devs build a beautiful RAG application that falls over the second a user uploads a 50MB PDF. The browser spins, the Nginx timeout hits, and the database locks up while your worker tries to chunk 500 pages of legal jargon in real time.</p>\n<p>This is the classic \"heavy lifting\" problem in AI engineering. Document processing — OCR, text extraction, semantic chunking, and embedding — is slow, unpredictable, and resource-heavy. Trying to force it into a synchronous web request is a recipe for a bad user experience and a fragile system.</p>\n<p>The solution is decoupling. I'm talking about message queues. In this guide, I'll walk you through why async work belongs in a queue and how to build a production-grade ingestion pipeline that doesn't melt your server.</p>\n<h2>The synchronous trap</h2>\n<p>Imagine a user uploads a document to your SaaS. Your code receives the file, sends it to an extraction API, waits for the response, loops through the text to create chunks, sends each chunk to an embedding model, and finally saves it to pgvector.</p>\n<p>If any of those steps take more than 30 seconds, the connection drops. If the embedding API has a momentary blip, the whole process fails, and the user has to start over. Worse, while your server is busy doing this heavy work, it's not responding to other users.</p>\n<p>This is where we apply the first rule of senior engineering: if it takes more than 100ms, consider making it async. By moving this work to a message queue, you give your users immediate feedback (\"we're processing your file!\") while the heavy lifting happens safely in the background.</p>\n<h2>The anatomy of a document pipeline</h2>\n<p>A robust RAG pipeline isn't just one big function. It's a series of decoupled stages. I like to break it down into modular steps, each triggered by a message in a queue. This lets you scale different parts of the system independently.</p>\n<p><img src=\"https://ansezz.com/blog/message-queues-document-processing/pipeline-stages.webp\" alt=\"Pipeline stages — ingestion, parsing, chunking, embedding\" /></p>\n<p>Here is how I usually structure it:</p>\n<ol>\n<li><strong>Ingestion &amp; discovery</strong> — a user uploads a file. You save it to S3 and push a small message to the queue containing the <code>file_path</code> and <code>tenant_id</code>.</li>\n<li><strong>Parsing &amp; normalization</strong> — a worker picks up the message, downloads the file, and runs it through a parser like pdfplumber or an OCR service. It emits the raw text to the next queue.</li>\n<li><strong>Chunking</strong> — this worker takes the text and splits it into semantic sections. Doing this in its own stage means you can easily swap chunking strategies (e.g., recursive character vs semantic) without re-running the heavy parsing step.</li>\n<li><strong>Embedding &amp; indexing</strong> — the final stage batches the chunks, hits your embedding API (like OpenAI or a local model), and pushes the vectors into your vector DB.</li>\n</ol>\n<p>This stage-based approach is exactly what I discuss in my post on <a href=\"https://ansezz.com/blog/7-rag-mistakes-production/\">7 RAG mistakes to avoid in production</a>. It provides backpressure control — if your vector DB slows down, the \"index\" queue grows, but the \"parsing\" workers keep humming along.</p>\n<h2>Retries and the beauty of dead letters</h2>\n<p>In the real world, things break. APIs time out. PDFs are malformed. Workers crash.</p>\n<p>When you use a message queue like Redis (with BullMQ or Laravel Queues) or SQS, you get retries for free. If a worker fails, the message goes back onto the queue to be tried again after a short delay. Exponential backoff is your best friend here — don't hammer a failing API every 5 seconds. Wait 10, then 60, then 300.</p>\n<p><img src=\"https://ansezz.com/blog/message-queues-document-processing/dlq-retries.webp\" alt=\"Retry strategy and dead-letter quarantine flow\" /></p>\n<p>But what happens when a document simply <em>cannot</em> be processed? Maybe it's a password-protected PDF or a corrupted file. You don't want it retrying forever and clogging up your workers.</p>\n<p>This is where a <strong>Dead Letter Queue (DLQ)</strong> comes in. After a certain number of failed attempts, the message is moved to the DLQ. This acts as a \"quarantine\" zone. I can then inspect these failed jobs, fix the underlying issue, and manually re-queue them. It's a safety net that keeps your main production line moving.</p>\n<h2>Batching for efficiency</h2>\n<p>If you are processing 10,000 chunks, you do not want to make 10,000 individual API calls to your embedding provider. That's slow and expensive.</p>\n<p>Most embedding APIs and vector databases perform much better with batches. A good worker pattern involves pulling multiple messages from the queue (or aggregating them in memory) and sending them as a single bulk request.</p>\n<p>In a Laravel environment, I often use job batching to track the progress of a large document. I can see exactly when 95% of a PDF is processed and update a progress bar for the user. If you're interested in how this fits into a larger architecture, check out my thoughts on <a href=\"https://ansezz.com/blog/event-driven-pubsub/\">event-driven pub/sub systems</a>.</p>\n<h2>Event-driven prefetching</h2>\n<p>Here is a \"senior\" tip — queues aren't just for ingestion. You can use them for <strong>prefetching</strong>.</p>\n<p>If a user is chatting with an AI agent and the conversation is heading toward a specific topic, you can fire off a background job to fetch related documents and warm up the cache before the user even asks the next question. This makes your AI feel lightning fast because the context is already \"ready\" when the retrieval step hits.</p>\n<p>By using an event bus, you can decouple the chat interface from these optimization tasks. The chat app just emits a <code>user_asked_question</code> event, and a background worker decides whether it should pre-fetch more data or update the semantic cache.</p>\n<h2>Monitoring the heart of your app</h2>\n<p>Once you move to a queue-based system, your most important metric is no longer just \"request latency.\" You need to watch your <strong>queue depth</strong>.</p>\n<p><img src=\"https://ansezz.com/blog/message-queues-document-processing/monitoring.webp\" alt=\"Queue-depth dashboard with worker autoscaler\" /></p>\n<p>If the queue depth is growing faster than your workers can clear it, you have a bottleneck. This is where tools like Docker and Coolify make life easy — I can spin up five more worker containers to handle a sudden surge in document uploads. You can read more about how I manage this infra in my <a href=\"https://ansezz.com/blog/coolify-docker-saas-hosting/\">Coolify and Docker guide</a>.</p>\n<h2>Practical takeaways for your pipeline</h2>\n<ul>\n<li><strong>Never store large files in the queue</strong> — only pass references (like an S3 key). Keep messages small for better performance.</li>\n<li><strong>Make tasks idempotent</strong> — assume a message might be processed twice. Use <code>upsert</code> instead of <code>insert</code> in your vector DB to avoid duplicates.</li>\n<li><strong>Use structured logging</strong> — every worker log should include the <code>doc_id</code> and <code>tenant_id</code>. Searching for \"why did this file fail?\" is impossible without it.</li>\n<li><strong>Scale on queue depth</strong> — set up your autoscaler to add workers based on how many messages are waiting, not just CPU usage.</li>\n<li><strong>Separate worker pools</strong> — have one set of workers for \"fast\" tasks (like metadata updates) and another for \"slow\" tasks (like OCR/embedding). Don't let a huge PDF upload block a simple name change.</li>\n</ul>\n<p>Building a document pipeline is about respecting the time it takes to process data. By moving that work into a queue, you build a system that is resilient, scalable, and — most importantly — provides a smooth experience for your users.</p>\n<p>How are you currently handling long-running AI tasks? Are you still fighting with request timeouts, or have you embraced the queue? Drop a note via <a href=\"https://ansezz.com/contact/\">contact</a> — I love this conversation. 🤘</p>\n",
      "date_published": "2026-05-22T00:00:00.000Z",
      "date_modified": "2026-05-22T00:00:00.000Z",
      "tags": [
        "architecture",
        "message-queues",
        "ai",
        "rag",
        "document-processing",
        "redis",
        "bullmq",
        "laravel-queues",
        "dlq",
        "batching",
        "infrastructure"
      ],
      "image": "https://ansezz.com/blog/message-queues-document-processing/hero.webp"
    },
    {
      "id": "https://ansezz.com/blog/rate-limiting-ai-wallet/",
      "url": "https://ansezz.com/blog/rate-limiting-ai-wallet/",
      "title": "Rate limiting: protecting your AI wallet",
      "summary": "One runaway agent loop = $5,000 OpenAI bill. Why request-per-second limits lie for LLM apps, how to architect hierarchical token-bucket limits across global / tenant / user layers, and adaptive throttling patterns that protect margins without breaking UX.",
      "content_html": "<p>One runaway agent loop is all it takes to wake up to a $5,000 OpenAI bill.</p>\n<p>If you're building AI-powered SaaS or RAG systems, your biggest threat isn't a server crash. It's a \"denial of wallet\" attack. A buggy client, a malicious user, or even your own experimental agent can spam your API endpoints and burn through your tokens (and credits) in minutes.</p>\n<p>Traditional web apps care about requests per second to keep the CPU from melting. In the world of LLMs, we care about tokens per minute to keep the bank account from draining. Standard rate limiting isn't enough anymore. You need an architecture that understands cost, context, and the \"noisy neighbor\" problem before a single prompt even hits your vector DB.</p>\n<h2>Why requests per second (QPS) is a lie for AI</h2>\n<p>In a standard Laravel or Node app, a request is a request. Sure, some take longer than others, but they generally consume similar resources. In AI engineering, one request might be a 50-token greeting, while another is a 128,000-token context dump for a RAG pipeline.</p>\n<p>If you only limit requests per second, a single user can stay within their \"10 requests per minute\" limit while still costing you 100× more than everyone else combined. This is where the <a href=\"https://ansezz.com/blog/laravel-multi-tenancy/\">noisy neighbor problem</a> becomes a financial crisis.</p>\n<p>You aren't just protecting your infrastructure. You're protecting your margins. To do this effectively, we have to move from counting \"pings\" to counting \"value.\"</p>\n<p><img src=\"https://ansezz.com/blog/rate-limiting-ai-wallet/dashboard.webp\" alt=\"Token usage dashboard showing skewed cost per user\" /></p>\n<h2>The anatomy of a denial of wallet (DoW) attack</h2>\n<p>A denial of wallet attack is the AI equivalent of a DDoS. The goal isn't necessarily to take your site down. It's to exhaust your API quotas or financial budget until your service stops functioning — or you're forced to pay a massive bill.</p>\n<p>I've seen this happen in three ways:</p>\n<ol>\n<li><strong>The agentic loop</strong> — an autonomous agent gets stuck in a logic loop, calling your tool-use functions repeatedly without a \"max steps\" ceiling.</li>\n<li><strong>The scrapers</strong> — malicious bots trying to exfiltrate your entire knowledge base by querying every possible permutation of your RAG system.</li>\n<li><strong>The dev mistake</strong> — a frontend developer accidentally puts an LLM-powered \"autocomplete\" on a search bar that triggers on every keystroke.</li>\n</ol>\n<p>Without token-aware rate limiting, your provider (like OpenAI or Anthropic) will eventually hit you with a <code>429</code> error. But by that time, the damage to your wallet is already done.</p>\n<h2>Solving the noisy neighbor with hierarchical limits</h2>\n<p>To solve this, I implement a three-layer rate limiting strategy at the API gateway level. This ensures that even if one tenant goes rogue, the rest of the platform stays healthy.</p>\n<h3>1. The global provider layer</h3>\n<p>This is your final line of defense. If your OpenAI quota is 500,000 tokens per minute (TPM), set your internal global limit to 450,000. This leaves a safety buffer and prevents you from actually hitting the provider's hard ceiling, which can sometimes lead to temporary account bans or throttled priority.</p>\n<h3>2. The tenant layer</h3>\n<p>Every customer gets their own bucket. I usually tie this to their subscription tier. A \"Pro\" user might get 50,000 TPM, while a \"Free\" user is capped at 2,000. This ensures no single company can eat up your entire global quota.</p>\n<h3>3. The user/session layer</h3>\n<p>Inside a single tenant, you still need limits. You don't want one single employee at a customer's company hogging all the tokens allocated to that entire organization. I set these at about 20% of the total tenant capacity.</p>\n<p><img src=\"https://ansezz.com/blog/rate-limiting-ai-wallet/architecture.webp\" alt=\"Hierarchical rate-limit architecture — global, tenant, user buckets\" /></p>\n<h2>Implementation: the token bucket algorithm</h2>\n<p>For most of my builds, I use a \"token bucket\" or \"leaky bucket\" algorithm backed by Redis. It's the gold standard for handling bursty traffic while maintaining a steady flow.</p>\n<p>Here is the logic: each user has a \"bucket\" of tokens. Every time they send a prompt, we estimate the total cost (input tokens + expected <code>max_tokens</code>). If the bucket has enough, they proceed and the tokens are deducted. The bucket refills at a constant rate over time.</p>\n<p>If you're <a href=\"https://ansezz.com/about/\">modernizing your stack</a> or building a SaaS on the LEMP stack, you can implement this efficiently in Laravel using middleware and a fast storage layer like Redis.</p>\n<pre><code>// A simplified token-bucket check in Laravel middleware\npublic function handle($request, Closure $next)\n{\n    $tenantId = $request-&gt;user()-&gt;tenant_id;\n    $estimatedTokens = $this-&gt;tokenizer-&gt;estimate($request-&gt;input('prompt'));\n\n    if (!$this-&gt;limiter-&gt;consume(\"tenant:{$tenantId}:tokens\", $estimatedTokens)) {\n        return response()-&gt;json(['error' =&gt; 'token budget exceeded'], 429);\n    }\n\n    return $next($request);\n}\n</code></pre>\n<p><img src=\"https://ansezz.com/blog/rate-limiting-ai-wallet/code.webp\" alt=\"Laravel middleware token-bucket snippet\" /></p>\n<h2>Token-budget routing and adaptive throttling</h2>\n<p>What happens when a user hits their limit? Most devs just throw a <code>429</code> error. But as a senior engineer, I prefer a more graceful degradation. We call this <strong>adaptive throttling</strong>.</p>\n<p>Instead of a hard \"no,\" you can:</p>\n<ul>\n<li><strong>Degrade the model</strong> — switch the request from GPT-4o to a cheaper, faster model like GPT-4o-mini.</li>\n<li><strong>Truncate the context</strong> — if the user is over budget, strip out some of the retrieved RAG documents to lower the input token count.</li>\n<li><strong>Queue the request</strong> — for non-interactive tasks (like background summarization), move the request to a message queue and process it when the token bucket refills.</li>\n</ul>\n<p>This keeps the user experience intact while protecting your margins. It's about being smart, not just being a gatekeeper.</p>\n<h2>The RAG context: limiting the \"hidden\" calls</h2>\n<p>In a RAG (retrieval-augmented generation) system, one user query often triggers multiple backend actions:</p>\n<ol>\n<li>One embedding call for the query.</li>\n<li>One search query to the vector database.</li>\n<li>One (or more) LLM calls for the final answer.</li>\n</ol>\n<p>If you only rate limit the final LLM call, your vector database might still get hammered by search queries. You need to treat the entire \"RAG flow\" as a single unit of work with its own combined budget. I cover some of these <a href=\"https://ansezz.com/blog/7-rag-mistakes-production/\">common RAG production mistakes</a> frequently, but rate limiting the \"flow\" is often the most overlooked fix.</p>\n<h2>Practical steps to protect your system today</h2>\n<p>If you're launching an AI feature this week, do these three things:</p>\n<ol>\n<li><strong>Set a hard daily spend cap</strong> — most API providers let you set a maximum dollar amount per month. Set it. It's your parachute.</li>\n<li><strong>Enforce <code>max_tokens</code></strong> — never let a user request an uncapped response. Always set a sane default for <code>max_tokens</code> in every API call.</li>\n<li><strong>Implement per-request timeout</strong> — if an LLM call takes longer than 30 seconds, kill it. Slow calls are often the symptom of a system that is about to spiral out of control.</li>\n</ol>\n<p>Rate limiting isn't just a \"security\" feature. In the AI era, it's a core part of your business model. You can't scale a product that allows a single user to run up a thousand-dollar bill in their first hour.</p>\n<p>Build for fairness. Build for cost. Build for the \"noisy neighbor.\"</p>\n<p>Have you ever seen a \"denial of wallet\" happen in the wild, or are you still running on a wing and a prayer with global provider limits? Drop a note via <a href=\"https://ansezz.com/contact/\">contact</a> — I love this conversation. 🤘</p>\n",
      "date_published": "2026-05-21T00:00:00.000Z",
      "date_modified": "2026-05-21T00:00:00.000Z",
      "tags": [
        "architecture",
        "rate-limiting",
        "ai",
        "llm",
        "rag",
        "multi-tenancy",
        "redis",
        "laravel",
        "denial-of-wallet",
        "api-gateway"
      ],
      "image": "https://ansezz.com/blog/rate-limiting-ai-wallet/hero.webp"
    },
    {
      "id": "https://ansezz.com/blog/api-gateway-ai-stack/",
      "url": "https://ansezz.com/blog/api-gateway-ai-stack/",
      "title": "API Gateway: the front door of your AI stack",
      "summary": "Stop exposing LLM providers directly to the frontend. The gateway pattern for AI apps — JWT-scoped tenant isolation, model aliases, denial-of-wallet rate limiting, streaming-safe timeouts, and the wallet-saving guardrails every senior engineer needs.",
      "content_html": "<p>Stop exposing your models to the wild.</p>\n<p>If you are building a production AI app, sending requests directly from your frontend to a RAG orchestrator or — god forbid — straight to an LLM provider is a liability. It is slow. It is insecure. And it is the fastest way to wake up to a five-figure bill you didn't plan for.</p>\n<p>I have spent over a decade building software, and if there is one thing I have learned, it is that engineering for \"it works\" is not the same as engineering for \"it scales.\" In the world of AI, scale isn't just about traffic. It is about cost, latency, and data safety.</p>\n<p>Imagine a \"denial of wallet\" attack where a malicious script spams your completions endpoint. Without a gatekeeper, your API keys are just sitting ducks. Or worse, imagine a multi-tenant app where one user's prompt accidentally retrieves another user's private data from your vector DB.</p>\n<p>This is where the API gateway comes in. It is the first line of defense and the brain of your infrastructure. It handles the boring but critical stuff so your RAG logic can stay focused on actually being smart.</p>\n<h2>The gatekeeper pattern</h2>\n<p>At its core, an API gateway is a reverse proxy that sits between your users and your backend services. But for an AI stack, it does more than just forward traffic. It acts as a centralized brain for auth, routing, and rate limiting.</p>\n<p>When a request hits your gateway, it goes through a gauntlet of checks before it ever touches a model. This \"gatekeeper\" ensures that every millisecond of GPU time or every cent of token cost is intentional.</p>\n<p><img src=\"https://ansezz.com/blog/api-gateway-ai-stack/gateway-stack.webp\" alt=\"Gateway responsibilities — auth, routing, rate limiting, observability\" /></p>\n<h2>Authentication and tenant isolation</h2>\n<p>In a typical SaaS, authentication is about knowing who the user is. In an AI-powered SaaS, it is about data sovereignty.</p>\n<p>If you are building a RAG system, your biggest risk is cross-tenant data leakage. If you want to avoid <a href=\"https://ansezz.com/blog/7-rag-mistakes-production/\">common RAG mistakes</a>, you must handle identity at the very edge.</p>\n<p>I prefer using JWTs (JSON Web Tokens) with custom claims. When a request hits the gateway, I validate the token and extract the <code>tenant_id</code>. That ID is then injected into the headers of the request before it is passed to the RAG orchestrator.</p>\n<p>This means the orchestrator doesn't have to \"guess\" who the user is. It receives a verified <code>x-tenant-id</code> header and uses it to apply metadata filters on the vector database. The user only \"sees\" data they are allowed to see. No tenant ID? No query. Period.</p>\n<h2>Smart routing for model flexibility</h2>\n<p>The AI world moves fast. Today you are using GPT-4o. Tomorrow, Claude 3.5 Sonnet might be the better play. Next week, you might want to test a fine-tuned Llama 3 model running on your own infrastructure via <a href=\"https://ansezz.com/blog/coolify-docker-saas-hosting/\">Docker and Coolify</a>.</p>\n<p>If your model logic is hardcoded into your frontend or a single monolithic backend, switching models is a nightmare. An API gateway solves this with smart routing.</p>\n<p><img src=\"https://ansezz.com/blog/api-gateway-ai-stack/routing-flow.webp\" alt=\"Smart routing — model aliases, tier-based routing, failover\" /></p>\n<p>I use the gateway to create \"model aliases.\" Instead of the frontend calling a specific model, it calls a generic endpoint like <code>/v1/chat/completions</code>. The gateway then decides where to send that request based on:</p>\n<ol>\n<li><strong>User tier</strong> — free users get routed to a cheaper, faster model like GPT-4o-mini. Pro users get the heavy hitters.</li>\n<li><strong>Versioning</strong> — run an A/B test by routing 10% of traffic to a new model version without changing a single line of client-side code.</li>\n<li><strong>Failover</strong> — if OpenAI is having an outage, the gateway can automatically reroute traffic to an Anthropic backup.</li>\n</ol>\n<p>This level of abstraction is what separates a weekend project from a resilient SaaS product.</p>\n<h2>Rate limiting: protecting the wallet</h2>\n<p>We used to rate limit to protect our CPUs. Now, we rate limit to protect our bank accounts.</p>\n<p>AI requests are asymmetric. A user sends a 50-word prompt, and the model might generate a 1,000-word response. The cost difference is massive.</p>\n<p>A good API gateway implementation allows for tiered rate limiting. Set global limits to prevent your entire system from being overwhelmed, but also set per-tenant or per-user limits.</p>\n<p>I usually implement this using Redis. The gateway checks the user's quota in real time. If they have exceeded their daily token limit or their requests-per-minute (RPM) cap, the gateway returns a <code>429 Too Many Requests</code> immediately.</p>\n<p>This saves your backend from doing expensive work that you won't get paid for. It also stops \"noisy neighbors\" — one user scripting an automated tool that hogs all your capacity and makes the app slow for everyone else.</p>\n<h2>Handling the AI-specific quirks</h2>\n<p>Gateways for AI need to handle two things differently than traditional web apps: streaming and long-running requests.</p>\n<h3>Streaming support</h3>\n<p>Most modern AI apps use Server-Sent Events (SSE) to stream responses word by word. Some older gateways or load balancers try to \"buffer\" the entire response before sending it to the client. This kills the user experience.</p>\n<p>Make sure your gateway (whether you are using <a href=\"https://konghq.com/\">Kong</a>, <a href=\"https://tyk.io/\">Tyk</a>, or a custom <a href=\"https://ansezz.com/blog/laravel-multi-tenancy/\">Laravel solution</a>) is configured to disable buffering for AI routes. The data should flow through the gateway like water through a pipe, not like a bucket that needs to be filled.</p>\n<h3>Extended timeouts</h3>\n<p>Traditional APIs expect a response in 1–2 seconds. A complex RAG query involving multiple vector searches and a large model generation might take 30 seconds or more.</p>\n<p>You need to adjust your gateway's \"upstream timeout\" settings. If you keep the default 5-second timeout, your users will see constant <code>504 Gateway Timeout</code> errors even when your models are working perfectly.</p>\n<p><img src=\"https://ansezz.com/blog/api-gateway-ai-stack/clean-code.webp\" alt=\"Clean code under load — gateway protecting model traffic\" /></p>\n<h2>Practical steps for your stack</h2>\n<p>You don't need a massive team to set this up. Here is how I usually approach it depending on the project size:</p>\n<ul>\n<li><strong>For startups</strong> — use a cloud-native gateway like AWS API Gateway or Azure API Management. They are serverless, scale automatically, and integrate directly with Cognito or Entra ID for auth.</li>\n<li><strong>For self-hosters</strong> — Kong is the gold standard. It has a great ecosystem of plugins for rate limiting and auth. If you are comfortable with PHP, a thin Laravel app acting as a gateway works surprisingly well for custom logic.</li>\n<li><strong>For Shopify devs</strong> — if you are building <a href=\"https://ansezz.com/blog/agentic-commerce-shopify/\">agentic commerce tools</a>, use the gateway to handle the specific Shopify HMAC validation before passing the request to your AI agents.</li>\n</ul>\n<h2>Wrapping up</h2>\n<p>The API gateway isn't just a piece of infrastructure. It is a design philosophy. It says that your AI logic is too valuable — and too expensive — to be left unprotected.</p>\n<p>By centralizing auth, routing, and rate limiting, you make your system more modular. You can swap models, change pricing tiers, and update security policies without touching the core code that makes your AI \"smart.\"</p>\n<p>Are you still letting your frontend talk directly to your LLM providers? If so, what is the one thing stopping you from putting a gateway in front of it?</p>\n<p>Stay sharp.\n— a senior dev</p>\n<hr />\n<h3>Actionable takeaways</h3>\n<ol>\n<li><strong>Centralize auth</strong> — never let your RAG orchestrator handle raw user authentication. Do it at the gateway.</li>\n<li><strong>Inject tenant context</strong> — use the gateway to verify the user and inject a <code>tenant_id</code> header to enforce data isolation.</li>\n<li><strong>Implement global + per-user limits</strong> — protect your wallet from both malicious attacks and accidental bugs.</li>\n<li><strong>Configure for streaming</strong> — ensure your gateway doesn't buffer responses, or your \"typing\" effect will break.</li>\n<li><strong>Use model aliases</strong> — route to <code>/chat/pro</code> instead of a specific model name to keep your stack flexible.</li>\n</ol>\n",
      "date_published": "2026-05-20T00:00:00.000Z",
      "date_modified": "2026-05-20T00:00:00.000Z",
      "tags": [
        "architecture",
        "api-gateway",
        "ai",
        "rag",
        "security",
        "rate-limiting",
        "multi-tenancy",
        "streaming",
        "infrastructure"
      ],
      "image": "https://ansezz.com/blog/api-gateway-ai-stack/hero.webp"
    },
    {
      "id": "https://ansezz.com/blog/shopify-storefront-web-components/",
      "url": "https://ansezz.com/blog/shopify-storefront-web-components/",
      "title": "Shopify Storefront Web Components: headless commerce for the rest of us",
      "summary": "Headless used to mean six engineers and a Hydrogen rebuild. Shopify Storefront Web Components let you drop products, collections, and cart into any HTML page with a script tag — no React, no build step, no DevOps tax.",
      "content_html": "<p>Headless used to mean hiring a team of six engineers just to show a \"Buy\" button.</p>\n<p>If you wanted a custom Shopify experience outside of their Liquid-based themes, you were usually forced into a massive architectural decision. You had to build a full React or Hydrogen app, manage server-side rendering, handle complex routing, and pray your SEO didn't tank. For many startups and established brands, this was like buying a semi-truck when all they needed was a bicycle.</p>\n<p>The complexity of traditional headless was a gatekeeper. It prevented simple content-driven sites on platforms like Astro or WordPress from easily selling products without a jarring transition to a subdomain. We've spent years over-engineering solutions for problems that could have been solved with a few custom elements.</p>\n<p>Shopify Storefront Web Components have changed the math. Now, adding commerce to any site is as simple as dropping in a script tag.</p>\n<h2>The headless headache</h2>\n<p>Building a headless store often feels like building a house from scratch. You have to worry about the foundation (hosting), the plumbing (GraphQL queries), and the wiring (state management).</p>\n<p>If you are a senior engineer, you know the drill. You spend weeks configuring Shopify Hydrogen or a custom Next.js setup just to get a cart that actually works. And once it's live, you're the one who has to maintain the Node.js environment and the middleware.</p>\n<p>It's expensive. It's slow to deploy. And for 80% of use cases, it is total overkill. We need a way to get the flexibility of headless without the technical debt of a heavy framework.</p>\n<p><img src=\"https://ansezz.com/blog/shopify-storefront-web-components/complexity-vs-simplicity.webp\" alt=\"Headless complexity vs simplicity\" /></p>\n<h2>Entering the era of custom elements</h2>\n<p>Shopify Storefront Web Components are framework-agnostic. They are standard Web Components (custom elements) that handle the heavy lifting of communicating with Shopify's Storefront API.</p>\n<p>You don't need React. You don't need a build step. You just need HTML.</p>\n<p>This \"headless light\" approach lets you embed products, collections, and a fully functional cart directly into your existing stack. Whether you are using a static site generator or a legacy CMS, these components act as bridge-builders. They let you keep your content where it is and pull the commerce in dynamically.</p>\n<h2>How to get started in 5 minutes</h2>\n<p>Getting these up and running is refreshing for any developer used to complex APIs. Here is the workflow I use when I want to move fast.</p>\n<h3>1. Connect your store</h3>\n<p>The first step is adding the script tag and the <code>&lt;shopify-store&gt;</code> component to your HTML. This establishes the connection to your Shopify domain.</p>\n<pre><code>&lt;script src=\"https://webcomponents.shopify.dev/components.js\"&gt;&lt;/script&gt;\n\n&lt;shopify-store store-domain=\"your-store.myshopify.com\"&gt;&lt;/shopify-store&gt;\n</code></pre>\n<p>If you need inventory counts or custom data, add a <code>public-access-token</code> from the Shopify headless channel. For basic title and price display, you don't even need that.</p>\n<h3>2. Define the context</h3>\n<p>The magic happens with <code>&lt;shopify-context&gt;</code>. This tells the page which product or collection it should be looking at. You just pass the handle from your Shopify admin.</p>\n<pre><code>&lt;shopify-context type=\"product\" handle=\"awesome-tshirt\"&gt;\n  &lt;template&gt;\n    &lt;!-- your content goes here --&gt;\n  &lt;/template&gt;\n&lt;/shopify-context&gt;\n</code></pre>\n<h3>3. Display the data</h3>\n<p>Inside that template, you use <code>&lt;shopify-data&gt;</code> to pull specific fields. It uses dot notation, making it feel very familiar to anyone who has worked with Liquid or JavaScript objects.</p>\n<pre><code>&lt;shopify-data query=\"product.title\"&gt;&lt;/shopify-data&gt;\n&lt;shopify-money query=\"product.variants.first.price\"&gt;&lt;/shopify-money&gt;\n</code></pre>\n<p><img src=\"https://ansezz.com/blog/shopify-storefront-web-components/ai-assisted-code.webp\" alt=\"AI-assisted Shopify code\" /></p>\n<h2>The secret sauce: llms.txt</h2>\n<p>Here is the part where it gets interesting for modern developers. Shopify has released a specific <a href=\"https://webcomponents.shopify.dev/llms.txt\">llms.txt</a> file.</p>\n<p>If you've been following the rise of <a href=\"https://ansezz.com/blog/agentic-workflows-vibe-coding/\">agentic workflows and vibe coding</a>, you know that providing context to an AI model is everything. By pointing your AI agent (like Claude or ChatGPT) to this text file, it learns exactly how to write code for these specific Web Components.</p>\n<p>This eliminates the hallucination problem. Instead of the AI guessing how a Shopify component should look, it uses the official spec. I've found that including this link in my system prompt lets me generate entire product landing pages in seconds that actually work on the first try.</p>\n<p>It turns \"development\" into \"orchestration.\" For more on how to leverage these tools, check out how <a href=\"https://ansezz.com/blog/mcp-context-aware-agents/\">MCP context-aware agents</a> are changing the way we handle technical docs.</p>\n<p><img src=\"https://ansezz.com/blog/shopify-storefront-web-components/llms-txt.webp\" alt=\"llms.txt verification\" /></p>\n<h2>When to use Web Components vs Hydrogen</h2>\n<p>As a senior engineer, you have to pick the right tool for the job. Here is my rule of thumb.</p>\n<p><strong>Use Storefront Web Components if:</strong></p>\n<ul>\n<li>You have an existing marketing site and want to add \"buy buttons\" or a mini-cart.</li>\n<li>You are building a landing page and need speed above all else.</li>\n<li>Your team doesn't want to maintain a React/Node.js infrastructure.</li>\n<li>You want to stay framework-agnostic.</li>\n</ul>\n<p><strong>Use Shopify Hydrogen if:</strong></p>\n<ul>\n<li>You are building a mission-critical, full-scale custom storefront.</li>\n<li>You need complex server-side logic and deep integrations with multiple APIs.</li>\n<li>You require the absolute highest level of performance optimization via streaming and edge rendering.</li>\n</ul>\n<p>For most people starting out or modernizing a digital presence, Web Components are the winning choice. They offer the best balance of <a href=\"https://ansezz.com/blog/ai-vs-traditional-development/\">AI-driven development</a> and production stability.</p>\n<h2>Takeaways for the modern dev</h2>\n<ul>\n<li>Web Components are the \"headless light\" solution we've been waiting for.</li>\n<li>They work anywhere — WordPress, Astro, plain HTML, or even a local static file.</li>\n<li>Use the <code>llms.txt</code> file to train your AI assistant for perfect code generation.</li>\n<li>Avoid over-engineering — if you don't need a massive React app, don't build one.</li>\n</ul>\n<p>Building for the web in 2026 is about reducing friction. Shopify has finally removed the friction from headless.</p>\n<p>Are you still building full React apps for simple stores, or are you ready to embrace the simplicity of custom elements? Drop a note via <a href=\"https://ansezz.com/contact/\">contact</a> — I love this conversation. 🤘</p>\n",
      "date_published": "2026-05-19T00:00:00.000Z",
      "date_modified": "2026-05-19T00:00:00.000Z",
      "tags": [
        "shopify",
        "shopify",
        "web-components",
        "headless",
        "llms-txt",
        "hydrogen",
        "agentic",
        "storefront-api"
      ],
      "image": "https://ansezz.com/blog/shopify-storefront-web-components/hero.webp"
    },
    {
      "id": "https://ansezz.com/blog/agentic-commerce-shopify/",
      "url": "https://ansezz.com/blog/agentic-commerce-shopify/",
      "title": "Why agentic commerce will change the way you build Shopify stores",
      "summary": "AI agents don't browse — they query. The shift from human-centric Shopify themes to agent-ready infrastructure: Shopify Catalog, UCP, Agentic Storefronts, MCP servers, and why structured data is the new CSS.",
      "content_html": "<p>Your beautiful Shopify theme is about to become invisible.</p>\n<p>For a decade, we have obsessed over conversion rate optimization, pixel-perfect layouts, and fast-loading Liquid templates. We built for human eyes. But a new type of shopper is arriving at the digital storefront. They don't have eyes, they don't click buttons, and they certainly don't care about your hero slider.</p>\n<p>They are autonomous AI agents.</p>\n<p>The shift from human-centric browsing to agentic commerce is the biggest architectural pivot Shopify has ever seen. If you are still just \"the theme guy\" or \"the Liquid expert,\" your skillset is about to hit a massive wall. I have been building on the web for over ten years, and I can tell you that the era of building for agents is here. It is time to move from designing pixels to designing protocols.</p>\n<h3>The problem: the end of the traditional funnel</h3>\n<p>Most developers are still stuck in the old way of thinking. We build a site, drive traffic to a landing page, and hope the user navigates to the checkout. This is a manual, high-friction process.</p>\n<p>Today, buyers are increasingly using tools like ChatGPT, Gemini, or Microsoft Copilot to do the \"work\" of shopping. They ask for a waterproof hiking boot under $150 with specific shipping requirements. The AI doesn't visit your store to browse. It queries data. If your store isn't built to be \"read\" by an agent, you simply don't exist in that transaction.</p>\n<p><img src=\"https://ansezz.com/blog/agentic-commerce-shopify/shopping-flows.webp\" alt=\"Bento grid of agentic shopping flows\" /></p>\n<h3>The agitation: why your current apps aren't enough</h3>\n<p>We have spent years building \"utility\" apps that handle simple automations. But Shopify is moving fast to pull those features into the core. With tools like Shopify Magic and Sidekick, the basic stuff — copywriting, image editing, simple workflows — is being commoditized.</p>\n<p>The real value is moving deeper into the stack. Agents need more than just a product description. They need structured data, real-time inventory logic, and a way to execute a checkout without a browser window. If you are relying on \"fake\" variants or messy metafields, you are breaking the agent's ability to reason about your store.</p>\n<p>When an agent fails to understand your product structure, the buyer gets a \"no results found\" or a hallucinated alternative. That is a lost sale for your client and a failed project for you.</p>\n<h3>The solution: welcome to agentic commerce</h3>\n<p>Agentic commerce is a system where AI agents take over the discovery, comparison, and execution phases of shopping. Shopify is already laying the groundwork for this with three major pillars:</p>\n<ol>\n<li><strong>Shopify Catalog</strong> — a centralized layer that broadcasts your structured data to AI platforms.</li>\n<li><strong>UCP (Universal Commerce Protocol)</strong> — a standard developed with companies like Google to allow agents to handle the entire journey, from discovery to cart and order.</li>\n<li><strong>Agentic Storefronts</strong> — a layer that packages your catalog and checkout into a form AI agents can consume programmatically.</li>\n</ol>\n<p>As developers, our job is shifting. We are no longer just building visual storefronts. We are building \"agent-ready\" infrastructure.</p>\n<p><img src=\"https://ansezz.com/blog/agentic-commerce-shopify/ui-comparison.webp\" alt=\"UI comparison — human vs agent storefront\" /></p>\n<h3>Why schema is the new CSS</h3>\n<p>In the agentic era, your structured data is your most important asset. A perfectly optimized JSON-LD schema or a clean set of Shopify Metaobjects is now more valuable than a fancy hover effect.</p>\n<p>Agents rely on clarity. They need to know if a product is compatible with another, what the specific material breakdown is, and exactly when it will arrive. If you want to see how I've helped businesses modernize their digital presence through better architecture, check out some of my <a href=\"https://ansezz.com/work/\">previous work</a>.</p>\n<p>The focus for developers must move toward:</p>\n<ul>\n<li><strong>Native variants only</strong> — stop using \"split products\" or custom Liquid hacks to show variants. Agents need standard <code>ProductVariant</code> records to function.</li>\n<li><strong>Rich Metaobjects</strong> — use these to build knowledge bases that agents can query. Think compatibility tables, brand story, and detailed technical specs.</li>\n<li><strong>GraphQL mastery</strong> — the Shopify Admin and Storefront APIs are the primary languages of agents. If you aren't comfortable with complex GraphQL queries, you are already behind.</li>\n</ul>\n<h3>Building custom agents with MCP and Shopify</h3>\n<p>The most exciting part of this shift is building our own agents. The Model Context Protocol (MCP) allows us to connect AI models to external data sources like the Shopify API.</p>\n<p>Imagine building an \"inventory agent\" for a merchant that doesn't just alert them when stock is low. Instead, it queries recent sales trends via the Admin API, checks lead times from a supplier, and suggests a restock amount — all through a chat interface.</p>\n<p>This is where tools like Laravel and Docker come in handy. We can build custom middleware that acts as an MCP server, exposing specific Shopify \"tools\" to an AI agent. This is the type of deep technical work that differentiates a senior engineer from a freelancer who just installs themes.</p>\n<p><img src=\"https://ansezz.com/blog/agentic-commerce-shopify/mcp-architecture.webp\" alt=\"MCP + Shopify architecture diagram\" /></p>\n<h3>The Shopify developer's new roadmap</h3>\n<p>To thrive in the next five years, I recommend focusing on these specific technical areas.</p>\n<p><strong>1. Data hygiene as a service</strong>\nStart offering \"agent-readiness\" audits. Clean up product data, normalize attributes, and ensure all metadata is machine-readable. This is the new SEO.</p>\n<p><strong>2. Headless and API-first builds</strong>\nWhile Liquid isn't dead, headless architectures using Hydrogen and Oxygen are much more aligned with the agentic future. They force you to think in terms of data and APIs rather than just templates.</p>\n<p><strong>3. RAG and vector databases</strong>\nLearn how to use RAG (Retrieval-Augmented Generation) and tools like pgvector. This allows you to build agents that can search through thousands of product reviews or support docs to provide \"expert\" advice to shoppers.</p>\n<p><strong>4. Focus on the \"agentic plan\"</strong>\nShopify is rolling out levels of agentic integration. Stay ahead by understanding how to implement Level 3 — where your store is fully AI-native and provides direct API access for agents to build carts and checkout.</p>\n<p><img src=\"https://ansezz.com/blog/agentic-commerce-shopify/shopping-agent.webp\" alt=\"Customer using an AI shopping agent\" /></p>\n<h3>Final takeaways for the agentic shift</h3>\n<p>The change is happening faster than most of us realize. By 2026, AI-driven traffic will be a double-digit percentage of total orders on major platforms.</p>\n<p>If you want to stay relevant, here is your checklist:</p>\n<ul>\n<li>Move your product logic into native Shopify structures.</li>\n<li>Double down on GraphQL and API integration.</li>\n<li>Start experimenting with MCP servers and local LLMs for internal store operations.</li>\n<li>Remember that the agent is your new \"user.\" Optimize for its understanding first.</li>\n</ul>\n<p>We are moving into a world where commerce is autonomous and frictionless. It is an incredible time to be a developer if you are willing to learn the new protocols. If you want to learn more about my background in high-quality web applications, you can read more <a href=\"https://ansezz.com/about/\">about me</a>.</p>\n<p>The question is — are you building a store that only humans can see, or are you building a store that the whole world can buy from?</p>\n<p>Are you ready to start building your first AI-native Shopify tool, or are you still holding onto your CSS hacks? Drop a note via <a href=\"https://ansezz.com/contact/\">contact</a> — I love this conversation. 🤘</p>\n",
      "date_published": "2026-05-18T00:00:00.000Z",
      "date_modified": "2026-05-18T00:00:00.000Z",
      "tags": [
        "shopify",
        "shopify",
        "agentic-commerce",
        "ai",
        "mcp",
        "hydrogen",
        "graphql",
        "ucp",
        "metaobjects"
      ],
      "image": "https://ansezz.com/blog/agentic-commerce-shopify/hero.webp"
    },
    {
      "id": "https://ansezz.com/blog/7-rag-mistakes-production/",
      "url": "https://ansezz.com/blog/7-rag-mistakes-production/",
      "title": "7 mistakes you're making with your production RAG stack (and how to fix them)",
      "summary": "Naive chunking, no reranker, embedding drift, latency blowups, vibe-checking — the seven structural mistakes that turn a slick RAG demo into a production nightmare, and the fixes that actually ship.",
      "content_html": "<p>Getting a RAG (retrieval-augmented generation) demo working is easy. You take a few PDFs, throw them into a vector database like Chroma or Pinecone, and ask a question. It feels like magic.</p>\n<p>But shipping RAG to production is where the magic dies.</p>\n<p>I've seen too many teams launch a feature only to realize that their users are getting irrelevant answers, waiting 10 seconds for a response, or worse, getting hit with \"I don't know\" for questions that are clearly in the documentation. When the \"vibe check\" fails at scale, your users lose trust.</p>\n<p>You're likely making at least one of these seven structural mistakes that turn a cool demo into a production nightmare. I've spent the last few years building <a href=\"https://ansezz.com/work/\">custom web applications</a> and AI systems, and I've had to fix these same leaks in my own stacks.</p>\n<p>Here is how to bridge the gap between \"it works on my machine\" and a production-grade AI system.</p>\n<h2>1. Naive chunking is killing your context</h2>\n<p>Most people start with a simple character-based or token-based splitter. You tell the library to \"give me chunks of 500 tokens with a 50-token overlap.\"</p>\n<p>This is a mistake.</p>\n<p>This \"naive chunking\" treats your data like raw soup. It might cut a sentence in half, split a table in the middle of a row, or separate a coding example from the explanation that precedes it. If the retriever pulls only one of those halves, the LLM has zero chance of giving a correct answer.</p>\n<p><strong>The fix:</strong> use semantic or structural chunking.</p>\n<p>I always recommend chunking based on the actual structure of the document first. Use headers (H1, H2, H3), paragraphs, or even markdown delimiters to ensure related ideas stay together. If you're working with complex data, consider recursive character splitting that respects newlines and punctuation before falling back to raw token counts.</p>\n<p><img src=\"https://ansezz.com/blog/7-rag-mistakes-production/chunking.webp\" alt=\"Structural vs naive chunking illustration\" /></p>\n<h2>2. Skipping the reranker step</h2>\n<p>Vector search is great at finding \"roughly similar\" stuff, but it's not a precision instrument. It relies on cosine similarity, which can be easily fooled by documents that share a similar \"vibe\" but don't actually contain the answer.</p>\n<p>If you're just taking the top 5 results from your vector store and shoving them into your LLM prompt, you're leaving quality on the table.</p>\n<p><strong>The fix:</strong> add a reranking step.</p>\n<p>I look at retrieval as a two-stage process. Stage one is the \"fast and broad\" search where you pull the top 20 or 50 candidates from your vector database. Stage two is using a cross-encoder or a specialized reranking model (like Cohere's Rerank or BGE-Reranker) to score those 50 candidates against the query more accurately.</p>\n<p>The reranker acts like a bouncer at a club. It doesn't care if a document looks \"okay.\" It only lets in the ones that are actually relevant to the question.</p>\n<p><img src=\"https://ansezz.com/blog/7-rag-mistakes-production/reranker.webp\" alt=\"Two-stage retrieval with reranker\" /></p>\n<h2>3. Ignoring embedding drift and versioning</h2>\n<p>This is the silent killer. I've seen teams upgrade their embedding model from <code>text-embedding-ada-002</code> to <code>text-embedding-3-small</code> without re-indexing their entire database.</p>\n<p>Suddenly, the vectors being generated for new queries don't \"line up\" with the vectors stored in the index. The similarity scores go haywire. Even worse is when you change the preprocessing logic (like how you format the chunks) but keep the old vectors.</p>\n<p><strong>The fix:</strong> pin your models and version your index.</p>\n<p>Treat your embedding model like a database schema. If you change the model, you must re-index. I always include the model name and version in the metadata of every index I build. This way, if I need to test a new model, I can run them side-by-side without breaking the production flow. My <a href=\"https://ansezz.com/about/\">experience in cloud infrastructure</a> has taught me that consistency is better than a \"better\" model that doesn't match its data.</p>\n<h2>4. The \"needle in a haystack\" latency problem</h2>\n<p>Everyone wants more context. We see context windows of 128k or even 1M tokens and think, \"great, I'll just give the LLM everything!\"</p>\n<p>This is a trap for two reasons. First, latency — feeding 50k tokens of context into an LLM can make your response time balloon to 20 or 30 seconds. Second, models still struggle with \"lost in the middle\" problems: they tend to ignore information buried in the center of a massive context window.</p>\n<p><strong>The fix:</strong> optimize your latency budget.</p>\n<p>I start with a \"latency budget.\" If the user expects a response in under 2 seconds, I can't afford to send 20 chunks. I limit my retrieval to the top 3–5 high-quality chunks and use streaming as soon as the first token is ready.</p>\n<p>If you need more data, consider using a multi-step approach: use a cheaper model to summarize the retrieved chunks before passing the refined info to your main model.</p>\n<p><img src=\"https://ansezz.com/blog/7-rag-mistakes-production/latency.webp\" alt=\"Latency budget illustration\" /></p>\n<h2>5. Forgetting about hybrid search</h2>\n<p>Vector search is terrible at finding specific keywords or unique identifiers. If a user asks for \"error code XF-904,\" a vector search might return documents about \"general error handling\" because the \"vibe\" is similar. But it might miss the one specific document that actually mentions \"XF-904.\"</p>\n<p><strong>The fix:</strong> implement hybrid search.</p>\n<p>I always combine dense vector search with traditional sparse search (like BM25). By blending these two results using something like Reciprocal Rank Fusion (RRF), you get the best of both worlds. You get the semantic understanding of vectors and the keyword precision of full-text search. This is non-negotiable for enterprise search or technical documentation.</p>\n<h2>6. Failing to filter by metadata</h2>\n<p>If your RAG system contains documents for different clients, versions, or dates, pure vector search will betray you. You might ask about \"API changes in 2024\" and get results from 2022 because they share similar keywords.</p>\n<p>Relying on the LLM to \"ignore\" the wrong dates in the context is a waste of tokens and a recipe for hallucinations.</p>\n<p><strong>The fix:</strong> use hard metadata filters.</p>\n<p>Before the vector search even happens, apply filters. If you know the user is looking for \"v2\" of your documentation, filter the vector query to only include chunks with <code>version: '2'</code>. This drastically reduces the search space and improves accuracy. I use this heavily when building <a href=\"https://ansezz.com/work/\">Shopify apps</a> where data must be strictly siloed by shop ID.</p>\n<h2>7. Vibe-based evaluation</h2>\n<p>How do you know your RAG stack is getting better? Most devs just ask a few questions, see that the answer looks okay, and ship it.</p>\n<p>This is called \"vibe-checking,\" and it doesn't work. When you change a prompt or a chunk size, you might improve one answer while breaking ten others you didn't check.</p>\n<p><strong>The fix:</strong> build a golden evaluation set.</p>\n<p>I use the Ragas framework or simple LLM-as-a-judge patterns to run automated evals. I maintain a \"golden set\" of 50–100 questions with ground-truth answers. Every time I change the architecture, I run the eval and look for three metrics:</p>\n<ol>\n<li><strong>Faithfulness</strong> — is the answer actually derived from the context?</li>\n<li><strong>Answer relevance</strong> — does it answer the user's question?</li>\n<li><strong>Context precision</strong> — are the retrieved chunks actually useful?</li>\n</ol>\n<p><img src=\"https://ansezz.com/blog/7-rag-mistakes-production/evals.webp\" alt=\"Evaluation harness illustration\" /></p>\n<h3>Practical takeaways for your stack</h3>\n<ul>\n<li><strong>Start with metadata</strong> — don't let the vector database guess. If you have categories or dates, use them as hard filters.</li>\n<li><strong>Rerank by default</strong> — it's the single biggest quality jump you can make for the lowest effort.</li>\n<li><strong>Monitor retrieval, not just generation</strong> — if your retriever fails, the best LLM in the world can't save you. Log your top-k retrieval results separately.</li>\n<li><strong>Don't over-engineer</strong> — sometimes a simple long-context prompt is better than a complex agentic workflow. Measure before you add complexity.</li>\n</ul>\n<p>Building production RAG is a game of millimeters. It's about cleaning your data, pinning your models, and actually measuring what's happening under the hood.</p>\n<p>I've spent years moving from \"it works\" to \"it's reliable.\" If you're struggling with a specific part of your AI pipeline, what's the one thing that's currently keeping you from hitting that \"deploy\" button?</p>\n<p>Drop a comment or <a href=\"https://ansezz.com/contact/\">reach out</a> if you're hitting a wall with your architecture. 🤘</p>\n",
      "date_published": "2026-05-17T00:00:00.000Z",
      "date_modified": "2026-05-17T00:00:00.000Z",
      "tags": [
        "ai",
        "rag",
        "ai",
        "llm",
        "pgvector",
        "reranker",
        "hybrid-search",
        "evals",
        "production"
      ],
      "image": "https://ansezz.com/blog/7-rag-mistakes-production/hero.webp"
    },
    {
      "id": "https://ansezz.com/blog/scaling-with-rabbitmq/",
      "url": "https://ansezz.com/blog/scaling-with-rabbitmq/",
      "title": "Scaling with RabbitMQ: why message brokers matter",
      "summary": "Synchronous controllers are how monoliths die. RabbitMQ basics, exchanges and queues, the strangler pattern for going async, idempotent workers, and the Laravel queue setup I use to absorb 100k-row spikes without breaking the login page.",
      "content_html": "<p>The monolith is screaming. Every time a user hits the \"checkout\" button, your server has to generate a PDF, send a welcome email, update the inventory, and ping three different third-party APIs. Your request/response cycle is hanging by a thread. If any of those external services take more than two seconds to respond, your user sees a 504 gateway timeout.</p>\n<p>It starts with a small delay. Then it becomes a bottleneck. Before you know it, you are throwing more RAM at a problem that cannot be solved by bigger hardware. This is the \"monolith wall.\" When everything is synchronous, a single failure in a secondary task brings down the entire user experience.</p>\n<p>I have been in these trenches. I have watched dashboards turn red during a marketing spike because the database was too busy processing background reports to handle new signups. The solution isn't just \"faster code.\" It is a change in how your services talk to each other. It is about <strong>decoupling</strong>. It is about <strong>RabbitMQ</strong>.</p>\n<h2>Why your request path is too crowded</h2>\n<p>In a standard web application, we often fall into the trap of doing too much inside the controller. A user makes a request, and we feel the need to finish every related task before sending back a \"200 OK.\" This is fine for a side project with ten users. For a scaling SaaS, it is a recipe for disaster.</p>\n<p>Think of it like a coffee shop. If the person taking your order also has to grind the beans, froth the milk, and hand-draw the logo on the cup before taking the next order, the line will wrap around the block. The shop fails because the cashier is \"tightly coupled\" to the barista's work.</p>\n<p>To scale, you need a system where the cashier takes the order, writes it on a slip, and hands it off. They are immediately free for the next customer. The work happens \"in the background.\" That slip of paper is your message. The counter where they put the slips is your message broker.</p>\n<h2>The RabbitMQ magic: more than just a queue</h2>\n<p>RabbitMQ is an open-source message broker that acts as the \"middleware\" for your architecture. It doesn't just store messages — it routes them with surgical precision.</p>\n<p>At its core, RabbitMQ uses a few key concepts:</p>\n<ul>\n<li><strong>Producers</strong> — your web applications or APIs that create a task.</li>\n<li><strong>Exchanges</strong> — the \"post office\" that decides which queue a message should go to based on rules.</li>\n<li><strong>Queues</strong> — the temporary storage where messages sit until they are processed.</li>\n<li><strong>Consumers</strong> — the background workers (often running in <a href=\"https://ansezz.com/work/\">Docker containers</a>) that actually do the heavy lifting.</li>\n</ul>\n<p>By putting RabbitMQ in the middle, your web tier only needs to do one thing: tell RabbitMQ that a task needs to be done. This takes milliseconds. The user gets an instant confirmation, while the heavy work happens whenever your workers are ready.</p>\n<p><img src=\"https://ansezz.com/blog/scaling-with-rabbitmq/architecture-diagram.webp\" alt=\"Architecture diagram of producers, exchanges, queues, and consumers\" /></p>\n<h2>Smoothing out the spikes</h2>\n<p>One of the biggest pains in <a href=\"https://ansezz.com/\">custom web development</a> is handling \"noisy neighbors\" or sudden traffic bursts. If a large enterprise client uploads a 100,000-row CSV for processing, you don't want that to slow down the login page for everyone else.</p>\n<p>With RabbitMQ, those 100,000 rows become 100,000 individual messages in a queue. Your workers will chew through them at a steady pace. If the queue gets too long, you don't need to scale your entire application — you just spin up more worker instances.</p>\n<p>This is called <strong>horizontal scaling</strong>. Since the workers are decoupled from the web server, you can scale them independently based on the specific load. If you use modern tools like Laravel and Vue, you can easily manage these background jobs using built-in queue drivers that talk directly to RabbitMQ.</p>\n<h2>How to move from sync to async</h2>\n<p>You don't have to rewrite your entire codebase overnight. I usually recommend the \"strangler pattern.\" Pick one slow, non-critical process. Maybe it is the \"forgot password\" email or an image resize task.</p>\n<p>Here is a simplified look at how you might dispatch a job in a modern PHP environment:</p>\n<pre><code>// instead of sending the email directly\n// $emailService-&gt;sendWelcome($user);\n\n// we dispatch a job to RabbitMQ\nProcessWelcomeEmail::dispatch($user)-&gt;onQueue('high-priority');\n\n// the user gets a response instantly\nreturn response()-&gt;json([\n    'message' =&gt; 'welcome! check your inbox soon.',\n]);\n</code></pre>\n<p>Now, even if your email provider (like SendGrid or Mailgun) is having a bad day, your application stays up. The message stays safely in the RabbitMQ queue until the service is back online.</p>\n<h2>Building for the future</h2>\n<p>Moving to a message-broker-first mindset is the first step toward a microservices architecture. Once your monolith starts publishing \"events\" (like <code>order.placed</code> or <code>user.registered</code>), other services can start listening to those events without you ever changing the original code.</p>\n<p>It creates a system that is resilient, observable, and significantly easier to debug. You can look at the RabbitMQ management UI and see exactly how many tasks are pending and how fast they are being processed. No more guessing why the server is slow.</p>\n<p><img src=\"https://ansezz.com/blog/scaling-with-rabbitmq/monitoring-dashboard.webp\" alt=\"Monitoring dashboard mockup of RabbitMQ queue depth and throughput\" /></p>\n<blockquote>\n<p>Already running on GCP? The same patterns apply with <a href=\"https://ansezz.com/blog/event-driven-pubsub/\">Google Pub/Sub</a> — pick the broker that matches your hosting stack, not the trend cycle.</p>\n</blockquote>\n<h2>Key takeaways for your next build</h2>\n<ul>\n<li><strong>Don't block the user.</strong> If a task takes more than 100ms, it probably belongs in a queue.</li>\n<li><strong>Decouple early.</strong> Use RabbitMQ to separate your \"thinking\" (web tier) from your \"doing\" (workers).</li>\n<li><strong>Idempotency is key.</strong> Since messages can sometimes be delivered twice, make sure your workers can handle the same task more than once without causing errors.</li>\n<li><strong>Monitor your queues.</strong> A massive queue is a leading indicator that you need more workers or that a service is failing.</li>\n</ul>\n<p>Scaling a SaaS isn't about working harder. It is about working smarter by giving your data room to breathe. RabbitMQ is that breathing room.</p>\n<p>What is the slowest part of your application right now? Could it be a background job instead? <a href=\"https://ansezz.com/contact/\">Tell me</a> — I bet we can move it off the request path.</p>\n<p><img src=\"https://ansezz.com/blog/scaling-with-rabbitmq/final-visual.webp\" alt=\"Pop-art final visual of a calm web tier while workers chew through background jobs\" /></p>\n",
      "date_published": "2026-05-16T00:00:00.000Z",
      "date_modified": "2026-05-16T00:00:00.000Z",
      "tags": [
        "architecture",
        "rabbitmq",
        "message-broker",
        "queues",
        "decoupling",
        "scaling",
        "laravel",
        "async",
        "architecture"
      ],
      "image": "https://ansezz.com/blog/scaling-with-rabbitmq/hero.webp"
    },
    {
      "id": "https://ansezz.com/blog/event-driven-pubsub/",
      "url": "https://ansezz.com/blog/event-driven-pubsub/",
      "title": "Mastering event-driven architecture with Google Pub/Sub",
      "summary": "Decouple your services or drown in latency. Topics, fan-out, push vs pull, dead-letter queues, idempotent consumers, and the Laravel integration I run on Google Cloud — a practical EDA blueprint from a senior engineer.",
      "content_html": "<p>Building a modern web application usually starts simple. You have a request and you send a response. But as your business grows, that simple flow starts to feel heavy. Maybe you need to send a welcome email, update a CRM, and trigger a data warehouse sync all at once. If you do this synchronously, your users are stuck staring at a loading spinner. If one service fails, the whole request dies. Your system becomes a house of cards.</p>\n<p>This is the problem of tight coupling. Your application logic is tangled like old headphones in a pocket. Every new feature adds more risk and more latency. You want to scale, but your monolithic approach is holding you back. You need a way to let your services talk without being glued together.</p>\n<p>The solution is <strong>event-driven architecture</strong> (EDA). And in the Google Cloud world, the heart of that architecture is <strong>Google Pub/Sub</strong>. It is a globally distributed messaging service that decouples the services that produce events from the services that consume them. It allows you to build systems that are truly scalable, resilient, and ready for the future of AI and big data.</p>\n<h2>Understanding topics and subscriptions</h2>\n<p><img src=\"https://ansezz.com/blog/event-driven-pubsub/architecture-diagram.webp\" alt=\"Architecture diagram of a Pub/Sub topic with multiple subscriptions fanning out\" /></p>\n<p>At its core, Google Pub/Sub is built on two main concepts: <strong>topics</strong> and <strong>subscriptions</strong>. I like to think of a topic as a radio station. It broadcasts information out into the void. It doesn't care who is listening or what they do with the music. It just plays the hits.</p>\n<p>On the other side, you have subscriptions. These are the listeners. A subscription represents a stream of messages from a specific topic. The beauty of this system is the decoupling. The service sending the message (the publisher) only needs to know about the topic. It doesn't need to know if there are ten consumers or zero.</p>\n<p>In a typical <a href=\"https://ansezz.com/\">software development</a> workflow, this is a game changer. When a user signs up on your site, you publish a <code>UserSignedUp</code> event to a topic. Your main app is done. It returns a success message to the user immediately. Meanwhile, various subscribers pick up that event and do their jobs in the background.</p>\n<h2>The power of fan-out</h2>\n<p>One of the most effective patterns in Google Pub/Sub is the fan-out. This is where you publish a single message to a topic, but multiple subscriptions receive a copy of that message.</p>\n<p>Imagine you are running an e-commerce store. When an order is placed, you might have three different services that need to act:</p>\n<ol>\n<li>An inventory service to update stock levels.</li>\n<li>A shipping service to generate a label.</li>\n<li>An analytics service to track revenue.</li>\n</ol>\n<p>Instead of your checkout service calling three different APIs, it sends one message to an <code>order-events</code> topic. Three separate subscriptions (one for inventory, one for shipping, one for analytics) each get their own copy of that order message. They process it at their own pace. If the analytics service is down for maintenance, it doesn't stop the shipping label from being created. The messages just wait in the queue until the service is back online.</p>\n<h2>Pull vs push delivery</h2>\n<p><img src=\"https://ansezz.com/blog/event-driven-pubsub/dashboard-mockup.webp\" alt=\"Dashboard mockup comparing pull and push subscription metrics\" /></p>\n<p>When you set up a subscription, you have to decide how you want to receive messages. Google Pub/Sub gives you two main options: <strong>pull</strong> and <strong>push</strong>.</p>\n<p><strong>Push subscriptions</strong> are great for serverless architectures. Google Cloud will literally \"push\" the message to a webhook URL you provide. This is perfect for <a href=\"https://ansezz.com/work/\">cloud infrastructure</a> built on Cloud Run or Cloud Functions. It scales automatically and you only pay for what you use. However, you have to make sure your endpoint can handle the sudden spikes in traffic.</p>\n<p><strong>Pull subscriptions</strong> work differently. Your consumer service asks Google Pub/Sub for messages when it is ready. This gives you much more control over backpressure. If your worker is busy, it doesn't ask for more work. This is the preferred method for long-running services or when you are using tools like Laravel's queue workers. Pull delivery is generally more robust for heavy processing tasks where you want to fine-tune concurrency.</p>\n<h2>Building resilient systems with DLQs</h2>\n<p><img src=\"https://ansezz.com/blog/event-driven-pubsub/dlq-illustration.webp\" alt=\"Pop-art illustration of a dead-letter queue catching poison messages\" /></p>\n<p>In a distributed system, things will fail. A database might time out or an external API might be down. If a message can't be processed, you don't want to lose it. This is where <strong>Dead Letter Queues</strong> (DLQs) come in.</p>\n<p>A DLQ is just another topic where Google Pub/Sub sends messages that have failed to be acknowledged after a certain number of attempts. Instead of retrying forever and clogging up your main pipeline, the \"poison\" message is moved aside.</p>\n<p>I always recommend setting up a DLQ for every critical subscription. It acts as a safety net. You can then build a separate dashboard or a small script to inspect these failed messages, fix the underlying issue, and replay them. It is a professional approach to error handling that prevents data loss and keeps your system moving.</p>\n<h2>Integrating Google Pub/Sub with Laravel</h2>\n<p>For those of us in the <a href=\"https://ansezz.com/\">PHP and Laravel</a> ecosystem, integrating Google Pub/Sub is incredibly smooth. While Laravel comes with great support for Redis and SQS, using a package like <code>google/cloud-pubsub</code> allows you to tap into GCP's global scale.</p>\n<p>You can treat Google Pub/Sub as a custom queue driver. Here is a quick look at how you might publish a message in a typical service class:</p>\n<pre><code>use Google\\Cloud\\PubSub\\PubSubClient;\n\n$pubsub = new PubSubClient([\n    'projectId' =&gt; 'your-gcp-project-id',\n]);\n\n$topic = $pubsub-&gt;topic('user-events');\n\n$topic-&gt;publish([\n    'data' =&gt; json_encode([\n        'user_id' =&gt; 123,\n        'action' =&gt; 'signup',\n    ]),\n    'attributes' =&gt; [\n        'event_type' =&gt; 'UserSignedUp',\n        'priority' =&gt; 'high',\n    ],\n]);\n</code></pre>\n<p>By using attributes, you can even filter messages at the subscription level. This means a subscriber can choose to only listen for messages where <code>event_type</code> is <code>UserSignedUp</code>. This saves compute power and money because your worker never even sees the messages it doesn't care about.</p>\n<h2>Monitoring and cost management</h2>\n<p>Monitoring is not an afterthought. It is a requirement. Google Cloud provides deep integration with Cloud Monitoring for Google Pub/Sub. You should keep a close eye on your \"unacked message count.\" If this number is climbing, it means your subscribers can't keep up with the producers.</p>\n<p>Cost is another factor to watch. Google Pub/Sub is very cheap for low volumes, but as you scale to millions of messages, those bytes add up. Use batching on the publisher side to reduce the number of API calls. Also, be mindful of message retention. If you don't need to keep messages for seven days, shorten the retention period to save on storage costs.</p>\n<h2>Wrap up and takeaways</h2>\n<p>Moving to an event-driven architecture with Google Pub/Sub is a major step toward building senior-level systems. It gives you the flexibility to grow your application without it becoming a tangled mess. It is the backbone of many high-performance <a href=\"https://ansezz.com/\">web applications</a> I build for clients today.</p>\n<p>Here are the key takeaways for your next project:</p>\n<ol>\n<li><strong>Start by identifying \"facts\"</strong> in your system (e.g., <code>OrderPlaced</code>) and turn them into events.</li>\n<li><strong>Use the fan-out pattern</strong> to keep your services decoupled and focused on one task.</li>\n<li><strong>Always implement a Dead Letter Queue</strong> to handle failures gracefully.</li>\n<li><strong>Use message attributes for efficient filtering</strong> at the subscription level.</li>\n<li><strong>Design your consumers to be idempotent.</strong> If they receive the same message twice, it doesn't cause errors or double-charges.</li>\n</ol>\n<p>Building these kinds of systems takes a bit more planning upfront, but the payoff in stability and scalability is worth every second.</p>\n<p>Are you still using synchronous API calls for everything, or have you started moving toward an event-driven flow? <a href=\"https://ansezz.com/contact/\">Let me know what's stopping you</a> from making the switch.</p>\n",
      "date_published": "2026-05-02T00:00:00.000Z",
      "date_modified": "2026-05-02T00:00:00.000Z",
      "tags": [
        "architecture",
        "pub-sub",
        "event-driven",
        "gcp",
        "google-cloud",
        "laravel",
        "messaging",
        "scalability",
        "architecture"
      ],
      "image": "https://ansezz.com/blog/event-driven-pubsub/hero.webp"
    },
    {
      "id": "https://ansezz.com/blog/coolify-docker-saas-hosting/",
      "url": "https://ansezz.com/blog/coolify-docker-saas-hosting/",
      "title": "Effortless SaaS hosting: the Coolify and Docker deployment guide",
      "summary": "Heroku DX, your own server, none of the cloud tax. How Coolify + Docker on a $5 VPS replaces vendor-lock managed platforms with a control plane you actually own — one-click databases, automatic SSL, and zero-downtime deploys.",
      "content_html": "<p>Shipping a SaaS is hard enough without the constant anxiety of a \"surprise\" bill from your hosting provider. I have seen developers start a project on a managed platform only to find that as soon as they add a second team member or cross a certain bandwidth threshold, their costs skyrocket. You are trapped between paying a \"convenience tax\" that eats your margins or spending your entire weekend fighting with Nginx configurations and manual SSH commands. It feels like you are either overpaying for simplicity or overworking for control.</p>\n<p>The agitation grows when you realize that most managed platforms are essentially wrappers around the same open source tools you could run yourself. You are paying for a pretty dashboard and an easy git-push flow. But when you try to leave, you find yourself deep in vendor lock-in. Your databases, your environment variables, and your build pipelines are all tied to a proprietary ecosystem. If the platform goes down or changes its pricing, your business is at the mercy of their support team.</p>\n<p>There is a better way. <strong>Coolify</strong> combined with <strong>Docker</strong> gives you the exact same developer experience as high-end managed platforms but on your own infrastructure. You get the \"one-click\" deploy feel and a beautiful dashboard while keeping 100 percent control over your servers. It is the ultimate setup for a <a href=\"https://ansezz.com/\">modern web application</a> that needs to scale without breaking the bank.</p>\n<h2>Why the cloud is getting more expensive</h2>\n<p>In the early days of a startup, a $20-per-month plan feels reasonable. But as you grow, those costs don't just add up — they multiply. Many platforms now charge per seat. If you have a team of five engineers, you might be paying a hundred dollars a month before you even deploy a single line of code. Then come the usage fees. Bandwidth, image optimization, and serverless function execution costs are often opaque and difficult to predict.</p>\n<p>I have worked with clients who moved their entire stack from managed services to a self-hosted Coolify setup and saw their monthly infrastructure bill drop by eighty percent. We are talking about moving from $500 a month down to $50 a month while maintaining the same performance and reliability. When you own the server, you own the resources. There are no \"hidden\" charges for extra build minutes or database connections.</p>\n<p><img src=\"https://ansezz.com/blog/coolify-docker-saas-hosting/coolify-architecture.webp\" alt=\"Architecture diagram of Coolify control plane and app nodes\" /></p>\n<h2>What exactly is Coolify?</h2>\n<p>Think of Coolify as an open source, self-hosted version of Heroku or Vercel. It is a control plane that sits on your server and manages everything for you. It handles your deployments, your reverse proxies, your SSL certificates, and your databases. It turns a raw Linux VPS into a powerful hosting platform.</p>\n<p>One of the best parts about Coolify is that it is built on Docker. Every application you deploy is containerized. This means your environment is consistent across development, staging, and production. No more \"it works on my machine\" excuses. If it runs in a container on your laptop, it will run exactly the same way on your Coolify server.</p>\n<h2>The power of Docker containerization</h2>\n<p>Docker is the silent engine that makes this entire workflow possible. Instead of installing PHP, Node.js, or Python directly on your server, you package them into a container image. This approach has several key benefits for SaaS founders:</p>\n<ol>\n<li><strong>Isolation.</strong> Each app runs in its own sandbox. A memory leak in one app won't crash your entire server.</li>\n<li><strong>Portability.</strong> You can move your containers from Hetzner to DigitalOcean to AWS in minutes.</li>\n<li><strong>Version control.</strong> Your infrastructure is defined as code. You can version your Dockerfile just like your application code.</li>\n<li><strong>Scalability.</strong> Adding more instances of your app is as simple as spinning up another container.</li>\n</ol>\n<p>When I build <a href=\"https://ansezz.com/work/\">custom web solutions</a>, I always prioritize Docker. It ensures that the handoff to the client is seamless. They don't need to worry about the underlying server configuration. They just need a Docker-compatible environment.</p>\n<p><img src=\"https://ansezz.com/blog/coolify-docker-saas-hosting/docker-rack.webp\" alt=\"Pop-art rack of Docker containers stacked in a modular grid\" /></p>\n<h2>Setting up your control plane</h2>\n<p>Getting started is surprisingly simple. You need a fresh VPS with at least two gigabytes of RAM. I typically recommend Ubuntu for the operating system. Once you have your server, you run a single installation command provided by the Coolify documentation.</p>\n<pre><code># the only command you need to bootstrap Coolify\ncurl -fsSL https://cdn.coollabs.io/coolify/install.sh | sudo bash\n</code></pre>\n<p>This script takes care of installing Docker, setting up the Traefik reverse proxy, and launching the Coolify dashboard. Within minutes, you can log in to your own private hosting panel. From there, you can connect your GitHub or GitLab account. Coolify will listen for webhooks and automatically trigger a new build whenever you push code to your main branch.</p>\n<p>It feels magical. You get the same feedback loop as the big-name platforms. You see the build logs in real time. You get a preview URL for your feature branches. And you do it all on a $5 VPS.</p>\n<h2>Handling databases and state</h2>\n<p>One of the biggest pain points of self-hosting is managing databases. Nobody wants to manually configure Postgres clusters or worry about backing up Redis instances. Coolify solves this by offering \"one-click\" services.</p>\n<p>You can spin up a Postgres, MySQL, MongoDB, or Redis instance in seconds. Coolify automatically generates secure credentials and provides you with the connection strings. It also handles persistent volumes. This means that even if your container restarts or you update the image, your data stays safe.</p>\n<p>For a SaaS, I usually recommend a dedicated server for your databases if you have high traffic. Coolify makes this easy because it supports multi-server setups. You can have one server acting as your \"control plane\" and several other servers acting as \"worker nodes\" where your apps and databases actually live. (I cover that scaling pattern in more depth in <a href=\"https://ansezz.com/blog/scaling-with-coolify/\">Scaling with confidence: advanced Coolify deployment strategies</a>.)</p>\n<h2>Security and SSL by default</h2>\n<p>Security shouldn't be an afterthought. In the old days, setting up SSL with Let's Encrypt required cron jobs and manual certificates. With Coolify and Traefik, it is entirely automated.</p>\n<p>When you point a domain to your server and add it to your app configuration, Coolify automatically requests and installs an SSL certificate. It also handles the renewal process. Your SaaS is always served over HTTPS without you ever touching a terminal.</p>\n<p>Beyond encryption, Coolify helps you manage your environment variables securely. You don't need to hardcode secrets in your git repository. You can define them in the dashboard, and they are injected into your containers at runtime. This is a standard best practice that many developers skip when they are in a rush.</p>\n<p><img src=\"https://ansezz.com/blog/coolify-docker-saas-hosting/ssl-security.webp\" alt=\"Pop-art illustration of automatic SSL certificates and runtime secrets\" /></p>\n<h2>The senior engineer's workflow</h2>\n<p>If you want to do this the \"pro\" way, here is how I structure my deployments.</p>\n<h3>Use a Dockerfile</h3>\n<p>While Coolify can automatically detect many frameworks like Laravel or Node.js, I always recommend writing your own Dockerfile. It gives you total control over the build process. You can optimize your image size by using multi-stage builds. This makes your deployments faster and saves disk space.</p>\n<h3>Leverage Nixpacks</h3>\n<p>If you don't want to write a Dockerfile, Coolify supports Nixpacks. It is a tool developed by Railway that looks at your code and builds an optimized container image automatically. It is incredibly smart and works for almost every major framework.</p>\n<h3>Set up health checks</h3>\n<p>Never deploy without a health check. You want to make sure your app is actually responding before the reverse proxy starts sending traffic to it. Coolify allows you to define a health check endpoint. If the check fails, the old container stays running, and the new one isn't promoted. This is the foundation of zero-downtime deployments.</p>\n<h2>Is self-hosting right for you?</h2>\n<p>Self-hosting isn't for everyone. If you are a solo developer with zero interest in learning how a server works, then paying the \"convenience tax\" might be worth it. Your time is valuable, and if a managed platform saves you five hours of frustration a month, it might pay for itself.</p>\n<p>However, if you are building a real business, you need to understand your stack. Owning your infrastructure is about more than just saving money. It is about autonomy. It is about knowing that no matter what happens to a specific provider, you can move your business elsewhere in a heartbeat.</p>\n<p>At <a href=\"https://ansezz.com/about/\">Ansezz</a>, I focus on building robust, scalable systems that empower clients. Whether the work is a complex e-commerce engine on Shopify or a custom Laravel application, the goal is always the same: high performance and long-term stability.</p>\n<h2>Final takeaways for your deployment strategy</h2>\n<p>Hosting shouldn't be a source of stress. By moving to a Docker-based workflow with Coolify, you reclaim your time and your budget. You get the professional features of a top-tier PaaS without the enterprise price tag.</p>\n<p>Here is your checklist for a successful transition:</p>\n<ul>\n<li><strong>Start with a clean VPS</strong> and install Coolify using the official script.</li>\n<li><strong>Containerize your application</strong> using a Dockerfile for maximum control.</li>\n<li><strong>Use one-click services</strong> for your databases and enable automatic backups.</li>\n<li><strong>Set up your domain</strong> and let Coolify handle the SSL certificates.</li>\n<li><strong>Implement health checks</strong> to ensure zero-downtime updates.</li>\n</ul>\n<p>When you stop worrying about the \"how\" of deployment, you can spend more time on the \"what\" of your product. That is where the real value is created.</p>\n<p>What is the one thing stopping you from moving your SaaS to a self-hosted setup today? <a href=\"https://ansezz.com/contact/\">Tell me about it</a> — happy to share war stories.</p>\n",
      "date_published": "2026-04-19T00:00:00.000Z",
      "date_modified": "2026-04-19T00:00:00.000Z",
      "tags": [
        "devops",
        "coolify",
        "docker",
        "self-hosting",
        "devops",
        "saas",
        "deployment",
        "vps",
        "traefik"
      ],
      "image": "https://ansezz.com/blog/coolify-docker-saas-hosting/hero.webp"
    },
    {
      "id": "https://ansezz.com/blog/mcp-context-aware-agents/",
      "url": "https://ansezz.com/blog/mcp-context-aware-agents/",
      "title": "MCP and the future of tool-use: building context-aware agents",
      "summary": "The Model Context Protocol kills the era of brittle one-off integrations. Tools, resources, prompts, and the three primitives that let one server talk to any MCP-aware client — with a working TypeScript example you can ship today.",
      "content_html": "<p>The current state of AI agents is a mess of fragmented integrations. Every time I want to give an LLM access to a new data source or a specific tool I find myself writing custom glue code that breaks the moment an API version changes. It is a frustrating cycle of building brittle wrappers. We are effectively forcing highly intelligent models to peer through a keyhole when they should have a wide-open window into our data ecosystems.</p>\n<p>This fragmentation creates a massive technical debt for developers. You spend eighty percent of your time on plumbing and maybe twenty percent on the actual intelligence of the agent. Without a unified way to share context the model often hallucinates because it lacks the grounding of real-time data. It is stuck in a loop of \"I don't have access to that\" or worse \"I'll guess what that data looks like\" — which leads to unreliable outputs and a poor user experience.</p>\n<p>The <strong>Model Context Protocol</strong> (MCP) changes this dynamic entirely. It is an open standard that allows me to build context-aware agents that connect to any data source using a universal language. By standardizing how servers and clients communicate I can focus on building sophisticated logic rather than managing endless API endpoints. It is the missing link in the agentic workflow.</p>\n<h2>Why MCP matters for developers</h2>\n<p><img src=\"https://ansezz.com/blog/mcp-context-aware-agents/architecture-diagram.webp\" alt=\"Architecture diagram showing MCP servers bridging an AI client to many data sources\" /></p>\n<p>I have spent years building custom web applications and one of the biggest hurdles has always been data silos. When I work on <a href=\"https://ansezz.com/work/\">complex technical challenges</a> the goal is usually to make data actionable. Traditional tool-use requires the developer to define every schema and every function call manually for the model. MCP flips this script.</p>\n<p>MCP acts as a bridge. It defines a clear boundary between the AI application (the client) and the data sources (the servers). This separation of concerns means I can swap out the underlying model without rebuilding the entire data integration layer. If I move from Claude to another model that supports MCP, the tools and resources remain the same.</p>\n<p>It also solves the \"context window\" problem. Instead of stuffing a massive document into the prompt I can expose it as an MCP resource. The model only fetches what it needs when it needs it. This is significantly more efficient and cost-effective. It allows me to build agents that are aware of their environment without being overwhelmed by it.</p>\n<h2>The three pillars: tools, resources, and prompts</h2>\n<p><img src=\"https://ansezz.com/blog/mcp-context-aware-agents/context-visual.webp\" alt=\"Visual breakdown of MCP primitives — tools, resources, prompts\" /></p>\n<p>To understand how to build with MCP I look at its three core primitives. These are the building blocks for any context-aware system.</p>\n<ul>\n<li><strong>Tools</strong> are model-controlled actions. When I give an agent a tool I am giving it the ability to change the world. This could be writing a file to a disk or making a POST request to a Shopify API. The model decides when to call the tool based on the user's intent.</li>\n<li><strong>Resources</strong> are application-controlled data. Think of these as read-only files or database entries that the agent can inspect. Resources provide the necessary grounding. If I am building a support agent the documentation for the product would be a resource. The agent can search and read it to provide accurate answers.</li>\n<li><strong>Prompts</strong> are user-controlled templates. They help guide the interaction. By using MCP prompts I can standardize how users interact with the agent across different platforms. It ensures consistency in how the model interprets tasks.</li>\n</ul>\n<h2>Building your first MCP server</h2>\n<p>I prefer using TypeScript for building MCP servers because of the robust SDK provided by Anthropic. However the protocol itself is language-agnostic. Here is a simplified look at how I structure a basic server that exposes a weather tool.</p>\n<pre><code>import { Server } from \"@modelcontextprotocol/sdk/server/index.js\";\nimport { StdioServerTransport } from \"@modelcontextprotocol/sdk/server/stdio.js\";\nimport {\n  CallToolRequestSchema,\n  ListToolsRequestSchema,\n} from \"@modelcontextprotocol/sdk/types.js\";\n\nconst server = new Server(\n  {\n    name: \"weather-server\",\n    version: \"1.0.0\",\n  },\n  {\n    capabilities: {\n      tools: {},\n    },\n  },\n);\n\nserver.setRequestHandler(ListToolsRequestSchema, async () =&gt; {\n  return {\n    tools: [\n      {\n        name: \"get_weather\",\n        description: \"get the current weather for a location\",\n        inputSchema: {\n          type: \"object\",\n          properties: {\n            location: { type: \"string\" },\n          },\n          required: [\"location\"],\n        },\n      },\n    ],\n  };\n});\n\nserver.setRequestHandler(CallToolRequestSchema, async (request) =&gt; {\n  if (request.params.name === \"get_weather\") {\n    const location = request.params.arguments?.location;\n    // logic to fetch weather from an api goes here\n    return {\n      content: [{ type: \"text\", text: `it is sunny in ${location}` }],\n    };\n  }\n  throw new Error(\"tool not found\");\n});\n\nconst transport = new StdioServerTransport();\nawait server.connect(transport);\n</code></pre>\n<p>This snippet illustrates the simplicity of the protocol. I define the tool and how to handle the call. The MCP client handles the rest. This modular approach is exactly what I look for when managing <a href=\"https://ansezz.com/work/\">cloud infrastructure</a> or complex backend systems. It is clean and scalable.</p>\n<h2>Security and the MCP ecosystem</h2>\n<p><img src=\"https://ansezz.com/blog/mcp-context-aware-agents/dashboard-mockup.webp\" alt=\"Dashboard mockup of an MCP server admin showing fine-grained tool permissions\" /></p>\n<p>Security is a major concern when giving an AI agent access to your data. I have seen many implementations where API keys are hardcoded or permissions are too broad. MCP addresses this by using a client-server architecture where the server controls exactly what is exposed.</p>\n<p>The server acts as a gatekeeper. I can implement fine-grained access control at the server level. For example an MCP server connecting to a database can be restricted to only specific tables or read-only queries. This level of control is essential for enterprise-grade applications.</p>\n<p>The ecosystem is growing rapidly. We are seeing early adoption from major players in the dev tools space. Tools like Zed, Cursor, and Claude Code are already integrating MCP to help developers write better code by giving their AI assistants better context. This trend will only accelerate as more developers realize the power of standardized context.</p>\n<h2>Practical steps for getting started</h2>\n<p>If you are a developer looking to dive into MCP I recommend following these steps.</p>\n<ol>\n<li><strong>Explore the existing MCP servers</strong> on GitHub. There are already servers for file system access and SQLite databases. See how they are structured.</li>\n<li><strong>Pick a simple data source you use every day.</strong> It could be your Obsidian notes or a local directory of markdown files. Build a basic server to expose these as resources.</li>\n<li><strong>Use a client like Claude Desktop</strong> to test your server. See how the model interacts with your data. Adjust the tool descriptions to make them more intuitive for the AI.</li>\n<li><strong>Compose multiple MCP servers</strong> once you are comfortable. Imagine an agent that can read your calendar and then write a draft email based on your upcoming meetings.</li>\n</ol>\n<p>MCP is more than just a new protocol. It is a fundamental shift in how we build AI applications. It moves us away from the era of \"black box\" agents and toward a world of transparent and context-aware assistants. I am excited to see how this technology evolves and how it will transform our development workflows.</p>\n<p><strong>How are you planning to use MCP in your next project?</strong> <a href=\"https://ansezz.com/contact/\">Drop me a line</a> — happy to swap notes on real-world MCP server design.</p>\n",
      "date_published": "2026-04-05T00:00:00.000Z",
      "date_modified": "2026-04-05T00:00:00.000Z",
      "tags": [
        "ai",
        "mcp",
        "anthropic",
        "claude",
        "agentic-ai",
        "typescript",
        "tool-use",
        "context"
      ],
      "image": "https://ansezz.com/blog/mcp-context-aware-agents/hero.webp"
    },
    {
      "id": "https://ansezz.com/blog/agentic-workflows-vibe-coding/",
      "url": "https://ansezz.com/blog/agentic-workflows-vibe-coding/",
      "title": "Vibe coding and the architectural shift to agentic workflows",
      "summary": "MCP, agentic loops, and intent-based engineering. How vibe coding becomes a real architecture pattern when AI stops being a chat sidebar and starts owning stateful loops against your tools. The practical Laravel + MCP stack I run today.",
      "content_html": "<p>I've spent the last decade building Laravel applications, managing Docker clusters, and fine-tuning Shopify stores. For most of that time, \"coding\" meant one thing: translating a business requirement into a specific syntax that a machine could execute. It was a manual, linear process of writing line by line, debugging stack traces, and managing state.</p>\n<p>But recently, the ground has shifted. We're moving away from the era of \"writing code\" and into the era of \"orchestrating intent.\"</p>\n<p>This transition — often playfully called <strong>vibe coding</strong> — is more than just a meme. It represents a fundamental architectural shift in how we build software, moving from sequential instruction to agentic loops powered by protocols like <strong>MCP</strong> (Model Context Protocol).</p>\n<h2>The friction of the manual syntax</h2>\n<p>The traditional development lifecycle is riddled with invisible friction. You have an idea (the \"vibe\"), you break it down into tasks, and then you spend 80% of your time fighting with syntax, configuration, and boilerplate.</p>\n<p>In a standard <strong>Laravel</strong> environment, even a simple feature — say, an automated reporting tool — requires you to set up routes, controllers, service classes, and database migrations. You are the compiler. You are the architect. You are the labor.</p>\n<p>The problem is that our human cognitive load is being consumed by the \"how\" rather than the \"what.\" We get stuck in the weeds of <strong>PHP</strong> version compatibility or <strong>Docker</strong> networking issues, losing sight of the actual user value. This manual micromanagement doesn't scale as fast as the demands of modern business.</p>\n<h2>The agitation of the \"black box\" assistant</h2>\n<p>When AI first entered the scene with basic autocomplete, it felt like a shortcut. But it wasn't a solution. We ended up with what I call \"the Copilot paradox\": the AI suggests code, but you still have to copy-paste it, test it, find the error, and feed it back to the AI.</p>\n<p>It's a broken feedback loop. The AI is a \"black box\" that doesn't actually know your system. It doesn't know your database schema, your <strong>MCP</strong> servers, or your deployment status on <strong>Coolify</strong>. You are still the manual bridge between the AI's logic and your local environment.</p>\n<p>This creates a new kind of fatigue. Instead of writing code, you're now a high-speed code reviewer, constantly context-switching between your editor and a chat interface. This isn't \"vibe coding\" — it's just accelerated manual labor.</p>\n<p><img src=\"https://ansezz.com/blog/agentic-workflows-vibe-coding/architecture-diagram.webp\" alt=\"Architecture diagram showing the broken loop between developer, AI, and tools\" /></p>\n<h2>The solution: agentic workflows and MCP</h2>\n<p>True <strong>vibe coding</strong> isn't about being lazy; it's about shifting your role to that of a high-level system architect. This becomes possible through <strong>agentic workflows</strong> — systems that don't just \"complete text\" but \"execute tasks in loops.\"</p>\n<p>The breakthrough here is the <strong>Model Context Protocol (MCP)</strong> by Anthropic. MCP acts as the \"USB port\" for AI. Instead of you manually giving the AI context, the AI uses an MCP client to talk directly to your tools — your <strong>PostgreSQL</strong> database, your <strong>Slack</strong> channels, or your <strong>GitHub</strong> repositories.</p>\n<h3>The shift from chains to loops</h3>\n<p>In a traditional chain, you give a prompt and get a result. In an agentic loop, the architecture looks like this:</p>\n<ol>\n<li><strong>Intent.</strong> You describe the outcome (\"build a Laravel dashboard for my Shopify sales\").</li>\n<li><strong>Reasoning.</strong> The AI (like <strong>Claude</strong>) determines it needs to see the schema.</li>\n<li><strong>Action.</strong> It uses an <strong>MCP</strong> tool to query the database.</li>\n<li><strong>Observation.</strong> It sees a missing table and decides to create a migration.</li>\n<li><strong>Correction.</strong> If the migration fails, it reads the error and fixes it itself.</li>\n</ol>\n<p>I call this \"intent-based engineering.\" You aren't writing the migration — you are approving the architectural decision.</p>\n<p><img src=\"https://ansezz.com/blog/agentic-workflows-vibe-coding/agentic-loop.webp\" alt=\"Bento grid visual of an agentic loop with intent, reasoning, action, observation, correction\" /></p>\n<h2>Implementing the agentic stack</h2>\n<p>As an engineer who values quality, I don't just let the \"vibe\" take over without guardrails. Here is how I'm currently structuring my agentic stack using <strong>Laravel</strong> and <strong>AI</strong>.</p>\n<h3>1. Defined MCP servers</h3>\n<p>I build small, dedicated <strong>MCP</strong> servers that expose only the necessary tools to the AI. This keeps the context window clean and the security tight.</p>\n<pre><code>// Conceptual MCP tool definition in a PHP environment\npublic function defineTools(): array\n{\n    return [\n        'get_database_schema' =&gt; [\n            'description' =&gt; 'Retrieves the structure of the Laravel application tables.',\n            'parameters' =&gt; [],\n        ],\n        'run_artisan_command' =&gt; [\n            'description' =&gt; 'Executes an artisan command safely.',\n            'parameters' =&gt; ['command' =&gt; 'string'],\n        ],\n    ];\n}\n</code></pre>\n<h3>2. Stateful loops</h3>\n<p>Instead of one-off chats, I use tools like <strong>Cursor</strong>, <strong>Claude Code</strong>, or <strong>Windsurf</strong> that maintain a stateful connection to my local file system. This allows the agent to \"see\" the impact of its changes in real-time, just like a human developer would.</p>\n<h3>3. The human-in-the-loop (HITL)</h3>\n<p>The most important part of the architecture is the review gate. Even with agentic loops, the human architect must sign off on the \"plan\" before the \"action\" phase. This ensures the <strong>PHP</strong> logic follows clean architecture principles rather than just \"making it work.\"</p>\n<h2>The takeaway for the modern founder</h2>\n<p>If you're a founder or a CTO, the takeaway is simple: stop hiring for syntax and start hiring for system design. The technical barrier is collapsing, but the architectural stakes are higher than ever.</p>\n<ul>\n<li><strong>Embrace the vibe.</strong> Focus on the intent and the user experience.</li>\n<li><strong>Invest in infrastructure.</strong> Build the <strong>MCP</strong> connections and the data pipelines that allow AI to be effective.</li>\n<li><strong>Think in loops.</strong> Design your internal processes so that AI can iterate autonomously, reducing your bottleneck role.</li>\n</ul>\n<p>At <a href=\"https://ansezz.com/\">Ansezz</a>, I'm not just building apps anymore — I'm building agent-ready ecosystems. Whether it's a complex <strong>Shopify</strong> integration or a custom <strong>SaaS</strong>, I ensure the architecture is ready for the agentic future.</p>\n<p>The code might be generated, but the vision is entirely yours.</p>\n<p><strong>Are you ready to stop writing code and start orchestrating your intent?</strong> <a href=\"https://ansezz.com/contact/\">Get in touch</a> — let's design your agent stack together.</p>\n",
      "date_published": "2026-03-22T00:00:00.000Z",
      "date_modified": "2026-03-22T00:00:00.000Z",
      "tags": [
        "architecture",
        "vibe-coding",
        "agentic-ai",
        "mcp",
        "anthropic",
        "claude",
        "laravel",
        "architecture"
      ],
      "image": "https://ansezz.com/blog/agentic-workflows-vibe-coding/hero.webp"
    },
    {
      "id": "https://ansezz.com/blog/why-your-rag-is-failing/",
      "url": "https://ansezz.com/blog/why-your-rag-is-failing/",
      "title": "Why your RAG implementation is failing in production (and how to fix it)",
      "summary": "Vector-only retrieval is the silent killer of production RAG. Hybrid search with BM25, reciprocal rank fusion, smarter chunking, re-rankers, and an evaluation harness — the production checklist that turns a flaky demo into a reliable system.",
      "content_html": "<p>You built a RAG (Retrieval-Augmented Generation) demo. On a local machine, with a handful of PDF files, it looked convincing. The answers felt coherent. The system appeared capable.</p>\n<p>Then you pushed it to production.</p>\n<p>That is usually where the illusion breaks.</p>\n<p>Users start reporting that the LLM is \"hallucinating\" when the real issue is retrieval. Obvious answers go missing even though they exist in the documentation. Irrelevant chunks surface because they are semantically adjacent, not actually useful.</p>\n<p>If your RAG system feels unreliable in production, you are not dealing with a model problem first. You are dealing with a retrieval design problem. Most production RAG systems fail because they rely too heavily on vector search and confuse a strong demo with a robust system.</p>\n<p>I've spent a lot of time building custom AI solutions at <a href=\"https://ansezz.com/\">Ansezz</a>, and one pattern keeps showing up: <strong>a demo proves possibility, but production demands precision.</strong></p>\n<h2>The \"vector noise\" trap</h2>\n<p>The philosophical shift from demo RAG to production RAG is simple: in a demo, semantic resemblance often feels good enough. In production, \"good enough\" is where failures begin.</p>\n<p>Embeddings are useful. They let us map text into vectors and retrieve by meaning rather than exact wording. That is powerful. But semantic similarity is not the same thing as retrieval accuracy.</p>\n<p><strong>The problem.</strong> Vector search is strong at finding related concepts, but weak at handling specificity.</p>\n<p>If a user searches for <em>\"Project-X-99 deployment logs,\"</em> a vector search might return documents about \"Project-A deployment\" or \"logging best practices\" because they are semantically close. It can miss the exact identifier \"X-99\" because that string carries little semantic weight in a high-dimensional space.</p>\n<p><strong>The agitation.</strong> Once retrieval drifts, the LLM inherits the drift. The model cannot reason its way out of missing or irrelevant context. You end up paying for tokens that produce confident but unhelpful answers, and users lose trust for a reason that often sits one layer below the model itself.</p>\n<h2>The solution: hybrid search (vector + BM25)</h2>\n<p>The move from demo RAG to production RAG usually starts with one realization: meaning alone is not enough. You need semantic retrieval and lexical precision working together. This is <strong>hybrid search</strong>.</p>\n<p><img src=\"https://ansezz.com/blog/why-your-rag-is-failing/hybrid-search-workflow.webp\" alt=\"Hybrid search workflow combining vector and BM25 retrieval\" /></p>\n<h3>What is BM25?</h3>\n<p>BM25 (Best Matching 25) is the standard lexical ranking method behind classic search systems. It does not try to infer meaning. It rewards exact terms based on how important they are within a document and across the collection.</p>\n<h3>Why you need both</h3>\n<ul>\n<li><strong>Vector search</strong> handles synonyms, multi-lingual queries, and conceptual matching.</li>\n<li><strong>BM25 search</strong> handles exact matches, IDs, SKUs, product codes, and technical jargon.</li>\n</ul>\n<p>Production systems need both because user questions are rarely pure meaning or pure keyword. They are usually a mix of the two.</p>\n<p><img src=\"https://ansezz.com/blog/why-your-rag-is-failing/vector-vs-keyword.webp\" alt=\"Side-by-side comparison of vector search vs keyword search results\" /></p>\n<h2>Technical insight: reciprocal rank fusion (RRF)</h2>\n<p>When you run two different retrieval strategies, you also create a new design problem: how should they be combined?</p>\n<p>A practical answer is <strong>Reciprocal Rank Fusion (RRF)</strong>. It is simple, reliable, and does not require you to pretend that scores from different retrieval systems are directly comparable.</p>\n<p><img src=\"https://ansezz.com/blog/why-your-rag-is-failing/rrf-code-logic.webp\" alt=\"Annotated code snippet for reciprocal rank fusion logic\" /></p>\n<p><strong>The logic breakdown:</strong></p>\n<ol>\n<li><strong>Assign a score.</strong> For every document returned by either search method, calculate a new rank-based score.</li>\n<li><strong>The formula.</strong> <code>score = 1 / (rank + k)</code>. The <code>k</code> value (often 60) prevents lower-ranked items from contributing too aggressively.</li>\n<li><strong>Sum it up.</strong> If a document appears in both the vector and BM25 result sets, its scores are added together.</li>\n<li><strong>Sort.</strong> The documents with the highest combined scores are passed to the LLM.</li>\n</ol>\n<p>Here's the minimal PHP version I drop into a Laravel service:</p>\n<pre><code>function reciprocalRankFusion(array $resultSets, int $k = 60): array\n{\n    $scores = [];\n\n    foreach ($resultSets as $results) {\n        foreach ($results as $rank =&gt; $docId) {\n            $scores[$docId] = ($scores[$docId] ?? 0.0) + 1 / ($rank + 1 + $k);\n        }\n    }\n\n    arsort($scores);\n\n    return $scores;\n}\n</code></pre>\n<p>This gives you a cleaner retrieval layer. If a document is semantically relevant <em>and</em> lexically precise, it moves toward the top for a reason.</p>\n<h2>The \"second pass\": using re-rankers</h2>\n<p>Hybrid search is a strong retrieval foundation, but production RAG usually needs one more layer of judgment.</p>\n<p>If you want more precise results, add a <strong>re-ranker</strong>.</p>\n<p>A re-ranker such as Cohere Rerank or BGE-Reranker is a cross-encoder model that evaluates the query and the document together. That matters because relevance is relational. It is not just about what a document contains. It is about whether that document answers <em>this</em> question.</p>\n<ul>\n<li><strong>Step 1.</strong> Retrieve the top 50 results using hybrid search.</li>\n<li><strong>Step 2.</strong> Pass those 50 results through a re-ranker.</li>\n<li><strong>Step 3.</strong> Send only the top 5 re-ranked results to your LLM.</li>\n</ul>\n<p>This reduces context stuffing and improves the quality of what reaches the model. In practice, it is one of the clearest differences between a RAG demo and a production RAG system that behaves consistently.</p>\n<h2>Your production RAG checklist</h2>\n<h3>The problem</h3>\n<p>A RAG system can feel impressive in a demo and still be structurally weak in production.</p>\n<h3>The agitation</h3>\n<p>Once real users, messy documents, and ambiguous queries enter the picture, weak retrieval turns the LLM into expensive guesswork. That is when confidence and correctness start drifting apart.</p>\n<h3>The solution</h3>\n<p>To move from demo RAG to production RAG, I focus on a few non-negotiables:</p>\n<ul>\n<li><strong>Stop relying on vector-only search.</strong> Add a BM25 layer.</li>\n<li><strong>Implement RRF.</strong> Fuse lexical and semantic retrieval without overcomplicating score calibration.</li>\n<li><strong>Tune chunking deliberately.</strong> If chunks are too small, they lose context. If they are too large, they add noise. I usually find 512–1024 tokens with a 10–15% overlap works well for technical documentation.</li>\n<li><strong>Add a re-ranker.</strong> Refine the final candidate set before anything reaches the LLM.</li>\n<li><strong>Evaluate with RAGAS.</strong> Measure faithfulness and relevance instead of trusting intuition.</li>\n</ul>\n<p>Building AI is easy. Building <em>reliable</em> AI is hard. It requires a deeper understanding of retrieval, ranking, and context design, not just the ability to connect an API.</p>\n<p>If you are looking to build a high-performance SaaS or need help modernizing your digital presence with AI that actually works, check out what I do at <a href=\"https://ansezz.com/work/\">Ansezz</a>. I specialize in solving these exact types of technical problems.</p>\n<p><strong>Where does your own system still behave like a demo when it should be behaving like production?</strong> <a href=\"https://ansezz.com/contact/\">Get in touch</a> — I read every war story.</p>\n",
      "date_published": "2026-03-08T00:00:00.000Z",
      "date_modified": "2026-03-08T00:00:00.000Z",
      "tags": [
        "ai",
        "rag",
        "ai",
        "vector-search",
        "bm25",
        "hybrid-search",
        "re-ranker",
        "production"
      ],
      "image": "https://ansezz.com/blog/why-your-rag-is-failing/hero.webp"
    },
    {
      "id": "https://ansezz.com/blog/monolith-to-microservices/",
      "url": "https://ansezz.com/blog/monolith-to-microservices/",
      "title": "From monolith to micro-services: a senior dev's guide to pragmatic scaling",
      "summary": "Skip the big-bang rewrite. The strangler fig pattern, anti-corruption layers, Docker-first migration, and GKE/Coolify operations — how I peel services off a Laravel monolith one endpoint at a time without breaking revenue.",
      "content_html": "<p>Your monolith is a ticking time bomb and every feature you add makes the explosion more inevitable.</p>\n<p>I have seen it happen a dozen times. A startup begins with a clean Laravel or Rails app. It is fast. It is easy. It is productive. Then the team grows. The code base swells. Suddenly, a simple change to the checkout logic breaks the authentication system. Deployments that used to take five minutes now take forty. You are not scaling your business anymore — you are managing technical debt.</p>\n<p>This is the point where most developers start dreaming of micro-services. They imagine a world where every service is isolated and deployments are instant. But the reality is often a nightmare. If you do it wrong, you end up with a distributed monolith. You get all the complexity of networking with none of the benefits of isolation.</p>\n<p>The solution is not a \"big bang\" rewrite. It is pragmatic scaling. I use the <strong>strangler fig pattern</strong> to move from monoliths to micro-services without losing my mind or my job.</p>\n<h2>The problem with the big bang</h2>\n<p><img src=\"https://ansezz.com/blog/monolith-to-microservices/strangler-fig.webp\" alt=\"Strangler fig pattern diagram — new services wrap the legacy monolith\" /></p>\n<p>When a monolith becomes too heavy, the immediate reaction is to want to scrap it. I have seen companies spend two years on a rewrite only to ship a product that has half the features of the original. The business dies while the engineers play with new toys.</p>\n<p>The monolith is not your enemy. It is just a phase. The real problem is coupling. When every part of your app knows too much about every other part, you cannot move. You are stuck in a web of dependencies. If you try to jump straight into micro-services, you will likely just port those dependencies into a network layer. Now, instead of a function call failing, you have a 500 error across a network socket.</p>\n<p>I prefer a slower, more deliberate approach. I focus on high-value extractions. I look for the parts of the app that hurt the most. Is the image processing service slowing down the web server? Is the reporting engine locking up the database? Those are your first candidates for micro-services.</p>\n<h2>The strangler fig pattern in practice</h2>\n<p>I named this approach after a tree that grows around another tree. It starts as a small vine and eventually replaces the host entirely. In software, this means building new features as services while the old monolith remains.</p>\n<p>The process starts with an API gateway or a load balancer. I use <a href=\"https://www.nginx.com/\">Nginx</a> or <a href=\"https://cloud.google.com/armor\">Cloud Armor</a> on <a href=\"https://cloud.google.com/\">Google Cloud</a> to route traffic. If a request comes for <code>/api/v1/orders</code>, it goes to the new service. Everything else goes to the old monolith.</p>\n<p>This allows me to test the new service in production with real traffic while the monolith acts as a safety net. If the new service fails, I just flip the routing back. I do not have to migrate everything at once. I can migrate one endpoint at a time.</p>\n<h2>Containerization with Docker</h2>\n<p><img src=\"https://ansezz.com/blog/monolith-to-microservices/docker-snippet.webp\" alt=\"Annotated Dockerfile snippet for a Laravel micro-service\" /></p>\n<p>You cannot do micro-services without <a href=\"https://www.docker.com/\">Docker</a>. I treat every service as a black box. The monolith might be running on an old version of PHP, while the new service is a lean Go binary or a modern Laravel instance. Docker makes this possible.</p>\n<p>I start by containerizing the monolith. Even if it stays as a monolith for another year, putting it in a container forces me to define its environment. It makes the infrastructure reproducible.</p>\n<pre><code># a simplified example of a service container\nFROM php:8.3-fpm\n\nWORKDIR /app\nCOPY . /app\n\nRUN apt-get update &amp;&amp; apt-get install -y \\\n    libpq-dev \\\n    &amp;&amp; docker-php-ext-install pdo_pgsql\n\nEXPOSE 9000\nCMD [\"php-fpm\"]\n</code></pre>\n<p>Once the monolith is containerized, I can deploy it to a platform like <a href=\"https://cloud.google.com/kubernetes-engine\">Google Kubernetes Engine (GKE)</a>. This is where the real power of micro-services comes in. I can scale the order service to fifty instances during a sale while keeping the blog service at two.</p>\n<h2>Communication and the anti-corruption layer</h2>\n<p><img src=\"https://ansezz.com/blog/monolith-to-microservices/routing-diagram.webp\" alt=\"Routing diagram showing API gateway dispatching between monolith and new services\" /></p>\n<p>The hardest part of micro-services is not the code. It is the data. Your monolith has a single database. Your micro-services should each have their own. But how do they talk?</p>\n<p>I use an <strong>anti-corruption layer (ACL)</strong>. When I extract a service, I do not let it reach back into the monolith's database. That would be cheating. Instead, I create an interface. If the new service needs user data, it asks the monolith via a private API or a message queue like <a href=\"https://cloud.google.com/pubsub\">Google Pub/Sub</a>.</p>\n<p>This keeps the new service clean. It does not care about the messy database schema of the legacy app. It only cares about the data it receives through the ACL. Eventually, when the user logic is also migrated, I just update the ACL to point to the new user service.</p>\n<h2>Cloud infrastructure and DevOps</h2>\n<p><img src=\"https://ansezz.com/blog/monolith-to-microservices/cloud-infrastructure.webp\" alt=\"Cloud infrastructure overview — GKE, Pub/Sub, managed databases\" /></p>\n<p>Scaling a monolith usually means buying a bigger server. Scaling micro-services means managing a fleet. I rely heavily on cloud-native tools to manage the complexity.</p>\n<p>I use <a href=\"https://www.terraform.io/\">Terraform</a> to manage my infrastructure as code. This ensures that my staging and production environments are identical. If I need a new database for a service, I define it in code. I do not click around in a dashboard.</p>\n<p>On the DevOps side, I use tools like <a href=\"https://github.com/features/actions\">GitHub Actions</a> or <a href=\"https://coolify.io/\">Coolify</a> for deployments. Every service has its own pipeline. If I update the checkout service, I only deploy the checkout service. I do not have to worry about the rest of the system.</p>\n<h2>The hidden costs of micro-services</h2>\n<p>I would be lying if I said this was all sunshine and rainbows. Micro-services come with a \"complexity tax.\" You now have to deal with distributed logging, service discovery, and eventual consistency.</p>\n<p>I tell my clients that they should only move to micro-services when the pain of the monolith is greater than the cost of the complexity tax. If your team is three people and your app is simple, stay in the monolith. You will move faster.</p>\n<p>But if you are hitting walls every day and your developers are afraid to touch the code, it is time to start strangling.</p>\n<h2>Pragmatic takeaways for your next move</h2>\n<ul>\n<li><strong>Start with an API gateway</strong> to handle routing.</li>\n<li><strong>Containerize your monolith first</strong> to normalize the environment.</li>\n<li><strong>Use the strangler fig pattern</strong> to migrate one domain at a time.</li>\n<li><strong>Build an anti-corruption layer</strong> to keep new services clean.</li>\n<li><strong>Invest in infrastructure as code</strong> early on.</li>\n<li><strong>Only split when the monolith starts to hurt</strong> your productivity.</li>\n</ul>\n<p>Migration is a marathon, not a sprint. I have spent months on a single extraction just to make sure it was perfect. The goal is not to have micro-services. The goal is to have a system that can grow with your business.</p>\n<p>Have you ever tried a \"big bang\" rewrite only to regret it six months later? <a href=\"https://ansezz.com/contact/\">Tell me about it</a> — I collect these stories for a reason.</p>\n",
      "date_published": "2026-02-22T00:00:00.000Z",
      "date_modified": "2026-02-22T00:00:00.000Z",
      "tags": [
        "architecture",
        "monolith",
        "micro-services",
        "scaling",
        "strangler-fig",
        "docker",
        "kubernetes",
        "devops",
        "laravel"
      ],
      "image": "https://ansezz.com/blog/monolith-to-microservices/hero.webp"
    },
    {
      "id": "https://ansezz.com/blog/laravel-multi-tenancy/",
      "url": "https://ansezz.com/blog/laravel-multi-tenancy/",
      "title": "Laravel multi-tenancy: how I built a scalable SaaS architecture",
      "summary": "Single DB vs multi-DB, global scopes that stop data leaks, stancl/tenancy in production, isolated storage, automated migrations, and the Docker + Google Cloud setup I run for high-trust SaaS clients.",
      "content_html": "<p>I still remember the panic of my first SaaS launch. I was watching the logs as the third customer signed up. Suddenly I realized I had no idea if customer A could see customer B's data. That realization is a rite of passage for every developer.</p>\n<p>Building a software-as-a-service (SaaS) platform is a massive technical challenge. The biggest hurdle is almost always data isolation. You need to ensure that every tenant feels like they have the whole application to themselves. If you get this wrong early on, it will haunt you forever.</p>\n<p>I have spent years refining a scalable architecture using Laravel. It is the most robust way to handle multiple customers on a single codebase. Here is how I approach the architecture to ensure security and scalability.</p>\n<h2>Why simple database structures fail SaaS</h2>\n<p>Most developers start with a single database. They add a <code>user_id</code> or <code>team_id</code> to every table. It works fine for the first ten users. Then the complexity grows.</p>\n<p>You start adding more relationships. You forget to add a <code>where</code> clause in one obscure controller. Suddenly one customer is seeing another customer's private invoices. This is a catastrophic failure. It kills trust and can end your business overnight.</p>\n<p>Performance also becomes a nightmare. As your database grows to millions of rows the queries get slower. Indexing helps, but it does not solve the fundamental problem of data bloat. You need a strategy that isolates data while keeping your infrastructure manageable.</p>\n<p><img src=\"https://ansezz.com/blog/laravel-multi-tenancy/architecture.webp\" alt=\"Tenant isolation architecture diagram\" /></p>\n<h2>Choosing your isolation strategy</h2>\n<p>When I build custom web solutions I always start by choosing between two main paths. You either go with a single database or a multi-database setup.</p>\n<p>The <strong>single database</strong> approach uses a shared schema. Every row has a <code>tenant_id</code>. It is cheap to run and easy to update. I recommend this for startups where costs need to stay low. You can manage thousands of small tenants on a single server this way.</p>\n<p>The <strong>multi-database</strong> approach is the gold standard for enterprise. Every customer gets their own database. This offers the best security and makes backups easy. If one database crashes the others stay online. I use this for high-value clients who have strict compliance needs.</p>\n<p>I often use the <code>stancl/tenancy</code> package for Laravel. It is the most flexible tool in the ecosystem. It allows you to switch between these strategies as your business grows. You can find out more about how I handle these complex technical challenges on the <a href=\"https://ansezz.com/work/\">Ansezz work page</a>.</p>\n<h2>Building with global scopes</h2>\n<p>The secret to sleeping well at night is automation. I never rely on my memory to filter data. Instead I use Laravel global scopes.</p>\n<p>A global scope automatically adds a filter to every query on a model. It ensures that <code>Customer::all()</code> only returns customers for the current tenant. It happens behind the scenes so you cannot forget it.</p>\n<p><img src=\"https://ansezz.com/blog/laravel-multi-tenancy/code-snippet.webp\" alt=\"Annotated code snippet showing a BelongsToTenant trait\" /></p>\n<p>I create a <code>BelongsToTenant</code> trait. I apply this trait to every model that needs isolation. It handles the filtering and automatically sets the <code>tenant_id</code> when a new record is created. It is a simple solution that prevents 99 percent of data leaks.</p>\n<p>Here's what that trait looks like in practice:</p>\n<pre><code>&lt;?php\n\nnamespace App\\Concerns;\n\nuse App\\Models\\Tenant;\nuse App\\Scopes\\TenantScope;\nuse Illuminate\\Database\\Eloquent\\Relations\\BelongsTo;\n\ntrait BelongsToTenant\n{\n    protected static function bootBelongsToTenant(): void\n    {\n        static::addGlobalScope(new TenantScope());\n\n        static::creating(function ($model): void {\n            if (! $model-&gt;tenant_id &amp;&amp; app()-&gt;bound('tenant')) {\n                $model-&gt;tenant_id = app('tenant')-&gt;id;\n            }\n        });\n    }\n\n    public function tenant(): BelongsTo\n    {\n        return $this-&gt;belongsTo(Tenant::class);\n    }\n}\n</code></pre>\n<p>You also need to isolate your cache and your file storage. If two tenants upload a file named <code>logo.png</code> they should not overwrite each other. I configure Laravel to use tenant-specific prefixes for all storage paths. This creates a true \"sandbox\" environment for every user.</p>\n<h2>Scaling on the cloud</h2>\n<p>Your architecture is only as good as the infrastructure it runs on. I usually deploy my Laravel apps using Docker and cloud providers like Google Cloud or AWS.</p>\n<p>Containerization is key. It allows me to scale the application horizontally. When traffic spikes I can spin up more instances of the web server. Because the multi-tenancy logic is handled at the application level, the infrastructure stays clean.</p>\n<p><img src=\"https://ansezz.com/blog/laravel-multi-tenancy/infrastructure.webp\" alt=\"Cloud infrastructure diagram for a multi-tenant Laravel deployment\" /></p>\n<p>I use managed database services like Google Cloud SQL. They handle the heavy lifting of backups and scaling. For multi-database setups I use automated scripts to provision new databases whenever a customer signs up. This \"infrastructure as code\" approach ensures that scaling is a button click away.</p>\n<p>If you are looking to modernize your digital presence with a scalable cloud setup you can see my full range of services at <a href=\"https://ansezz.com/\">ansezz.com</a>.</p>\n<h2>The tenant switcher experience</h2>\n<p>The final piece of the puzzle is the user interface. Customers need a seamless way to move between different accounts if they own multiple businesses.</p>\n<p>I build clean dashboards using Vue. The frontend communicates with the Laravel backend via GraphQL APIs. This setup allows for a very fast and reactive user experience. The tenant switcher is always accessible and shows the user exactly which context they are working in.</p>\n<p><img src=\"https://ansezz.com/blog/laravel-multi-tenancy/tenant-switcher.webp\" alt=\"Tenant switcher UI mockup in a Vue dashboard\" /></p>\n<p>I focus on making the transition between tenants feel instant. I cache the tenant configuration in the frontend to avoid unnecessary API calls. It is these small details that separate a basic app from a high-quality pro solution.</p>\n<h2>My SaaS architecture checklist</h2>\n<p>If you are starting a new project today, here are the steps I recommend you follow.</p>\n<ol>\n<li><strong>Choose your package early.</strong> I prefer <code>stancl/tenancy</code> because of its flexibility. It handles subdomain routing and database switching out of the box.</li>\n<li><strong>Implement global scopes immediately.</strong> Do not wait until you have ten models. Add the trait to your base model and make it a standard part of your workflow.</li>\n<li><strong>Automate your deployment.</strong> Use Docker from day one. It makes local development identical to production. This avoids the \"it works on my machine\" bugs that plague SaaS launches.</li>\n<li><strong>Plan for data migration.</strong> As you update your schema you need a way to run migrations across hundreds of databases. Tools like <code>stancl/tenancy</code> have built-in commands for this.</li>\n<li><strong>Keep it simple.</strong> Don't build a multi-database setup if you only have five users. Start small and scale as the revenue grows. My goal is always to deliver exceptional results without over-complicating the technical stack.</li>\n</ol>\n<p>Are you building a SaaS or looking to migrate your current app to a multi-tenant structure? What is the biggest technical hurdle you are facing right now? <a href=\"https://ansezz.com/contact/\">Get in touch</a> — I'd love to hear about it. 🤙</p>\n",
      "date_published": "2026-02-08T00:00:00.000Z",
      "date_modified": "2026-02-08T00:00:00.000Z",
      "tags": [
        "laravel",
        "laravel",
        "multi-tenancy",
        "saas",
        "architecture",
        "stancl-tenancy",
        "docker",
        "postgres"
      ],
      "image": "https://ansezz.com/blog/laravel-multi-tenancy/hero.webp"
    },
    {
      "id": "https://ansezz.com/blog/ai-vs-traditional-development/",
      "url": "https://ansezz.com/blog/ai-vs-traditional-development/",
      "title": "AI integration vs traditional development: which is better for your business in 2026?",
      "summary": "Speed, control, or a hybrid path? When AI-assisted development pays off, when traditional engineering is non-negotiable, and the hybrid workflow I recommend most often to founders and tech leads.",
      "content_html": "<p>Most teams are asking the wrong question.</p>\n<p>The real problem is not \"AI or traditional development?\" — it is what kind of speed, control, and risk your business can actually afford.</p>\n<p>I see this mistake a lot. Teams chase AI because it feels faster. Or they reject it because it feels messy. Then they end up with the same problem from both directions. Rushed systems with weak foundations, or polished systems that ship too late.</p>\n<p>The better move is to understand where each approach wins, where it breaks, and where a hybrid model gives you the best return.</p>\n<p>Both approaches work. They just solve different problems.</p>\n<h2>AI-powered development: the speed revolution</h2>\n<p>AI integration changes how I build software. Instead of manually writing every repetitive piece, I can use tools that understand context, generate scaffolding, speed up testing, and remove a lot of the drag from delivery.</p>\n<h3>The core advantage: speed</h3>\n<p>This is where AI shines.</p>\n<p>For standard workflows, admin panels, CRUD-heavy systems, internal tools, and first-pass prototypes, AI can cut a serious amount of time. What used to take weeks can often be reduced to days if the scope is clear and the review process is tight.</p>\n<p>That speed usually comes from a few places:</p>\n<ul>\n<li><strong>Automated code generation.</strong> Prompts turn into usable boilerplate and feature drafts.</li>\n<li><strong>Faster testing.</strong> AI can draft test cases and edge-case coverage quickly.</li>\n<li><strong>Debugging support.</strong> It helps narrow down likely failures faster.</li>\n<li><strong>Documentation help.</strong> It can turn rough implementation details into clean internal docs.</li>\n</ul>\n<p><img src=\"https://ansezz.com/blog/ai-vs-traditional-development/workflow.webp\" alt=\"Architecture visual showing AI, traditional, and hybrid engineering workflow\" /></p>\n<h3>Who benefits most from AI development</h3>\n<p>I would lean toward AI-heavy workflows when speed matters more than perfect customization on day one.</p>\n<p>That usually means:</p>\n<ul>\n<li>Startups trying to reach product-market fit before the runway gets tight.</li>\n<li>Small teams that need leverage more than headcount.</li>\n<li>Businesses shipping standard features that already follow familiar patterns.</li>\n<li>Teams where non-technical stakeholders want to contribute to discovery and prototyping.</li>\n</ul>\n<p>In those cases, AI acts like a power tool. It does not replace the builder. It just makes the first cut much faster.</p>\n<h3>The trade-offs to consider</h3>\n<p>This is where a lot of teams get burned.</p>\n<p>AI is fast at common patterns. It is weaker at deep product nuance, strange business rules, and systems that need careful long-term architecture. If you skip review, you can ship something that looks finished but behaves like a prototype wearing a production costume.</p>\n<p>That means I would not treat AI output as truth. I would treat it as a draft.</p>\n<h2>Traditional development: the control champion</h2>\n<p>Traditional development is slower, but it gives me tighter control over how the system is shaped.</p>\n<p>This is the path I trust most when the business rules are complex, the architecture matters, or the cost of failure is high. Every part of the system is designed with intent instead of inferred from a prompt.</p>\n<h3>The core advantage: control</h3>\n<p>Traditional development is better when the software needs precision.</p>\n<p>That matters for:</p>\n<ul>\n<li><strong>Complex enterprise systems</strong> — lots of moving parts and layered business logic.</li>\n<li><strong>Regulated industries</strong> — where auditability and traceability matter.</li>\n<li><strong>Mission-critical applications</strong> — where downtime or bad behavior is expensive.</li>\n<li><strong>Custom architectures</strong> — where the product does not fit common patterns.</li>\n</ul>\n<h3>The predictability factor</h3>\n<p>One underrated benefit of traditional development is predictability.</p>\n<p>Manual design, explicit code reviews, architecture decisions, and planned testing give me a clearer picture of trade-offs. It is like building with blueprints instead of assembling furniture from a photo.</p>\n<p>That slower process often saves time later because fewer assumptions make it into production.</p>\n<h3>The time investment reality</h3>\n<p>The downside is obvious.</p>\n<p>Manual coding, reviews, debugging, refactoring, and testing take time. You need stronger engineering talent, and you need the discipline to keep standards high when deadlines start squeezing the team.</p>\n<p>Traditional development gives more control, but you pay for it in time and cost.</p>\n<h2>Head-to-head comparison</h2>\n<table>\n<thead>\n<tr>\n<th>Factor</th>\n<th>AI-Powered Development</th>\n<th>Traditional Development</th>\n</tr>\n</thead>\n<tbody>\n<tr>\n<td><strong>Development speed</strong></td>\n<td>30–50% faster completion</td>\n<td>Standard industry timelines</td>\n</tr>\n<tr>\n<td><strong>Cost structure</strong></td>\n<td>Lower long-term expenses</td>\n<td>Higher labor costs</td>\n</tr>\n<tr>\n<td><strong>Team requirements</strong></td>\n<td>Mixed skill levels acceptable</td>\n<td>Requires senior expertise</td>\n</tr>\n<tr>\n<td><strong>Customization level</strong></td>\n<td>Limited by AI training data</td>\n<td>Unlimited customization</td>\n</tr>\n<tr>\n<td><strong>Quality assurance</strong></td>\n<td>Automated testing and fixes</td>\n<td>Manual review processes</td>\n</tr>\n<tr>\n<td><strong>Risk management</strong></td>\n<td>Variable based on AI reliability</td>\n<td>Predictable risk factors</td>\n</tr>\n<tr>\n<td><strong>Scalability</strong></td>\n<td>Rapid scaling through automation</td>\n<td>Scales with team growth</td>\n</tr>\n</tbody>\n</table>\n<h2>Making the right choice for your business</h2>\n<h3>Choose AI integration when</h3>\n<p>Choose AI when your bottleneck is delivery speed and the work is close to known patterns.</p>\n<p>That usually applies when:</p>\n<ul>\n<li>Your market window is tight.</li>\n<li>You are building standard business apps like portals, dashboards, e-commerce flows, or content systems.</li>\n<li>Your team wants quick prototypes before committing engineering time.</li>\n<li>Your budget is better spent on iteration than on deep custom engineering from day one.</li>\n</ul>\n<h3>Choose traditional development when</h3>\n<p>Choose traditional development when the cost of being wrong is higher than the cost of being slower.</p>\n<p>That usually means:</p>\n<ul>\n<li>The app needs a unique architecture.</li>\n<li>Compliance and audit trails are mandatory.</li>\n<li>Reliability matters more than release velocity.</li>\n<li>Your team wants direct ownership of code quality and system design.</li>\n</ul>\n<h3>The hybrid strategy: best of both worlds</h3>\n<p>This is the option I recommend most often.</p>\n<p>The strongest teams do not treat this like a religion. They use AI where speed helps and switch to traditional engineering where judgment matters.</p>\n<p>A practical hybrid setup looks like this:</p>\n<ul>\n<li>Generate boilerplate and first drafts with AI, then review and reshape manually.</li>\n<li>Use AI for prototyping, then rebuild critical paths carefully.</li>\n<li>Automate repetitive testing tasks, but keep human review for logic and architecture.</li>\n<li>Use AI to accelerate docs and support material, while keeping final technical decisions human-led.</li>\n</ul>\n<p>The hybrid model works because it treats AI like a junior accelerator, not like an autopilot.</p>\n<h2>Implementation guidelines</h2>\n<h3>Starting with AI integration</h3>\n<p>If I were introducing AI into an existing team, I would start small.</p>\n<ul>\n<li>Begin with low-risk features.</li>\n<li>Define a review process for all AI-generated code.</li>\n<li>Choose tools that fit the current workflow.</li>\n<li>Train the team on prompting, verification, and code quality checks.</li>\n</ul>\n<h3>Maintaining traditional excellence</h3>\n<p>If the team stays mostly traditional, I would protect the basics.</p>\n<ul>\n<li>Invest in strong senior review.</li>\n<li>Keep documentation current.</li>\n<li>Use clear architecture standards.</li>\n<li>Avoid rushing complex work into fragile implementations.</li>\n</ul>\n<h3>Building hybrid capabilities</h3>\n<p>If the goal is balance, then the workflow matters more than the tools.</p>\n<ul>\n<li>Identify which tasks are repetitive and safe to automate.</li>\n<li>Keep humans responsible for architecture and business logic.</li>\n<li>Add quality gates before merge and deployment.</li>\n<li>Measure outcomes, not just speed.</li>\n</ul>\n<h2>The future-ready approach</h2>\n<p>The teams that will win in 2026 are not the ones that blindly choose AI or reject it.</p>\n<p>They are the ones that know where speed is enough, where control is non-negotiable, and where a hybrid model gives them leverage without chaos.</p>\n<p>That is the real solution.</p>\n<p>Use AI to remove friction. Use traditional engineering to protect the parts that matter. Combine both when the business needs speed and reliability at the same time.</p>\n<p><img src=\"https://ansezz.com/blog/ai-vs-traditional-development/workspace.webp\" alt=\"Final workspace visual of a senior developer desk in pop-art comic style\" /></p>\n<p>Your development strategy should match your business goals, not the trend cycle. If you had to choose today, which matters more for your next product: speed, control, or a hybrid path? <a href=\"https://ansezz.com/contact/\">Reach out</a> — I'd love to hear which side you're leaning toward.</p>\n",
      "date_published": "2026-01-25T00:00:00.000Z",
      "date_modified": "2026-01-25T00:00:00.000Z",
      "tags": [
        "architecture",
        "ai",
        "strategy",
        "hybrid",
        "business",
        "decision-making",
        "productivity"
      ],
      "image": "https://ansezz.com/blog/ai-vs-traditional-development/hero.webp"
    },
    {
      "id": "https://ansezz.com/blog/scaling-with-coolify/",
      "url": "https://ansezz.com/blog/scaling-with-coolify/",
      "title": "Scaling with confidence: advanced Coolify deployment strategies",
      "summary": "Move past the single-server trap. Multi-node Coolify setups, zero-downtime rolling deploys with health checks, dedicated build servers, managed databases, and GitHub Actions wiring — production-grade self-hosting without a DevOps team.",
      "content_html": "<p>You've finally moved your apps off that messy manual VPS and into Coolify. It feels great. Everything is in one place. But then the traffic starts to spike. You realize that hosting your production database, three web apps, and a memory-heavy build process on a single $10 DigitalOcean droplet is a recipe for disaster.</p>\n<p>The \"single server trap\" is real. It's fine for a side project or a quick MVP. But when you're building for real customers, you need more than just a dashboard. You need a strategy. You're worried about what happens when that one server hits 100% CPU or when a simple deployment takes your whole site down for five minutes.</p>\n<p>I've spent the last decade scaling web applications and building custom solutions at <a href=\"https://ansezz.com/\">Ansezz</a>. I've seen self-hosted setups crumble under pressure because they lacked the right architecture. The good news is that Coolify is more than capable of handling high-scale workloads. You just need to know how to pull the right levers.</p>\n<p>In this guide, I'm going to show you how to move from a basic setup to a production-grade infrastructure using advanced Coolify strategies. We're talking multi-server nodes, zero-downtime deployments, and offloading the heavy lifting so your apps stay snappy.</p>\n<h2>Moving beyond the single-server monolith</h2>\n<p>The biggest mistake I see engineers make is keeping everything on one node. When your build process starts, it eats up CPU and RAM. Your web app starts to lag. Your database gets starved for resources.</p>\n<p>The solution is to decouple your \"control plane\" from your \"workloads.\"</p>\n<p>In a professional setup, you want one small server dedicated solely to running the Coolify instance itself. This is your mission control. Then, you add separate \"app servers\" where your actual containers live.</p>\n<p><img src=\"https://ansezz.com/blog/scaling-with-coolify/multi-server.webp\" alt=\"Multi-server architecture bento grid showing control plane and app nodes\" /></p>\n<p>To do this in Coolify, you go to the <strong>Servers</strong> tab and add a new server via SSH. Once it's connected, you can choose which server a specific resource should be deployed to. This gives you instant horizontal scalability. If one server is getting full, you just spin up another one, add it to Coolify, and point your next app there.</p>\n<p>This separation of concerns is a core pillar of what we do when building <a href=\"https://ansezz.com/work/\">custom web applications</a>. It prevents a single point of failure from taking down your entire digital presence.</p>\n<h2>The art of the zero-downtime deploy</h2>\n<p>Nothing kills user trust faster than a \"502 Bad Gateway\" every time you push a small CSS fix. By default, many self-hosted setups just kill the old container and start the new one. There's a gap. That gap is where your users get frustrated.</p>\n<p>Coolify handles this beautifully with \"rolling updates,\" but it only works if you tell it how to check the health of your app.</p>\n<p>If you don't configure health checks, Traefik (the reverse proxy Coolify uses) might start sending traffic to your new container before the app inside it has even finished booting up.</p>\n<p><img src=\"https://ansezz.com/blog/scaling-with-coolify/health-checks.webp\" alt=\"Dashboard visualization of health check monitoring across rolling deploy\" /></p>\n<p>Here is the workflow I use to ensure 100% uptime:</p>\n<ol>\n<li><strong>Create a health endpoint.</strong> In your Laravel, Vue, or Node app, create a simple route like <code>/healthz</code>. It should return a 200 status code only when the app is ready to serve traffic.</li>\n<li><strong>Configure Coolify.</strong> In your application settings, go to the <strong>Health Check</strong> section. Set the path to <code>/healthz</code> and the interval to something like 5 seconds.</li>\n<li><strong>The rollout.</strong> When you hit deploy, Coolify starts the new container. Traefik waits until that <code>/healthz</code> endpoint returns a success before it switches the traffic over. The old container is only killed after the new one is confirmed live.</li>\n</ol>\n<p>This is a non-negotiable step for any SaaS or e-commerce store where every second of downtime equals lost revenue.</p>\n<h2>Offloading the heavy lifting</h2>\n<p>If you're building a modern app with Docker, the build process can be incredibly resource-intensive. Compiling assets, installing npm packages, and building images can spike your server usage to the moon.</p>\n<p>If you're running that build on the same server that's trying to serve your customers, they're going to feel the slowdown.</p>\n<p>Advanced users leverage a <strong>dedicated build server</strong>.</p>\n<p>You can designate a high-performance, high-CPU server in Coolify specifically for builds. When you trigger a deployment, Coolify pushes the code to the build server, creates the image there, and then pushes the finished image to your production app server.</p>\n<p>Your production server never feels a thing. It just gets a fresh, ready-to-run image.</p>\n<h3>What about the database?</h3>\n<p>While Coolify makes it easy to click \"New Database,\" running your production Postgres or MySQL inside a Docker container on the same server as your app is risky.</p>\n<p>For production workloads, I almost always recommend using an external managed database like AWS RDS or Google Cloud SQL. It handles backups, point-in-time recovery, and scaling automatically.</p>\n<p>In Coolify, you simply provide the connection string as an environment variable. This keeps your state (the data) separate from your compute (the app). If your app server goes up in flames, your data is safe on a managed platform.</p>\n<h2>Automation at scale with CI/CD</h2>\n<p>Manual deployments are for hobbyists. For a professional workflow, you want your code to move from GitHub to production without you touching a single button in the Coolify UI.</p>\n<p>I prefer using GitHub Actions for this. While Coolify has a great GitHub App integration, using Actions gives you more control. You can run your test suite, lint your code, and only if everything passes, trigger the Coolify deployment via a webhook.</p>\n<p><img src=\"https://ansezz.com/blog/scaling-with-coolify/cicd-pipeline.webp\" alt=\"Pop-art illustration of a CI/CD pipeline flowing from GitHub to Coolify\" /></p>\n<p>Here is a snippet of how I usually structure a simple deployment step in a <code>.github/workflows/deploy.yml</code> file:</p>\n<pre><code>name: deploy to production\non:\n  push:\n    branches:\n      - main\n\njobs:\n  deploy:\n    runs-on: ubuntu-latest\n    steps:\n      - name: trigger coolify webhook\n        run: |\n          curl -X GET \"${{ secrets.COOLIFY_WEBHOOK_URL }}\"\n</code></pre>\n<p>It's simple, direct, and ensures that broken code never reaches your servers. It keeps your development cycle clean and your mental health intact.</p>\n<h2>Advanced configuration tips</h2>\n<p>Managing a multi-server setup requires a bit of extra care. Here are a few practical takeaways to keep in your back pocket:</p>\n<ul>\n<li><strong>Resource limits.</strong> Always set CPU and RAM limits in Coolify for each application. This prevents a single \"leaky\" container from hogging all the resources and crashing the whole server.</li>\n<li><strong>External backups.</strong> If you do choose to run databases inside Coolify, use the S3-compatible backup feature. I personally use Backblaze B2 or Cloudflare R2 for this. Never rely on local backups alone.</li>\n<li><strong>Docker pruning.</strong> Coolify is good at cleaning up, but it's worth checking your disk space regularly. Large images can eat up your SSD fast. Set up a cron job or use Coolify's built-in cleanup settings.</li>\n<li><strong>Monitoring.</strong> Use a tool like Better Stack or GlitchTip to monitor your endpoints. Coolify tells you if the container is running, but an external monitor tells you if a human can actually use the site.</li>\n</ul>\n<h2>Scaling is a journey</h2>\n<p>Scaling isn't about having the most expensive hardware. It's about having a system that is predictable and resilient. Coolify gives us the tools to act like a giant tech company without the massive overhead of a dedicated DevOps team.</p>\n<p>By splitting your servers, mastering health checks, and automating your builds, you move from \"hoping it works\" to \"knowing it scales.\"</p>\n<p>I've helped dozens of founders and tech leads navigate these waters. Whether you're building a Shopify app or a complex Laravel SaaS, the principles are the same. Keep your compute separate from your data, and your builds separate from your traffic.</p>\n<p>Have you ever had a deployment go sideways because a build process crashed your production server? What's your current \"war story\" from the world of self-hosting? <a href=\"https://ansezz.com/contact/\">Drop me a line</a>.</p>\n",
      "date_published": "2026-01-11T00:00:00.000Z",
      "date_modified": "2026-01-11T00:00:00.000Z",
      "tags": [
        "devops",
        "coolify",
        "deployment",
        "devops",
        "docker",
        "self-hosting",
        "ci-cd",
        "scaling"
      ],
      "image": "https://ansezz.com/blog/scaling-with-coolify/hero.webp"
    },
    {
      "id": "https://ansezz.com/blog/shopify-liquid-vs-headless/",
      "url": "https://ansezz.com/blog/shopify-liquid-vs-headless/",
      "title": "Shopify Liquid vs. headless: choosing the right stack for scale",
      "summary": "Hydrogen looks great on paper. Liquid still ships more revenue per week. A practical decision framework for picking between Liquid, headless Hydrogen, and the messy middle — based on what your team can actually operate long-term.",
      "content_html": "<p>Your store is making money, but every new change feels like surgery on a moving car.</p>\n<p>Traffic is up. Orders are up. Pressure is up. But the theme layer is starting to fight back. Simple feature requests turn into fragile hacks. App scripts pile up. Mobile performance gets softer with every install. What used to feel fast and convenient now feels like a ceiling.</p>\n<p>This is where a lot of brands start looking at headless. The promise sounds great. More freedom. Better performance. Cleaner frontend architecture. But this is also where expensive mistakes happen. A headless rebuild can solve real problems — or create a second system your team now has to babysit forever.</p>\n<p>The real question is not which stack sounds more modern. The real question is which stack fits your stage, your team, and your operational reality.</p>\n<h2>The Liquid reality: why it still wins for most stores</h2>\n<p>Liquid is still the default winner for a reason. It is tightly integrated with Shopify, fast to ship, and much easier to maintain than a custom headless frontend.</p>\n<p>For most growth-stage stores, Liquid gives the best speed-to-market. Online Store 2.0 made theme architecture much more modular than the old days, so a solid theme setup can go a long way before it becomes a real blocker.</p>\n<p>This is the part people underestimate. A good Liquid stack is like a well-tuned production van. It is not flashy, but it moves product reliably, gets updated quickly, and does not need a pit crew every week.</p>\n<p>The problem shows up when business needs outgrow theme constraints. Maybe the storefront needs deeply custom interactions. Maybe product data has to come from an ERP, a PIM, and a custom backend at the same time. Maybe merchandising logic is getting too complex for theme code to stay clean. That is usually the point where headless becomes a serious conversation, not just a trendy one.</p>\n<h2>Performance is the primary driver</h2>\n<p><img src=\"https://ansezz.com/blog/shopify-liquid-vs-headless/performance-metrics.webp\" alt=\"Performance metrics dashboard comparing Liquid vs headless storefront\" /></p>\n<p>Performance is the most common reason teams start looking at headless. And yes, milliseconds matter. On e-commerce storefronts, small delays compound into lower conversion rates, weaker ad efficiency, and a worse mobile experience.</p>\n<p>A well-built headless stack using Hydrogen and Oxygen can push performance much further than a typical theme setup. You get finer control over rendering, data loading, caching, and frontend execution. That opens the door for lower LCP and a more responsive storefront.</p>\n<p>But this is where the hype needs a reality check. Headless does not automatically mean faster. It only gets faster when the frontend architecture is actually good.</p>\n<p>If the team over-fetches GraphQL data, hydrates too much JavaScript, or ships a bloated component tree, the custom storefront can end up slower than a decent Liquid theme. That happens more often than people admit.</p>\n<p>So yes, headless can be a performance win. But only if the implementation is disciplined.</p>\n<h2>The hidden complexity of going headless</h2>\n<p><img src=\"https://ansezz.com/blog/shopify-liquid-vs-headless/architecture-diagram.webp\" alt=\"Architecture diagram of a headless Shopify stack with custom frontend and integrations\" /></p>\n<p>Going headless means splitting commerce from presentation. Shopify still handles products, checkout, and admin workflows. Your custom frontend handles the customer experience.</p>\n<p>That sounds clean on paper. In practice, it means you now own two systems instead of one.</p>\n<p>You need to manage hosting, deployments, caching, GraphQL queries, error handling, observability, and integration behavior across services. Every app in the stack has to be checked for API compatibility. If an app only works by injecting snippets into a theme, that convenience is gone.</p>\n<p>This is the part that catches teams off guard. In Liquid, many features feel plug-and-play. In headless, the same features often become custom integration work. Reviews, loyalty, search, subscriptions, personalization, analytics. All of it may need extra engineering.</p>\n<p>The easiest way to think about it is this: Liquid is renting a well-equipped shop. Headless is designing your own building. You get more freedom, but now plumbing, wiring, and maintenance are your problem too.</p>\n<h2>Time and money: the real cost of freedom</h2>\n<p><img src=\"https://ansezz.com/blog/shopify-liquid-vs-headless/timeline-comparison.webp\" alt=\"Timeline comparison illustration showing Liquid vs headless project costs\" /></p>\n<p>Time and budget usually decide this faster than architecture opinions do.</p>\n<p>A custom Liquid theme can often be launched in weeks. That means faster testing, faster iteration, and lower implementation cost. If the business mainly needs merchandising flexibility, better UX, and cleaner performance hygiene, Liquid usually gives a better return.</p>\n<p>Headless is a different category of investment. The build takes longer. The team needs stronger frontend engineering. The integration surface is bigger. And maintenance does not stop after launch.</p>\n<p>This is the important part. With headless, you are not just paying for a redesign. You are taking on a software product that needs ongoing care. Deployments, monitoring, API changes, dependency updates, caching strategy, and developer ownership all become part of normal operations.</p>\n<p>For many stores, that trade-off is not worth it yet. More freedom is great, but freedom is expensive when the business does not fully need it.</p>\n<h2>My decision framework: which one should you choose?</h2>\n<p>I like to reduce this decision to a few practical questions.</p>\n<ol>\n<li><strong>Is the current theme actually blocking revenue?</strong> If conversion is healthy and the main pain is taste or minor flexibility, Liquid is probably still the right call.</li>\n<li><strong>Does the team have the engineering depth to own a custom storefront long term?</strong> Headless is not a one-time build. It is an operating model.</li>\n<li><strong>Are the integration requirements genuinely complex?</strong> If the storefront needs to combine Shopify with custom product logic, external systems, or a bespoke application layer, headless starts to make more sense.</li>\n<li><strong>Is performance a code problem or an architecture problem?</strong> Many slow stores do not need headless. They need script cleanup, better image handling, less app bloat, and tighter theme code.</li>\n</ol>\n<p>The simplest stack that solves the real bottleneck is usually the best stack. Building a spaceship to cross the street is still a bad decision.</p>\n<h2>Practical takeaways for your next move</h2>\n<p>If you are stuck between Liquid and headless, start here:</p>\n<ul>\n<li><strong>Audit site speed before making architectural decisions.</strong> If LCP is acceptable, the problem may not be the stack.</li>\n<li><strong>Remove app bloat.</strong> A lot of slow Liquid stores are just carrying too many scripts and too much leftover code.</li>\n<li><strong>Map every integration.</strong> List what works natively, what depends on theme injection, and what would break in headless.</li>\n<li><strong>Estimate ownership cost, not just build cost.</strong> Launch is only the first invoice.</li>\n<li><strong>Look at Hydrogen if the business really needs headless</strong> — but keep the scope tight.</li>\n</ul>\n<p>Choosing a stack is a long-term commitment. The right answer is not the most advanced one. It is the one that solves the current bottleneck without creating three new ones.</p>\n<p>Are you dealing with real technical limits in Shopify, or just feeling the pull of a more customizable stack? <a href=\"https://ansezz.com/contact/\">Drop me a line</a> — happy to do a 30-minute architecture sanity check.</p>\n",
      "date_published": "2025-12-28T00:00:00.000Z",
      "date_modified": "2025-12-28T00:00:00.000Z",
      "tags": [
        "shopify",
        "shopify",
        "liquid",
        "headless",
        "hydrogen",
        "performance",
        "architecture",
        "ecommerce"
      ],
      "image": "https://ansezz.com/blog/shopify-liquid-vs-headless/hero.webp"
    },
    {
      "id": "https://ansezz.com/blog/picking-the-right-rag-stack/",
      "url": "https://ansezz.com/blog/picking-the-right-rag-stack/",
      "title": "Picking the right RAG stack: vector databases for AI engineering",
      "summary": "pgvector, Pinecone, Weaviate, Qdrant — a 2026 field guide. Which vector store to pick for your AI app, why hybrid search matters, and how to ship without painting yourself into a corner.",
      "content_html": "<p>You built a cool chatbot. It works great on your local machine until you feed it 50,000 internal documents. Suddenly, it's hallucinating. It's slow. It's pulling data from three years ago when you specifically asked for last week's report.</p>\n<p>Building a Retrieval-Augmented Generation (RAG) system sounds like a weekend project. But once you move past the \"hello world\" stage, you hit the database wall. Choosing the wrong vector store early on is a silent killer. It leads to high latency, soaring cloud costs, and a painful migration six months down the line when your data outgrows your infrastructure.</p>\n<p>I've spent over a decade building <a href=\"https://ansezz.com/\">custom web applications</a> and scaling cloud infrastructure. I've seen teams get paralyzed by the sheer number of options in the AI ecosystem. You don't need a perfect database. You need the right tool for your specific scale and team.</p>\n<p>Let's break down the 2026 vector database landscape so you can stop scrolling and start shipping.</p>\n<h2>Why the database matters in RAG</h2>\n<p>An LLM like Claude or GPT-4 is a genius without a memory. RAG gives it that memory. Your vector database is the librarian. If the librarian is slow or loses books, the genius can't do its job.</p>\n<p>When we talk about RAG stacks, we're looking for three things:</p>\n<ol>\n<li><strong>Latency</strong> — can it find the right \"memory\" in under 50ms?</li>\n<li><strong>Hybrid search</strong> — can it search by meaning (vectors) and exact keywords (full-text)?</li>\n<li><strong>Developer experience</strong> — how much time are you going to spend on DevOps?</li>\n</ol>\n<p><img src=\"https://ansezz.com/blog/picking-the-right-rag-stack/comparison-bento.webp\" alt=\"Comparison bento grid of vector databases\" /></p>\n<h2>The contenders: which one is yours?</h2>\n<h3>1. pgvector — the \"I already have a database\" choice</h3>\n<p>If you are already running <a href=\"https://ansezz.com/\">Postgres for your web applications</a>, pgvector is usually your first stop. It's not a new database. It's an extension that adds vector support to the database you already trust.</p>\n<p>It's perfect if you have under 10 million vectors. You get ACID compliance, easy backups, and your relational data stays right next to your embeddings. No new infra. No new security audits.</p>\n<p><strong>Pros</strong></p>\n<ul>\n<li>Zero new infrastructure if you use Postgres.</li>\n<li>Perfect for joining vector data with user metadata.</li>\n<li>Huge ecosystem support (Laravel, Django, Node.js).</li>\n</ul>\n<p><strong>Cons</strong></p>\n<ul>\n<li>Scaling to 100M+ vectors requires serious server muscle.</li>\n<li>Hybrid search requires manual tuning with Postgres full-text search.</li>\n</ul>\n<h3>2. Pinecone — the \"I want zero ops\" choice</h3>\n<p>Pinecone is the gold standard for managed service. It's a serverless vector database. You don't manage clusters. You don't tune indexes. You just send vectors and get results.</p>\n<p>In 2026, Pinecone is the go-to for teams that want to scale from zero to a billion vectors without hiring a dedicated DevOps engineer. Their serverless architecture means you only pay for what you use.</p>\n<p><strong>Pros</strong></p>\n<ul>\n<li>Truly managed. Pick a region and go.</li>\n<li>World-class performance and low latency.</li>\n<li>Great enterprise features like SOC2 compliance.</li>\n</ul>\n<p><strong>Cons</strong></p>\n<ul>\n<li>It's a black box. You can't self-host it.</li>\n<li>Costs can scale quickly if you have high write/read volume.</li>\n</ul>\n<h3>3. Weaviate &amp; Qdrant — the hybrid powerhouses</h3>\n<p>If your RAG app needs to combine semantic search with old-school keyword search, these two are the leaders. Weaviate and Qdrant are built from the ground up for high-performance vector retrieval.</p>\n<p>Weaviate excels at \"out-of-the-box\" hybrid search. Qdrant, written in Rust, is incredibly fast and efficient with memory. Both offer open-source versions and managed cloud options.</p>\n<p><strong>Pros</strong></p>\n<ul>\n<li>Best-in-class hybrid search (BM25 + Vector).</li>\n<li>Flexible hosting (self-hosted Docker or managed cloud).</li>\n<li>Highly optimized for filtering (e.g., \"find documents from '2025' that talk about 'security'\").</li>\n</ul>\n<p><strong>Cons</strong></p>\n<ul>\n<li>More operational overhead than Pinecone.</li>\n<li>Requires learning a new database API.</li>\n</ul>\n<p><img src=\"https://ansezz.com/blog/picking-the-right-rag-stack/rag-architecture.webp\" alt=\"Reference RAG architecture diagram\" /></p>\n<h2>How to choose: the engineering trade-offs</h2>\n<p>Picking a database isn't about finding the \"best\" one. It's about matching the tool to your engineering constraints.</p>\n<h3>Factor 1: the \"billions\" problem</h3>\n<p>Most startups don't have a billion vectors. They have a few thousand PDFs. If you're in the sub-1M range, pgvector is almost always the right answer. It's simple and it works.</p>\n<p>If you are building something like a global legal search engine or a massive e-commerce recommendation system, you need the distributed architecture of Milvus or Pinecone. Don't build a massive distributed system if you don't have a massive amount of data.</p>\n<h3>Factor 2: hybrid search is non-negotiable</h3>\n<p>Pure vector search is actually pretty bad at finding specific technical terms. If you search for \"PHP 8.4 features,\" a pure vector search might give you general \"PHP\" articles. A hybrid search combines the \"vibe\" of the vector with the \"exactness\" of a keyword search.</p>\n<p>If search quality is your #1 metric, look at Weaviate or Qdrant. They handle the blending of these two search types natively.</p>\n<h3>Factor 3: the \"DevOps\" tax</h3>\n<p>I'm a huge fan of <a href=\"https://ansezz.com/\">cloud infrastructure and deployment</a>. But I also know that every new piece of infra you add to your stack is another thing that can break at 3 AM.</p>\n<p>If you have a small team, lean on managed services like Pinecone or Zilliz. If you have a strong infra team and want to save on cloud margins at high scale, self-hosting Qdrant on a tool like Coolify or Kubernetes is the move.</p>\n<h2>Implementing pgvector with Laravel</h2>\n<p>Since I work a lot with <a href=\"https://ansezz.com/\">custom web development using Laravel</a>, I want to show you how easy this looks in practice. You don't need a PhD in math to run a vector query.</p>\n<pre><code>// finding the most relevant document chunks\n$embedding = Ai::embed($query); // get vector from OpenAI/Claude\n\n$results = Document::query()\n    -&gt;select('content')\n    -&gt;orderByRaw('embedding &lt;=&gt; ?', [$embedding]) // the &lt;=&gt; operator is pgvector's magic\n    -&gt;limit(5)\n    -&gt;get();\n</code></pre>\n<p>That snippet is essentially the core of a RAG system. You find the content, send it to the LLM, and get a grounded answer.</p>\n<p><img src=\"https://ansezz.com/blog/picking-the-right-rag-stack/code-snippet.webp\" alt=\"Annotated code snippet showing pgvector similarity query\" /></p>\n<h2>Three practical tips for your RAG stack</h2>\n<p>Before you commit to a database, keep these three things in mind. They will save you weeks of refactoring.</p>\n<p><strong>1. Index early, but not too early.</strong>\nVector indexes like HNSW are fast for searching but slow for inserting data. If you are doing a massive initial data load, insert your vectors first, then create the index. It's the difference between minutes and hours.</p>\n<p><strong>2. Normalize your vectors.</strong>\nMake sure your embedding model and your vector database are on the same page. If you use cosine similarity, normalize your vectors. It keeps your results consistent and prevents weird ranking bugs.</p>\n<p><strong>3. Keep the metadata lean.</strong>\nIt's tempting to store the entire JSON object of a document inside your vector database. Don't. Store the vector and a simple ID. Keep the heavy data in your primary database (like Postgres). This keeps your vector index small and fast.</p>\n<h2>My personal rule of thumb</h2>\n<p>I've built systems for <a href=\"https://ansezz.com/work/\">startups and established businesses</a>. Here is how I usually guide them:</p>\n<ul>\n<li><strong>Default to pgvector.</strong> It's the path of least resistance for most web apps.</li>\n<li><strong>Move to Pinecone</strong> if you need high performance and don't want to manage servers.</li>\n<li><strong>Choose Weaviate</strong> if your application relies heavily on complex hybrid search and metadata filtering.</li>\n</ul>\n<p>The \"right\" stack is the one that lets you ship your AI features today, not the one that looks the best on a benchmark chart.</p>\n<p>Are you building a RAG system right now? What's the biggest hurdle you've hit with your data retrieval?</p>\n<p>Drop a line or <a href=\"https://ansezz.com/contact/\">reach out</a>. I'd love to hear your war stories.</p>\n<hr />\n<p><strong>Summary takeaways</strong></p>\n<ul>\n<li><strong>pgvector</strong> is king for teams already on Postgres.</li>\n<li><strong>Pinecone</strong> is the best zero-ops solution for scaling.</li>\n<li><strong>Hybrid search</strong> (keyword + vector) is usually better than vector search alone.</li>\n<li>Keep your architecture simple. Don't over-engineer for \"billions\" of vectors if you only have thousands.</li>\n</ul>\n<p><img src=\"https://ansezz.com/blog/picking-the-right-rag-stack/mentor.webp\" alt=\"Mentor robot character offering advice\" /></p>\n",
      "date_published": "2025-12-14T00:00:00.000Z",
      "date_modified": "2025-12-14T00:00:00.000Z",
      "tags": [
        "ai",
        "rag",
        "vector-databases",
        "pgvector",
        "pinecone",
        "weaviate",
        "qdrant",
        "laravel"
      ],
      "image": "https://ansezz.com/blog/picking-the-right-rag-stack/hero.webp"
    },
    {
      "id": "https://ansezz.com/blog/vibe-coding/",
      "url": "https://ansezz.com/blog/vibe-coding/",
      "title": "Vibe coding: why your next project needs more than just logic",
      "summary": "Logic is the skeleton. Vibe is the soul. Why taste, intent, and feel are the new senior-engineer superpowers in the Cursor + Claude era — and how to keep the codebase from turning into a ball of mud while you chase it.",
      "content_html": "<p>Most developers are obsessed with logic. We spend years mastering syntax, optimizing database queries, and debating the merits of different architectural patterns. We build systems that are technically perfect but somehow feel completely hollow. They work, but they don't sing.</p>\n<p>The problem is that your users don't care about your clean code or your clever recursive functions. They care about how the software feels. They care about the \"vibe.\"</p>\n<p>If you keep building purely for the machine, you are going to lose. The next generation of successful products won't be the ones with the most features or the tightest algorithms. They will be the ones that master the art of vibe coding.</p>\n<h2>The logic trap</h2>\n<p>I've spent over a decade in the trenches of software development. I've built custom web applications for startups and managed complex cloud infrastructure on Google Cloud and AWS. For a long time, I thought my job was to be a logic machine. I thought that if I followed every best practice and wrote the most efficient Laravel code possible, the project would be a success.</p>\n<p>I was wrong.</p>\n<p>Logic is just the foundation. It is the skeleton that keeps the building from falling down. But nobody wants to live in a skeleton. People want a home with character, warmth, and a specific feeling. In software, that character comes from the vibe.</p>\n<p>When we focus purely on logic, we end up with \"boring\" software. It is the kind of software that does what it says on the tin but leaves the user feeling nothing. Or worse, it feels frustrating because the developer didn't think about the emotional friction of a slow-loading button or a confusing layout.</p>\n<p><img src=\"https://ansezz.com/blog/vibe-coding/logic-vs-vibe.webp\" alt=\"Side-by-side illustration of cold logic vs warm vibe coding\" /></p>\n<h2>Entering the era of vibe coding</h2>\n<p>The term \"vibe coding\" was popularized recently by Andrej Karpathy. It describes a shift in how we build things in the age of AI tools like Cursor and Claude. It is a transition from being a writer of code to being a curator of intent.</p>\n<p>Vibe coding is about letting go of the need to micro-manage every semicolon. It is about using natural language to describe the <em>feel</em> and <em>behavior</em> you want, and then letting AI handle the heavy lifting of the implementation.</p>\n<p>In this world, your value as an engineer isn't in how fast you can type. It's in your taste. It's in your ability to recognize when a user interface feels \"off\" and knowing how to steer the AI to fix it. It is about prioritizing the outcome over the output.</p>\n<p>I've seen this shift firsthand in my own work at <a href=\"https://ansezz.com/\">Ansezz</a>. When I'm working on a Shopify store development project, the technical logic of the checkout is important. But the <em>vibe</em> of the checkout — the smooth transitions, the reassuring feedback, the perfect typography — is what actually drives conversions for the business.</p>\n<h2>Tools that fuel the flow</h2>\n<p>To embrace vibe coding, you need tools that don't get in your way. You need tools that allow you to stay in a state of flow where the distance between your idea and the execution is as small as possible.</p>\n<p>Tools like Cursor have changed the game for me. Instead of spending twenty minutes setting up boilerplate for a new Vue component, I can describe the \"vibe\" of the component in the chat. I can say, \"build me a dashboard widget that feels airy and modern, uses a bento grid layout, and gives the user a sense of calm control over their data.\"</p>\n<p><img src=\"https://ansezz.com/blog/vibe-coding/ai-ide.webp\" alt=\"AI-powered IDE pairing with a developer\" /></p>\n<p>The AI generates the code. I review it. If the vibe isn't right, I don't fix the code line-by-line. I talk to the model again. I give it feedback on the <em>feeling</em>. \"This feels too cramped. Give it more white space and make the shadows softer.\"</p>\n<p>This is the essence of vibe coding. It's a high-level conversation about intent.</p>\n<h2>The senior developer guardrails</h2>\n<p>Now, I know what some of you are thinking. \"This sounds like a recipe for a messy, unmaintainable codebase.\"</p>\n<p>You are right to be worried. If you just \"vibe\" your way through a project without any discipline, you will end up with a \"ball of mud.\" This is where the senior engineer perspective becomes more critical than ever.</p>\n<p>Vibe coding isn't about being lazy. It's about shifting your focus. You use your senior-level expertise to build the \"robust core\" that allows the \"vibe layer\" to exist.</p>\n<p>For me, that core is often built with Laravel and Docker. I use Laravel because it is built for \"developer happiness.\" The framework itself has a vibe of elegance and simplicity. It provides the solid, logical foundation — the authentication, the database migrations, the API structures — that I can trust.</p>\n<p>Once that robust core is in place, I can afford to be more exploratory with the frontend and the user experience. I can \"vibe code\" the top layer because I know the foundation is solid.</p>\n<p><img src=\"https://ansezz.com/blog/vibe-coding/architecture.webp\" alt=\"Architecture diagram — robust core under the vibe layer\" /></p>\n<h2>Why Shopify and vibe coding are a perfect match</h2>\n<p>If you work in e-commerce, vibe coding is your secret weapon. Shopify is a platform that already understands the importance of the feel. They have spent years perfecting the checkout flow and the admin experience.</p>\n<p>When I do <a href=\"https://ansezz.com/work/\">Shopify customization</a>, I'm not just writing Liquid code. I'm trying to match the brand's vibe. A luxury jewelry brand needs a completely different \"vibe\" than a high-energy fitness store.</p>\n<p>One should feel slow, deliberate, and expensive. The other should feel fast, punchy, and motivating. You can't achieve that through logic alone. You achieve it by obsessing over the details that the logic-only dev ignores.</p>\n<h2>How to start vibe coding today</h2>\n<p>If you want to move beyond being a logic-only developer, here are some practical steps you can take:</p>\n<ol>\n<li><strong>Prioritize your taste.</strong> Start looking at software not just as a tool, but as an experience. What apps do you love using? Why? Is it the speed? The animations? The way the buttons click? Start building a \"swipe file\" of great vibes.</li>\n<li><strong>Embrace AI as a partner, not a tool.</strong> Stop using Copilot just for autocompletion. Start using tools like Claude or Cursor to brainstorm high-level concepts. Describe the \"feel\" you want and see what it gives you.</li>\n<li><strong>Build a solid core.</strong> Don't let the vibe turn into chaos. Use frameworks like Laravel or tools like Docker to keep your infrastructure predictable and clean. The more you trust your foundation, the more you can play with the surface.</li>\n<li><strong>Iterate on the feeling.</strong> Instead of trying to get the code perfect the first time, get the \"vibe\" right first. Build a messy prototype that feels great, and then use your technical skills to refactor and harden it.</li>\n<li><strong>Focus on user empathy.</strong> Every time you write a piece of logic, ask yourself: \"how will this make the user feel?\" If the answer is \"nothing,\" you have more work to do.</li>\n</ol>\n<h2>The future is felt, not just calculated</h2>\n<p>We are entering a time where \"coding\" as we knew it is becoming a commodity. Anyone can generate a function to sort an array. But not everyone can create an experience that moves people.</p>\n<p>The future of software development belongs to the engineers who can bridge the gap between the machine and the human heart. It belongs to the people who understand that the best code is the code you don't even notice because you're too busy enjoying the vibe.</p>\n<p>I've seen the results of this approach in my own projects and for the clients I work with. When you stop fighting the logic and start leaning into the flow, everything becomes easier. The work becomes more fun, and the results become more impactful.</p>\n<p>Are you ready to stop just writing logic and start building vibes?</p>\n<p>What is the one app you use that just \"feels\" right, and what can you steal from its vibe for your next project?</p>\n",
      "date_published": "2025-11-30T00:00:00.000Z",
      "date_modified": "2025-11-30T00:00:00.000Z",
      "tags": [
        "ai",
        "vibe-coding",
        "ai",
        "cursor",
        "claude",
        "taste",
        "dx",
        "laravel"
      ],
      "image": "https://ansezz.com/blog/vibe-coding/hero.webp"
    },
    {
      "id": "https://ansezz.com/blog/hello-world/",
      "url": "https://ansezz.com/blog/hello-world/",
      "title": "Hello, world. Yes, another developer blog.",
      "summary": "Why this site exists, what I'll write about, and why neobrutalism is the right call for an engineer's portfolio in 2026.",
      "content_html": "<p>import SpeechBubble from \"@/components/neobrutalist/SpeechBubble.astro\";\nimport StickyNote from \"@/components/neobrutalist/StickyNote.astro\";\nimport CodeBlock from \"@/components/neobrutalist/CodeBlock.astro\";\nimport ComparisonTable from \"@/components/neobrutalist/ComparisonTable.astro\";\nimport BurstBadge from \"@/components/neobrutalist/BurstBadge.astro\";</p>\n<p>Most developer blogs read like a LinkedIn post had a baby with a Medium article.\nThis one won't. I'm Anass. I ship Laravel + Shopify + AI for a living, and this is\nwhere I take notes out loud.</p>\n<h2>Why now</h2>\n<p>I've spent 10+ years remote, mostly head-down in client codebases. The patterns I keep\nreaching for — Laravel Octane at scale, RAG done right, agentic commerce — don't show up\nin tutorials. So I'm writing them down.</p>\n\n  Strong opinions, loosely held. Code that runs in production.\n\n<h2>What you'll find here</h2>\n<ul>\n<li>Laravel internals you actually use day-to-day</li>\n<li>AI engineering with Anthropic Claude + MCP</li>\n<li>Shopify Plus app patterns</li>\n<li>Architecture &amp; DevOps lessons learned the hard way</li>\n</ul>\n\n  Subscribe via [RSS](/rss.xml) — no newsletter, no popup, no LinkedIn dance.\n\n<h2>A small comparison</h2>\n<p>&lt;ComparisonTable\ncolumns={[\n{ label: \"This blog\", tone: \"yellow\" },\n{ label: \"AI-slop blog\", tone: \"red\" },\n]}\nrows={[\n{\nicon: \"lucide:check\",\nlabel: \"Real production code\",\ncells: [true, false],\n},\n{ icon: \"lucide:zap\", label: \"Opinions\", cells: [true, false] },\n{ icon: \"lucide:bot\", label: \"GPT-written filler\", cells: [false, true] },\n{ icon: \"lucide:smile\", label: \"Personality\", cells: [true, false] },\n]}\n/&gt;</p>\n<h2>Example code</h2>\n<p>Every post that needs code gets the syntax treatment:</p>\n<p>&lt;CodeBlock\ncode={<code>use Prism\\\\Prism\\\\Prism;\\n\\n$response = Prism::text()\\n    -&gt;using('anthropic', 'claude-opus-4-7')\\n    -&gt;withSystemPrompt('You are a senior Laravel engineer.')\\n    -&gt;withPrompt('Refactor this controller for clarity.')\\n    -&gt;asText();</code>}\nlang=\"php\"\nfilename=\"app/Actions/AskClaude.php\"\n/&gt;</p>\n\n  01\n\n<p>That's it. First post live. More incoming.</p>\n",
      "date_published": "2025-11-16T00:00:00.000Z",
      "date_modified": "2025-11-16T00:00:00.000Z",
      "tags": [
        "career",
        "meta",
        "intro"
      ]
    }
  ]
}