{"id":49118,"date":"2025-04-15T05:02:12","date_gmt":"2025-04-15T05:02:12","guid":{"rendered":"https:\/\/www.devopsschool.com\/blog\/?p=49118"},"modified":"2025-04-15T05:02:12","modified_gmt":"2025-04-15T05:02:12","slug":"hybrid-multi%e2%80%91cloud-migration-with-zero-downtime","status":"publish","type":"post","link":"https:\/\/www.devopsschool.com\/blog\/hybrid-multi%e2%80%91cloud-migration-with-zero-downtime\/","title":{"rendered":"Hybrid Multi\u2011Cloud Migration with Zero Downtime"},"content":{"rendered":"\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Overview<\/h2>\n\n\n\n<p>Migrating from Google Cloud (Cloud Run + GKE) to AWS EKS while serving users from both environments requires careful planning. The goal is to <strong>gradually shift traffic to AWS<\/strong> without service interruption, confirm stability on EKS, then fully cut over \u2013 all while maintaining <strong>zero downtime<\/strong>. The domain\u2019s DNS is hosted on Google Cloud DNS (with zones for prod, stage, uat), and this will remain unchanged. We need a strategy that allows <strong>hybrid traffic routing<\/strong> (to GCP and AWS) during the transition, <strong>smoothly migrates<\/strong> users to AWS, and provides <strong>instant failover<\/strong> if any backend is unhealthy.<\/p>\n\n\n\n<p>Key requirements and challenges:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Hybrid Serving:<\/strong> Both GCP and AWS instances must serve traffic simultaneously during migration.<\/li>\n\n\n\n<li><strong>Gradual Traffic Shifting:<\/strong> Ability to start with most traffic on GCP and incrementally increase traffic to AWS (for canary testing on EKS).<\/li>\n\n\n\n<li><strong>Zero Downtime:<\/strong> No outages or user-impacting cutovers \u2013 changes must be seamless.<\/li>\n\n\n\n<li><strong>DNS Stays on GCP:<\/strong> We will use Google Cloud DNS for traffic steering (not moving to Route\u00a053 or others).<\/li>\n\n\n\n<li><strong>Consistent Endpoints:<\/strong> Users should keep using the same URLs. We\u2019ll direct those URLs to the appropriate backends under the hood.<\/li>\n<\/ul>\n\n\n\n<p>To meet these goals, we\u2019ll explore <strong>DNS-based routing<\/strong> options, <strong>global load balancers<\/strong>, and <strong>service mesh\/API gateway<\/strong> approaches. Each offers trade-offs in complexity, control, and reliability. Below is a comprehensive guide with recommendations, architecture considerations, and example configurations for each approach.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">DNS-Based Traffic Steering<\/h2>\n\n\n\n<p>One of the simplest multi-cloud routing methods is to leverage DNS policies. Google Cloud DNS supports advanced routing policies like <strong>Weighted Round Robin (WRR)<\/strong> and <strong>Geolocation<\/strong> routing, similar to AWS Route&nbsp;53. This allows the authoritative DNS server to decide which backend\u2019s IP to return for a client\u2019s query.<\/p>\n\n\n\n<p><strong>1. Weighted DNS (Canary\/Gradual Cutover):<\/strong> With a weighted DNS policy, you create multiple DNS records for the same name, each pointing to a different backend (GCP or AWS) and assign a weight to each. <strong>Traffic is distributed in proportion to these weights<\/strong> \u2013 for example, 80% of DNS responses resolving to the GCP IP, 20% to the AWS IP (<a href=\"https:\/\/cloud.google.com\/blog\/products\/networking\/dns-routing-policies-for-geo-location--weighted-round-robin#:~:text=Cloud%20DNS%20will%20support%20two,using%20the%20Cloud%20DNS%20APIs\" target=\"_blank\" rel=\"noopener\">DNS routing policies for geo-location &amp; weighted round robin | Google Cloud Blog<\/a>) (<a href=\"https:\/\/blog.clearscale.com\/best-practices-of-application-migration\/#:~:text=,premise%20environment\" target=\"_blank\" rel=\"noopener\">Best Practices for Zero Downtime Migration to AWS | ClearScale<\/a>). By adjusting weights over time, you can smoothly shift load to AWS:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Initial State:<\/strong> GCP weight 1.0 (or 100%), AWS weight 0.0 \u2013 all users resolve to GCP service endpoints.<\/li>\n\n\n\n<li><strong>Canary Phase:<\/strong> Introduce AWS with a small weight (e.g. GCP 0.9, AWS 0.1 for ~10% traffic to AWS). Monitor AWS EKS performance.<\/li>\n\n\n\n<li><strong>Gradual Increase:<\/strong> If stable, increment AWS weight (e.g. 30\/70, 50\/50, etc.) in steps, sending more traffic to EKS (<a href=\"https:\/\/blog.clearscale.com\/best-practices-of-application-migration\/#:~:text=,premise%20environment\" target=\"_blank\" rel=\"noopener\">Best Practices for Zero Downtime Migration to AWS | ClearScale<\/a>). Google Cloud DNS will serve the right IP based on these weights for each query (<a href=\"https:\/\/cloud.google.com\/blog\/products\/networking\/dns-routing-policies-for-geo-location--weighted-round-robin#:~:text=Cloud%20DNS%20will%20support%20two,using%20the%20Cloud%20DNS%20APIs\" target=\"_blank\" rel=\"noopener\">DNS routing policies for geo-location &amp; weighted round robin | Google Cloud Blog<\/a>).<\/li>\n\n\n\n<li><strong>Full Cutover:<\/strong> Eventually set AWS to 1.0 (100%) and GCP to 0.0. At this point all new DNS lookups direct to AWS. GCP services can be turned down after existing TTLs expire.<\/li>\n<\/ul>\n\n\n\n<p>Google Cloud DNS\u2019s weighted round-robin policy makes this possible natively. For example, you could configure <code>app.prod.example.com<\/code> with two A records: one pointing to the GCP load balancer IP, one to the AWS load balancer\u2019s IP, weighted say \u201c0.8=GCP_IP;0.2=AWS_IP\u201d to start (<a href=\"https:\/\/cloud.google.com\/dns\/docs\/configure-routing-policies#:~:text=,the%20target%20is%20calculated%20from\" target=\"_blank\" rel=\"noopener\">Configure DNS routing policies and health checks &nbsp;|&nbsp; Google Cloud<\/a>). Cloud DNS will dynamically compute which IP to return on each query according to those ratios (<a href=\"https:\/\/cloud.google.com\/blog\/products\/networking\/dns-routing-policies-for-geo-location--weighted-round-robin#:~:text=Cloud%20DNS%20will%20support%20two,using%20the%20Cloud%20DNS%20APIs\" target=\"_blank\" rel=\"noopener\">DNS routing policies for geo-location &amp; weighted round robin | Google Cloud Blog<\/a>). You can change weights via the API or gcloud CLI as you progress (these changes take effect at the DNS level immediately, though clients respect TTL).<\/p>\n\n\n\n<p><strong>TTL and Caching Considerations:<\/strong> DNS-based steering relies on clients periodically querying DNS. To minimize lag during changes, use a <strong>low TTL<\/strong> on these records (e.g. 30 seconds) during the migration (<a href=\"https:\/\/blog.clearscale.com\/best-practices-of-application-migration\/#:~:text=In%20ClearScale%E2%80%99s%20experience%2C%20a%20simple,currently%20uses%20the%20most%20providers\" target=\"_blank\" rel=\"noopener\">Best Practices for Zero Downtime Migration to AWS | ClearScale<\/a>). Lower TTL ensures that when you adjust weights or switch traffic, clients will pick up new DNS answers quickly. <em>(Be aware that some ISPs or resolvers might not strictly honor very low TTLs, and browsers\/device caches could retain DNS entries for longer (<a href=\"https:\/\/blog.clearscale.com\/best-practices-of-application-migration\/#:~:text=,Browser%20caching\" target=\"_blank\" rel=\"noopener\">Best Practices for Zero Downtime Migration to AWS | ClearScale<\/a>).<\/em>) It\u2019s wise to lower the TTL well <strong>before<\/strong> the migration starts, so that by the time you make weight changes most clients are already using the low TTL setting (<a href=\"https:\/\/blog.clearscale.com\/best-practices-of-application-migration\/#:~:text=In%20ClearScale%E2%80%99s%20experience%2C%20a%20simple,currently%20uses%20the%20most%20providers\" target=\"_blank\" rel=\"noopener\">Best Practices for Zero Downtime Migration to AWS | ClearScale<\/a>).<\/p>\n\n\n\n<p><strong>High-Level DNS Setup:<\/strong> In Cloud DNS, you would create a resource record set for the service domain with a <strong>WRR (Weighted Round Robin) routing policy<\/strong>. For example:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-1\" data-shcb-language-name=\"JavaScript\" data-shcb-language-slug=\"javascript\"><span><code class=\"hljs language-javascript\">gcloud dns record-sets create app.prod.example.com. --type=A --ttl=<span class=\"hljs-number\">30<\/span> \\\n    --routing-policy-type=WRR \\\n    --routing-policy-data=<span class=\"hljs-string\">\"0.8=203.0.113.10;0.2=198.51.100.50\"<\/span>\n<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-1\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">JavaScript<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">javascript<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>In this hypothetical example, <code>203.0.113.10<\/code> could be the IP of a Google Cloud Load Balancer fronting Cloud Run\/GKE, and <code>198.51.100.50<\/code> an IP (or IP range) of an AWS ALB\/NLB fronting the EKS service. Cloud DNS will return the GCP IP ~80% of the time and the AWS IP ~20% (<a href=\"https:\/\/cloud.google.com\/blog\/products\/networking\/dns-routing-policies-for-geo-location--weighted-round-robin#:~:text=Cloud%20DNS%20will%20support%20two,using%20the%20Cloud%20DNS%20APIs\" target=\"_blank\" rel=\"noopener\">DNS routing policies for geo-location &amp; weighted round robin | Google Cloud Blog<\/a>). Over time, you\u2019d update the <code>routing-policy-data<\/code> weights to shift the percentages. (If using hostnames\/CNAMEs \u2013 e.g. Cloud Run custom domain or ALB DNS name \u2013 Cloud DNS can weight those via CNAME records similarly. However, root\/apex domain cannot use CNAME, so an A\/AAAA with direct IPs or using alias\/ANAME-like features would be needed.)<\/p>\n\n\n\n<p><strong>Health Checks &amp; Failover:<\/strong> Basic DNS round-robin by itself doesn\u2019t automatically detect outages \u2013 if the GCP service goes down while still in DNS, some clients might get that IP until TTL expires. To mitigate this, Cloud DNS also supports a <strong>Failover<\/strong> routing policy and can integrate health checks for DNS endpoints (<a href=\"https:\/\/www.redhat.com\/en\/blog\/global-load-balancer-approaches#:~:text=,honored%20by%20the%20service%20consumer\" target=\"_blank\" rel=\"noopener\">Global Load Balancer Approaches<\/a>) (<a href=\"https:\/\/www.redhat.com\/en\/blog\/global-load-balancer-approaches#:~:text=sophisticated%20load%20balancing%20policies%20nor,support%20for%20backend%20health%20checks\" target=\"_blank\" rel=\"noopener\">Global Load Balancer Approaches<\/a>). One approach is to combine policies: for example, use <strong>weighted routing with health checks<\/strong> on each record. Google Cloud DNS health-checking can detect if the GCP or AWS endpoint is down and stop returning its IP (<a href=\"https:\/\/www.redhat.com\/en\/blog\/global-load-balancer-approaches#:~:text=,honored%20by%20the%20service%20consumer\" target=\"_blank\" rel=\"noopener\">Global Load Balancer Approaches<\/a>). Another approach is to use <strong>DNS failover policy<\/strong> once you reach the final cutover: designate AWS as primary and GCP as secondary (failover target). During the hybrid period, though, weighted policies with manual control are typically used (since you <em>want<\/em> both active). Keep TTL low so even if one backend fails, you can quickly adjust weights to 0 for that backend (or rely on the health-check to remove it).<\/p>\n\n\n\n<p><strong>Geo-Location DNS (optional):<\/strong> In addition to weighting, Google Cloud DNS allows geo-based policies (<a href=\"https:\/\/cloud.google.com\/blog\/products\/networking\/dns-routing-policies-for-geo-location--weighted-round-robin#:~:text=The%20geo,is\" target=\"_blank\" rel=\"noopener\">DNS routing policies for geo-location &amp; weighted round robin | Google Cloud Blog<\/a>). If your user base is regionally divided or if the GCP and AWS clusters are in different regions, you could route users to the nearest cloud. For example, <em>during migration<\/em> you might direct EU customers to GCP and US customers to AWS (or vice versa) using Geo DNS, gradually expanding the geo coverage of AWS as confidence grows. Geo policies can also ensure optimal latency by keeping users on the closest service (<a href=\"https:\/\/www.redhat.com\/en\/blog\/global-load-balancer-approaches#:~:text=the%20issue%20is%20resolved%20with,honored%20by%20the%20service%20consumer\" target=\"_blank\" rel=\"noopener\">Global Load Balancer Approaches<\/a>) (<a href=\"https:\/\/cloud.google.com\/blog\/products\/networking\/dns-routing-policies-for-geo-location--weighted-round-robin#:~:text=The%20geo,is\" target=\"_blank\" rel=\"noopener\">DNS routing policies for geo-location &amp; weighted round robin | Google Cloud Blog<\/a>). However, in this scenario (gradual migration) weighted routing is more straightforward for splitting traffic globally. Geo-DNS could be combined with weights (e.g. weighted within each region), but Cloud DNS does <strong>not<\/strong> allow combining geo and custom weights simultaneously on the same record set (<a href=\"https:\/\/cloud.google.com\/dns\/docs\/configure-routing-policies#:~:text=,weighted%20WRR%20policy\" target=\"_blank\" rel=\"noopener\">Configure DNS routing policies and health checks &nbsp;|&nbsp; Google Cloud<\/a>). So you\u2019d typically choose one strategy or the other. Weighted routing is usually sufficient unless you have multi-region deployments in both clouds.<\/p>\n\n\n\n<p><strong>Pros &amp; Cons of DNS Steering:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><em>Advantages:<\/em> DNS-based routing is simple to implement with existing Cloud DNS. No new infrastructure is needed. It\u2019s a proven technique for blue-green and canary migrations (<a href=\"https:\/\/www.infracloud.io\/blogs\/blue-green-deployments-dns-routing\/#:~:text=As%20you%20might%20have%20understood%2C,achieve%20this%20is%20load%20balancer\" target=\"_blank\" rel=\"noopener\">How to Setup Blue Green Deployments with DNS Routing<\/a>) (<a href=\"https:\/\/blog.clearscale.com\/best-practices-of-application-migration\/#:~:text=,premise%20environment\" target=\"_blank\" rel=\"noopener\">Best Practices for Zero Downtime Migration to AWS | ClearScale<\/a>). By gradually changing DNS weights, you reduce risk and can rollback by reversing weights if issues arise. Also, DNS can distribute load globally without concentrating traffic through a single point (each user goes directly to whichever endpoint DNS gives them). This can improve latency if the DNS policy is geo-aware or if each user sticks to a nearby backend.<\/li>\n\n\n\n<li><em>Disadvantages:<\/em> DNS changes aren\u2019t instantaneous for all users due to caching. Some users may continue hitting the \u201cold\u201d service for up to the TTL duration (or longer, if their resolver ignores TTL) (<a href=\"https:\/\/www.redhat.com\/en\/blog\/global-load-balancer-approaches#:~:text=1,that%20the%20order%20of%20the\" target=\"_blank\" rel=\"noopener\">Global Load Balancer Approaches<\/a>). This is usually manageable by keeping both environments live during overlap, but it means you can\u2019t perfectly control the <em>exact<\/em> cutover moment for every user \u2013 there\u2019s a fuzzy period. Also, if an environment goes down unexpectedly, clients that cached its IP might fail until they retry DNS. Health-check integrated DNS can alleviate this but may not be as fast as a true load balancer. DNS load balancing is also <strong>stateless<\/strong>: it distributes DNS queries, not actual traffic flows. So there\u2019s no concept of \u201csession stickiness\u201d beyond DNS caching. If your application is stateful (e.g. relying on session affinity), a user might get sent to AWS on one DNS lookup and then to GCP on a later lookup, which could be an issue if session data isn\u2019t shared. (Mitigate by using a shared session store or sticky cookies with a common domain if needed.)<\/li>\n<\/ul>\n\n\n\n<p>In practice, <strong>weighted DNS is a great low-complexity approach<\/strong> to achieve near zero-downtime migration. Many organizations use it for cloud migrations \u2013 for example, gradually shifting 5% of traffic at a time (<a href=\"https:\/\/blog.clearscale.com\/best-practices-of-application-migration\/#:~:text=,premise%20environment\" target=\"_blank\" rel=\"noopener\">Best Practices for Zero Downtime Migration to AWS | ClearScale<\/a>). As long as both old and new services run in parallel and serve identical content\/APIs, end-users will not notice the difference. Just be sure to monitor both environments closely during the shift (e.g. compare error rates, latencies) and plan for how to quickly react if the new environment has issues (e.g. set AWS weight back to 0 or remove that record).<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Global Load Balancer Approach<\/h2>\n\n\n\n<p>An alternative (often more sophisticated) method is to put a <strong>global load balancing layer<\/strong> in front of your services. Instead of relying on DNS to make the routing decision, a global load balancer can accept user traffic at a single entry point and then proxy it to either GCP or AWS backends. This can provide faster failover, detailed traffic control (at the request level), and shielding users from any DNS propagation delays.<\/p>\n\n\n\n<p>(<a href=\"https:\/\/blog.cloudflare.com\/load-balancing-with-weighted-pools\/\" target=\"_blank\" rel=\"noopener\">Load Balancing with Weighted Pools<\/a>) <em>Example of a global load balancer splitting traffic 80\/20 between two origin pools (e.g., one in a data center and one in cloud). A similar approach can route users to GCP or AWS backends based on assigned weights.<\/em><\/p>\n\n\n\n<p>Two primary options in this category are <strong>Google Cloud HTTP(S) Load Balancing<\/strong> (with hybrid backends) and <strong>AWS Global Accelerator<\/strong>. We\u2019ll also mention third-party anycast networks (like Cloudflare) as an option.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Using Google Cloud External Load Balancer (Anycast Global LB)<\/h3>\n\n\n\n<p>Google Cloud\u2019s external Application Load Balancer (HTTP(S) LB) is a global, Anycast load balancer that can distribute traffic across multiple regions \u2013 and even across different backend types. You can leverage it to route traffic to both your GCP services <strong>and<\/strong> AWS services during the migration:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Single Endpoint:<\/strong> The LB provides a single IP address (anycast globally) or a single domain that clients connect to. You would update DNS <em>once<\/em> to point <code>app.prod.example.com<\/code> to this load balancer\u2019s IP or CNAME. After that, you no longer need to change DNS; all traffic goes to the LB.<\/li>\n\n\n\n<li><strong>Multiple Backends:<\/strong> The LB is configured with backend services representing your environments. For example, one backend might be a <strong>Serverless NEG<\/strong> pointing to the Cloud Run service or a GKE Ingress in GCP, and another backend could be an <strong>Internet NEG<\/strong> pointing to the AWS service endpoint (AWS ALB or a public IP of an AWS NLB). Google\u2019s load balancer supports <strong>Internet Network Endpoint Groups<\/strong> \u2013 which means it can send traffic to arbitrary external addresses, like an AWS load balancer, as if they were just another backend (<a href=\"https:\/\/www.doit.com\/helping-a-business-incrementally-migrate-from-aws-and-cloudflare-to-gcp\/#:~:text=Configure%20load%20balancer%20and%20NEG,on%C2%A0GCP\" target=\"_blank\" rel=\"noopener\">Helping A Business Incrementally Migrate From AWS and Cloudflare to Google Cloud | DoiT<\/a>) (<a href=\"https:\/\/www.doit.com\/helping-a-business-incrementally-migrate-from-aws-and-cloudflare-to-gcp\/#:~:text=Select%20%E2%80%9CFully%20qualified%20domain%20name%E2%80%9D,and%20that%20is%20fine%20too\" target=\"_blank\" rel=\"noopener\">Helping A Business Incrementally Migrate From AWS and Cloudflare to Google Cloud | DoiT<\/a>). This setup effectively bridges the two clouds at the load-balancer level.<\/li>\n\n\n\n<li><strong>Weighted Traffic Splitting:<\/strong> With the load balancer in place, you can configure <strong>weight-based traffic splitting<\/strong> among backend services. Google\u2019s global HTTP LB supports advanced traffic management \u2013 you can define a URL mapping where a given path or host is served by multiple backend services with specified weights (<a href=\"https:\/\/cloud.google.com\/load-balancing\/docs\/https\/traffic-management-global#:~:text=match%20at%20L840%20Weighted%20traffic,to%20the%20individual%20backend%20service\" target=\"_blank\" rel=\"noopener\">Traffic management overview for global external Application Load Balancers \u00a0|\u00a0 Load Balancing \u00a0|\u00a0 Google Cloud<\/a>). For instance, you create a single frontend (say <code>app.example.com\/*<\/code>) and attach two backend services to that route: Backend A (GCP) with weight 95, Backend B (AWS) with weight 5 to start. The LB will then <strong>route 5% of requests to AWS and 95% to GCP<\/strong>, at the HTTP request level (<a href=\"https:\/\/cloud.google.com\/load-balancing\/docs\/application-load-balancer#:~:text=,legacy%20services%2C%20and%20similar%20processes\" target=\"_blank\" rel=\"noopener\">Application Load Balancer overview \u00a0|\u00a0 Load Balancing \u00a0|\u00a0 Google Cloud<\/a>). This is analogous to weighted DNS, but the balancing is done by the LB on each request, not by DNS responses. You can gradually adjust these weights over time using gcloud or the GCP console, just like with DNS policies. The difference is the LB makes the decision <em>for each incoming request<\/em> in real time.<\/li>\n\n\n\n<li><strong>Health Checks and Failover:<\/strong> The LB continuously health-checks each backend. If the AWS backend becomes unhealthy, the LB will stop sending traffic to it entirely within seconds, regardless of weight (essentially failing over to the healthy backend automatically) (<a href=\"https:\/\/www.whizlabs.com\/blog\/aws-global-accelerator\/#:~:text=%2A%20,Responsive%20Regional%20Failover\" target=\"_blank\" rel=\"noopener\">Introduction to AWS Global Accelerator &#8211; Whizlabs Blog<\/a>) (<a href=\"https:\/\/www.whizlabs.com\/blog\/aws-global-accelerator\/#:~:text=As%20you%20must%20know%20by,Global%20Accelerator%E2%80%99s%20capabilities%2C%20isn%E2%80%99t%20it\" target=\"_blank\" rel=\"noopener\">Introduction to AWS Global Accelerator &#8211; Whizlabs Blog<\/a>). This provides near-instantaneous failover \u2013 something DNS alone cannot guarantee due to caching. Similarly, if GCP backend had an outage, the LB could send all traffic to AWS. This ensures the <em>zero downtime<\/em> requirement is met even in the face of issues.<\/li>\n\n\n\n<li><strong>Latency and Geo-Distribution:<\/strong> An anycast LB will typically route users to the nearest point of presence. Google\u2019s global LB has worldwide edge nodes; users hit the closest Google front-end, which then forwards to the chosen backend. If your GCP and AWS backends are in different geographic regions, the LB could be configured with routing rules to prefer the nearest backend by latency or geography (this would be a more complex \u201clatency-based routing\u201d policy at the LB level, or using multiple LB frontends). However, since we control weights manually in this scenario, you might keep both backends active globally and rely on weighted split + the LB\u2019s own network intelligence to handle performance.<\/li>\n<\/ul>\n\n\n\n<p><strong>Architecture Diagram \u2013 GCP LB Hybrid:<\/strong> Imagine this setup: the DNS for <code>app.prod.example.com<\/code> resolves to a <strong>Global LB IP<\/strong> (anycast). A user\u2019s request goes to the LB, which then decides where to forward it:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>GCP path: LB \u2192 Cloud Run\/GKE (within GCP, via the serverless or instance group NEG).<\/li>\n\n\n\n<li>AWS path: LB \u2192 AWS ALB\/NLB (via Internet NEG over the internet). The LB here acts like a reverse proxy; the user\u2019s connection terminates at the Google front-end, then the LB opens a new connection to the AWS endpoint.<\/li>\n<\/ul>\n\n\n\n<p>From the client perspective, they are always talking to one host\/IP (the LB). This indirection adds a bit of overhead (requests to AWS now go through Google\u2019s infrastructure first), but it gives strong control. Google\u2019s LB also supports features like Cloud CDN, Cloud Armor (WAF), etc., which you could use to enhance security\/performance during the transition.<\/p>\n\n\n\n<p><strong>Traffic Shifting with LB:<\/strong> Initially, you configure the LB to send 0% to AWS (all traffic to Cloud Run\/GKE). Then as EKS comes online, start with a small percentage to the AWS backend service. Google\u2019s traffic management supports very fine-grained splits (even 1% if desired) (<a href=\"https:\/\/cloud.google.com\/load-balancing\/docs\/application-load-balancer#:~:text=using%20weight,legacy%20services%2C%20and%20similar%20processes\" target=\"_blank\" rel=\"noopener\">Application Load Balancer overview &nbsp;|&nbsp; Load Balancing &nbsp;|&nbsp; Google Cloud<\/a>). Increase AWS share gradually until it\u2019s 100%. At that point, you could even remove the GCP backend from the LB. The DNS doesn\u2019t need to change at cutover at all \u2013 it was already pointing to the LB, so users notice nothing. Essentially, the cutover happens inside the LB configuration.<\/p>\n\n\n\n<p><strong>Zero Downtime and Testing:<\/strong> During this process, the LB ensures no downtime: it will only send traffic to healthy backends and you can adjust weights without interrupting existing connections. You can test AWS in production with a small trickle of real traffic. If any problem is detected, simply dial the AWS backend weight down (even to 0%) and the LB will immediately stop sending new requests there. This offers a very fast rollback mechanism (faster than waiting for DNS TTLs to expire) (<a href=\"https:\/\/www.whizlabs.com\/blog\/aws-global-accelerator\/#:~:text=Configuration%20updates%2C%20changes%20in%20routing,seconds%2C%20thereby%20reducing%20application%20downtime\" target=\"_blank\" rel=\"noopener\">Introduction to AWS Global Accelerator &#8211; Whizlabs Blog<\/a>) (<a href=\"https:\/\/www.whizlabs.com\/blog\/aws-global-accelerator\/#:~:text=%2A%20,Responsive%20Regional%20Failover\" target=\"_blank\" rel=\"noopener\">Introduction to AWS Global Accelerator &#8211; Whizlabs Blog<\/a>).<\/p>\n\n\n\n<p><strong>Costs and Complexity:<\/strong> Introducing a global load balancer has some overhead. There are GCP costs for LB bandwidth\/requests, and configuring the LB (especially with an Internet NEG to AWS) is a bit more work than just adding DNS records. You also need to ensure the AWS service is <strong>exposed publicly<\/strong> in a way the GCP LB can reach \u2013 likely through an AWS ALB or NLB with a public IP. One common pattern is to use an AWS Network Load Balancer with a static Elastic IP, so you have a stable IP for the NEG target. Alternatively, use the AWS ALB\u2019s hostname in a \u201cfully qualified domain name (FQDN) NEG\u201d (the GCP Internet NEG can point to a domain name and will resolve it). Make sure to <strong>allow the LB\u2019s health check IPs<\/strong> and traffic through any firewalls (the GCP LB uses Google Front Ends that will connect from Google IP ranges).<\/p>\n\n\n\n<p><strong>Lifecycle:<\/strong> Once the migration is done and AWS is serving 100%, you have a choice: you could keep the GCP LB in place permanently (still directing everything to AWS). Some teams do this for a period to allow an easy fallback. Eventually, though, you might decide to simplify by pointing DNS directly to the AWS ALB and removing the GCP LB from the path (to reduce an extra network hop). That final DNS change can be done at a convenient time since the AWS backend is already handling all traffic \u2013 or you might even continue to use the GCP LB as a layer of indirection if it offers value (e.g., using Cloud Armor WAF in front of AWS). It\u2019s up to your architecture preferences.<\/p>\n\n\n\n<p><strong>Summary of Pros:<\/strong> The global LB approach provides <strong>fine-grained control and fast failover<\/strong>. Weight changes take effect immediately on new requests (no waiting for DNS). Health checks make it safer \u2013 failing backends are automatically removed from rotation (<a href=\"https:\/\/www.whizlabs.com\/blog\/aws-global-accelerator\/#:~:text=%2A%20,Responsive%20Regional%20Failover\" target=\"_blank\" rel=\"noopener\">Introduction to AWS Global Accelerator &#8211; Whizlabs Blog<\/a>). You also get centralized logging and monitoring of all traffic in one place (the LB), which can simplify observing the cutover. And clients only ever see one IP\/endpoint, which can avoid certain DNS sticking issues or cross-origin concerns.<\/p>\n\n\n\n<p><strong>Cons:<\/strong> The main downsides are the added complexity and potential performance impact for cross-cloud calls. For example, if a user and the AWS cluster are in the same region (say both in us-east) but the GCP LB node handling the request is in a different region (or routes inefficiently), you could introduce a slight latency penalty. In practice, Google\u2019s network is very optimized, and any extra latency is usually small (tens of milliseconds). Another consideration is <strong>stateful sessions<\/strong>: if your LB does not have session affinity and you are switching traffic gradually, a user might bounce between GCP and AWS across requests (unless you enable session affinity on the LB by cookie or IP \u2013 though note that <strong>Google\u2019s weighted traffic splitting does not combine with session affinity<\/strong>; if you set affinity, it might override the weights (<a href=\"https:\/\/cloud.google.com\/load-balancing\/docs\/https\/traffic-management-global#:~:text=Don%27t%20configure%20session%20affinity%20if,traffic%20splitting%20configuration%20takes%20precedence\" target=\"_blank\" rel=\"noopener\">Traffic management overview for global external Application Load Balancers &nbsp;|&nbsp; Load Balancing &nbsp;|&nbsp; Google Cloud<\/a>)). If session stickiness is needed, you might use a different strategy (like route all users of a certain cohort to one side using a header or path). For mostly stateless services or API calls, this isn\u2019t an issue.<\/p>\n\n\n\n<p>In short, using the Google Cloud global load balancer for migration is a powerful approach that essentially gives you <strong>\u201ccloud-agnostic\u201d traffic management<\/strong>: you decouple the user-facing endpoint from the underlying cloud. It requires setup, but it ensures absolutely minimal disruption during the migration.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">AWS Global Accelerator<\/h3>\n\n\n\n<p>AWS Global Accelerator (GA) is another global traffic management service, but it operates at the network layer. GA provides you with a pair of stable anycast IP addresses that edge locations announce globally. It then routes traffic from those edges to designated <strong>endpoint groups<\/strong> in AWS (which can be regional load balancers, EC2 instances, etc.). GA supports weighting traffic between AWS regions using a feature called the <strong>traffic dial<\/strong> \u2013 for example, splitting 70\/30 between two AWS regions (<a href=\"https:\/\/www.whizlabs.com\/blog\/aws-global-accelerator\/#:~:text=AWS%20Global%20Accelerator%20documentation%C2%A0also%20emphasizes,application%20updates%20or%20performance%20testing\" target=\"_blank\" rel=\"noopener\">Introduction to AWS Global Accelerator &#8211; Whizlabs Blog<\/a>). It also monitors health and will fail over if an endpoint goes unhealthy.<\/p>\n\n\n\n<p>In the context of a GCP-to-AWS migration, AWS GA could be useful <strong>after<\/strong> most traffic is in AWS (especially if you plan multi-region deployments in AWS for high availability). However, GA by itself can\u2019t directly split traffic between AWS and GCP, because its endpoints must be AWS resources. One theoretical approach would be to have one GA endpoint group in an AWS region for the EKS cluster, and another endpoint group that points to an AWS resource which forwards to GCP (e.g., an EC2 instance proxying to GCP). This is generally not worth the complexity \u2013 essentially it means hairpinning GCP traffic through AWS.<\/p>\n\n\n\n<p>So, while <strong>Global Accelerator is great for multi-region AWS traffic management<\/strong> (and could be part of your end-state architecture for AWS-only, ensuring low latency globally and quick failover across regions), it\u2019s not typically used to manage a hybrid cloud cutover. We mention it for completeness because it\u2019s an example of an anycast load balancer similar in concept to Google\u2019s, but tied to AWS. If in the final state you need global IPs for your service and multi-region resilience, you might deploy GA once you\u2019re fully on EKS, but during the migration, other methods (DNS or GCP\u2019s LB) are more straightforward for cross-cloud balancing.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Third-Party Global Load Balancers (Cloudflare, etc.)<\/h3>\n\n\n\n<p>Beyond cloud-native solutions, there are providers like <strong>Cloudflare<\/strong>, <strong>Akamai<\/strong>, <strong>Fastly<\/strong>, or <strong>F5\/Citrix ADC<\/strong> that offer global load balancing as a service (<a href=\"https:\/\/www.redhat.com\/en\/blog\/global-load-balancer-approaches#:~:text=However%2C%20there%20are%20several%20advanced,do%20support%20these%20features%20including\" target=\"_blank\" rel=\"noopener\">Global Load Balancer Approaches<\/a>). For example, Cloudflare\u2019s Load Balancer can sit at the DNS\/proxy level and distribute traffic between multiple origins (which could be GCP and AWS) with weights, health checks, geo-steering, etc. This can be very effective: Cloudflare\u2019s network will direct users to whichever origin you configure (they support session affinity and fine routing rules as well).<\/p>\n\n\n\n<p>To use Cloudflare in this way, you would typically delegate your DNS to Cloudflare or at least configure your domain to proxy through Cloudflare\u2019s CDN. Since the question states DNS stays on Google Cloud, switching to Cloudflare DNS may not be desired. However, you could still use Cloudflare by making <code>app.example.com<\/code> a CNAME to a Cloudflare-managed domain that does the load balancing (Cloudflare allows weighted pools as we saw). Similar capabilities exist in other DNS services like NS1 or Dyn Traffic Director \u2013 they sit between the user and your origin servers.<\/p>\n\n\n\n<p><strong>Pros:<\/strong> Third-party solutions can be cloud-agnostic and very feature-rich. For instance, you could set up health checks from multiple continents, do latency-based routing (serve each user from whichever cloud is faster for them), or even do per-user sticky routing (like send a particular user ID consistently to one backend). Cloudflare\u2019s example in the embedded diagram above shows how weights can be adjusted to quickly shift load when one origin pool is scaled up (80\/20 split) (<a href=\"https:\/\/blog.cloudflare.com\/load-balancing-with-weighted-pools\/#:~:text=In%20the%20example%20below%2C%20the,requests%20across%20unequally%20sized%20pools\" target=\"_blank\" rel=\"noopener\">Load Balancing with Weighted Pools<\/a>) (<a href=\"https:\/\/blog.cloudflare.com\/load-balancing-with-weighted-pools\/#:~:text=Image%3A%20Diagram%20showing%20a%20request,a%20weight%20of%2080%20percent\" target=\"_blank\" rel=\"noopener\">Load Balancing with Weighted Pools<\/a>).<\/p>\n\n\n\n<p><strong>Cons:<\/strong> The downside is you\u2019re adding another external dependency and potentially cost. Also, using a third-party means your traffic flows through their network (for Cloudflare in proxy mode, traffic goes through Cloudflare POPs). This can actually improve performance (due to caching and faster routes), but it\u2019s a change to consider. Since our primary focus is using the existing cloud providers, a third-party LB is an option if neither Cloud DNS nor GCP\/AWS native solutions meet a requirement you have (for example, if you needed true latency-based routing across clouds, a service like Cloudflare LB or Cedexis would be needed, as Cloud DNS doesn\u2019t do latency measurements).<\/p>\n\n\n\n<p>In summary, a <strong>global load balancer approach adds an abstraction layer<\/strong> that can greatly smooth out the migration. It\u2019s often used in enterprise multi-cloud deployments. If your team is comfortable setting it up, it provides the most control and safety (at the cost of some complexity).<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Service Mesh \/ API Gateway Approach<\/h2>\n\n\n\n<p>A third approach involves the <strong>application layer routing<\/strong> rather than DNS or a global LB. This typically means deploying either a shared API gateway or using a <strong>service mesh<\/strong> that spans both environments.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Multi-Cloud Service Mesh<\/h3>\n\n\n\n<p>Service mesh technologies (like <strong>Istio<\/strong>, <strong>Linkerd<\/strong>, or <strong>Consul<\/strong> mesh) can be used to route traffic between services across clusters. If you have the same application deployed on GKE and EKS, you could establish a mesh that includes both clusters and then use mesh routing features (layer 7 routing) to control traffic splitting. For example, Istio\u2019s <strong>VirtualService<\/strong> resource can be configured to send X% of requests to one service version and Y% to another \u2013 even if those \u201cversions\u201d live in different clusters. Projects like <strong>Istio Multi-Cluster<\/strong> or <strong>Gloo Mesh<\/strong> allow tying two Kubernetes clusters together in one logical mesh. You\u2019d typically need network connectivity between the clusters (VPN or VPC peering across clouds) so that services can talk to each other. With that in place, you can deploy a common control plane or a federated service mesh configuration.<\/p>\n\n\n\n<p><strong>How it would work:<\/strong> You might expose the service on GKE (mesh ingress gateway) and also on EKS (mesh gateway). You then configure the mesh so that when requests hit the ingress, it can split them: e.g., 90% to local service (GKE pods) and 10% forwarded to the EKS service (via the mesh\u2019s cross-cluster communication). As you gain confidence, you adjust the weights in the VirtualService to send more to EKS. Eventually, you send 100% to EKS, and you could even switch the DNS to point directly to the EKS ingress at that time. This is essentially a <em>layer 7 load balancing done by the service mesh sidecar proxies<\/em>.<\/p>\n\n\n\n<p><strong>Pros:<\/strong> This approach keeps the traffic management in the application layer, which means you have full context of requests (you can do routing based on HTTP headers, etc., beyond just percentages). It also doesn\u2019t rely on public DNS or public load balancers \u2013 the clusters could be connected privately. It\u2019s a very powerful technique if you already use a service mesh, because you can leverage the same tools for canarying that you use within one cluster but now across clusters. For instance, Istio can even mirror traffic to the new deployment or do gradual rollouts with rich telemetry.<\/p>\n\n\n\n<p><strong>Cons:<\/strong> However, implementing a multi-cloud service mesh is <strong>non-trivial<\/strong>. You need to set up secure connectivity between GCP and AWS (like a direct VPN or use Istio\u2019s mesh VPN capabilities) and ensure service discovery works across clouds. There is also a learning curve and operational overhead to running a service mesh across two environments. If you do not already have a mesh, introducing one just for the migration may be overkill. Meshes also typically assume a relatively stable set of connectivity; using them for a one-time migration might be more work than benefit, unless you want to adopt a mesh long-term for multi-cloud operations.<\/p>\n\n\n\n<p>In most cases, DNS or global LB solutions are simpler for a short-term migration. That said, if your architecture is microservices-heavy and you foresee staying hybrid for a while, a service mesh could provide a consistent way to manage traffic splitting, security (mTLS between clouds), and monitoring.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">API Gateway<\/h3>\n\n\n\n<p>Another application-layer approach is to use an <strong>API Gateway<\/strong> as the unified front-end. This could be a cloud-managed gateway or a self-hosted one:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Cloud-managed<\/strong>: For example, Google Cloud Endpoints\/Apigee or Amazon API Gateway could front the service. But those solutions typically work best when the backends are in the same cloud or accessible publicly. You could configure an Apigee gateway (running in GCP) to have targets for GCP service and AWS service, and do weighted routing between them. Or similarly, an AWS API Gateway could point to an AWS Lambda that proxies to GCP\u2026 however, these become Rube Goldberg machines, with complexity and cost.<\/li>\n\n\n\n<li><strong>Self-hosted<\/strong>: You might run a gateway like <strong>Kong<\/strong>, <strong>NGINX<\/strong>, or <strong>HAProxy<\/strong> on a VM or container that has network access to both environments. That gateway then becomes the entry point (you\u2019d point DNS to it), and it forwards requests either to GCP or AWS. Essentially, this is like running your own global load balancer. You could even run such gateways in both clouds for redundancy (and use DNS round-robin between the two gateways, each of which splits traffic internally).<\/li>\n<\/ul>\n\n\n\n<p>The API gateway approach, like the service mesh, gives you a lot of flexibility (you can do things like auth, transformations, etc., in one place during the migration). But again, you are introducing a new component that must be highly available itself. It can become a bottleneck or single point of failure if not done carefully.<\/p>\n\n\n\n<p><strong>When to consider mesh\/gateway:<\/strong> If your system already uses an API gateway layer, then extending it to handle multi-cloud can make sense. Or if you require advanced routing logic (say only certain users go to the new environment \u2013 e.g., internal beta testers \u2013 which could be done by gateway inspecting a header or cookie), an app-layer solution is needed. Otherwise, for pure load distribution, DNS or LBs are typically easier.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Ensuring Zero Downtime During Cutover<\/h2>\n\n\n\n<p>Regardless of which approach you choose, here are some <strong>best practices<\/strong> to ensure zero or minimal downtime:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Use Blue\/Green Principles:<\/strong> Always have the new environment (blue) up and running in parallel with the old (green) before shifting traffic (<a href=\"https:\/\/www.infracloud.io\/blogs\/blue-green-deployments-dns-routing\/#:~:text=Compared%20to%20the%20other%20strategies%2C,different%20systems%20across%20different%20regions\" target=\"_blank\" rel=\"noopener\">How to Setup Blue Green Deployments with DNS Routing<\/a>) (<a href=\"https:\/\/www.infracloud.io\/blogs\/blue-green-deployments-dns-routing\/#:~:text=As%20you%20might%20have%20understood%2C,achieve%20this%20is%20load%20balancer\" target=\"_blank\" rel=\"noopener\">How to Setup Blue Green Deployments with DNS Routing<\/a>). This way, users are always hitting a working version. Our strategies above all adhere to this: they route to both old and new in parallel.<\/li>\n\n\n\n<li><strong>Lower TTLs Ahead of Time:<\/strong> If using DNS changes (weighted or not), reduce the TTL well in advance (<a href=\"https:\/\/blog.clearscale.com\/best-practices-of-application-migration\/#:~:text=In%20ClearScale%E2%80%99s%20experience%2C%20a%20simple,currently%20uses%20the%20most%20providers\" target=\"_blank\" rel=\"noopener\">Best Practices for Zero Downtime Migration to AWS | ClearScale<\/a>). For final cutover DNS changes (like eventually repointing the domain directly to AWS), a low TTL (e.g. 60s) ensures quick propagation. After the migration, you can raise TTLs back to normal.<\/li>\n\n\n\n<li><strong>Implement Health Monitoring:<\/strong> Continuously monitor the health of both environments. If using global LB or DNS health checks, they will do this for you and take action. If not, set up your own synthetic checks. For instance, Google Cloud DNS can be set with a failover record \u2013 you might configure that once AWS is stable as primary and GCP as failover. Or if not using that, be prepared to manually adjust DNS or LB settings in case of a failure. The key is to catch any issue <em>before<\/em> it affects users widely. Use logging, APM, etc., in both clouds.<\/li>\n\n\n\n<li><strong>Gradual Transition with Monitoring:<\/strong> Treat the migration like a canary release. Start by sending a small percentage to AWS EKS and verify: are error rates low? Is performance good? Compare it to the baseline on GCP. Only increase traffic when metrics look healthy. Use dashboards to watch both sets of servers. This minimizes risk \u2013 if something goes wrong at 10% traffic, you can roll back quickly with minimal impact.<\/li>\n\n\n\n<li><strong>Data Consistency:<\/strong> Ensure both environments have access to the same data sources or have data synchronized. For example, if there\u2019s a database, you might keep it in one place (perhaps still in GCP) during the transition, or use a cross-cloud replication. If one environment had stale data, users could see inconsistent results when they switch. Ideally, the user experience is identical no matter which backend served them. (This typically means using a single DB or synchronized databases, and careful handling of any caches, etc.)<\/li>\n\n\n\n<li><strong>Session Management:<\/strong> As noted, if your application maintains session state in-memory (say in GKE pods), a user who bounces between clouds might lose their session. Solutions include using a shared session store (Redis, etc.) accessible from both, or enabling sticky session features. If using a load balancer, you could stick sessions to the first backend they hit (though that complicates the gradual migration since some users would never move). Another solution some adopt is migrating users in \u201cbatches\u201d \u2013 e.g., based on user hash or region, which a service mesh or gateway could do. In general, prefer stateless handling during the migration if possible.<\/li>\n\n\n\n<li><strong>Rollback Plan:<\/strong> For each stage, have a quick rollback plan. With weighted DNS, rollback = set AWS weight to 0 (or lower it). With LB, rollback = route 100% back to GCP. With mesh, rollback = flip the weight back. These can happen very fast (seconds) if automated or a simple config change. Also, ensure engineers are ready during changes to address any surprise (perhaps do changes during a low-traffic period initially, though with proper canarying you can even do it during normal hours).<\/li>\n\n\n\n<li><strong>Final Cutover and Cleanup:<\/strong> Once AWS is handling all traffic smoothly and you\u2019ve run like that for some time (to ensure stability), you can <strong>decommission<\/strong> the GCP side. This might involve deleting the weighted DNS policy (or removing the old IP), or removing the GCP backend from the LB, etc. Do this only after you\u2019re confident \u2013 you might choose to leave the dual setup running for a \u201cbake-in\u201d period (e.g., a week of 100% on AWS but GCP still on standby). That way, if you unexpectedly need to fail back, you can just reintroduce the weights. When fully done, turn off the GCP services to avoid incurring cost. Also raise DNS TTLs if you lowered them.<\/li>\n<\/ul>\n\n\n\n<p>By following these practices, you can achieve a <strong>zero-downtime migration<\/strong>. In fact, users should not even notice the transition if done correctly. Many companies have done cloud-to-cloud migrations in this fashion (weighted DNS or L7 splitting) without their users ever being aware of the backend move. The combination of careful traffic management and comprehensive monitoring is key to success (<a href=\"https:\/\/blog.clearscale.com\/best-practices-of-application-migration\/#:~:text=,premise%20environment\" target=\"_blank\" rel=\"noopener\">Best Practices for Zero Downtime Migration to AWS | ClearScale<\/a>) (<a href=\"https:\/\/cloud.google.com\/load-balancing\/docs\/application-load-balancer#:~:text=using%20weight,legacy%20services%2C%20and%20similar%20processes\" target=\"_blank\" rel=\"noopener\">Application Load Balancer overview &nbsp;|&nbsp; Load Balancing &nbsp;|&nbsp; Google Cloud<\/a>).<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Recommendation and Example Strategy<\/h2>\n\n\n\n<p>Considering the scenario (DNS in Google Cloud, services in Cloud Run\/GKE moving to EKS), a <strong>two-phase approach<\/strong> might work best:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Phase 1: Weighted DNS Cutover (Quick Win).<\/strong> Set up weighted DNS records for your prod, stage, uat domains to start introducing AWS. This leverages your existing Cloud DNS setup with minimal overhead. For example, in staging or UAT, you could begin sending a portion of traffic to EKS to test it under real load. This is straightforward to implement and requires no new components \u2013 ideal for early testing. Make sure the AWS environment\u2019s domain\/IP is configured in Cloud DNS with a small weight and gradually increase it (<a href=\"https:\/\/blog.clearscale.com\/best-practices-of-application-migration\/#:~:text=,premise%20environment\" target=\"_blank\" rel=\"noopener\">Best Practices for Zero Downtime Migration to AWS | ClearScale<\/a>). Monitor results. This phase gives you confidence in AWS and is easy to roll back by adjusting DNS weights.<\/li>\n\n\n\n<li><strong>Phase 2: Consider Global Load Balancer for Prod (if needed).<\/strong> For production, where zero downtime and fast reactions are paramount, you might introduce the GCP global load balancer in front of prod traffic. This adds more control \u2013 for instance, if during prod migration a problem occurs, the LB will automatically fail back to GCP in milliseconds (due to health checks) rather than waiting for DNS. You could either switch prod DNS to the LB from the start (and then use LB splitting), or continue with weighted DNS but with very aggressive TTLs and perhaps script-based health check adjustments. The LB approach could be implemented in parallel: you can pilot it with one service or domain first. If implementing the LB is too time-consuming or not feasible, staying with <strong>Weighted DNS<\/strong> for prod is still a valid strategy (just make sure to have those health checks and low TTL).<\/li>\n<\/ul>\n\n\n\n<p>In either case, <strong>planning and testing are crucial<\/strong>. Test the weighted routing in a lower environment (e.g., use stage.example.com with 50\/50 weights and see how the traffic flows). Test failure scenarios (e.g., what happens if the AWS service is down \u2013 does DNS or LB correctly keep traffic on GCP?). Also, test the performance when some users are served from AWS \u2013 ensure your CDN, if any, or client-side logic, works the same against both.<\/p>\n\n\n\n<p>For a concrete example, suppose <code>api.prod.example.com<\/code> is currently an A record pointing to a Cloud Run custom domain (which maps to some Google front-end). You want to introduce the AWS EKS service which is exposed via an Amazon ALB (say URL eks-prod-123.us-east-1.elb.amazonaws.com). Here\u2019s how you might proceed:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Setup AWS Endpoint:<\/strong> Ensure the AWS ALB is up and has the EKS service registered. It might have a CNAME DNS. For weighted routing via Cloud DNS, you could either use an A record if the ALB has static IPs (ALB usually doesn\u2019t, but NLB does), or use a CNAME approach. Cloud DNS allows weighted CNAMES as well \u2013 you would create two CNAME records for <code>api.prod.example.com<\/code>, one pointing to the Cloud Run domain, one to the ALB domain, with weights.<\/li>\n\n\n\n<li><strong>Lower TTL:<\/strong> Set <code>api.prod.example.com<\/code> TTL to 30s (from perhaps 300 or 3600) at least 1 hour before starting.<\/li>\n\n\n\n<li><strong>Add AWS with 0 weight:<\/strong> Initially, add the AWS record with weight 0 (or a very small fraction). This ensures the record is in place but essentially nobody (or almost nobody) will get it until you raise it. Or start with a token 5% if you\u2019re feeling confident to test.<\/li>\n\n\n\n<li><strong>Gradually Increase Weight:<\/strong> Over a period of hours or days, raise AWS to 10%, then 25%, 50%, etc. At each step, use metrics from both sides to verify system behavior.<\/li>\n\n\n\n<li><strong>100% and Monitor:<\/strong> Eventually set 100% AWS, 0% GCP. Keep GCP instances running but receiving no traffic (they\u2019re effectively on hot standby). After a stable period, you can remove the weighted policy (replace with a simple CNAME or A to AWS) or keep a failover record (primary AWS, secondary GCP) as a safety net.<\/li>\n\n\n\n<li><strong>Post-Cutover:<\/strong> Increase DNS TTL to normal (to improve cache efficiency). Decommission GCP resources if no longer needed.<\/li>\n<\/ol>\n\n\n\n<p>This approach would achieve the goal with essentially no downtime. Even the final step of going 100% AWS is not a \u201chard cut\u201d \u2013 by that point, most users were already on AWS; it\u2019s just the last portion.<\/p>\n\n\n\n<p>If absolute instantaneous failover is required, adding the global LB in step 3 could replace steps 3-5: you\u2019d point DNS to the LB, and let the LB handle the gradual routing. That might be more complex initially but gives more confidence for mission-critical prod services.<\/p>\n\n\n\n<p>Both methods (DNS vs LB) can even be combined: you could use weighted DNS to split between a GCP LB and an AWS LB if you wanted to double layer it. However, that\u2019s usually unnecessary.<\/p>\n\n\n\n<p><strong>Conclusion:<\/strong> Start with what is simplest and meets your needs. Weighted DNS is often sufficient for a controlled migration and is directly supported by Google Cloud DNS (<a href=\"https:\/\/cloud.google.com\/blog\/products\/networking\/dns-routing-policies-for-geo-location--weighted-round-robin#:~:text=Cloud%20DNS%20will%20support%20two,using%20the%20Cloud%20DNS%20APIs\" target=\"_blank\" rel=\"noopener\">DNS routing policies for geo-location &amp; weighted round robin | Google Cloud Blog<\/a>). If your use case demands tighter control (or you want to minimize reliance on DNS caching behavior), then introduce a global load balancer. In either case, careful incremental rollout and monitoring will ensure you achieve <strong>zero downtime<\/strong> and a successful migration to AWS EKS. The end result will be that users are smoothly transitioned to AWS with no disruption, and you can shut down the GCP services once confidence is high that AWS is running perfectly.<\/p>\n\n\n\n<p><strong>References:<\/strong> Weighted DNS routing and failover techniques (<a href=\"https:\/\/cloud.google.com\/blog\/products\/networking\/dns-routing-policies-for-geo-location--weighted-round-robin#:~:text=Cloud%20DNS%20will%20support%20two,using%20the%20Cloud%20DNS%20APIs\" target=\"_blank\" rel=\"noopener\">DNS routing policies for geo-location &amp; weighted round robin | Google Cloud Blog<\/a>) (<a href=\"https:\/\/blog.clearscale.com\/best-practices-of-application-migration\/#:~:text=,premise%20environment\" target=\"_blank\" rel=\"noopener\">Best Practices for Zero Downtime Migration to AWS | ClearScale<\/a>), advanced load balancers and traffic splitting (<a href=\"https:\/\/cloud.google.com\/load-balancing\/docs\/application-load-balancer#:~:text=,legacy%20services%2C%20and%20similar%20processes\" target=\"_blank\" rel=\"noopener\">Application Load Balancer overview &nbsp;|&nbsp; Load Balancing &nbsp;|&nbsp; Google Cloud<\/a>) (<a href=\"https:\/\/cloud.google.com\/load-balancing\/docs\/https\/traffic-management-global#:~:text=match%20at%20L840%20Weighted%20traffic,to%20the%20individual%20backend%20service\" target=\"_blank\" rel=\"noopener\">Traffic management overview for global external Application Load Balancers &nbsp;|&nbsp; Load Balancing &nbsp;|&nbsp; Google Cloud<\/a>), and real-world zero-downtime migration practices (<a href=\"https:\/\/blog.clearscale.com\/best-practices-of-application-migration\/#:~:text=In%20ClearScale%E2%80%99s%20experience%2C%20a%20simple,currently%20uses%20the%20most%20providers\" target=\"_blank\" rel=\"noopener\">Best Practices for Zero Downtime Migration to AWS | ClearScale<\/a>) (<a href=\"https:\/\/blog.clearscale.com\/best-practices-of-application-migration\/#:~:text=,premise%20environment\" target=\"_blank\" rel=\"noopener\">Best Practices for Zero Downtime Migration to AWS | ClearScale<\/a>) were all considered in devising this plan. Following these best practices will help ensure a seamless hybrid operation and cutover. Good luck with your migration!<\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Overview Migrating from Google Cloud (Cloud Run + GKE) to AWS EKS while serving users from both environments requires careful planning. The goal is to gradually shift traffic to AWS&#8230; <\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"_joinchat":[],"footnotes":""},"categories":[2],"tags":[],"class_list":["post-49118","post","type-post","status-publish","format-standard","hentry","category-uncategorised"],"_links":{"self":[{"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/49118","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/comments?post=49118"}],"version-history":[{"count":1,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/49118\/revisions"}],"predecessor-version":[{"id":49119,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/49118\/revisions\/49119"}],"wp:attachment":[{"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/media?parent=49118"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/categories?post=49118"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/tags?post=49118"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}