July 5, 2026
Building a Multi-Cluster Istio Service Mesh on EKS with External CA and mTLS
Modern Kubernetes platforms rarely live inside a single cluster.

By Hrishikesh Limaye
9 min read
As platforms grow, workloads often run across multiple clusters, regions, and environments. Some services need to communicate across cluster boundaries, while platform teams still need consistent security, observability, traffic control, and identity.
This is where a service mesh becomes useful.
In this blog, I'll walk through a practical architecture for running Istio service mesh on Amazon EKS using:
- Multi-primary Istio service mesh
- Underlying VPC connectivity using Transit Gateway or VPC peering
- External CA integration for Istio
- cert-manager for CA certificate lifecycle
- Istiod as the workload certificate authority
- Envoy sidecars for workload traffic
- mTLS for service-to-service encryption
- Kiali for mesh traffic and mTLS visibility
The goal is to build a mesh where workloads across clusters can communicate securely without every application implementing its own networking, certificate, retry, telemetry, or encryption logic.
Why Service Mesh?
Without a service mesh, applications usually own too much networking complexity.
Each service may need to handle:
- TLS configuration
- Service discovery
- Retries and timeouts
- Load balancing
- Cross-cluster communication
- Observability
- Authentication and authorization
- Certificate rotation
This becomes harder as the number of services and clusters grows.
Istio helps by moving these concerns into the platform layer. Applications continue to send normal traffic, while Envoy sidecars and the Istio control plane handle mesh-level security, routing, discovery, and telemetry.
At a high level:
Application -> Envoy sidecar -> Mesh network -> Envoy sidecar -> Destination applicationApplication -> Envoy sidecar -> Mesh network -> Envoy sidecar -> Destination applicationThe application does not need to know whether the destination is in the same cluster or another cluster. The mesh handles that.
High-Level Architecture
The architecture uses a multi-primary Istio mesh.
In this model, each Kubernetes cluster runs its own Istio control plane. The clusters exchange remote secrets so each control plane can discover workloads and services in the other clusters.
The clusters are connected at the network layer using either AWS Transit Gateway, VPC peering, or an equivalent private connectivity model.
This design gives each cluster local control-plane independence while still allowing secure cross-cluster service communication.
Why Multi-Primary Mesh?
Istio supports different multi-cluster models. In this design, the multi-primary model is useful because each cluster has its own control plane.
That gives several benefits:
- Each cluster can continue operating locally.
- Control-plane failure in one cluster does not fully break the other cluster.
- Service discovery can span clusters.
- Cross-cluster traffic can still use mTLS.
- Platform teams can scale mesh connectivity without centralizing everything into one control plane.
The key piece that connects clusters at the Istio layer is the remote secret.
A remote secret gives one Istio control plane the ability to discover services and endpoints from another cluster.
Conceptually:
Cluster A istiod knows about Cluster B
Cluster B istiod knows about Cluster ACluster A istiod knows about Cluster B
Cluster B istiod knows about Cluster AThis enables services in one cluster to discover and call services in another cluster through the mesh.
Network Prerequisite: Cluster-to-Cluster Reachability
Before configuring Istio multi-cluster service discovery, the Kubernetes clusters must be able to reach each other at the network layer.
Istio does not magically connect isolated VPCs. The service mesh can secure and manage traffic, but packets still need a route between the source cluster and the destination cluster.
In AWS, this usually means one of the following is already in place:
Option 1: AWS Transit Gateway connecting the VPCs
Option 2: VPC peering between the VPCs
Option 3: Another private connectivity model approved by the platform/network teamOption 1: AWS Transit Gateway connecting the VPCs
Option 2: VPC peering between the VPCs
Option 3: Another private connectivity model approved by the platform/network teamThe exact choice depends on the organization's network topology.
VPC peering can work well for a small number of VPCs, but it becomes harder to manage as the number of clusters, accounts, and regions grows.
Transit Gateway is often a better fit for larger multi-account or multi-region environments because it provides a centralized network hub for routing between VPCs.
At minimum, validate the following before enabling multi-cluster mesh:
- Cluster A VPC can route to Cluster B VPC
- Cluster B VPC can route to Cluster A VPC
- Security groups allow required traffic
- Network ACLs do not block traffic
- Pod and service CIDR ranges do not conflict, or conflicts are handled by design
- DNS and service discovery assumptions are understood- Cluster A VPC can route to Cluster B VPC
- Cluster B VPC can route to Cluster A VPC
- Security groups allow required traffic
- Network ACLs do not block traffic
- Pod and service CIDR ranges do not conflict, or conflicts are handled by design
- DNS and service discovery assumptions are understoodA simplified network view looks like this:
Once this network foundation exists, Istio can build on top of it:
- AWS networking provides reachability.
- Istio provides identity, mTLS, discovery, routing, and observability.- AWS networking provides reachability.
- Istio provides identity, mTLS, discovery, routing, and observability.Without this network layer, remote secrets may be configured correctly and service discovery may appear valid, but cross-cluster traffic will still fail because the clusters cannot actually reach each other.
Why External CA?
By default, Istio can generate and use its own self-signed root CA. Istiod then uses that CA to issue workload certificates for Envoy sidecars.
That works, but it creates operational concerns at scale:
- Every cluster may have its own independently managed Istio root CA.
- Root CA expiry must be monitored carefully.
- Certificate renewal becomes a platform responsibility.
- Trust consistency across clusters can become harder.
- Integrating with enterprise PKI or a private CA may be required.
To make the certificate model more production-friendly, this design uses cert-manager to provision the CA material used by Istio.
The important point is:
Istiod still signs workload certificates. cert-manager manages the CA certificate and key that Istiod uses.
So Istiod remains responsible for issuing workload certificates to sidecars, but the CA it uses is not an unmanaged default self-signed CA created manually during installation.
Certificate Model
In this approach, cert-manager creates or retrieves the CA certificate chain and private key needed by Istio. That material is stored in the expected Istio CA secret.
Istiod then uses that CA material to sign workload certificates.
In practical terms, Istio expects CA material in a Kubernetes secret, commonly named cacerts, in the istio-system namespace.
That secret typically contains files such as:
ca-cert.pem
ca-key.pem
root-cert.pem
cert-chain.pemca-cert.pem
ca-key.pem
root-cert.pem
cert-chain.pemIstiod reads this secret and uses it as the signing authority for mesh workload certificates.
Deployment Flow
A production deployment should be automated through pipelines, Helm charts, and Terraform modules. The high-level flow looks like this:
1. Establish network connectivity between cluster VPCs using Transit Gateway or VPC peering
2. Validate routes, security groups, NACLs, DNS assumptions, and non-overlapping CIDRs
3. Create istio-system namespace
4. Install cert-manager
5. Configure cert-manager issuer or private CA integration
6. Use cert-manager to create/provision Istio CA certificate material
7. Store CA material in the Istio CA secret
8. Install Istio base components
9. Install Istio control plane
10. Create remote secrets between clusters
11. Enable sidecar injection for workloads
12. Validate cross-cluster traffic and mTLS1. Establish network connectivity between cluster VPCs using Transit Gateway or VPC peering
2. Validate routes, security groups, NACLs, DNS assumptions, and non-overlapping CIDRs
3. Create istio-system namespace
4. Install cert-manager
5. Configure cert-manager issuer or private CA integration
6. Use cert-manager to create/provision Istio CA certificate material
7. Store CA material in the Istio CA secret
8. Install Istio base components
9. Install Istio control plane
10. Create remote secrets between clusters
11. Enable sidecar injection for workloads
12. Validate cross-cluster traffic and mTLSThe important sequencing points are:
Network connectivity must exist before cross-cluster traffic can work. Istio CA material must be ready before Istio control plane installation.
If Istiod starts without the expected CA secret, it may fall back to default behavior or fail to use the intended CA model, depending on configuration.
Istio Control Plane With CA Secret
In this model, Istiod acts as the CA server for mesh workloads, but it uses the provided CA certificate and key.
A simplified view of the setup:
cert-manager -> creates CA cert/key -> stores in istio-system/cacerts
istiod -> loads cacerts -> signs workload certificates
Envoy sidecars -> receive workload certificates from istiodcert-manager -> creates CA cert/key -> stores in istio-system/cacerts
istiod -> loads cacerts -> signs workload certificates
Envoy sidecars -> receive workload certificates from istiodThis keeps the workload certificate flow native to Istio while giving the platform team better control over the CA lifecycle.
The key design is:
Use cert-manager for CA lifecycle. Use istiod for workload certificate issuance.
That avoids pushing every workload certificate request directly through cert-manager while still avoiding a manually managed Istio root CA.
Cross-Cluster Mesh Setup
After the Istio control plane is running in each cluster, the clusters need to know about each other.
That is usually done by creating remote secrets.
Conceptually:
Create remote secret for Cluster B
Apply it into Cluster A
Create remote secret for Cluster A
Apply it into Cluster BCreate remote secret for Cluster B
Apply it into Cluster A
Create remote secret for Cluster A
Apply it into Cluster BThe flow looks like this:
Once remote secrets are in place, Istio can discover services across clusters and route traffic accordingly.
For multi-cluster mTLS to work cleanly, clusters must also share a compatible trust model. That is why the CA design matters. If clusters do not trust the same root or compatible certificate chain, cross-cluster mTLS will fail.
Sidecar Injection
For a workload to participate in the mesh, it needs an Envoy sidecar.
There are two common approaches.
Namespace-level injection
kubectl label namespace app-namespace istio-injection=enabled --overwritekubectl label namespace app-namespace istio-injection=enabled --overwriteAfter labeling the namespace, workloads need to be restarted so Istio can inject the sidecar.
Workload-level injection
template:
metadata:
annotations:
sidecar.istio.io/inject: "true"template:
metadata:
annotations:
sidecar.istio.io/inject: "true"The right approach depends on how your platform manages namespace onboarding and default injection policy.
Cross-Cluster Traffic Validation
A simple way to validate cross-cluster traffic is to deploy test workloads into both clusters.
For example:
Cluster A:
namespace: sample-a
services: nginx, sleep
Cluster B:
namespace: sample-b
services: nginx, sleepCluster A:
namespace: sample-a
services: nginx, sleep
Cluster B:
namespace: sample-b
services: nginx, sleepFrom a sleep pod in Cluster A, test:
curl nginx.sample-a.svc.cluster.local
curl nginx.sample-b.svc.cluster.localcurl nginx.sample-a.svc.cluster.local
curl nginx.sample-b.svc.cluster.localExpected result:
Traffic should reach services in both local and remote clusters.Traffic should reach services in both local and remote clusters.This confirms that service discovery and routing are working across the mesh.
Validating mTLS
Cross-cluster connectivity alone is not enough. We also need to validate that traffic is encrypted and only mesh workloads can communicate when strict mTLS is enabled.
A namespace-level PeerAuthentication policy can enforce strict mTLS:
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: default
namespace: sample-a
spec:
mtls:
mode: STRICTapiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: default
namespace: sample-a
spec:
mtls:
mode: STRICTApply the same policy to the second namespace.
Then test from two types of workloads:
1. Workload with Istio sidecar
2. Workload without Istio sidecar1. Workload with Istio sidecar
2. Workload without Istio sidecarExpected behavior:
Sidecar-enabled workload -> request succeeds
Non-mesh workload -> request failsSidecar-enabled workload -> request succeeds
Non-mesh workload -> request failsThis proves that the service is accepting only mTLS traffic from mesh identities.
Observability With Kiali
Kiali is useful for visual verification.
In the Kiali graph view, you can enable:
- Security
- Traffic animation
- Namespace graph
- Service graph
When mTLS is active, Kiali shows a lock icon on traffic edges.
This is helpful during rollout because it gives a quick visual confirmation that traffic is flowing through the mesh and protected by mTLS.
Operational Checks
A few commands are useful for day-two operations.
Check Istio version:
istioctl versionistioctl versionCheck sidecar connectivity:
istioctl proxy-statusistioctl proxy-statusCheck remote clusters:
istioctl remote-clustersistioctl remote-clustersCheck Istio control plane pods:
kubectl get pods -n istio-systemkubectl get pods -n istio-systemInspect workload certificates:
istioctl proxy-config secret deploy/<deployment-name> -n <namespace>istioctl proxy-config secret deploy/<deployment-name> -n <namespace>Check the Istio CA secret:
kubectl get secret cacerts -n istio-systemkubectl get secret cacerts -n istio-systemCheck cert-manager resources:
kubectl get certificate -A
kubectl get issuer -A
kubectl get clusterissuerkubectl get certificate -A
kubectl get issuer -A
kubectl get clusterissuerThese checks are useful when debugging cross-cluster discovery, certificate issues, sidecar injection, or mTLS failures.
Common Issues
A few common issue patterns are worth watching.
Workload is not part of the mesh
If the pod does not have an Envoy sidecar, it cannot participate in mesh identity or mTLS.
Check:
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[*].name}'kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[*].name}'You should see both the application container and istio-proxy.
Cross-cluster service is not resolving
Check remote secrets and remote cluster status:
istioctl remote-clustersistioctl remote-clustersAlso check istiod logs for service discovery issues.
Cross-cluster traffic is discovered but not reachable
This usually points to underlying network connectivity.
Check:
VPC routes
Transit Gateway attachments or VPC peering routes
Security groups
Network ACLs
Pod/service CIDR overlap
Firewall or inspection rulesVPC routes
Transit Gateway attachments or VPC peering routes
Security groups
Network ACLs
Pod/service CIDR overlap
Firewall or inspection rulesIstio may know that a remote service exists, but AWS networking still has to allow traffic to reach it.
STRICT mTLS breaks traffic
This usually means the source workload is not in the mesh, the destination policy is too strict for the current rollout state, or sidecar injection was missed.
Istiod is not using the expected CA
Check whether the cacerts secret exists before Istio control plane installation:
kubectl get secret cacerts -n istio-systemkubectl get secret cacerts -n istio-systemAlso inspect istiod logs:
kubectl logs deploy/istiod -n istio-systemkubectl logs deploy/istiod -n istio-systemIf the CA secret is missing or malformed, workload certificates may not be issued from the intended CA chain.
Key Lessons Learned
A few lessons stand out from this design.
First, networking comes before mesh. Transit Gateway, VPC peering, or equivalent private connectivity must exist before cross-cluster Istio traffic can work.
Second, certificate material must be ready before Istio control plane installation. Istiod needs the expected CA secret when it starts.
Third, multi-primary mesh is a good fit when each cluster should retain local control-plane independence.
Fourth, sidecar injection strategy matters. Namespace-level injection is convenient, but explicit workload-level injection can be useful during controlled onboarding.
Fifth, strict mTLS should be rolled out carefully. It is powerful, but it will immediately expose workloads that are not properly injected into the mesh.
Sixth, Kiali is valuable for visual validation, but it should not be the only check. Combine it with curl tests,
istioctl proxy-status, certificate inspection, and strict mTLS validation.
Finally, using cert-manager for CA lifecycle while keeping Istiod as the workload certificate signer gives a practical balance. The platform gets better CA lifecycle management without replacing Istio's native workload certificate issuance flow.
Conclusion
A multi-cluster Istio service mesh is not just about connecting Kubernetes clusters.
It is about creating a consistent platform layer for service identity, encrypted communication, discovery, traffic management, and observability.
With a multi-primary architecture, each EKS cluster can run its own Istio control plane while still participating in a shared mesh. With VPC connectivity underneath, the clusters can actually reach each other. With cert-manager managing the CA material and Istiod using that CA to sign workload certificates, the mesh gets a more controlled certificate model while preserving Istio's native certificate issuance flow.
The final architecture gives platform teams a strong foundation:
VPC-to-VPC network reachability
+ multi-cluster service discovery
+ cert-manager managed CA material
+ Istiod signed workload certificates
+ mTLS encryption
+ sidecar-based traffic control
+ Kiali-based visibility
+ operational validation with istioctlVPC-to-VPC network reachability
+ multi-cluster service discovery
+ cert-manager managed CA material
+ Istiod signed workload certificates
+ mTLS encryption
+ sidecar-based traffic control
+ Kiali-based visibility
+ operational validation with istioctlThat combination turns service-to-service communication into a platform capability instead of something every application team has to solve independently.