十二月 21, 2025
45 分钟阅读

高级 DevOps 工程师生产系统面试题

interview
career-advice
job-search
高级 DevOps 工程师生产系统面试题
Milad Bonakdar

Milad Bonakdar

作者

用实战问题准备 Kubernetes、Terraform 状态、GitOps、安全、可观测性、事故响应和生产系统取舍等高级 DevOps 面试重点。


简介

高级 DevOps 工程师需要架构可扩展的基础设施、实施高级自动化、确保安全性和合规性,并在整个组织内推动 DevOps 文化。这个职位要求在容器编排、基础设施即代码、云架构和站点可靠性工程方面具有深厚的专业知识。

本综合指南涵盖了高级 DevOps 工程师的必备面试问题,重点关注高级概念、生产系统和战略思维。每个问题都包含详细的解释和实际示例。


高级 Kubernetes

1. 解释 Kubernetes 架构以及关键组件的作用。

回答: Kubernetes 遵循主从架构:

控制平面组件:

  • API Server: Kubernetes 控制平面的前端,处理所有 REST 请求
  • etcd: 用于集群状态的分布式键值存储
  • Scheduler: 根据资源需求将 Pod 分配给节点
  • Controller Manager: 运行控制器进程(复制、端点等)
  • Cloud Controller Manager: 与云提供商 API 集成

节点组件:

  • kubelet: 代理,确保容器在 Pod 中运行
  • kube-proxy: 维护 Pod 通信的网络规则
  • Container Runtime: 运行容器(Docker、containerd、CRI-O)
Loading diagram...

工作原理:

  1. 用户通过 kubectl 提交部署
  2. API Server 验证并存储在 etcd 中
  3. Scheduler 将 Pod 分配给节点
  4. 节点上的 kubelet 创建容器
  5. kube-proxy 配置网络

常见程度: 非常常见 难度: 困难


2. 如何排查卡在 CrashLoopBackOff 状态的 Pod?

回答: 系统性的调试方法:

# 1. 检查 Pod 状态和事件
kubectl describe pod <pod-name>
# 查找:镜像拉取错误、资源限制、健康检查失败

# 2. 检查日志
kubectl logs <pod-name>
kubectl logs <pod-name> --previous  # 之前的容器日志

# 3. 检查资源约束
kubectl top pod <pod-name>
kubectl describe node <node-name>

# 4. 检查存活/就绪探针
kubectl get pod <pod-name> -o yaml | grep -A 10 livenessProbe

# 5. 进入容器(如果它短暂保持运行)
kubectl exec -it <pod-name> -- /bin/sh

# 6. 检查镜像
kubectl get pod <pod-name> -o jsonpath='{.spec.containers[*].image}'
docker pull <image>  # 在本地测试

# 7. 检查 ConfigMaps/Secrets
kubectl get configmap
kubectl get secret

# 8. 查看 Deployment/Pod 规范
kubectl get deployment <deployment-name> -o yaml

常见原因:

  • 应用程序在启动时崩溃
  • 缺少环境变量
  • 不正确的存活探针配置
  • 资源不足(OOMKilled)
  • 镜像拉取错误
  • 缺少依赖项

示例修复:

# 增加资源限制
resources:
  limits:
    memory: "512Mi"
    cpu: "500m"
  requests:
    memory: "256Mi"
    cpu: "250m"

# 调整探针时序
livenessProbe:
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 30  # 给应用程序启动时间
  periodSeconds: 10
  failureThreshold: 3

常见程度: 非常常见 难度: 中等


3. 解释 Kubernetes 网络:Services、Ingress 和 Network Policies。

回答: Kubernetes 网络层:

Services: Service 暴露类型:

# ClusterIP(仅限内部)
apiVersion: v1
kind: Service
metadata:
  name: backend
spec:
  type: ClusterIP
  selector:
    app: backend
  ports:
    - port: 80
      targetPort: 8080

# NodePort(通过节点 IP 进行外部访问)
spec:
  type: NodePort
  ports:
    - port: 80
      targetPort: 8080
      nodePort: 30080

# LoadBalancer(云负载均衡器)
spec:
  type: LoadBalancer
  ports:
    - port: 80
      targetPort: 8080

Ingress: HTTP/HTTPS 路由:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: app-ingress
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  rules:
  - host: api.example.com
    http:
      paths:
      - path: /v1
        pathType: Prefix
        backend:
          service:
            name: api-v1
            port:
              number: 80
      - path: /v2
        pathType: Prefix
        backend:
          service:
            name: api-v2
            port:
              number: 80
  tls:
  - hosts:
    - api.example.com
    secretName: api-tls

Network Policies: 控制 Pod 到 Pod 的通信:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: backend-policy
spec:
  podSelector:
    matchLabels:
      app: backend
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: frontend
    ports:
    - protocol: TCP
      port: 8080
  egress:
  - to:
    - podSelector:
        matchLabels:
          app: database
    ports:
    - protocol: TCP
      port: 5432

常见程度: 非常常见 难度: 困难


4. 如何在 Kubernetes 中实现自动伸缩?

回答: 多种自动伸缩策略:

Horizontal Pod Autoscaler (HPA):

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15

Vertical Pod Autoscaler (VPA):

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: app
  updatePolicy:
    updateMode: "Auto"  # 或 "Recreate", "Initial", "Off"
  resourcePolicy:
    containerPolicies:
    - containerName: app
      minAllowed:
        cpu: 100m
        memory: 128Mi
      maxAllowed:
        cpu: 2
        memory: 2Gi

Cluster Autoscaler: 根据待处理的 Pod 自动调整集群大小:

# AWS 示例
apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-autoscaler-config
data:
  min-nodes: "2"
  max-nodes: "10"
  scale-down-delay-after-add: "10m"
  scale-down-unneeded-time: "10m"

常见程度: 常见 难度: 中等


高级 Terraform

5. 解释 Terraform 状态管理和最佳实践。

回答: Terraform 状态跟踪基础设施,对于操作至关重要。

远程状态配置:

# backend.tf
terraform {
  backend "s3" {
    bucket         = "my-terraform-state"
    key            = "prod/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-locks"
  }
}

状态锁定:

# 用于状态锁定的 DynamoDB 表
resource "aws_dynamodb_table" "terraform_locks" {
  name         = "terraform-locks"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "LockID"

  attribute {
    name = "LockID"
    type = "S"
  }
}

最佳实践:

1. 永远不要将状态文件提交到 Git

# .gitignore
*.tfstate
*.tfstate.*
.terraform/

2. 使用工作区进行环境隔离

terraform workspace new dev
terraform workspace new staging
terraform workspace new prod

terraform workspace select dev
terraform apply

3. 导入现有资源

# 导入现有的 EC2 实例
terraform import aws_instance.web i-1234567890abcdef0

# 验证
terraform plan

4. 状态操作(谨慎使用)

# 列出状态中的资源
terraform state list

# 显示特定资源
terraform state show aws_instance.web

# 在状态中移动资源
terraform state mv aws_instance.old aws_instance.new

# 从状态中删除资源(不删除)
terraform state rm aws_instance.web

5. 在重大更改之前备份状态

terraform state pull > backup.tfstate

常见程度: 非常常见 难度: 困难


6. 如何为大型项目构建 Terraform 代码?

回答: 模块化结构,以提高可维护性:

目录结构:

terraform/
├── environments/
│   ├── dev/
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   ├── terraform.tfvars
│   │   └── backend.tf
│   ├── staging/
│   └── prod/
├── modules/
│   ├── vpc/
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   ├── outputs.tf
│   │   └── README.md
│   ├── eks/
│   ├── rds/
│   └── s3/
└── global/
    ├── iam/
    └── route53/

模块示例:

# modules/vpc/main.tf
resource "aws_vpc" "main" {
  cidr_block           = var.vpc_cidr
  enable_dns_hostnames = true
  enable_dns_support   = true

  tags = merge(
    var.tags,
    {
      Name = "${var.environment}-vpc"
    }
  )
}

resource "aws_subnet" "private" {
  count             = length(var.private_subnet_cidrs)
  vpc_id            = aws_vpc.main.id
  cidr_block        = var.private_subnet_cidrs[count.index]
  availability_zone = var.availability_zones[count.index]

  tags = merge(
    var.tags,
    {
      Name = "${var.environment}-private-${count.index + 1}"
      Type = "private"
    }
  )
}

# modules/vpc/variables.tf
variable "vpc_cidr" {
  description = "VPC 的 CIDR 块"
  type        = string
}

variable "environment" {
  description = "环境名称"
  type        = string
}

variable "private_subnet_cidrs" {
  description = "私有子网的 CIDR 块"
  type        = list(string)
}

variable "availability_zones" {
  description = "可用区"
  type        = list(string)
}

variable "tags" {
  description = "通用标签"
  type        = map(string)
  default     = {}
}

# modules/vpc/outputs.tf
output "vpc_id" {
  value = aws_vpc.main.id
}

output "private_subnet_ids" {
  value = aws_subnet.private[*].id
}

使用模块:

# environments/prod/main.tf
module "vpc" {
  source = "../../modules/vpc"

  vpc_cidr             = "10.0.0.0/16"
  environment          = "prod"
  private_subnet_cidrs = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
  availability_zones   = ["us-east-1a", "us-east-1b", "us-east-1c"]

  tags = {
    Project   = "MyApp"
    ManagedBy = "Terraform"
  }
}

module "eks" {
  source = "../../modules/eks"

  cluster_name    = "prod-cluster"
  vpc_id          = module.vpc.vpc_id
  subnet_ids      = module.vpc.private_subnet_ids
  node_group_size = 3
}

常见程度: 常见 难度: 困难


云架构

7. 在 AWS 上设计一个高可用的多区域架构。

回答: 用于高可用性的多区域架构:

Loading diagram...

关键组件:

1. DNS 和流量管理:

# 带有健康检查的 Route 53
resource "aws_route53_health_check" "primary" {
  fqdn              = "api.example.com"
  port              = 443
  type              = "HTTPS"
  resource_path     = "/health"
  failure_threshold = 3
  request_interval  = 30
}

resource "aws_route53_record" "api" {
  zone_id = aws_route53_zone.main.zone_id
  name    = "api.example.com"
  type    = "A"

  failover_routing_policy {
    type = "PRIMARY"
  }

  set_identifier = "primary"
  health_check_id = aws_route53_health_check.primary.id

  alias {
    name                   = aws_lb.primary.dns_name
    zone_id                = aws_lb.primary.zone_id
    evaluate_target_health = true
  }
}

2. 数据库复制:

# 带有跨区域只读副本的 RDS
resource "aws_db_instance" "primary" {
  identifier           = "prod-db-primary"
  engine               = "postgres"
  instance_class       = "db.r5.xlarge"
  multi_az             = true
  backup_retention_period = 7

  provider = aws.us-east-1
}

resource "aws_db_instance" "replica" {
  identifier             = "prod-db-replica"
  replicate_source_db    = aws_db_instance.primary.arn
  instance_class         = "db.r5.xlarge"
  auto_minor_version_upgrade = false

  provider = aws.us-west-2
}

3. 数据复制:

# S3 跨区域复制
resource "aws_s3_bucket_replication_configuration" "replication" {
  bucket = aws_s3_bucket.source.id
  role   = aws_iam_role.replication.arn

  rule {
    id     = "replicate-all"
    status = "Enabled"

    destination {
      bucket        = aws_s3_bucket.destination.arn
      storage_class = "STANDARD"
    }
  }
}

设计原则:

  • 主动-主动或主动-被动设置
  • 通过健康检查实现自动故障转移
  • 以最小的延迟进行数据复制
  • 跨区域的一致部署
  • 两个区域的监控和警报

常见程度: 常见 难度: 困难


GitOps & CI/CD

8. 解释 GitOps 以及如何使用 ArgoCD 实现它。

回答: GitOps 使用 Git 作为声明式基础设施和应用程序的单一事实来源。

原则:

  1. Git 中的声明式配置
  2. 自动同步
  3. 所有更改的版本控制
  4. 持续协调

ArgoCD 实现:

# 应用程序清单
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: myapp
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/org/app-manifests
    targetRevision: main
    path: k8s/overlays/production
  destination:
    server: https://kubernetes.default.svc
    namespace: production
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
      allowEmpty: false
    syncOptions:
    - CreateNamespace=true
    retry:
      limit: 5
      backoff:
        duration: 5s
        factor: 2
        maxDuration: 3m

目录结构:

app-manifests/
├── base/
│   ├── deployment.yaml
│   ├── service.yaml
│   └── kustomization.yaml
└── overlays/
    ├── dev/
    │   ├── kustomization.yaml
    │   └── patches/
    ├── staging/
    └── production/
        ├── kustomization.yaml
        ├── replicas.yaml
        └── resources.yaml

Kustomization:

# overlays/production/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

bases:
- ../../base

replicas:
- name: myapp
  count: 5

resources:
- ingress.yaml

patches:
- path: resources.yaml
  target:
    kind: Deployment
    name: myapp

优点:

  • Git 作为审计跟踪
  • 易于回滚 (git revert)
  • 声明式的期望状态
  • 自动漂移检测
  • 多集群管理

常见程度: 常见 难度: 中等


安全 & 合规

9. 如何在 Kubernetes 中实施安全最佳实践?

回答: 多层安全方法:

1. Pod 安全标准:

apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/warn: restricted

2. RBAC(基于角色的访问控制):

# 开发人员的角色
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: production
  name: developer
rules:
- apiGroups: ["", "apps"]
  resources: ["pods", "deployments", "services"]
  verbs: ["get", "list", "watch"]
- apiGroups: [""]
  resources: ["pods/log"]
  verbs: ["get"]

# RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: developer-binding
  namespace: production
subjects:
- kind: Group
  name: developers
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: developer
  apiGroup: rbac.authorization.k8s.io

3. Network Policies:

# 默认拒绝所有入口
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-ingress
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Ingress

4. Secrets 管理:

# External Secrets Operator
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: app-secrets
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: aws-secrets-manager
    kind: SecretStore
  target:
    name: app-secrets
    creationPolicy: Owner
  data:
  - secretKey: database-password
    remoteRef:
      key: prod/database
      property: password

5. Security Context:

apiVersion: v1
kind: Pod
metadata:
  name: secure-pod
spec:
  securityContext:
    runAsNonRoot: true
    runAsUser: 1000
    fsGroup: 2000
    seccompProfile:
      type: RuntimeDefault
  containers:
  - name: app
    image: myapp:1.0
    securityContext:
      allowPrivilegeEscalation: false
      readOnlyRootFilesystem: true
      capabilities:
        drop:
        - ALL
    volumeMounts:
    - name: tmp
      mountPath: /tmp
  volumes:
  - name: tmp
    emptyDir: {}

6. 镜像扫描:

# 带有 OPA 的准入控制器
apiVersion: v1
kind: ConfigMap
metadata:
  name: opa-policy
data:
  policy.rego: |
    package kubernetes.admission
    
    deny[msg] {
      input.request.kind.kind == "Pod"
      image := input.request.object.spec.containers[_].image
      not startswith(image, "registry.company.com/")
      msg := sprintf("Image %v is not from approved registry", [image])
    }

常见程度: 非常常见 难度: 困难


可观测性 & SRE

10. 设计一个全面的可观测性堆栈。

回答: 可观测性的三大支柱:指标、日志、追踪

架构:

Loading diagram...

1. 指标 (Prometheus + Grafana):

# 应用程序指标的 ServiceMonitor
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: app-metrics
spec:
  selector:
    matchLabels:
      app: myapp
  endpoints:
  - port: metrics
    interval: 30s
    path: /metrics

2. 日志 (Loki):

# 用于日志收集的 Promtail 配置
apiVersion: v1
kind: ConfigMap
metadata:
  name: promtail-config
data:
  promtail.yaml: |
    server:
      http_listen_port: 9080
    
    clients:
      - url: http://loki:3100/loki/api/v1/push
    
    scrape_configs:
      - job_name: kubernetes-pods
        kubernetes_sd_configs:
          - role: pod
        relabel_configs:
          - source_labels: [__meta_kubernetes_pod_label_app]
            target_label: app
          - source_labels: [__meta_kubernetes_namespace]
            target_label: namespace

3. 追踪 (Jaeger):

# 应用程序检测
from opentelemetry import trace
from opentelemetry.exporter.jaeger.thrift import JaegerExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor

# 设置追踪
trace.set_tracer_provider(TracerProvider())
jaeger_exporter = JaegerExporter(
    agent_host_name="jaeger-agent",
    agent_port=6831,
)
trace.get_tracer_provider().add_span_processor(
    BatchSpanProcessor(jaeger_exporter)
)

tracer = trace.get_tracer(__name__)

# 在代码中使用
with tracer.start_as_current_span("process_request"):
    # 你的代码
    pass

4. 警报规则:

# PrometheusRule
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: app-alerts
spec:
  groups:
  - name: app
    interval: 30s
    rules:
    - alert: HighErrorRate
      expr: |
        sum(rate(http_requests_total{status=~"5.."}[5m]))
        /
        sum(rate(http_requests_total[5m]))
        > 0.05
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "检测到高错误率"
        description: "错误率为 {{ $value | humanizePercentage }}"
    
    - alert: HighLatency
      expr: |
        histogram_quantile(0.95,
          rate(http_request_duration_seconds_bucket[5m])
        ) > 1
      for: 10m
      labels:
        severity: warning
      annotations:
        summary: "检测到高延迟"

5. SLO 监控:

# SLO 定义
apiVersion: sloth.slok.dev/v1
kind: PrometheusServiceLevel
metadata:
  name: api-availability
spec:
  service: "api"
  labels:
    team: "platform"
  slos:
    - name: "requests-availability"
      objective: 99.9
      description: "API 请求应该成功"
      sli:
        events:
          errorQuery: sum(rate(http_requests_total{status=~"5.."}[{{.window}}]))
          totalQuery: sum(rate(http_requests_total[{{.window}}]))
      alerting:
        pageAlert:
          labels:
            severity: critical
        ticketAlert:
          labels:
            severity: warning

常见程度: 常见 难度: 困难


灾难恢复

11. 如何为 Kubernetes 集群实施灾难恢复?

回答: 全面的 DR 策略:

1. 备份策略:

# Velero 备份计划
apiVersion: velero.io/v1
kind: Schedule
metadata:
  name: daily-backup
  namespace: velero
spec:
  schedule: "0 2 * * *"  # 每天凌晨 2 点
  template:
    includedNamespaces:
    - production
    - staging
    excludedResources:
    - events
    - events.events.k8s.io
    storageLocation: aws-s3
    volumeSnapshotLocations:
    - aws-ebs
    ttl: 720h  # 30 天

2. etcd 备份:

#!/bin/bash
# 自动化的 etcd 备份脚本

ETCDCTL_API=3 etcdctl \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key \
  snapshot save /backup/etcd-snapshot-$(date +%Y%m%d-%H%M%S).db

# 上传到 S3
aws s3 cp /backup/etcd-snapshot-*.db s3://etcd-backups/

# 清理旧备份
find /backup -name "etcd-snapshot-*.db" -mtime +7 -delete

3. 恢复过程:

# 从快照恢复 etcd
ETCDCTL_API=3 etcdctl snapshot restore snapshot.db \
  --data-dir=/var/lib/etcd-restore \
  --initial-cluster=etcd-0=https://10.0.1.10:2380 \
  --initial-advertise-peer-urls=https://10.0.1.10:2380

# 使用 Velero 恢复应用程序
velero restore create --from-backup daily-backup-20231125
velero restore describe <restore-name>

4. 多区域故障转移:

# 用于多区域设置的 Terraform
module "primary_cluster" {
  source = "./modules/eks"
  region = "us-east-1"
  # ... 配置
}

module "dr_cluster" {
  source = "./modules/eks"
  region = "us-west-2"
  # ... 配置
}

# Route 53 健康检查和故障转移
resource "aws_route53_health_check" "primary" {
  fqdn              = module.primary_cluster.endpoint
  port              = 443
  type              = "HTTPS"
  resource_path     = "/healthz"
  failure_threshold = 3
}

5. RTO/RPO 目标:

  • RTO(恢复时间目标): < 1 小时
  • RPO(恢复点目标): < 15 分钟
  • 定期 DR 演练(每月)
  • 记录在案的运行手册
  • 尽可能实现自动故障转移

常见程度: 常见 难度: 困难


服务网格

12. 解释服务网格架构以及何时使用它。

回答: 服务网格为服务到服务通信提供基础设施层。

核心组件:

Loading diagram...

Istio 实现:

# 用于流量路由的 Virtual Service
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: reviews
spec:
  hosts:
  - reviews
  http:
  - match:
    - headers:
        end-user:
          exact: jason
    route:
    - destination:
        host: reviews
        subset: v2
  - route:
    - destination:
        host: reviews
        subset: v1
      weight: 80
    - destination:
        host: reviews
        subset: v2
      weight: 20

# Destination Rule
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: reviews
spec:
  host: reviews
  trafficPolicy:
    loadBalancer:
      simple: LEAST_REQUEST
  subsets:
  - name: v1
    labels:
      version: v1
  - name: v2
    labels:
      version: v2
    trafficPolicy:
      connectionPool:
        tcp:
          maxConnections: 100
        http:
          http1MaxPendingRequests: 50
          http2MaxRequests: 100

熔断:

apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: backend
spec:
  host: backend
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 100
      http:
        http1MaxPendingRequests: 10
        maxRequestsPerConnection: 2
    outlierDetection:
      consecutiveErrors: 5
      interval: 30s
      baseEjectionTime: 30s
      maxEjectionPercent: 50

Mutual TLS:

apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
  namespace: production
spec:
  mtls:
    mode: STRICT

# Authorization Policy
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: frontend-policy
spec:
  selector:
    matchLabels:
      app: frontend
  action: ALLOW
  rules:
  - from:
    - source:
        principals: ["cluster.local/
Newsletter subscription

真正有效的每周职业建议

将最新见解直接发送到您的收件箱

停止申请,开始被录用

使用全球求职者信赖的AI驱动优化,将您的简历转变为面试磁铁。

免费开始

分享这篇文章

让您的6秒钟发挥作用

招聘人员平均只花6到7秒扫描简历。我们经过验证的模板旨在立即吸引注意力并让他们继续阅读。