高级 DevOps 工程师面试题

简介

高级 DevOps 工程师需要架构可扩展的基础设施、实施高级自动化、确保安全性和合规性，并在整个组织内推动 DevOps 文化。这个职位要求在容器编排、基础设施即代码、云架构和站点可靠性工程方面具有深厚的专业知识。

本综合指南涵盖了高级 DevOps 工程师的必备面试问题，重点关注高级概念、生产系统和战略思维。每个问题都包含详细的解释和实际示例。

高级 Kubernetes

1. 解释 Kubernetes 架构以及关键组件的作用。

回答： Kubernetes 遵循主从架构：

控制平面组件：

API Server： Kubernetes 控制平面的前端，处理所有 REST 请求
etcd： 用于集群状态的分布式键值存储
Scheduler： 根据资源需求将 Pod 分配给节点
Controller Manager： 运行控制器进程（复制、端点等）
Cloud Controller Manager： 与云提供商 API 集成

节点组件：

kubelet： 代理，确保容器在 Pod 中运行
kube-proxy： 维护 Pod 通信的网络规则
Container Runtime： 运行容器（Docker、containerd、CRI-O）

Loading diagram...

工作原理：

用户通过 kubectl 提交部署
API Server 验证并存储在 etcd 中
Scheduler 将 Pod 分配给节点
节点上的 kubelet 创建容器
kube-proxy 配置网络

常见程度： 非常常见 难度： 困难

2. 如何排查卡在 CrashLoopBackOff 状态的 Pod？

回答： 系统性的调试方法：

# 1. 检查 Pod 状态和事件
kubectl describe pod <pod-name>
# 查找：镜像拉取错误、资源限制、健康检查失败

# 2. 检查日志
kubectl logs <pod-name>
kubectl logs <pod-name> --previous  # 之前的容器日志

# 3. 检查资源约束
kubectl top pod <pod-name>
kubectl describe node <node-name>

# 4. 检查存活/就绪探针
kubectl get pod <pod-name> -o yaml | grep -A 10 livenessProbe

# 5. 进入容器（如果它短暂保持运行）
kubectl exec -it <pod-name> -- /bin/sh

# 6. 检查镜像
kubectl get pod <pod-name> -o jsonpath='{.spec.containers[*].image}'
docker pull <image>  # 在本地测试

# 7. 检查 ConfigMaps/Secrets
kubectl get configmap
kubectl get secret

# 8. 查看 Deployment/Pod 规范
kubectl get deployment <deployment-name> -o yaml

常见原因：

应用程序在启动时崩溃
缺少环境变量
不正确的存活探针配置
资源不足（OOMKilled）
镜像拉取错误
缺少依赖项

示例修复：

# 增加资源限制
resources:
  limits:
    memory: "512Mi"
    cpu: "500m"
  requests:
    memory: "256Mi"
    cpu: "250m"

# 调整探针时序
livenessProbe:
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 30  # 给应用程序启动时间
  periodSeconds: 10
  failureThreshold: 3

常见程度： 非常常见 难度： 中等

3. 解释 Kubernetes 网络：Services、Ingress 和 Network Policies。

回答： Kubernetes 网络层：

Services： Service 暴露类型：

# ClusterIP（仅限内部）
apiVersion: v1
kind: Service
metadata:
  name: backend
spec:
  type: ClusterIP
  selector:
    app: backend
  ports:
    - port: 80
      targetPort: 8080

# NodePort（通过节点 IP 进行外部访问）
spec:
  type: NodePort
  ports:
    - port: 80
      targetPort: 8080
      nodePort: 30080

# LoadBalancer（云负载均衡器）
spec:
  type: LoadBalancer
  ports:
    - port: 80
      targetPort: 8080

Ingress： HTTP/HTTPS 路由：

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: app-ingress
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  rules:
  - host: api.example.com
    http:
      paths:
      - path: /v1
        pathType: Prefix
        backend:
          service:
            name: api-v1
            port:
              number: 80
      - path: /v2
        pathType: Prefix
        backend:
          service:
            name: api-v2
            port:
              number: 80
  tls:
  - hosts:
    - api.example.com
    secretName: api-tls

Network Policies： 控制 Pod 到 Pod 的通信：

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: backend-policy
spec:
  podSelector:
    matchLabels:
      app: backend
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: frontend
    ports:
    - protocol: TCP
      port: 8080
  egress:
  - to:
    - podSelector:
        matchLabels:
          app: database
    ports:
    - protocol: TCP
      port: 5432

常见程度： 非常常见 难度： 困难

4. 如何在 Kubernetes 中实现自动伸缩？

回答： 多种自动伸缩策略：

Horizontal Pod Autoscaler (HPA)：

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15

Vertical Pod Autoscaler (VPA)：

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: app
  updatePolicy:
    updateMode: "Auto"  # 或 "Recreate", "Initial", "Off"
  resourcePolicy:
    containerPolicies:
    - containerName: app
      minAllowed:
        cpu: 100m
        memory: 128Mi
      maxAllowed:
        cpu: 2
        memory: 2Gi

Cluster Autoscaler： 根据待处理的 Pod 自动调整集群大小：

# AWS 示例
apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-autoscaler-config
data:
  min-nodes: "2"
  max-nodes: "10"
  scale-down-delay-after-add: "10m"
  scale-down-unneeded-time: "10m"

常见程度： 常见 难度： 中等

高级 Terraform

5. 解释 Terraform 状态管理和最佳实践。

回答： Terraform 状态跟踪基础设施，对于操作至关重要。

远程状态配置：

# backend.tf
terraform {
  backend "s3" {
    bucket         = "my-terraform-state"
    key            = "prod/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-locks"
  }
}

状态锁定：

# 用于状态锁定的 DynamoDB 表
resource "aws_dynamodb_table" "terraform_locks" {
  name         = "terraform-locks"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "LockID"

  attribute {
    name = "LockID"
    type = "S"
  }
}

最佳实践：

1. 永远不要将状态文件提交到 Git

# .gitignore
*.tfstate
*.tfstate.*
.terraform/

2. 使用工作区进行环境隔离

terraform workspace new dev
terraform workspace new staging
terraform workspace new prod

terraform workspace select dev
terraform apply

3. 导入现有资源

# 导入现有的 EC2 实例
terraform import aws_instance.web i-1234567890abcdef0

# 验证
terraform plan

4. 状态操作（谨慎使用）

# 列出状态中的资源
terraform state list

# 显示特定资源
terraform state show aws_instance.web

# 在状态中移动资源
terraform state mv aws_instance.old aws_instance.new

# 从状态中删除资源（不删除）
terraform state rm aws_instance.web

5. 在重大更改之前备份状态

terraform state pull > backup.tfstate

常见程度： 非常常见 难度： 困难

6. 如何为大型项目构建 Terraform 代码？

回答： 模块化结构，以提高可维护性：

目录结构：

terraform/
├── environments/
│   ├── dev/
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   ├── terraform.tfvars
│   │   └── backend.tf
│   ├── staging/
│   └── prod/
├── modules/
│   ├── vpc/
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   ├── outputs.tf
│   │   └── README.md
│   ├── eks/
│   ├── rds/
│   └── s3/
└── global/
    ├── iam/
    └── route53/

模块示例：

# modules/vpc/main.tf
resource "aws_vpc" "main" {
  cidr_block           = var.vpc_cidr
  enable_dns_hostnames = true
  enable_dns_support   = true

  tags = merge(
    var.tags,
    {
      Name = "${var.environment}-vpc"
    }
  )
}

resource "aws_subnet" "private" {
  count             = length(var.private_subnet_cidrs)
  vpc_id            = aws_vpc.main.id
  cidr_block        = var.private_subnet_cidrs[count.index]
  availability_zone = var.availability_zones[count.index]

  tags = merge(
    var.tags,
    {
      Name = "${var.environment}-private-${count.index + 1}"
      Type = "private"
    }
  )
}

# modules/vpc/variables.tf
variable "vpc_cidr" {
  description = "VPC 的 CIDR 块"
  type        = string
}

variable "environment" {
  description = "环境名称"
  type        = string
}

variable "private_subnet_cidrs" {
  description = "私有子网的 CIDR 块"
  type        = list(string)
}

variable "availability_zones" {
  description = "可用区"
  type        = list(string)
}

variable "tags" {
  description = "通用标签"
  type        = map(string)
  default     = {}
}

# modules/vpc/outputs.tf
output "vpc_id" {
  value = aws_vpc.main.id
}

output "private_subnet_ids" {
  value = aws_subnet.private[*].id
}

使用模块：

# environments/prod/main.tf
module "vpc" {
  source = "../../modules/vpc"

  vpc_cidr             = "10.0.0.0/16"
  environment          = "prod"
  private_subnet_cidrs = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
  availability_zones   = ["us-east-1a", "us-east-1b", "us-east-1c"]

  tags = {
    Project   = "MyApp"
    ManagedBy = "Terraform"
  }
}

module "eks" {
  source = "../../modules/eks"

  cluster_name    = "prod-cluster"
  vpc_id          = module.vpc.vpc_id
  subnet_ids      = module.vpc.private_subnet_ids
  node_group_size = 3
}

常见程度： 常见 难度： 困难

云架构

7. 在 AWS 上设计一个高可用的多区域架构。

回答： 用于高可用性的多区域架构：

Loading diagram...

关键组件：

1. DNS 和流量管理：

# 带有健康检查的 Route 53
resource "aws_route53_health_check" "primary" {
  fqdn              = "api.example.com"
  port              = 443
  type              = "HTTPS"
  resource_path     = "/health"
  failure_threshold = 3
  request_interval  = 30
}

resource "aws_route53_record" "api" {
  zone_id = aws_route53_zone.main.zone_id
  name    = "api.example.com"
  type    = "A"

  failover_routing_policy {
    type = "PRIMARY"
  }

  set_identifier = "primary"
  health_check_id = aws_route53_health_check.primary.id

  alias {
    name                   = aws_lb.primary.dns_name
    zone_id                = aws_lb.primary.zone_id
    evaluate_target_health = true
  }
}

2. 数据库复制：

# 带有跨区域只读副本的 RDS
resource "aws_db_instance" "primary" {
  identifier           = "prod-db-primary"
  engine               = "postgres"
  instance_class       = "db.r5.xlarge"
  multi_az             = true
  backup_retention_period = 7

  provider = aws.us-east-1
}

resource "aws_db_instance" "replica" {
  identifier             = "prod-db-replica"
  replicate_source_db    = aws_db_instance.primary.arn
  instance_class         = "db.r5.xlarge"
  auto_minor_version_upgrade = false

  provider = aws.us-west-2
}

3. 数据复制：

# S3 跨区域复制
resource "aws_s3_bucket_replication_configuration" "replication" {
  bucket = aws_s3_bucket.source.id
  role   = aws_iam_role.replication.arn

  rule {
    id     = "replicate-all"
    status = "Enabled"

    destination {
      bucket        = aws_s3_bucket.destination.arn
      storage_class = "STANDARD"
    }
  }
}

设计原则：

主动-主动或主动-被动设置
通过健康检查实现自动故障转移
以最小的延迟进行数据复制
跨区域的一致部署
两个区域的监控和警报

常见程度： 常见 难度： 困难

GitOps & CI/CD

8. 解释 GitOps 以及如何使用 ArgoCD 实现它。

回答： GitOps 使用 Git 作为声明式基础设施和应用程序的单一事实来源。

原则：

Git 中的声明式配置
自动同步
所有更改的版本控制
持续协调

ArgoCD 实现：

# 应用程序清单
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: myapp
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/org/app-manifests
    targetRevision: main
    path: k8s/overlays/production
  destination:
    server: https://kubernetes.default.svc
    namespace: production
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
      allowEmpty: false
    syncOptions:
    - CreateNamespace=true
    retry:
      limit: 5
      backoff:
        duration: 5s
        factor: 2
        maxDuration: 3m

目录结构：

app-manifests/
├── base/
│   ├── deployment.yaml
│   ├── service.yaml
│   └── kustomization.yaml
└── overlays/
    ├── dev/
    │   ├── kustomization.yaml
    │   └── patches/
    ├── staging/
    └── production/
        ├── kustomization.yaml
        ├── replicas.yaml
        └── resources.yaml

Kustomization：

# overlays/production/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

bases:
- ../../base

replicas:
- name: myapp
  count: 5

resources:
- ingress.yaml

patches:
- path: resources.yaml
  target:
    kind: Deployment
    name: myapp

优点：

Git 作为审计跟踪
易于回滚 (git revert)
声明式的期望状态
自动漂移检测
多集群管理

常见程度： 常见 难度： 中等

安全 & 合规

9. 如何在 Kubernetes 中实施安全最佳实践？

回答： 多层安全方法：

1. Pod 安全标准：

apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/warn: restricted

2. RBAC（基于角色的访问控制）：

# 开发人员的角色
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: production
  name: developer
rules:
- apiGroups: ["", "apps"]
  resources: ["pods", "deployments", "services"]
  verbs: ["get", "list", "watch"]
- apiGroups: [""]
  resources: ["pods/log"]
  verbs: ["get"]

# RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: developer-binding
  namespace: production
subjects:
- kind: Group
  name: developers
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: developer
  apiGroup: rbac.authorization.k8s.io

3. Network Policies：

# 默认拒绝所有入口
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-ingress
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Ingress

4. Secrets 管理：

# External Secrets Operator
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: app-secrets
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: aws-secrets-manager
    kind: SecretStore
  target:
    name: app-secrets
    creationPolicy: Owner
  data:
  - secretKey: database-password
    remoteRef:
      key: prod/database
      property: password

5. Security Context：

apiVersion: v1
kind: Pod
metadata:
  name: secure-pod
spec:
  securityContext:
    runAsNonRoot: true
    runAsUser: 1000
    fsGroup: 2000
    seccompProfile:
      type: RuntimeDefault
  containers:
  - name: app
    image: myapp:1.0
    securityContext:
      allowPrivilegeEscalation: false
      readOnlyRootFilesystem: true
      capabilities:
        drop:
        - ALL
    volumeMounts:
    - name: tmp
      mountPath: /tmp
  volumes:
  - name: tmp
    emptyDir: {}

6. 镜像扫描：

# 带有 OPA 的准入控制器
apiVersion: v1
kind: ConfigMap
metadata:
  name: opa-policy
data:
  policy.rego: |
    package kubernetes.admission
    
    deny[msg] {
      input.request.kind.kind == "Pod"
      image := input.request.object.spec.containers[_].image
      not startswith(image, "registry.company.com/")
      msg := sprintf("Image %v is not from approved registry", [image])
    }

常见程度： 非常常见 难度： 困难

可观测性 & SRE

10. 设计一个全面的可观测性堆栈。

回答： 可观测性的三大支柱：指标、日志、追踪

架构：

Loading diagram...

1. 指标 (Prometheus + Grafana)：

# 应用程序指标的 ServiceMonitor
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: app-metrics
spec:
  selector:
    matchLabels:
      app: myapp
  endpoints:
  - port: metrics
    interval: 30s
    path: /metrics

2. 日志 (Loki)：

# 用于日志收集的 Promtail 配置
apiVersion: v1
kind: ConfigMap
metadata:
  name: promtail-config
data:
  promtail.yaml: |
    server:
      http_listen_port: 9080
    
    clients:
      - url: http://loki:3100/loki/api/v1/push
    
    scrape_configs:
      - job_name: kubernetes-pods
        kubernetes_sd_configs:
          - role: pod
        relabel_configs:
          - source_labels: [__meta_kubernetes_pod_label_app]
            target_label: app
          - source_labels: [__meta_kubernetes_namespace]
            target_label: namespace

3. 追踪 (Jaeger)：

# 应用程序检测
from opentelemetry import trace
from opentelemetry.exporter.jaeger.thrift import JaegerExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor

# 设置追踪
trace.set_tracer_provider(TracerProvider())
jaeger_exporter = JaegerExporter(
    agent_host_name="jaeger-agent",
    agent_port=6831,
)
trace.get_tracer_provider().add_span_processor(
    BatchSpanProcessor(jaeger_exporter)
)

tracer = trace.get_tracer(__name__)

# 在代码中使用
with tracer.start_as_current_span("process_request"):
    # 你的代码
    pass

4. 警报规则：

# PrometheusRule
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: app-alerts
spec:
  groups:
  - name: app
    interval: 30s
    rules:
    - alert: HighErrorRate
      expr: |
        sum(rate(http_requests_total{status=~"5.."}[5m]))
        /
        sum(rate(http_requests_total[5m]))
        > 0.05
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "检测到高错误率"
        description: "错误率为 {{ $value | humanizePercentage }}"
    
    - alert: HighLatency
      expr: |
        histogram_quantile(0.95,
          rate(http_request_duration_seconds_bucket[5m])
        ) > 1
      for: 10m
      labels:
        severity: warning
      annotations:
        summary: "检测到高延迟"

5. SLO 监控：

# SLO 定义
apiVersion: sloth.slok.dev/v1
kind: PrometheusServiceLevel
metadata:
  name: api-availability
spec:
  service: "api"
  labels:
    team: "platform"
  slos:
    - name: "requests-availability"
      objective: 99.9
      description: "API 请求应该成功"
      sli:
        events:
          errorQuery: sum(rate(http_requests_total{status=~"5.."}[{{.window}}]))
          totalQuery: sum(rate(http_requests_total[{{.window}}]))
      alerting:
        pageAlert:
          labels:
            severity: critical
        ticketAlert:
          labels:
            severity: warning

常见程度： 常见 难度： 困难

灾难恢复

11. 如何为 Kubernetes 集群实施灾难恢复？

回答： 全面的 DR 策略：

1. 备份策略：

# Velero 备份计划
apiVersion: velero.io/v1
kind: Schedule
metadata:
  name: daily-backup
  namespace: velero
spec:
  schedule: "0 2 * * *"  # 每天凌晨 2 点
  template:
    includedNamespaces:
    - production
    - staging
    excludedResources:
    - events
    - events.events.k8s.io
    storageLocation: aws-s3
    volumeSnapshotLocations:
    - aws-ebs
    ttl: 720h  # 30 天

2. etcd 备份：

#!/bin/bash
# 自动化的 etcd 备份脚本

ETCDCTL_API=3 etcdctl \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key \
  snapshot save /backup/etcd-snapshot-$(date +%Y%m%d-%H%M%S).db

# 上传到 S3
aws s3 cp /backup/etcd-snapshot-*.db s3://etcd-backups/

# 清理旧备份
find /backup -name "etcd-snapshot-*.db" -mtime +7 -delete

3. 恢复过程：

# 从快照恢复 etcd
ETCDCTL_API=3 etcdctl snapshot restore snapshot.db \
  --data-dir=/var/lib/etcd-restore \
  --initial-cluster=etcd-0=https://10.0.1.10:2380 \
  --initial-advertise-peer-urls=https://10.0.1.10:2380

# 使用 Velero 恢复应用程序
velero restore create --from-backup daily-backup-20231125
velero restore describe <restore-name>

4. 多区域故障转移：

# 用于多区域设置的 Terraform
module "primary_cluster" {
  source = "./modules/eks"
  region = "us-east-1"
  # ... 配置
}

module "dr_cluster" {
  source = "./modules/eks"
  region = "us-west-2"
  # ... 配置
}

# Route 53 健康检查和故障转移
resource "aws_route53_health_check" "primary" {
  fqdn              = module.primary_cluster.endpoint
  port              = 443
  type              = "HTTPS"
  resource_path     = "/healthz"
  failure_threshold = 3
}

5. RTO/RPO 目标：

RTO（恢复时间目标）： < 1 小时
RPO（恢复点目标）： < 15 分钟
定期 DR 演练（每月）
记录在案的运行手册
尽可能实现自动故障转移

常见程度： 常见 难度： 困难

服务网格

12. 解释服务网格架构以及何时使用它。

回答： 服务网格为服务到服务通信提供基础设施层。

核心组件：

Loading diagram...

Istio 实现：

# 用于流量路由的 Virtual Service
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: reviews
spec:
  hosts:
  - reviews
  http:
  - match:
    - headers:
        end-user:
          exact: jason
    route:
    - destination:
        host: reviews
        subset: v2
  - route:
    - destination:
        host: reviews
        subset: v1
      weight: 80
    - destination:
        host: reviews
        subset: v2
      weight: 20

# Destination Rule
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: reviews
spec:
  host: reviews
  trafficPolicy:
    loadBalancer:
      simple: LEAST_REQUEST
  subsets:
  - name: v1
    labels:
      version: v1
  - name: v2
    labels:
      version: v2
    trafficPolicy:
      connectionPool:
        tcp:
          maxConnections: 100
        http:
          http1MaxPendingRequests: 50
          http2MaxRequests: 100

熔断：

apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: backend
spec:
  host: backend
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 100
      http:
        http1MaxPendingRequests: 10
        maxRequestsPerConnection: 2
    outlierDetection:
      consecutiveErrors: 5
      interval: 30s
      baseEjectionTime: 30s
      maxEjectionPercent: 50

Mutual TLS：

apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
  namespace: production
spec:
  mtls:
    mode: STRICT

# Authorization Policy
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: frontend-policy
spec:
  selector:
    matchLabels:
      app: frontend
  action: ALLOW
  rules:
  - from:
    - source:
        principals: ["cluster.local/

最新职业建议

高级 DevOps 工程师生产系统面试题

简介

高级 Kubernetes

1. 解释 Kubernetes 架构以及关键组件的作用。

2. 如何排查卡在 CrashLoopBackOff 状态的 Pod？

3. 解释 Kubernetes 网络：Services、Ingress 和 Network Policies。

4. 如何在 Kubernetes 中实现自动伸缩？

高级 Terraform

5. 解释 Terraform 状态管理和最佳实践。

6. 如何为大型项目构建 Terraform 代码？

云架构

7. 在 AWS 上设计一个高可用的多区域架构。

GitOps & CI/CD

8. 解释 GitOps 以及如何使用 ArgoCD 实现它。

安全 & 合规

9. 如何在 Kubernetes 中实施安全最佳实践？

可观测性 & SRE

10. 设计一个全面的可观测性堆栈。

灾难恢复

11. 如何为 Kubernetes 集群实施灾难恢复？

服务网格

12. 解释服务网格架构以及何时使用它。

真正有效的每周职业建议

真正有效的每周职业建议

相关文章

Node.js 高级后端面试题

Python 高级后端开发面试题

Go后端开发工程师面试题：实用指南

停止申请，开始被录用

分享这篇文章

让您的6秒钟发挥作用