무엇을 하려고?
이번에 제목의 기술들로 AWS EC2에 kubernetes cluster를 배포하기로 했다. 실습의 의미이기 때문에 최소한의 자원만 사용하려고 한다. 혹시 해당 기술들을 모르는 사람들을 위해 짧게 설명하겠다!
- Terraform : 코드로 인프라를 관리할 수 있게 해주는 오픈소스 도구이다.
- Kubespray : 프로덕션 환경에 쿠버네티스(Kubernetes) 클러스터를 설치하고 구성할 수 있도록 도와주는 오픈소스 자동화 도구이다.
- AWS : 도구는 아니다. Amazon Web Service의 약자로, 아마존에서 제공하는 세계 최대의 클라우드 컴퓨팅 서비스 플랫폼이다.
kubernetes cluster를 배포하다
디렉토리 구조는 다음과 같다.
kubernetes-iac/
├── setup.sh
├── deploy.sh
└── terraform/
├── main.tf
├── variables.tf
├── terraform.tfvars
└── outputs.tf
sh파일은 전부 terraform 실행에 관련된 명령어 모음이다. 해당 내용을 전부 보여줄 수는 없으니 일부 결과만 보이도록 하겠다.
Terraform 초기화
# 초기화는 테라폼 프로젝트를 시작하고 코드를 실행하기 위해 가장 먼저 수행해야 하는 필수 단계이다.
# 테라폼 구성 파일(.tf 파일)이 있는 디렉토리를 테라폼이 인식하고 작업을 수행할 수 있는 루트 모듈(root module)로 설정한다.
# 루트 모듈은 인프라 구성의 시작점 역할을 한다.
terraform init
결과는 다음과 같다. Terraform has been successfully initialized!
Initializing the backend...
Initializing provider plugins...
- Finding hashicorp/aws versions matching "~> 4.16"...
- Finding latest version of hashicorp/local...
- Installing hashicorp/aws v4.67.0...
- Installed hashicorp/aws v4.67.0 (signed by HashiCorp)
- Installing hashicorp/local v2.5.3...
- Installed hashicorp/local v2.5.3 (signed by HashiCorp)
Terraform has created a lock file .terraform.lock.hcl to record the provider
selections it made above. Include this file in your version control repository
so that Terraform can guarantee to make the same selections by default when
you run "terraform init" in the future.
Terraform has been successfully initialized!
You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.
If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.
Terrafrom 인프라 검사
# 테라폼 코드를 검사하는 명령어다.
terraform plan
작성한 구성 파일이 제대로 작동되는지 검증할 필요가 있다. 위 명령어를 사용하면 검증이 시작된다.
결과는 다음과 같다. 민감한 부분은 최대한 지웠기 때문에 양해를..!
Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
+ create
Terraform will perform the following actions:
# aws_instance.master will be created
+ resource "aws_instance" "master" {
+ ami = ""
+ arn = (known after apply)
+ associate_public_ip_address = (known after apply)
+ availability_zone = (known after apply)
+ cpu_core_count = (known after apply)
+ cpu_threads_per_core = (known after apply)
+ disable_api_stop = (known after apply)
+ disable_api_termination = (known after apply)
+ ebs_optimized = (known after apply)
+ get_password_data = false
+ host_id = (known after apply)
+ host_resource_group_arn = (known after apply)
+ iam_instance_profile = (known after apply)
+ id = (known after apply)
+ instance_initiated_shutdown_behavior = (known after apply)
+ instance_state = (known after apply)
+ instance_type = ""
+ ipv6_address_count = (known after apply)
+ ipv6_addresses = (known after apply)
+ key_name = "kube-cluster-key"
+ monitoring = (known after apply)
+ outpost_arn = (known after apply)
+ password_data = (known after apply)
+ placement_group = (known after apply)
+ placement_partition_number = (known after apply)
+ primary_network_interface_id = (known after apply)
+ private_dns = (known after apply)
+ private_ip = (known after apply)
+ public_dns = (known after apply)
+ public_ip = (known after apply)
+ secondary_private_ips = (known after apply)
+ security_groups = (known after apply)
+ source_dest_check = true
+ subnet_id = (known after apply)
+ tags = {
+ "Name" = "kube-cluster-master"
+ "Role" = "master,etcd"
}
+ tags_all = {
+ "Name" = "kube-cluster-master"
+ "Role" = "master,etcd"
}
+ tenancy = (known after apply)
+ user_data = (known after apply)
+ user_data_base64 = (known after apply)
+ user_data_replace_on_change = false
+ vpc_security_group_ids = (known after apply)
+ capacity_reservation_specification (known after apply)
+ cpu_options (known after apply)
+ ebs_block_device (known after apply)
+ enclave_options (known after apply)
+ ephemeral_block_device (known after apply)
+ maintenance_options (known after apply)
+ metadata_options (known after apply)
+ network_interface (known after apply)
+ private_dns_name_options (known after apply)
+ root_block_device {
+ delete_on_termination = true
+ device_name = (known after apply)
+ encrypted = (known after apply)
+ iops = (known after apply)
+ kms_key_id = (known after apply)
+ throughput = (known after apply)
+ volume_id = (known after apply)
+ volume_size =
+ volume_type = ""
}
}
# aws_instance.worker[0] will be created
+ resource "aws_instance" "worker" {
+ ami = ""
+ arn = (known after apply)
+ associate_public_ip_address = (known after apply)
+ availability_zone = (known after apply)
+ cpu_core_count = (known after apply)
+ cpu_threads_per_core = (known after apply)
+ disable_api_stop = (known after apply)
+ disable_api_termination = (known after apply)
+ ebs_optimized = (known after apply)
+ get_password_data = false
+ host_id = (known after apply)
+ host_resource_group_arn = (known after apply)
+ iam_instance_profile = (known after apply)
+ id = (known after apply)
+ instance_initiated_shutdown_behavior = (known after apply)
+ instance_state = (known after apply)
+ instance_type = ""
+ ipv6_address_count = (known after apply)
+ ipv6_addresses = (known after apply)
+ key_name = ""
+ monitoring = (known after apply)
+ outpost_arn = (known after apply)
+ password_data = (known after apply)
+ placement_group = (known after apply)
+ placement_partition_number = (known after apply)
+ primary_network_interface_id = (known after apply)
+ private_dns = (known after apply)
+ private_ip = (known after apply)
+ public_dns = (known after apply)
+ public_ip = (known after apply)
+ secondary_private_ips = (known after apply)
+ security_groups = (known after apply)
+ source_dest_check = true
+ subnet_id = (known after apply)
+ tags = {
+ "Name" = "kube-cluster-worker-0"
+ "Role" = "worker"
}
+ tags_all = {
+ "Name" = "kube-cluster-worker-0"
+ "Role" = "worker"
}
+ tenancy = (known after apply)
+ user_data = (known after apply)
+ user_data_base64 = (known after apply)
+ user_data_replace_on_change = false
+ vpc_security_group_ids = (known after apply)
+ capacity_reservation_specification (known after apply)
+ cpu_options (known after apply)
+ ebs_block_device (known after apply)
+ enclave_options (known after apply)
+ ephemeral_block_device (known after apply)
+ maintenance_options (known after apply)
+ metadata_options (known after apply)
+ network_interface (known after apply)
+ private_dns_name_options (known after apply)
+ root_block_device {
+ delete_on_termination = true
+ device_name = (known after apply)
+ encrypted = (known after apply)
+ iops = (known after apply)
+ kms_key_id = (known after apply)
+ throughput = (known after apply)
+ volume_id = (known after apply)
+ volume_size =
+ volume_type = ""
}
}
# aws_internet_gateway.kubernetes_igw will be created
+ resource "aws_internet_gateway" "kubernetes_igw" {
+ arn = (known after apply)
+ id = (known after apply)
+ owner_id = (known after apply)
+ tags = {
+ "Name" = "kube-cluster-igw"
}
+ tags_all = {
+ "Name" = "kube-cluster-igw"
}
+ vpc_id = (known after apply)
}
# aws_key_pair.kubernetes_key will be created
+ resource "aws_key_pair" "kubernetes_key" {
+ arn = (known after apply)
+ fingerprint = (known after apply)
+ id = (known after apply)
+ key_name = "kube-cluster-key"
+ key_name_prefix = (known after apply)
+ key_pair_id = (known after apply)
+ key_type = (known after apply)
+ public_key = "공개 ssh 키"
+ tags_all = (known after apply)
}
# aws_route_table.kubernetes_rt will be created
+ resource "aws_route_table" "kubernetes_rt" {
+ arn = (known after apply)
+ id = (known after apply)
+ owner_id = (known after apply)
+ propagating_vgws = (known after apply)
+ route = [
+ {
+ cidr_block = "0.0.0.0/0"
+ gateway_id = (known after apply)
# (12 unchanged attributes hidden)
},
]
+ tags = {
+ "Name" = "kube-cluster-rt"
}
+ tags_all = {
+ "Name" = "kube-cluster-rt"
}
+ vpc_id = (known after apply)
}
# aws_route_table_association.kubernetes_rta will be created
+ resource "aws_route_table_association" "kubernetes_rta" {
+ id = (known after apply)
+ route_table_id = (known after apply)
+ subnet_id = (known after apply)
}
# aws_security_group.kubernetes_sg will be created
+ resource "aws_security_group" "kubernetes_sg" {
+ arn = (known after apply)
+ description = "Security group for Kubernetes cluster"
+ egress = [
+ {
+ cidr_blocks = [
+ "0.0.0.0/0",
]
+ from_port = 0
+ ipv6_cidr_blocks = []
+ prefix_list_ids = []
+ protocol = "-1"
+ security_groups = []
+ self = false
+ to_port = 0
# (1 unchanged attribute hidden)
},
]
+ id = (known after apply)
+ ingress = [
+ {
+ cidr_blocks = [
+ "0.0.0.0/0",
]
+ from_port = 22
+ ipv6_cidr_blocks = []
+ prefix_list_ids = []
+ protocol = "tcp"
+ security_groups = []
+ self = false
+ to_port = 22
# (1 unchanged attribute hidden)
},
+ {
+ cidr_blocks = [
+ "0.0.0.0/0",
]
+ from_port = 6443
+ ipv6_cidr_blocks = []
+ prefix_list_ids = []
+ protocol = "tcp"
+ security_groups = []
+ self = false
+ to_port = 6443
# (1 unchanged attribute hidden)
},
+ {
+ cidr_blocks = []
+ from_port = 0
+ ipv6_cidr_blocks = []
+ prefix_list_ids = []
+ protocol = "-1"
+ security_groups = []
+ self = true
+ to_port = 0
# (1 unchanged attribute hidden)
},
]
+ name = "kube-cluster-sg"
+ name_prefix = (known after apply)
+ owner_id = (known after apply)
+ revoke_rules_on_delete = false
+ tags = {
+ "Name" = "kube-cluster-sg"
}
+ tags_all = {
+ "Name" = "kube-cluster-sg"
}
+ vpc_id = (known after apply)
}
# aws_subnet.kubernetes_subnet will be created
+ resource "aws_subnet" "kubernetes_subnet" {
+ arn = (known after apply)
+ assign_ipv6_address_on_creation = false
+ availability_zone = "ap-northeast-2a"
+ availability_zone_id = (known after apply)
+ cidr_block = "10.0.1.0/24"
+ enable_dns64 = false
+ enable_resource_name_dns_a_record_on_launch = false
+ enable_resource_name_dns_aaaa_record_on_launch = false
+ id = (known after apply)
+ ipv6_cidr_block_association_id = (known after apply)
+ ipv6_native = false
+ map_public_ip_on_launch = true
+ owner_id = (known after apply)
+ private_dns_hostname_type_on_launch = (known after apply)
+ tags = {
+ "Name" = "kube-cluster-subnet"
}
+ tags_all = {
+ "Name" = "kube-cluster-subnet"
}
+ vpc_id = (known after apply)
}
# aws_vpc.kubernetes_vpc will be created
+ resource "aws_vpc" "kubernetes_vpc" {
+ arn = (known after apply)
+ cidr_block = "10.0.0.0/16"
+ default_network_acl_id = (known after apply)
+ default_route_table_id = (known after apply)
+ default_security_group_id = (known after apply)
+ dhcp_options_id = (known after apply)
+ enable_classiclink = (known after apply)
+ enable_classiclink_dns_support = (known after apply)
+ enable_dns_hostnames = true
+ enable_dns_support = true
+ enable_network_address_usage_metrics = (known after apply)
+ id = (known after apply)
+ instance_tenancy = "default"
+ ipv6_association_id = (known after apply)
+ ipv6_cidr_block = (known after apply)
+ ipv6_cidr_block_network_border_group = (known after apply)
+ main_route_table_id = (known after apply)
+ owner_id = (known after apply)
+ tags = {
+ "Name" = "kube-cluster-vpc"
}
+ tags_all = {
+ "Name" = "kube-cluster-vpc"
}
}
Plan: 10 to add, 0 to change, 0 to destroy.
Changes to Outputs:
+ master_private_ip = (known after apply)
+ master_public_ip = (known after apply)
+ worker_private_ips = [
+ (known after apply),
+ (known after apply),
]
+ worker_public_ips = [
+ (known after apply),
+ (known after apply),
]
"Plan: 10 to add, 0 to change, 0 to destroy."
위 로그를 보면 10개의 구성 모두 유효하는 것을 알 수 있다.
terrafrom 실행
자, 이제 terraform을 실행해보자. 다음 명령어를 실행하면 내가 설정한 인프라만 생성된다.
terraform apply -auto-approve
실행 결과가 있는 터미널을 실수로 닫아서 보여줄 수가 없다...! 구성한 모든 인프라가 잘 생성되었다!
kubespray 실행
이제 kubespray를 사용해서 terraform으로 생성한 ec2에 kubernetes cluster를 배포하려고 한다. 해당 과정은 kubernetes 공식 문서에 있는 kubespray 문단을 참고하자!
https://kubernetes.io/ko/docs/setup/production-environment/tools/kubespray/
Kubespray로 쿠버네티스 설치하기
이 가이드는 Kubespray를 이용하여 GCE, Azure, OpenStack, AWS, vSphere, Equinix Metal(전 Packet), Oracle Cloud infrastructure(실험적) 또는 베어메탈 등에서 운영되는 쿠버네티스 클러스터를 설치하는 과정을 보여준다
kubernetes.io
잠깐, 에러 떴다
kubespray로 배포중에 다음과 같은 에러가 발생했다.
All version string used in kubespray have been normalized to not use a leading 'v'.
This check will be dropped in the next minor release.
버전을 명시할 때 오타 v가 들어갔나 보다. "1.31" 이렇게 작성해야 하는데 "v1.31" 이렇게 작성해버린 것. 가서 수정!
결과
이제 실행 결과를 터미널에서 보자.
...
...
...
...
...
RUNNING HANDLER [kubernetes/kubeadm : Kubeadm | reload systemd] ******************************************************************************************************************************************************************************************************************************
ok: [worker-0]
Thursday 15 May 2025 01:20:05 +0900 (0:00:00.963) 0:07:14.919 **********
LAY RECAP ***********************************************************************************************************************************************************************************************************************************************************************************
master : ok=644 changed=146 unreachable=0 failed=0 skipped=1001 rescued=0 ignored=5
worker-0 : ok=441 changed=91 unreachable=0 failed=0 skipped=632 rescued=0 ignored=0
Thursday 15 May 2025 01:20:41 +0900 (0:00:00.026) 0:07:50.971 **********
===============================================================================
download : Download_container | Download image if required --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 38.19s
kubernetes/kubeadm : Join to cluster if needed --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 16.24s
system_packages : Update package management cache (APT) ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 15.50s
download : Download_file | Download item --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 13.93s
system_packages : Install packages requirements -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 11.37s
download : Download_file | Download item --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 11.15s
kubernetes/control-plane : Kubeadm | Initialize first control plane node ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 10.48s
download : Download_container | Download image if required ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 7.99s
network_plugin/calico : Calico | Create calico manifests ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 7.39s
download : Download_file | Download item ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 6.77s
etcd : Restart etcd ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 6.71s
download : Download_container | Download image if required ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 5.93s
kubernetes-apps/ansible : Kubernetes Apps | CoreDNS ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 5.49s
etcd : Configure | Check if etcd cluster is healthy ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 5.32s
etcd : Configure | Ensure etcd is running --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 5.20s
kubernetes/preinstall : Preinstall | wait for the apiserver to be running ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 5.08s
download : Download_container | Download image if required ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 4.80s
download : Download_container | Download image if required ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 4.23s
download : Download_container | Download image if required ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 4.15s
download : Download_container | Download image if required ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 3.96s
쿠버네티스 클러스터 배포 완료!
더 정확한 확인을 위해서 마스터 노드인 ec2 인스턴스에 접속하여 명시한 버전의 쿠버네티스와 각 노드가 잘 연결됐는지 확인한다.
워커 노드도 마스터 노드에 잘 연결돼있다. 1.31도 확인!
생각
cloudformation은 설치 없이 즉석에서 작성하고 온라인 환경에서 템플릿 검증을 하면 바로 인프라 프로비저닝이 가능하다. 또한 aws 신규 서비스를 거의 즉각 사용할 수 있다. terraform은 멀티 클라우드 플랫폼을 사용할 때 확실한 이점이 있다. 똑같이 IaC를 위한 도구이지만 특정 상황에서 그 용도에 차이가 있다.
kubespray는 온프레미스 환경에서 쿠버네티스를 배포할 때 사용할 수 있다. 쿠버네티스를 직접 설치하려면, 마스터 노드, 워커 노드 구성, 네트워크 플러그인, 인증서 발급, kubelet 설정 등 해야 할 작업이 매우 많다. 이를 자동화한 툴이 나오는 건 당연하다. 한 달 전 처음 쿠버네티스를 배울 때도 실습으로 로컬에서 vm 설정을 일일이 하려고 했을 때 정말 어지러웠다.. 그래서 두 번째 사용할 땐 vagrant 같은 자동화 툴부터 찾았다. 시간은 금이니 꼭 써두자. 물론 구성 요소를 잘 이해해야 하는 건 대전제다.
어떤 상황에서 terraform과 kubespray를 함께 사용하는 것이 최적의 솔루션이 될 수 있을까 라는 의문이 든다. 현재 국내외를 막론하고 대부분의 클라우드 플랫폼에서는 클라우드 서비스의 일환으로 kubernetes cluster를 지원한다. aws 같은 경우엔 EKS 사용할 때도 비용을 최적화할 수 있는 방안을 꾸준히 내고 있다. EKS auto mode는 그중 하나고 지금은 거의 기본 옵션이다. 이런 상황에 이 조합이 시너지를 낼 수 있는 상황이 상상이 되지 않는다. 물론 하나하나는 알아두면 내게 양분이 되니 공부할 것이다.
terraform, docker, kubernetes 다 go 언어로 작성된 것이 인상적이었다. 성능 면에서 우수하다는 것은 알고 있었지만 이런 대규모 프로젝트에서도 쓰일 정도였다는 건 몰랐다. 나중에 시간이 된다면 go를 배우고자 한다. 툴을 쓰기만 하는 게 아니라 조금이라도 더 깊이 이해한다면 보다 전문성을 갖출 수 있지 않을까 한다.