目錄
- 正文
- kubernetes調度pod簡介
- kubelet 創建pod代碼及圖解說明
- kubelet 簡介
- kubelet創建及啟動pod流程
- kubelet 創建pod代碼調用圖解
- kubelet 創建pod詳細說明
- kubelet 調用cri說明
- kubelet創建pod整體架構圖
- kubelet創建pod日志說明
正文
本文將從如下方面介紹kubelet創建pod的過程
- kubernetes調度pod簡介
- kubelet 創建pod代碼圖解說明 (本文重點)
- kubelet 調用cri創建容器說明 (本文重點)
- 通過日志來分析kubelet真實創建日志的全過程 (本文重點)
kubernetes調度pod簡介
kubernetes(后面簡稱k8s)主要有三種管理(創建)pod的方式:
- 一種是直接申明創建一個裸pod
- 另一種是通過controller 來申明創建pod:比如,deployments、replicationcontrollers、daemonsets或者replicasets
- 還有一種是static(靜態) pod 這種用的比較少,一般是把pod的申明文件放在對應的kubernetes/manifest 目錄下,通常用來創建apiserver,controller-manager,scheduler這類k8s管理組件的pod。
k8s推薦使用controller來管理pod,這符合k8s管理pod的習慣,便于使用k8s相關功能,比如彈性擴縮容,pod故障自動拉起等。 我們也以controller管理的pod為例,簡單梳理下k8s創建及調度pod流程,如下圖
- 客戶端請求apiserver創建replicasets,apiserver通過認證、鑒權、準入后,會把請求相關信息持久化至etcd
- Controller-manager 管理的replicaset controller 通過list-watch機制,watch到有replicasets創建請求,通過label selector發現集群中與這個replicasets 關聯的pod當前狀態與期望狀態不一致,則會進行調協(reconcile)向apiserver發起創建pod請求
- Scheduler 通過list-watch機制來發現未綁定的pod,并通過預選及優選策略算法,來計算出pod最終可調度的node節點,并通過apiserver將數據更新至etcd
- Kubelet 通過list-watch發現有新的pod bound到本node上,則會發起創建pod相關流程
kubelet 創建pod代碼及圖解說明
kubelet 簡介
Kubelet 有點和controller類似,也是通過list-watch相關信息,或者輪詢本地pod相關信息及事件,來觸發相關動作,使pod處于”期望狀態”,并且向apiserver上報本node(宿主機)及node里所有pod的狀態信息。
kubelet 不同于其他controller的一點就是,它是部署在每個node節點上的agent,它需要與apiserver 打交道同樣也需要與cri(contain-runtime-interface)打交道來管理node上的容器。所以它需要通過apiserver來watch到對本地pod變更的事件,也需要不斷輪詢pod狀態信息,將狀態及時同步給apiserver,所以Kubelet整體工作邏輯是loop監聽各類生產者產生的消息或者定時觸發消息,來調用相應的消費者(不同的子模塊)完成不同的操作,比如watch 到apiserver的請求,PLEG(pod lifecycle event generator)產生的事件,定時觸發的任務等
kubelet創建及啟動pod流程
kubelet 創建pod代碼調用圖解
kubelet 創建pod詳細說明
- 1.kubelet 會listwatch所有namespace下、綁定到本node上的pod,并將信息傳入updatechannel。kubelet 的SyncLoop(是kubele的主循環函數,來控制例行循環往復的事情:同步接收、更新、處理pod變更相關信息)下的syncLoopIteration方法會監聽多方消息,會監聽各個消息源,來觸發相應的操作,這個方法會接收前面listwatch到的updatechannel信息,交由對應的handler:如pod創建:調用HandlePodAdditions處理,pod刪除調用HandlePodUpdates處理(DELETE is treated as a UPDATE because of graceful deletion.)
- 2.HandlePodAdditions 會對pods 進行排序,判斷,準入校驗,之后調用dispatchWork 把對某個pod的操作 分配給 podWorkers 做異步操作(pod創建、刪除、更新)處理
- 3.異步操作會調用kubelet syncPod(syncPod is the transaction script for the sync of a single pod.)方法,syncPod會做一些pod創建前的準備工作
a.如果pod updateType 為podkill,立即執行并返回(走pod刪除流程)
b.pod準入檢查檢查pod是否能運行在本節點
c.更新狀態給 status manager ,status manager將pod狀態上報給apiserver
d.檢查網絡插件是否就緒
e.創建并更新pod cgroups配置
f.為pod創建對應的目錄:pod目錄,volume目錄
g.等待pod sepc中的volme都被attach/mount
h.從apiserver中獲取pull secrets
i.調用 containerRuntime 的 SyncPod 方法開始創建容器
復制代碼
- 4.containerRuntime 的 SyncPod 會做如下主要工作
a.創建sandbox
b.Create ephemeral containers
c.Create init containers
d.Create normal containers
復制代碼
其中創建sandbox是關鍵,sandbox可以理解為pod的運行環境,是業務pod的父容器,在k8s里就是pause 容器,所有容器創建前都需要創建pause容器。首先會生成podsandbox相關配置:如dnsconfig,podhostname,設置sysctl,cgroups以及namespace
然后會調用CRI(container-runtime-interface)來調用底層container runtime來真實操作容器,之后還會調用CNI插件來為容器設置網絡。
- 5.我們再來看下創建sandbox:RunpodSandbox的步驟 (ds *dockerService) RunPodSandbox 是在是一個cri的是實現,所以在dockershim下dockershim是內置在kubelet里的cri實現,用來銜接kubelet與docker,dockershim翻譯為docker"墊片",很形象)。kubelet通過grp call調用的dockershim來實現容器的創建管理。
a.調用docker API Pull the image for the sandbox.
(kubelet 的sandbox鏡像:defaultSandboxImage = "k8s.gcr.io/pause:3.2")b. 調用docker Create the sandbox container.
c.Create Sandbox Checkpoint.
d.調用docker Start the sandbox container.
e.Rewrite resolv.conf file generated by docker.
f. Setup networking for the sandbox. 調用cni插件為容器設置網絡
kubelet 調用cri說明
我們目前container-runtime為docker,docker并不支持CRI,所以要想調用docker 操作容器,k8s內置了dockershim來調用docker,dockershim可以理解為一個滿足CRI標準的容器運行時,kubelet通過grpc call 來調用dockershim,dockershim收到kubelet的請求后,將其轉化為REST API請求,再發送給docker daemon,docker daemon 在通過組裝請求,調用docker API來完成container的最終創建、啟動等相關操作。
這塊有兩個地方需要說明下:
1是為啥會有dockershim? 這里有個小故事,首先k8s再具有一定市場規模后,想與docker 解耦,不想強依賴docker,同時為了支持多種container-runtime,故制定了CRI,只有滿足CRI,kubelet便可以直接完成調用來管理container,然而docker一開始并不支持CRI,故k8s想了個這種的方式,開發了一個dockershim(docker "墊片")來轉發請求,這樣k8s也完成了對docker的解耦,當然這看起來較繁瑣且影響性能,故在kubernetes 1.24后,kubernetes宣布啟用dockershim,需要我們在該版本后主動配置container-runtime。
2.docker這面也很早就做了應對,docker抽離出了支持CRI標準的containerd,通過containerd來管理容器。
所以如下圖,調用docker API創建容器后,docker還會調用docker-containerd來管理創建容器,docker-containerd通過docker-containerd-shim來間接管理container,這樣一個好處就是升級或重啟docker,我們的業務容器依然可以正常運行,最終docker-containerd-shim通過runc來創建container,runc是docker做的基于oci的實現就是以前的libcontainer,用于容器創建。
kubelet創建pod整體架構圖
(container-runtime="docker",大多數企業目前應該都是使用的這種方式)
kubelet創建pod日志說明
我們通過實戰,開啟debug日志來看下kubelet在創建pod時做了哪些工作
注:日志僅保留主要輸出及過濾敏感信息
1.收到新pod創建時間,寫入updatechannel通道
I0921 18:10:00.486345 26075 config.go:414] Receiving a new pod "opslk1-xxx_lktest01(xxx-3995-11ed-80a8-48df37244930)"
2.syncLoop: 收到add事件
I0921 18:10:00.757557 26075 kubelet.go:2007] SyncLoop (ADD, "api"): opslk1-xxx_lktest01(xxx-3995-11ed-80a8-48df37244930)
3.準入驗證pod fit success
I0921 18:10:00.759786 26075 predicates.go:986] Pod: opslk1-xxx fit success. Node: xx.xx.10.9 has enough resources.
4.流轉至syncPod,SyncPodType=create
I0921 18:10:00.759956 26075 kubelet.go:1498] syncPod "xxx-3995-11ed-80a8-48df37244930" updateType:{{ } types.SyncPodType=create)
5.獲取pod狀態
I0921 18:10:00.760128 26075 kubelet_pods.go:1529] Generating status for "opslk1-xxx_lktest01(xxx-3995-11ed-80a8-48df37244930)"
I0921 18:10:00.760148 26075 kubelet_pods.go:1494] pod waiting > 0, pending
I0921 18:10:00.760174 26075 kubelet.go:1603] apiPodStatus.Phase:Pending pod:"opslk1-xxx_lktest01(xxx-3995-11ed-80a8-48df37244930)"
6.配置cgroupConfig,設置cpu,內存
I0921 18:10:00.760200 26075 kubelet_resources.go:149] Newest cgroupConfig for pod:"opslk1-5sfjn_lktest01(739e1c1a-3175-11ed-aff8-48df37244926)"
are kubelet.cgroupResource{cpuShares:xxx, cpuQuota:xxx, memoryLimit:xxx, memoryLimitSwap:xxx}.
7.等待pod相關volume attach及掛載
I0921 18:10:00.768211 26075 volume_manager.go:350] Waiting for volumes to attach and mount for pod "opslk1-xxx_lktest01(xxx-3995-11ed-80a8-48df37244930)"
8.向apiserver同步狀態,先GET后PATCH
I0921 18:10:00.791361 26075 round_trippers.go:419] curl -k -v -XGET 'https://xxx/api/v1/namespaces/lktest01/pods/opslk1-xxx'
I0921 18:10:00.794250 26075 round_trippers.go:419] curl -k -v -XPATCH 'https://xxx/api/v1/namespaces/lktest01/pods/opslk1-xxx/status'
I0921 18:10:00.798998 26075 status_manager.go:506] Status for pod "opslk1-xxx_lktest01(xxx-3995-11ed-80a8-48df37244930)" updated successfully: (1, {Phase:Pending Conditions:[{Type:Initialized
9.根據期望狀態開始調協,Reconcile Pod "Ready" condition if necessary. Trigger sync pod for reconciliation.
I0921 18:10:00.799365 26075 kubelet.go:2020] SyncLoop (RECONCILE, "api"): "opslk1-xxx_lktest01(xxx-3995-11ed-80a8-48df37244930)"
10.mount volume
I0921 18:10:02.177479 26075 operation_generator.go:506] MountVolume.WaitForAttach succeeded for volume "volume" DevicePath "/dev/mapper/docker-xxx_3995_11ed_80a8_48df37244930"
I0921 18:10:03.136754 26075 operation_generator.go:527] MountVolume.MountDevice succeeded for volume "volume" device mount path "/export/kubelet/pods/xxx-3995-11ed-80a8-48df37244930/volumes/kubernetes.io~lvm/volume"
I0921 18:10:03.136851 26075 operation_generator.go:567] MountVolume.SetUp succeeded for volume "volume" (UniqueName: "flexvolume-kubernetes.io/lvm/xxx_3995_11ed_80a8_48df37244930") pod "opslk1-xxx"
11.volumes attached、mounted 完畢
I0921 18:10:03.168555 26075 volume_manager.go:384] All volumes are attached and mounted for pod "opslk1-xxx_lktest01(xxx-3995-11ed-80a8-48df37244930)"
12.調用 containerRuntime 的 SyncPod 方法開始創建容器
I0921 18:10:03.168568 26075 kuberuntime_manager.go:468] Syncing Pod "opslk1-xxx_lktest01(xxx-3995-11ed-80a8-48df37244930)": &Pod{}
13.創建sandbox容器:Setting cgroup parent,RunPodSandbox,Calling network plugin cni to set up pod
I0921 18:10:03.168833 26075 kuberuntime_manager.go:398] No sandbox for pod "opslk1-xxx_lktest01(xxx-3995-11ed-80a8-48df37244930)" can be found. Need to start a new one"opslk1-xxx_lktest01(xxx-3995-11ed-80a8-48df37244930)"
I0921 18:10:03.168885 26075 kuberuntime_manager.go:605] SyncPod received new pod "opslk1-xxx_lktest01(xxx-3995-11ed-80a8-48df37244930)", will create a sandbox for it
I0921 18:10:03.168891 26075 kuberuntime_manager.go:614] Stopping PodSandbox for "opslk1-xxx_lktest01(xxx-3995-11ed-80a8-48df37244930)", will start new one
I0921 18:10:03.168901 26075 kuberuntime_manager.go:841] Stop app containers for pod:"opslk1-xxx_lktest01(xxx-3995-11ed-80a8-48df37244930)".
I0921 18:10:03.168913 26075 kuberuntime_manager.go:666] Creating sandbox for pod "opslk1-xxx_lktest01(xxx-3995-11ed-80a8-48df37244930)"
I0921 18:10:03.170818 26075 docker_service.go:460] Setting cgroup parent to: "/kubepods/burstable/podxxx-3995-11ed-80a8-48df37244930"
I0921 18:10:03.170827 26075 docker_sandbox.go:108] RunPodSandbox PodName:opslk1-xxx PodUID:xxx-3995-11ed-80a8-48df37244930 NameSpace:lktest01
I0921 18:10:04.297831 26075 plugins.go:377] Calling network plugin cni to set up pod "opslk1-xxx_lktest01"
I0921 18:10:04.298323 26075 manager.go:1011] Added container: "/kubepods/burstable/podxxx-3995-11ed-80a8-48df37244930/805dda102e017247685240c2f740295396edcb7071dfe211979215eac0870e0b"
I0921 18:10:04.298535 26075 container.go:448] Start housekeeping for container "/kubepods/burstable/podxxx-3995-11ed-80a8-48df37244930/805dda102e017247685240c2f740295396edcb7071dfe211979215eac0870e0b"
I0921 18:10:04.298693 26075 cni.go:337] Got netns path /proc/26876/ns/net
I0921 18:10:04.298701 26075 cni.go:338] Using podns path lktest01
I0921 18:10:04.298820 26075 cni.go:307] About to add CNI network cni-loopback (type=loopback)
I0921 18:10:04.301399 26075 cni.go:337] Got netns path /proc/26876/ns/net
I0921 18:10:04.301405 26075 cni.go:338] Using podns path lktest01
I0921 18:10:04.301466 26075 cni.go:307] About to add CNI network cni (type=cni)
I0921 18:10:04.392172 26075 kuberuntime_manager.go:680] Created PodSandbox "805dda102e017247685240c2f740295396edcb7071dfe211979215eac0870e0b" for pod "opslk1-xxx_lktest01(xxx-3995-11ed-80a8-48df37244930)"
I0921 18:10:04.396981 26075 kuberuntime_manager.go:699] Determined the ip "xx.xx.226.17" for pod "opslk1-xxx_lktest01(xxx-3995-11ed-80a8-48df37244930)" after sandbox changed
14,創建常規容器
I0921 18:10:04.397114 26075 kuberuntime_manager.go:750] Creating container &Container{} in pod opslk1-xxx_lktest01(xxx-3995-11ed-80a8-48df37244930)
I0921 18:10:04.398859 26075 kuberuntime_container.go:108] Generating ref for container opslk: &v1.ObjectReference{Kind:"Pod", Namespace:"lktest01", Name:"opslk1-xxx"}
I0921 18:10:04.398883 26075 kuberuntime_container.go:117] To determine whether to restart the old container. Pod:opslk1-xxx_lktest01 PodIP: PodSandboxId: NameSpace:lktest01
I0921 18:10:04.398888 26075 kuberuntime_container.go:258] pod:opslk1-xxx default KeepRootDirForPod: true
I0921 18:10:04.398935 26075 server.go:471] Event(v1.ObjectReference{Kind:"Pod", Namespace:"lktest01", Name:"opslk1-xxx", UID:"xxx-3995-11ed-80a8-48df37244930", APIVersion:"v1", ResourceVersion:"19846024411", FieldPath:"spec.containers{opslk}"})
以上就是詳解kubelet 創建pod流程代碼圖解及日志說明的詳細內容,更多關于kubelet創建pod流程的資料請關注其它相關文章!