使用ESXi虚拟机安装K8s带GPU的节点

本文最后更新于:2024年3月29日 晚上

前言

安装步骤

创建虚拟机

这个没啥可说,在虚拟机软件中创建虚拟机即可

注意安装的系统版本,应选择使用与后文中安装脚本相同的系统版本,我这边安装的是Ubuntu 20.04 服务器版本

安装k8s

我这边选择的安装工具是KubeSphere,他有完善的功能和图形化界面,可以很方便的对其进行安装。

官网文档地址

安装KubeSphere单节点服务

下载 KubeKey

使用下面的命令安装KubeKey

1
2
3
export KKZONE=cn
curl -sfL https://get-kk.kubesphere.io | VERSION=v3.0.13 sh -

安装依赖项

运行./kk create cluster --with-kubernetes v1.22.12 --with-kubesphere v3.4.1会弹出表格,告诉你缺失什么项目,按照文档安装所需要的缺失项。

1
sudo apt install socat conntrack ebtables ipset 

开始安装

1
./kk create cluster --with-kubernetes v1.23.10 --with-kubesphere v3.4.1

确保环境变量KKZONE的值为cn,
后面的命令是选择安装的版本,其中kubernetes为1.23 kubesphere为3.4.1

查看安装结果

1
2
kubectl logs -n kubesphere-system $(kubectl get pod -n kubesphere-system -l 'app in (ks-install, ks-installer)' -o jsonpath='{.items[0].metadata.name}') -f

通过这个命令,初始密码会显示在终端中

安装NVIDIA显卡驱动

注意,一定要安装EXSi版本的显卡驱动,否则安装后会死机

安装docker nvidia-docker模块

安装Container gpu模块

因为k8s在v1.20后容器由k8s换为container,因此我们需要安装container的gpu模块

配置k8s上安装NVIDIA显卡驱动插件

安装完上述插件后,安装gpu显卡插件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: nvidia-device-plugin-daemonset
namespace: kube-system
spec:
selector:
matchLabels:
name: nvidia-device-plugin-ds
updateStrategy:
type: RollingUpdate
template:
metadata:
labels:
name: nvidia-device-plugin-ds
spec:
tolerations:
- key: nvidia.com/gpu
operator: Exists
effect: NoSchedule
# Mark this pod as a critical add-on; when enabled, the critical add-on
# scheduler reserves resources for critical add-on pods so that they can
# be rescheduled after a failure.
# See https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/
nodeSelector:
gpu: "true"
priorityClassName: "system-node-critical"
containers:
- image: nvcr.io/nvidia/k8s-device-plugin:v0.14.3
name: nvidia-device-plugin-ctr
env:
- name: FAIL_ON_INIT_ERROR
value: "false"
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"]
volumeMounts:
- name: device-plugin
mountPath: /var/lib/kubelet/device-plugins
volumes:
- name: device-plugin
hostPath:
path: /var/lib/kubelet/device-plugins

安装Harbor容器仓库

使用KubeSphere安装Harbor容器仓库

配置文件参考

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
expose:
type: nodePort
tls:
# 建议开启https
enabled: true
certSource: auto
auto:
# 部署节点
commonName: 10.180.193.11
secret:
secretName: ''
notarySecretName: ''
ingress:
hosts:
core: core.harbor.domain
notary: notary.harbor.domain
controller: default
kubeVersionOverride: ''
className: ''
annotations:
ingress.kubernetes.io/ssl-redirect: 'true'
ingress.kubernetes.io/proxy-body-size: '0'
nginx.ingress.kubernetes.io/ssl-redirect: 'true'
nginx.ingress.kubernetes.io/proxy-body-size: '0'
notary:
annotations: {}
labels: {}
harbor:
annotations: {}
labels: {}
clusterIP:
name: harbor
annotations: {}
ports:
httpPort: 80
httpsPort: 443
notaryPort: 4443
nodePort:
name: harbor
ports:
http:
port: 80
nodePort: 30002
https:
port: 443
nodePort: 30003
notary:
port: 4443
nodePort: 30004
loadBalancer:
name: harbor
IP: ''
ports:
httpPort: 80
httpsPort: 443
notaryPort: 4443
annotations: {}
sourceRanges: []
# 注意使用https做外部端口
externalURL: 'https://10.180.193.11:30003'
internalTLS:
enabled: true
certSource: auto
trustCa: ''
core:
secretName: ''
crt: ''
key: ''
jobservice:
secretName: ''
crt: ''
key: ''
registry:
secretName: ''
crt: ''
key: ''
portal:
secretName: ''
crt: ''
key: ''
chartmuseum:
secretName: ''
crt: ''
key: ''
trivy:
secretName: ''
crt: ''
key: ''
ipFamily:
ipv6:
enabled: true
ipv4:
enabled: true
persistence:
enabled: true
resourcePolicy: keep
persistentVolumeClaim:
# 这里换成使用storageClass配合nfs节点使用
registry:
existingClaim: ''
storageClass: nfs-sc
subPath: ''
accessMode: ReadWriteOnce
size: 5Gi
annotations: {}
chartmuseum:
existingClaim: ''
storageClass: nfs-sc
subPath: ''
accessMode: ReadWriteOnce
size: 5Gi
annotations: {}
jobservice:
existingClaim: ''
storageClass: nfs-sc
subPath: ''
accessMode: ReadWriteOnce
size: 1Gi
annotations: {}
database:
existingClaim: ''
storageClass: nfs-sc
subPath: ''
accessMode: ReadWriteOnce
size: 1Gi
annotations: {}
redis:
existingClaim: ''
storageClass: nfs-sc
subPath: ''
accessMode: ReadWriteOnce
size: 1Gi
annotations: {}
trivy:
existingClaim: ''
storageClass: nfs-sc
subPath: ''
accessMode: ReadWriteOnce
size: 5Gi
annotations: {}
imageChartStorage:
disableredirect: false
type: filesystem
filesystem:
rootdirectory: /storage
azure:
accountname: accountname
accountkey: base64encodedaccountkey
container: containername
gcs:
bucket: bucketname
encodedkey: base64-encoded-json-key-file
s3:
region: us-west-1
bucket: bucketname
swift:
authurl: 'https://storage.myprovider.com/v3/auth'
username: username
password: password
container: containername
oss:
accesskeyid: accesskeyid
accesskeysecret: accesskeysecret
region: regionname
bucket: bucketname
imagePullPolicy: IfNotPresent
imagePullSecrets: null
updateStrategy:
type: RollingUpdate
logLevel: info
harborAdminPassword: Harbor12345
caSecretName: ''
secretKey: not-a-secure-key
proxy:
httpProxy: null
httpsProxy: null
noProxy: '127.0.0.1,localhost,.local,.internal'
components:
- core
- jobservice
- trivy
enableMigrateHelmHook: false
nginx:
image:
repository: goharbor/nginx-photon
tag: v2.5.3
serviceAccountName: ''
automountServiceAccountToken: false
replicas: 1
revisionHistoryLimit: 10
nodeSelector: {}
tolerations: []
affinity: {}
podAnnotations: {}
priorityClassName: null
portal:
image:
repository: goharbor/harbor-portal
tag: v2.5.3
serviceAccountName: ''
automountServiceAccountToken: false
replicas: 1
revisionHistoryLimit: 10
nodeSelector: {}
tolerations: []
affinity: {}
podAnnotations: {}
priorityClassName: null
core:
image:
repository: goharbor/harbor-core
tag: v2.5.3
serviceAccountName: ''
automountServiceAccountToken: false
replicas: 1
revisionHistoryLimit: 10
startupProbe:
enabled: true
initialDelaySeconds: 10
nodeSelector: {}
tolerations: []
affinity: {}
podAnnotations: {}
secret: ''
secretName: ''
xsrfKey: ''
priorityClassName: null
artifactPullAsyncFlushDuration: null
jobservice:
image:
repository: goharbor/harbor-jobservice
tag: v2.5.3
replicas: 1
revisionHistoryLimit: 10
serviceAccountName: ''
automountServiceAccountToken: false
maxJobWorkers: 10
jobLoggers:
- file
loggerSweeperDuration: 14
nodeSelector: {}
tolerations: []
affinity: {}
podAnnotations: {}
secret: ''
priorityClassName: null
registry:
serviceAccountName: ''
automountServiceAccountToken: false
registry:
image:
repository: goharbor/registry-photon
tag: v2.5.3
controller:
image:
repository: goharbor/harbor-registryctl
tag: v2.5.3
replicas: 1
revisionHistoryLimit: 10
nodeSelector: {}
tolerations: []
affinity: {}
podAnnotations: {}
priorityClassName: null
secret: ''
relativeurls: false
credentials:
username: harbor_registry_user
password: harbor_registry_password
middleware:
enabled: false
type: cloudFront
cloudFront:
baseurl: example.cloudfront.net
keypairid: KEYPAIRID
duration: 3000s
ipfilteredby: none
privateKeySecret: my-secret
upload_purging:
enabled: true
age: 168h
interval: 24h
dryrun: false
chartmuseum:
enabled: true
serviceAccountName: ''
automountServiceAccountToken: false
absoluteUrl: false
image:
repository: goharbor/chartmuseum-photon
tag: v2.5.3
replicas: 1
revisionHistoryLimit: 10
nodeSelector: {}
tolerations: []
affinity: {}
podAnnotations: {}
priorityClassName: null
indexLimit: 0
trivy:
enabled: true
image:
repository: goharbor/trivy-adapter-photon
tag: v2.5.3
serviceAccountName: ''
automountServiceAccountToken: false
replicas: 1
debugMode: false
vulnType: 'os,library'
severity: 'UNKNOWN,LOW,MEDIUM,HIGH,CRITICAL'
ignoreUnfixed: false
insecure: false
gitHubToken: ''
skipUpdate: false
offlineScan: false
timeout: 5m0s
resources:
requests:
cpu: 200m
memory: 512Mi
limits:
cpu: 1
memory: 1Gi
nodeSelector: {}
tolerations: []
affinity: {}
podAnnotations: {}
priorityClassName: null
notary:
enabled: true
server:
serviceAccountName: ''
automountServiceAccountToken: false
image:
repository: goharbor/notary-server-photon
tag: v2.5.3
replicas: 1
nodeSelector: {}
tolerations: []
affinity: {}
podAnnotations: {}
priorityClassName: null
signer:
serviceAccountName: ''
automountServiceAccountToken: false
image:
repository: goharbor/notary-signer-photon
tag: v2.5.3
replicas: 1
nodeSelector: {}
tolerations: []
affinity: {}
podAnnotations: {}
priorityClassName: null
secretName: ''
database:
type: internal
internal:
serviceAccountName: ''
automountServiceAccountToken: false
image:
repository: goharbor/harbor-db
tag: v2.5.3
password: changeit
shmSizeLimit: 512Mi
nodeSelector: {}
tolerations: []
affinity: {}
priorityClassName: null
initContainer:
migrator: {}
permissions: {}
external:
host: 192.168.0.1
port: '5432'
username: user
password: password
coreDatabase: registry
notaryServerDatabase: notary_server
notarySignerDatabase: notary_signer
sslmode: disable
maxIdleConns: 100
maxOpenConns: 900
podAnnotations: {}
redis:
type: internal
internal:
serviceAccountName: ''
automountServiceAccountToken: false
image:
repository: goharbor/redis-photon
tag: v2.5.3
nodeSelector: {}
tolerations: []
affinity: {}
priorityClassName: null
external:
addr: '192.168.0.2:6379'
sentinelMasterSet: ''
coreDatabaseIndex: '0'
jobserviceDatabaseIndex: '1'
registryDatabaseIndex: '2'
chartmuseumDatabaseIndex: '3'
trivyAdapterIndex: '5'
password: ''
podAnnotations: {}
exporter:
replicas: 1
revisionHistoryLimit: 10
podAnnotations: {}
serviceAccountName: ''
automountServiceAccountToken: false
image:
repository: goharbor/harbor-exporter
tag: v2.5.3
nodeSelector: {}
tolerations: []
affinity: {}
cacheDuration: 23
cacheCleanInterval: 14400
priorityClassName: null
metrics:
enabled: false
core:
path: /metrics
port: 8001
registry:
path: /metrics
port: 8001
jobservice:
path: /metrics
port: 8001
exporter:
path: /metrics
port: 8001
serviceMonitor:
enabled: false
additionalLabels: {}
interval: ''
metricRelabelings: []
relabelings: []
trace:
enabled: false
provider: jaeger
sample_rate: 1
jaeger:
endpoint: 'http://hostname:14268/api/traces'
otel:
endpoint: 'hostname:4318'
url_path: /v1/traces
compression: false
insecure: true
timeout: 10s

使用ESXi虚拟机安装K8s带GPU的节点
https://www.liahnu.top/2024/03/15/使用ESXi虚拟机安装K8s带GPU的节点/
作者
liahnu
发布于
2024年3月15日
许可协议