新钛云服已累计为您分享857篇技术干货
想在云环境中像管理虚拟机一样管理物理服务器吗?想用物理节点实现基础设施即代码(IaC)和 CI/CD 流水线吗?让我们通过一些实用、简单且可复现的示例,来揭开私有云基础设施和裸金属生命周期管理的一些神秘面纱,我们将从一个最小化的 OpenStack Ironic “独立”部署开始。
在我们的第一篇文章中,我们通过介绍“独立”配置的OpenStack Ironic,开始探讨裸金属生命周期管理。
我们为实验环境奠定了基础,详细介绍了裸金属服务(bms)服务器的最小化设置,包括安装Podman和配置OpenStack Kolla Ansible。我们涵盖了准备环境、安装Ironic Python Agent (IPA)以及执行初始OpenStack 部署的必要步骤。
03
裸金属管理
配置待管理服务器的 BMC;
节点上线(Onboarding);
节点巡检(Inspecting);
清理(可选);
部署;
恢复无法访问的节点。
3.1 BMC配置
为了让 Ironic 能够管理物理节点,各个BMC(基板管理控制器) 必须能通过OOB/IB 网络 172.19.74.0/24从 bms服务器访问。(对于本实验,一个所有服务都汇集于此的单一 LAN 网络 192.168.0.0/24 也是可以接受的),并需要提供相应的访问凭据。
专用于操作系统安装和带内管理的以太网卡必须在 BIOS 中启用 PXE 引导。
在服务器上,BMC 的 MAC 地址和访问凭据都列在一个专门的标签上。
让我们把待管理服务器的信息收集到一个方便的位置:
为了检查BMC是否可以通过 IPMI 和/或 Redfish 协议访问,我们来安装相应的命令行工具:
kolla@bms:~$ sudo apt install ipmitool redfishtool
...
kolla@bms:~$ redfishtool -r 172.19.74.201 -u ironic -p baremetal Systems
{
"@odata.context": "/redfish/v1/$metadata#Systems",
"@odata.id": "/redfish/v1/Systems/",
"@odata.type": "#ComputerSystemCollection.ComputerSystemCollection",
"Description": "Computer Systems view",
"Members": [
{
"@odata.id": "/redfish/v1/Systems/1/"
}
],
"Members@odata.count": 1,
"Name": "Computer Systems"
}
kolla@bms:~$ ipmitool -C 3 -I lanplus -H 172.19.74.202 \
-U ironic -P baremetal chassis status
System Power : off
Power Overload : false
Power Interlock : inactive
Main Power Fault : false
Power Control Fault : false
Power Restore Policy : previous
Last Power Event :
Chassis Intrusion : inactive
Front-Panel Lockout : inactive
Drive Fault : false
Cooling/Fan Fault : false
Front Panel Control : none
注意:在某些 BMC 上,使用 IPMI 和 Redfish 协议可能需要在 BIOS 设置中明确启用。
注册(enrollment)阶段发生在新服务器上线并由 Ironic 使用 openstack baremetal node create --driver ... 命令进行管理时。
继续操作所需的信息是:
-BMC 类型(例如, ipmi, redfish, ilo, idrac, 等)
-BMC IP 地址及其相应的凭据 (带外)
- 将用于巡检、安装和可能恢复的网卡的MAC 地址(带内)
kolla@bms:~$ openstack baremetal node create \
--name server01 --driver redfish \
--driver-info redfish_address=https://172.19.74.201 \
--driver-info redfish_verify_ca=False \
--driver-info redfish_username=ironic \
--driver-info redfish_password=baremetal \
-f value -c uuid
5113ab44-faf2-4f76-bb41-ea8b14b7aa92
kolla@bms:~$ openstack baremetal node create \
--name server02 --driver ipmi \
--driver-info ipmi_address=172.19.74.202 \
--driver-info ipmi_username=ironic \
--driver-info ipmi_password=baremetal \
-f value -c uuid
5294ec18-0dae-449e-913e-24a0638af8e5
kolla@bms:~$ openstack baremetal node list
+--------------------------------------+----------+---------------+-------------+--------------------+-------------+
| UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance |
+--------------------------------------+----------+---------------+-------------+--------------------+-------------+
| 5113ab44-faf2-4f76-bb41-ea8b14b7aa92 | server01 | None | None | enroll | False |
| 5294ec18-0dae-449e-913e-24a0638af8e5 | server02 | None | None | enroll | False |
+--------------------------------------+----------+---------------+-------------+--------------------+-------------+
要管理这些服务器,必须使用 openstack baremetal node manage ... 命令将它们声明为可管理 (manageable):
Ironic 将使用指定的硬件驱动程序(在本例中为 redfish)查询 BMC,并将状态从enroll转换到verifying,最后到manageable,并填充一些服务器属性,这些属性可以用 openstack baremetal node show -f value -c properties 命令查询。
kolla@bms:~$ openstack baremetal node manage server01
kolla@bms:~$ openstack baremetal node list
+--------------------------------------+----------+---------------+-------------+--------------------+-------------+
| UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance |
+--------------------------------------+----------+---------------+-------------+--------------------+-------------+
| 5113ab44-faf2-4f76-bb41-ea8b14b7aa92 | server01 | None | power off | verifying | False |
| 5294ec18-0dae-449e-913e-24a0638af8e5 | server02 | None | None | enroll | False |
+--------------------------------------+----------+---------------+-------------+--------------------+-------------+
kolla@bms:~$ openstack baremetal node list
+--------------------------------------+----------+---------------+-------------+--------------------+-------------+
| UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance |
+--------------------------------------+----------+---------------+-------------+--------------------+-------------+
| 5113ab44-faf2-4f76-bb41-ea8b14b7aa92 | server01 | None | power off | manageable | False |
| 5294ec18-0dae-449e-913e-24a0638af8e5 | server02 | None | None | enroll | False |
+--------------------------------------+----------+---------------+-------------+--------------------+-------------+
kolla@bms:~$ openstack baremetal node show server01 -f value -c properties
{'vendor': 'HPE'}
注意:不幸的是,在这种情况下,不能使用“规范”名称(例如 server01);您必须明确使用其对应的UUID。
3.2 BMC配置
OpenStack Ironic 中的“巡检”过程用于自动对裸金属服务器的硬件特性进行巡检。
这个过程使我们能够在实际安装之前收集各种有用的信息,例如磁盘、CPU 类型、RAM和网卡。这些数据随后可用于指导和自定义安装本身。
kolla@bms:~$ time openstack baremetal node inspect server01 --wait
Waiting for provision state manageable on node(s) server01
real 6m26.435s
user 0m1.213s
sys 0m0.073s
通过这样生成的节点清单,您可以使用 openstack baremetal node inventory save 命令来检查节点,该命令返回一个 JSON 集合。为了更容易解释,建议使用 jq 或 yq 命令:
kolla@bms:~$ openstack baremetal node inventory save server01 | \
yq -r 'keys[]'
inventory
plugin_data
kolla@bms:~$ openstack baremetal node inventory save server01 | \
yq -r '.inventory|keys[]'
bmc_address
bmc_mac
bmc_v6address
boot
cpu
disks
hostname
interfaces
memory
system_vendor
kolla@bms:~$ openstack baremetal node inventory save server01 | \
yq -y '.inventory.system_vendor'
product_name: ProLiant ML350 Gen9 (776971-035)
serial_number: CZ24421875
manufacturer: HP
firmware:
vendor: HP
version: P92
build_date: 08/29/2024
kolla@bms:~$ openstack baremetal node inventory save server01 | \
yq -y '.inventory.boot'
current_boot_mode: uefi
pxe_interface: 9c:b6:54:b2:b0:ca
要列出网卡及其特性,我们可以使用 yq -y '.inventory.interfaces' 命令过滤输出,甚至只过滤有连接链路的网卡 (has_carrier==true):
kolla@bms:~$ openstack baremetal node inventory save server01 | \
yq -y '.inventory.interfaces|map_values(select(.has_carrier==true))'
- name: eno1
mac_address: 9c:b6:54:b2:b0:ca
ipv4_address: 172.19.74.193
ipv6_address: fe80::9eb6:54ff:feb2:b0ca%eno1
has_carrier: true
lldp: null
vendor: '0x14e4'
product: '0x1657'
client_id: null
biosdevname: null
speed_mbps: 1000
pci_address: '0000:02:00.0'
driver: tg3
- name: eno2
mac_address: 9c:b6:54:b2:b0:cb
ipv4_address: 192.168.0.195
ipv6_address: fe80::9eb6:54ff:feb2:b0cb%eno2
has_carrier: true
lldp: null
vendor: '0x14e4'
product: '0x1657'
client_id: null
biosdevname: null
speed_mbps: 1000
pci_address: '0000:02:00.1'
driver: tg3
有了这些信息,您就可以选择最佳的候选磁盘来安装操作系统。要指定使用磁盘/dev/sdc,即序列号为 4C530001220528100484 的大约 30GB 的 SanDisk Ultra Fit,您可以设置root_device属性:
kolla@bms:~$ openstack baremetal node set server01 \
--property root_device='{"serial": "4C530001220528100484"}'
kolla@bms:~$ openstack baremetal node show server01 -f json | \
yq -y '.properties'
vendor: HPE
cpu_arch: x86_64
root_device:
serial: 4C530001220528100484
3.3 清理(Cleaning)(可选)
如果硬件不是全新的,磁盘可能包含敏感数据,或者更糟的是,与新部署冲突(例如,先前安装在非目标磁盘上的操作系统)。
为了防止这些情况,使用 Ironic 通过代理提供的节点清理功能非常有用,命令为 openstack baremetal node clean :
kolla@bms:~$ openstack baremetal node clean server01 \
--clean-steps '[{"interface": "deploy", "step": "erase_devices_metadata"}]'
强制性选项 --clean-steps 用于通过以下特定接口指定操作:power,management,deploy,firmware,bios, 和raid。
对于通过Ironic Python Agent (IPA)管理的deploy接口,可用的步骤有:
-erase_devices: 确保从磁盘上擦除数据。
-erase_devices_metadata: 只擦除磁盘元数据。
-erase_devices_express: 硬件辅助的数据擦除,目前仅 NVMe 支持。
注意:对于所有其他我们暂时跳过的“清理”选项(例如,固件升级、BIOS 重置、RAID 设置等),您可以参考 Ironic 文档中的节点清理章节。
3.4 部署(Deployment)
现在我们已经添加、分析和配置了要管理的系统 (openstack baremetal node [create|manage|inspect|clean]),我们将有一个处于manageable状态的节点列表:
kolla@bms:~$ openstack baremetal node list
+--------------------------------------+----------+---------------+-------------+--------------------+-------------+
| UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance |
+--------------------------------------+----------+---------------+-------------+--------------------+-------------+
| 5113ab44-faf2-4f76-bb41-ea8b14b7aa92 | server01 | None | power off | manageable | False |
| 5294ec18-0dae-449e-913e-24a0638af8e5 | server02 | None | power off | manageable | False |
| f0d3f50a-d543-4e3d-9b9a-ba719373ab58 | server03 | None | power off | manageable | False |
+--------------------------------------+----------+---------------+-------------+--------------------+-------------+
要使物理节点可用于部署,您需要将其声明为available:
任何处于manageable状态的节点都有资格通过 openstack baremetal node provide 命令变为available。反向操作则通过 openstack baremetal node manage 完成。
kolla@bms:~$ openstack baremetal node provide server01
kolla@bms:~$ openstack baremetal node list
+--------------------------------------+----------+---------------+-------------+--------------------+-------------+
| UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance |
+--------------------------------------+----------+---------------+-------------+--------------------+-------------+
| 5113ab44-faf2-4f76-bb41-ea8b14b7aa92 | server01 | None | power off | available | False |
| 5294ec18-0dae-449e-913e-24a0638af8e5 | server02 | None | power off | manageable | False |
| f0d3f50a-d543-4e3d-9b9a-ba719373ab58 | server03 | None | power off | manageable | False |
+--------------------------------------+----------+---------------+-------------+--------------------+-------------+
kolla@bms:~$ openstack baremetal node validate server01
+------------+--------+------------------------------------------------------------------------------------------------+
| Interface | Result | Reason |
+------------+--------+------------------------------------------------------------------------------------------------+
| bios | False | Driver redfish does not support bios (disabled or not implemented). |
| boot | True | |
| console | False | Driver redfish does not support console (disabled or not implemented). |
| deploy | False | Node 5113ab44-faf2-4f76-bb41-ea8b14b7aa92 failed to validate deploy image info. Some |
| | | parameters were missing. Missing are: ['instance_info.image_source'] |
| firmware | False | Driver redfish does not support firmware (disabled or not implemented). |
| inspect | True | |
| management | True | |
| network | True | |
| power | True | |
| raid | True | |
| rescue | False | Node 5113ab44-faf2-4f76-bb41-ea8b14b7aa92 is missing 'instance_info/rescue_password'. It is |
| | | required for rescuing node. |
| storage | True | |
+------------+--------+------------------------------------------------------------------------------------------------+
我们可以使用 openstack baremetal node validate 命令对要安装的节点进行最后检查(输出如上所示)。结果表明某些区域未被管理(值为 False):
-bios: 在这种情况下,该功能已通过 enabled_bios_interfaces = no-bios 明确禁用。
-console: 此功能也已禁用 (enabled_console_interfaces = no-console)。
-deploy: 缺少 image_source 参数,该参数指定从何处获取要安装的操作系统镜像。
-firmware: 功能已禁用 (enabled_firmware_interfaces = no-firmware)。
-rescue: 缺少 rescue_password 参数。
在没有周边云环境的情况下,Ironic 需要知道从哪里获取要安装的镜像,可以直接通过文件或 URL。openstack baremetal node set 命令的 --instance-info 部分中的 image_source 参数正是用于此目的。
虽然可以使用互联网上的 URL,但为了优化带宽,最好使用本地存储库。在这种情况下,我们将使用 ironic_http 容器中的 Web 服务器以及挂载到其上的持久化 ironic 卷:
kolla@bms:~$ sudo podman volume inspect ironic
[
{
"Name": "ironic",
"Driver": "local",
"Mountpoint": "/var/lib/containers/storage/volumes/ironic/_data",
"CreatedAt": "2025-04-02T18:28:15.831223385+02:00",
"Labels": {},
"Scope": "local",
"Options": {},
"MountCount": 0,
"NeedsCopyUp": true
}
]
kolla@bms:~$ sudo podman container inspect ironic_http | \
yq '.[]|.Mounts[]|select(.Name=="ironic")'
{
"Type": "volume",
"Name": "ironic",
"Source": "/var/lib/containers/storage/volumes/ironic/_data",
"Destination": "/var/lib/ironic",
"Driver": "local",
"Mode": "",
"Options": [
"nosuid",
"nodev",
"rbind"
],
"RW": true,
"Propagation": "rprivate"
}
kolla@bms:~$ sudo cat /etc/kolla/ironic-http/httpd.conf
Listen 172.19.74.1:8089
TraceEnable off
LogLevel warn ErrorLog "/var/log/kolla/ironic/ironic-http-error.log" LogFormat "%h %l %u %t \"%r\" %>s %b %D \"%{Referer}i\" \"%{User-Agent}i\"" logformat CustomLog "/var/log/kolla/ironic/ironic-http-access.log" logformat DocumentRoot "/var/lib/ironic/httpboot"
Options FollowSymLinks AllowOverride None Require all granted Directory> VirtualHost>
由于卷 /var/lib/containers/storage/volumes/ironic/_data 被挂载到 ironic_http容器的 /var/lib/ironic下,并通过 http://172.19.74.1:8089 (Listen) 暴露了 /var/lib/ironic/httpboot目录 (DocumentRoot),我们可以将各种raw或qcow2格式的操作系统镜像复制到 /var/lib/containers/storage/volumes/ironic/_data/httpboot/ 目录,并以 http://172.19.74.1:8089/的形式引用它们:
kolla@bms:~$ sudo wget \
-P /var/lib/containers/storage/volumes/ironic/_data/httpboot/ \
https://cloud.debian.org/images/cloud/bookworm/latest/debian-12-nocloud-amd64.qcow2
...
debian-12-nocloud-amd64.qcow2 100%[========================================>] 398.39M
kolla@bms:~$ sudo wget \
-P /var/lib/containers/storage/volumes/ironic/_data/httpboot/ \
https://cloud.debian.org/images/cloud/bookworm/latest/debian-12-generic-amd64.qcow2
...
debian-12-generic-amd64.qcow2 100%[========================================>] 422.45M
kolla@bms:~$ sudo podman exec --workdir=/var/lib/ironic/httpboot \
ironic_http sh -c 'sha256sum *.qcow2 >CHECKSUM'
kolla@bms:~$ sudo podman exec ironic_http ls -la /var/lib/ironic/httpboot
total 1202312
drwxr-xr-x 3 ironic ironic 4096 May 28 14:34 .
drwxr-xr-x 8 ironic ironic 162 Apr 14 15:17 ..
-rw-r--r-- 1 ironic ironic 1004 Apr 2 18:28 boot.ipxe
-rw-r--r-- 1 root root 192 May 28 14:27 CHECKSUM
-rw-r--r-- 1 ironic ironic 4843992 Apr 11 16:18 cp042886.exe
-rw-r--r-- 1 root root 442970624 May 19 23:34 debian-12-generic-amd64.qcow2
-rw-r--r-- 1 root root 417746432 May 19 23:50 debian-12-nocloud-amd64.qcow2
-rw-r--r-- 1 ironic ironic 8388608 May 27 17:37 esp.img
-rw-r--r-- 1 root root 533 May 27 17:40 inspector.ipxe
-rw-r--r-- 1 root root 348999425 May 27 17:40 ironic-agent.initramfs
-rw-r--r-- 1 root root 8193984 May 27 17:40 ironic-agent.kernel
drwxr-xr-x 2 ironic ironic 6 May 28 14:34 pxelinux.cfg
kolla@bms:~$ curl -I http://172.19.74.1:8089/debian-12-nocloud-amd64.qcow2
HTTP/1.1 200 OK
Date: Wed, 28 May 2025 10:46:31 GMT
Server: Apache/2.4.62 (Debian)
Last-Modified: Mon, 19 May 2025 21:50:09 GMT
ETag: "18e64e00-635841d9c4640"
Accept-Ranges: bytes
Content-Length: 417746432
注意:创建CHECKSUM文件是必要的,以允许 Ironic 验证镜像在从源服务器传输到裸金属节点的过程中没有被篡改。
我们准备好进行第一次测试部署了:
3.5.1 测试
作为第一次测试,我们可以使用名为“nocloud”的 Debian Cloud 镜像,引用为 `debian-12-nocloud-amd64.qcow2`,来验证该过程是否有效。不幸的是,这个镜像没有安装cloud-init,它只允许直接从控制台无密码 root 访问,因为 SSH 服务未激活。
kolla@bms:~$ openstack baremetal node set server01 \
--instance-info image_source=http://172.19.74.1:8089/debian-12-nocloud-amd64.qcow2 \
--instance-info image_checksum=http://172.19.74.1:8089/CHECKSUM
kolla@bms:~$ openstack baremetal node deploy server01
kolla@bms:~$ openstack baremetal node list
+--------------------------------------+----------+---------------+-------------+--------------------+-------------+
| UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance |
+--------------------------------------+----------+---------------+-------------+--------------------+-------------+
| 5113ab44-faf2-4f76-bb41-ea8b14b7aa92 | server01 | None | power on | wait call-back | False |
| 5294ec18-0dae-449e-913e-24a0638af8e5 | server02 | None | power off | manageable | False |
| f0d3f50a-d543-4e3d-9b9a-ba719373ab58 | server03 | None | power off | manageable | False |
+--------------------------------------+----------+---------------+-------------+--------------------+-------------+
kolla@bms:~$ openstack baremetal node list
+--------------------------------------+----------+---------------+-------------+--------------------+-------------+
| UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance |
+--------------------------------------+----------+---------------+-------------+--------------------+-------------+
| 5113ab44-faf2-4f76-bb41-ea8b14b7aa92 | server01 | None | power on | deploying | False |
| 5294ec18-0dae-449e-913e-24a0638af8e5 | server02 | None | power off | manageable | False |
| f0d3f50a-d543-4e3d-9b9a-ba719373ab58 | server03 | None | power off | manageable | False |
+--------------------------------------+----------+---------------+-------------+--------------------+-------------+
kolla@bms:~$ openstack baremetal node list
+--------------------------------------+----------+---------------+-------------+--------------------+-------------+
| UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance |
+--------------------------------------+----------+---------------+-------------+--------------------+-------------+
| 5113ab44-faf2-4f76-bb41-ea8b14b7aa92 | server01 | None | power on | active | False |
| 5294ec18-0dae-449e-913e-24a0638af8e5 | server02 | None | power off | manageable | False |
| f0d3f50a-d543-4e3d-9b9a-ba719373ab58 | server03 | None | power off | manageable | False |
+--------------------------------------+----------+---------------+-------------+--------------------+-------------+
过程完成后,当相关节点的Provisioning State变为 active 时,我们可以直接从服务器的控制台或通过其BMC验证结果:
要回收此测试实例:
kolla@bms:~$ openstack baremetal node undeploy server01
kolla@bms:~$ openstack baremetal node list
+--------------------------------------+----------+---------------+-------------+--------------------+-------------+
| UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance |
+--------------------------------------+----------+---------------+-------------+--------------------+-------------+
| 5113ab44-faf2-4f76-bb41-ea8b14b7aa92 | server01 | None | power on | deleting | False |
| 5294ec18-0dae-449e-913e-24a0638af8e5 | server02 | None | power off | manageable | False |
| f0d3f50a-d543-4e3d-9b9a-ba719373ab58 | server03 | None | power off | manageable | False |
+--------------------------------------+----------+---------------+-------------+--------------------+-------------+
kolla@bms:~$ openstack baremetal node list
+--------------------------------------+----------+---------------+-------------+--------------------+-------------+
| UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance |
+--------------------------------------+----------+---------------+-------------+--------------------+-------------+
| 5113ab44-faf2-4f76-bb41-ea8b14b7aa92 | server01 | None | power off | available | False |
| 5294ec18-0dae-449e-913e-24a0638af8e5 | server02 | None | power off | manageable | False |
| f0d3f50a-d543-4e3d-9b9a-ba719373ab58 | server03 | None | power off | manageable | False |
+--------------------------------------+----------+---------------+-------------+--------------------+-------------+
节点将被关闭,其Provisioning State将从 active 转换到 deleting,然后回到 available。
3.5.12 cloud-init
像之前的 debian-12-nocloud这样的静态镜像不提供配置灵活性。除非镜像是专门为满足您的需求(分区、网络、用户、软件等)而制作的,否则它们的用处往往有限。
我们可以通过使用包含cloud-init的镜像来达成目标:
“云镜像是操作系统模板,每个实例启动时都是其他实例的相同克隆。是用户数据赋予了每个不同的云实例其个性,而 cloud-init 就是自动将用户数据应用于实例的工具。”
为了向裸金属节点提供个性化指令,我们将使用节点自身磁盘的一部分,称为Config Drive。在部署期间,必要的信息将以结构化文件的形式下载到文件系统中的这个 Config Drive 中。
假设我们要安装 server01节点,具有以下要求:
-ironic用户,预装了公共 SSH 密钥并具有执行 root 命令的 sudo 权限;
- 要自动安装的软件:git 和 podman;
-LAN 上的网络接口配置为静态 IP 地址 192.168.0.11/24、默认网关和公共 DNS;
我们可以按如下方式进行:
kolla@bms:~$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/kolla/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
...
kolla@bms:~$ RSA_PUB=$(cat ~/.ssh/id_rsa.pub)
kolla@bms:~$ NODE=server01
kolla@bms:~$ mkdir -p ~/ConfigDrive/$NODE/openstack/latest
kolla@bms:~$ cat >~/ConfigDrive/$NODE/openstack/latest/meta_data.json <
{
"uuid": "$(openstack baremetal node show $NODE -f value -c uuid)",
"hostname": "$NODE"
}
EOF
kolla@bms:~$ cat >~/ConfigDrive/$NODE/openstack/latest/user_data <
#cloud-config
package_update: true
packages:
- git
- podman
users:
- name: ironic
shell: /bin/bash
sudo: ["ALL=(ALL) NOPASSWD:ALL"]
ssh_authorized_keys:
- '$RSA_PUB'
EOF
kolla@bms:~$ cat >~/ConfigDrive/$NODE/openstack/latest/network_data.json <
{
"links": [
{
"id": "oob0",
"type": "phy",
"ethernet_mac_address": "9c:b6:54:b2:b0:ca"
},
{
"id": "lan0",
"type": "phy",
"ethernet_mac_address": "9c:b6:54:b2:b0:cb"
}
],
"networks": [
{
"id": "oob",
"type": "ipv4_dhcp",
"link": "oob0",
"network_id": "oob"
},
{
"id": "lan",
"type": "ipv4",
"link": "lan0",
"ip_address": "192.168.0.11/24",
"gateway": "192.168.0.254",
"network_id": "lan"
}
],
"services": [
{
"type": "dns",
"address": "8.8.8.8"
},
{
"type": "dns",
"address": "8.8.4.4"
}
]
}
EOF
注意:有关 network_data.json文件结构的更多信息,请参考其模式:https://opendev.org/openstack/nova/src/branch/master/doc/api_schemas/network_data.json
要执行利用cloud-init的新部署,我们将 image_source的引用更改为 debian-12-generic-amd64.qcow2,并指定包含生成Config Drive所需文件的目录:
kolla@bms:~$ openstack baremetal node set server01 \
--instance-info image_source=http://172.19.74.1:8089/debian-12-generic-amd64.qcow2 \
--instance-info image_checksum=http://172.19.74.1:8089/CHECKSUM
kolla@bms:~$ time openstack baremetal node deploy server01 \
--config-drive=$HOME/ConfigDrive/server01 --wait
Waiting for provision state active on node(s) server01
real 12m3.542s
user 0m0.863s
sys 0m0.085s
注意:如果我们之前没有用 openstack baremetal node undeploy 命令删除部署,仍然可以使用 openstack baremetal node set --instance-info image_source=... 重置 image_source 变量,然后请求重建:openstack baremetal node rebuild --config-drive= 。
3.5 恢复无法访问的节点(rescue)
有时会出错……IPA(Ironic Python Agent) 加载了,操作系统似乎也正确写入了磁盘,Config Drive 也看似安装好了。然而节点就是无法启动,或者启动了但网卡没有配置,管理员用户凭据似乎不正确,服务器显示器上一个错误信息一闪而过……总之,会出现各种问题。
图片
要利用 IPA 作为恢复系统,节点必须处于 **active** 状态,并且可以通过以下命令调用:
kolla@bms:~$ openstack baremetal node show server01 -c provision_state
+-----------------+--------+
| Field | Value |
+-----------------+--------+
| provision_state | active |
+-----------------+--------+
kolla@bms:~$ time baremetal node rescue server01 \
--rescue-password= --wait
Waiting for provision state rescue on node(s) server01
real 5m42.284s
user 0m0.745s
sys 0m0.045s
kolla@bms:~$ openstack baremetal node show server01 -c provision_state
+-----------------+--------+
| Field | Value |
+-----------------+--------+
| provision_state | rescue |
+-----------------+--------+
我们可以从 Ironic 的 DHCP 服务器日志(由一个运行 dnsmasq 的容器组成)中,根据用于部署的网卡的 MAC 地址来识别分配的 IP 地址:
kolla@bms:~$ openstack baremetal port list -f value -c address \
--node $(openstack baremetal node show server01 -f value -c uuid)
9c:b6:54:b2:b0:ca
kolla@bms:~$ sudo grep -i 9c:b6:54:b2:b0:ca /var/log/kolla/ironic/dnsmasq.log | tail -1
May 31 11:13:58 dnsmasq-dhcp[2]: DHCPACK(enp2s0) 172.19.74.192 9c:b6:54:b2:b0:ca
最后,用rescue用户和先前定义的密码连接,以分析磁盘内容、系统和 cloud-init 日志等:
kolla@bms:~$ ssh rescue@172.19.74.192
...
[rescue@localhost ~]$ sudo -i
[root@localhost ~]# lsblk -o+fstype,label
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS FSTYPE LABEL
sda 8:0 0 3.3T 0 disk
sdb. 8:16 1 28.6G 0 disk
|-sdb1 8:17 1 28.5G 0 part ext4
|-sdb2 8:18 1 64.4M 0 part iso9660 config-2
|-sdb14 8:30 1 3M 0 part
`-sdb15 8:31 1 124M 0 part vfat
sdc 8:32 0 838.3G 0 disk
sr0 11:0 1 1024M 0 rom
[root@localhost ~]# mount /dev/sdb1 /mnt
[root@localhost ~]# mount --bind /dev /mnt/dev
[root@localhost ~]# mount --bind /sys /mnt/sys
[root@localhost ~]# mount --bind /proc /mnt/proc
[root@localhost ~]# chroot /mnt
root@localhost:/# journalctl --list-boots
IDX BOOT ID FIRST ENTRY LAST ENTRY
0 34c9893dd3ab4a928207f7da41ec1226 Tue 2025-06-03 10:31:39 UTC Tue 2025-06-03 12:17:50 UTC
root@localhost:/# journalctl -b 34c9893dd3ab4a928207f7da41ec1226
...
root@localhost:/# less /var/log/cloud-init.log
...
要检查 Config Drive 是如何写入的:
[root@localhost ~]# mount -r LABEL=config-2 /media
[root@localhost ~]# find /media
/media
/media/openstack
/media/openstack/latest
/media/openstack/latest/meta_data.json
/media/openstack/latest/network_data.json
/media/openstack/latest/user_data
[root@localhost ~]# cat /media/openstack/latest/meta_data.json
{
"uuid": "5113ab44-faf2-4f76-bb41-ea8b14b7aa92",
"hostname": "server01"
}
[root@localhost ~]# cat /media/openstack/latest/user_data
#cloud-config
package_update: true
packages:
- git
- podman
users:
- name: ironic
shell: /bin/bash
sudo: ["ALL=(ALL) NOPASSWD:ALL"]
ssh_authorized_keys:
- 'sha-rsa ...'
[root@localhost ~]# umount /media
注意:要退出救援状态,不要从内部重启系统(例如 reboot, shutdown -r now 等);这只会启动磁盘上的操作系统,而不会改变 Ironic 数据库中的状态。相反,应从管理服务器使用 openstack baremetal node unrescue `命令。
04
自动化与基础设施即代码
基础设施即代码 (IaC)是一种 IT 基础设施管理方法,其中服务器、网络、数据库和其他组件通过代码来定义和配置,而不是手动创建和管理。基础设施实际上变成了一组声明性的、可版本化的和可自动化的文件,从而能够以可重复、可靠和可扩展的方式构建复杂的环境。
出于演示目的,我们将使用两个最常用的:Terraform和Ansible。
4.1 Terraform/OpenTofu
在本章中,主要目标将是以最简单直接的方式,使用 Terraform/OpenTofu 复现我们到目前为止在命令行上为安装 Debian 12 云镜像所做的动作。
由 HashiCorp 开发的Terraform是用于编排云资源(AWS、Azure、GCP、OpenStack)和“传统”环境(VMware、Proxmox 等)的最广泛使用的工具之一。其语言允许清晰且可重复地描述基础设施所需的资源:服务器、网络、DNS、防火墙、存储等等。
OpenTofu,在 HashiCorp(最近被 IBM 收购)更改许可证后作为 Terraform 的一个分支诞生,完全保持了兼容性。由开源社区治理,其使命是确保 IaC 保持开放、可访问且独立于商业逻辑。
要安装 Terraform,请遵循以下链接的说明:
https://developer.hashicorp.com/terraform/install;
要安装 OpenTofu,请运行命令 apt install -y tofu。
让我们从在它们各自的注册表中搜索 OpenStack Ironic 的提供商开始:
-Terraform: https://registry.terraform.io/browse/providers
-OpenTofu: https://search.opentofu.org/
我们会找到一个通用的提供商,由 Appkins Org 提供,并源自 OpenShift 项目Terraform provider for Ironic:appkins-org/ironic。
随后,在一个专门创建用于托管 Terraform/OpenTofu 文件的目录中,创建一个名为 main.tf 的文件,内容如下:
terraform {
required_providers {
ironic = {
source = "appkins-org/ironic"
version = "0.6.1"
}
}
}
provider "ironic" {
url = "http://172.19.74.1:6385/v1"
auth_strategy = "noauth"
microversion = "1.96"
timeout = 900
}
resource "ironic_node_v1" "server01" {
name = "server01"
inspect = true # 执行巡检
clean = false # 不清理节点
available = true # 使节点'available'
ports = [
{
"address" = "9c:b6:54:b2:b0:ca"
"pxe_enabled" = "true"
},
]
driver = "redfish"
driver_info = {
"redfish_address" = "https://172.19.74.201"
"redfish_verify_ca" = "False"
"redfish_username" = "ironic"
"redfish_password" = "baremetal"
}
}
resource "ironic_deployment" "server01" {
node_uuid = ironic_node_v1.server01.id
instance_info = {
image_source = "http://172.19.74.1:8089/debian-12-generic-amd64.qcow2"
image_checksum = "http://172.19.74.1:8089/CHECKSUM"
}
metadata = {
uuid = ironic_node_v1.server01.id
hostname = ironic_node_v1.server01.name
}
user_data = <<-EOT
#cloud-config
package_update: true
packages:
- git
- podman
users:
- name: ironic
shell: /bin/bash
sudo: ["ALL=(ALL) NOPASSWD:ALL"]
ssh_authorized_keys:
- 'ssh-rsa ...'
EOT
}
在这个文件中,除了指定 appkins-org/ironic 提供商及其位置外,还定义了两个资源。第一个,类型为ironic_node_v1,代表注册阶段与巡检阶段的结合。第二个,类型为ironic_deployment,将处理部署阶段和相关 Config Drive 的创建。
要下载提供商及其依赖项,只需调用 Terraform/OpenTofu 的 init 过程:
kolla@bms:~/TF$ terraform init
- 或 -
kolla@bms:~/TF$ tofu init
Initializing the backend...
Initializing provider plugins...
- Finding appkins-org/ironic versions matching "0.6.1"...
- Installing appkins-org/ironic v0.6.1...
- Installed appkins-org/ironic v0.6.1...
...
Terraform/OpenTofu has been successfully initialized!
同时,要执行实际部署,您使用 apply 过程:
kolla@bms:~/TF$ terraform apply
- 或 -
kolla@bms:~/TF$ tofu apply
...
Plan: 2 to add, 0 to change, 0 to destroy.
Do you want to perform these actions?
Terraform will perform the actions described above.
Only 'yes' will be accepted to approve.
Enter a value: yes
...
ironic_deployment.server01: Creation complete after 10m32s [id=12016c80-be15-48ed-9b55-af00911c66b5]
Apply complete! Resources: 2 added, 0 changed, 0 destroyed.
4.2 Ansible
在本章中,目标是实现一个更复杂的部署,不仅包括操作系统配置,还包括安装一个容器编排平台,如Kubernetes 的 all-in-one(单节点)设置。
Ansible是一个开源自动化平台,以其简单和强大的功能而闻名。与其他自动化工具不同,Ansible 是无代理的,这意味着它不需要在其管理的节点上安装任何额外的软件。
话虽如此,其他自动化系统对于您的环境可能同样有效,甚至更合适,例如Chef、Juju、[Puppet或SaltStack,仅举几例。
4.2.1 Ansible Playbook
playbook 的结构将如下所示:
kolla@bms:~$ tree Ansible
Ansible
├── config.yml
├── deploy.yml
├── group_vars
│ └── all.yml
└── roles
├── ironic
│ └── tasks
│ └── main.yml
└── k8s-aio
├── defaults
│ └── main.yml
└── tasks
├── configure.yml
├── install.yml
├── main.yml
└── prepare.yml
8 directories, 9 files
在 config.yml 文件中,我们可以创建一个数据结构来表示要配置的节点列表 (nodes: []),尽量遵循 Ironic 所需的形式(bmc, root_device, instance_info, network_data等),同时添加一些额外信息,如 ansible_host 和 ansible_roles:
# config.yml
---
user_data: |
#cloud-config
users:
- name: {{ default_user }}
lock_passwd: false
sudo: ['ALL=(ALL) NOPASSWD:ALL']
ssh_authorized_keys:
- '{{ public_key }}'
mounts:
- ['swap', null]
nodes:
- name: server01
 ...
特别声明:以上内容(如有图片或视频亦包括在内)为自媒体平台“网易号”用户上传并发布,本平台仅提供信息存储服务。
Notice: The content above (including the pictures and videos if any) is uploaded and posted by a user of NetEase Hao, which is a social media platform and only provides information storage services.