Docker 源码解析:(四)Docker Daemo 初始化容器网络

前置条件
Docker Daemon 源码版本:v26.1.4
Docker Daemon 项目地址:https://github.com/moby/moby
Docker Client 执行命令:docker run --name nginx -d -p 80:80 -p 443:443 nginx:latest
依赖的第三方库:
-
vishvananda/netlink -
主要作用:Go 语言实现的用来操作网络设备的库 -
项目地址:https://github.com/vishvananda/netlink -
etcd-io/bbolt -
主要作用:KV 数据库,用来管理容器相关配置 -
项目地址:https://github.com/etcd-io/bbolt
关键目录结构如下:
moby/├── api # REST API 路由与处理├── cmd/dockerd/ # dockerd 主程序入口├── container # 容器数据结构定义├── daemon # 守护进程核心逻辑(容器、镜像、网络)├── distribution # 镜像拉取与推送逻辑├── layer # 镜像分层管理├── libcontainerd # 与 containerd 的 gRPC 客户端├── libnetwork # 网络相关的核心逻辑├── runconfig # 容器运行配置解析└── volume # 卷管理
Network Driver 注册
Network Driver 是真正实现容器网络虚拟化的核心引擎。它定义了容器如何被接入网络、如何相互通信以及如何与外部世界交互,Docker Daemon 为容器初始化网络,最终会通过各类网络模式的 Driver 执行具体的虚拟网卡的创建、链接、分配 IP 和 MAC 、生成路由默认规则以及虚拟网卡信息的存储、查询、删除等。下面我们看下默认网络模式 Bridge 模式下的 Network Driver 注册的主要过程。Network Driver 的注册过程是在 Daemon 服务初始化的时候就开始的,逻辑入口如下:
daemon/daemon.go
// line 803funcNewDaemon(ctx context.Context, config *config.Config, pluginStore *plugin.Store, authzMiddleware *authorization.Middleware)(daemon *Daemon, err error) { ...// line 1220if err := d.restore(cfgStore); err != nil {returnnil, err } ...}
这里的 d.restore() 函数非常重要,这个函数的主要功能就是加载存储在本地已经存在的所有 Container,容器的存储默认路径为/var/lib/docker/containers。这个路径下都是以容器 ID 为文件夹名称,里面存储着容器的两个重要的配置文件:config.v2.json 和 hostconfig.json,这两个文件是在容器创建的过程中生成的,具体的逻辑可以查看上篇文章Docker 源码解析:(三)Docker Daemon 处理请求与执行任务的方式。
daemon/daemon.go
// line 251func(daemon *Daemon)restore(cfg *configStore)error { ...// line 257 加载容器存储目录 dir, err := os.ReadDir(daemon.repository) ...for _, v := range dir { group.Add(1)gofunc(id string) { ...// 根据ID加载所有的容器 c, err := daemon.load(id) ... }(v.Name()) } ...// line 574 初始化网络控制器if err = daemon.initNetworkController(&cfg.Config, activeSandboxes); ...}
Docker 允许存在网络模式为 Container 的容器,这类网络模式的容器的网络配置依赖另外一个运行的容器。从上面代码可以看出,初始化网络控制器的逻辑在加载容器的逻辑后面,这个逻辑顺序就是为了避免生成多余的 Sandbox(容器网络的重要概念,后面会讲)。初始化网络控制器的逻辑如下:
daemon/daemon_unix.go
// line 835func(daemon *Daemon)initNetworkController(cfg *config.Config, activeSandboxes map[string]interface{})error { ...// line 841 初始化Network Controller daemon.netController, err = libnetwork.New(netOptions...) ...}
需要关注的是 libnetwork.New() 函数,这个函数里面需要关注两个地方:c.initStores() 函数和 registerNetworkDrivers() 函数。
libnetwork/controller.go
// line 107funcNew(cfgOptions ...config.Option)(*Controller, error) { ...// line 119 初始化网络信息KV存储数据库if err := c.initStores(); err != nil {returnnil, err } ...// line 131 注册各类网络模式的Network Driverif err := registerNetworkDrivers(&c.drvRegistry, c.makeDriverConfig); err != nil {returnnil, err } ...}
c.initStores() 函数主要功能是为 Docker Daemon 初始化了一个本地的 KV 数据库,专门用来存储网络配置信息,存储的实例对象即为多个 Sandbox,这些 Sandbox 和容器 一一对应,这些对应关系即表达了容器和网络信息的绑定,Sandbox 的定义和存储后面会讲,此处继续关注 Network Driver 的注册逻辑,这部分逻辑就在函数 registerNetworkDrivers() 中。
libnetwork/drivers_linux.go
// line 15funcregisterNetworkDrivers(r driverapi.Registerer, driverConfig func(string)map[string]interface{}) error { ...for _, nr := range []struct { ntype string register func(driverapi.Registerer, map[string]interface{})error }{ {ntype: bridge.NetworkType, register: bridge.Register}, {ntype: host.NetworkType, register: noConfig(host.Register)}, {ntype: ipvlan.NetworkType, register: ipvlan.Register}, {ntype: macvlan.NetworkType, register: macvlan.Register}, {ntype: null.NetworkType, register: noConfig(null.Register)}, {ntype: overlay.NetworkType, register: overlay.Register}, } {if err := nr.register(r, driverConfig(nr.ntype)); err != nil {return fmt.Errorf("failed to register %q driver: %w", nr.ntype, err) } }returnnil}
我们还需要关注一下注册器内的逻辑:
libnetwork/drivers/bridge/bridge_linux.go
// line 165funcRegister(r driverapi.Registerer, config map[string]interface{})error { d := newDriver()if err := d.configure(config); err != nil {return err }return r.RegisterDriver(NetworkType, d, driverapi.Capability{ DataScope: scope.Local, ConnectivityScope: scope.Local, })}
还需要关注 d.configure() 函数的逻辑,这个函数中,会初始化 Bridge Network Driver 的各类参数:
libnetwork/drivers/bridge/bridge_linux.go
// line 395func(d *driver)configure(option map[string]interface{})error { ...// line 432 IPv4 IPTables是默认开启的,如果需要关闭,可以修改Docker Daemon参数if config.EnableIPTables {// 删除之前存在的老旧数据 removeIPChains(iptables.IPv4)// 新增nat、filter、DOCKER-ISOLATION-STAGE路由表,后续Docker相关的规则会写入这些路由表中 natChain, filterChain, isolationChain1, isolationChain2, err = setupIPChains(config, iptables.IPv4)if err != nil {return err } ... } ...// line 474 d.Lock() d.natChain = natChain d.filterChain = filterChain d.isolationChain1 = isolationChain1 d.isolationChain2 = isolationChain2 d.natChainV6 = natChainV6 d.filterChainV6 = filterChainV6 d.isolationChain1V6 = isolationChain1V6 d.isolationChain2V6 = isolationChain2V6 d.config = config d.Unlock()return d.initStore(option)}
上述代码主要关注 Docker 默认的路由表的创建,IPTables 在 Docker Daemon 是默认开启的,这里会调用 setupIPChains() 函数,创建各类型的路由表:
libnetwork/drivers/bridge/setup_ip_tables_linux.go
// line 35funcsetupIPChains(config configuration, version iptables.IPVersion)(natChain *iptables.ChainInfo, filterChain *iptables.ChainInfo, isolationChain1 *iptables.ChainInfo, isolationChain2 *iptables.ChainInfo, retErr error) { ...// line 45 新建nat路由表 natChain, err := iptable.NewChain(DockerChain, iptables.Nat, hairpinMode) ...// line 57 新建filter路由表 filterChain, err = iptable.NewChain(DockerChain, iptables.Filter, false) ...// line 69 新建isolationChain1路由表 isolationChain1, err = iptable.NewChain(IsolationChain1, iptables.Filter, false) ...// line 81 新建isolationChain2路由表 isolationChain2, err = iptable.NewChain(IsolationChain2, iptables.Filter, false)}
Docker 网络初始化
在注册 Network Driver 的时候,还会初始化 Docker 的网络,包括创建默认网桥 docker0 和写入默认路由规则等,逻辑入口也在初始化 Network Controller 中,具体如下:
daemon/daemon_unix.go
// line 835func(daemon *Daemon)initNetworkController(cfg *config.Config, activeSandboxes map[string]interface{})error { ...// line 846 配置Docker网络iflen(activeSandboxes) > 0 { log.G(context.TODO()).Info("there are running containers, updated network configuration will not take affect") } elseif err := configureNetworking(daemon.netController, cfg); err != nil {return err } ...}
逻辑入口如上所示,当宿主机存在还处在运行状态的容器时,对 Docker 网络的修改不会生效,这是为了保护运行容器的网络连通性。配置的逻辑如下所示:
daemon/daemon_unix.go
// line 857funcconfigureNetworking(controller *libnetwork.Controller, conf *config.Config)error { ...// line 873if n, err := controller.NetworkByName(network.NetworkBridge); err == nil {if err = n.Delete(); err != nil {return errors.Wrapf(err, `could not delete the default %q network`, network.NetworkBridge) }iflen(conf.NetworkConfig.DefaultAddressPools.Value()) > 0 && !conf.LiveRestoreEnabled { removeDefaultBridgeInterface() } }if !conf.DisableBridge {// Initialize default driver "bridge"if err := initBridgeDriver(controller, conf.BridgeConfig); err != nil {return err } } else { removeDefaultBridgeInterface() }returnnil}
上述源码中,可以看到 Docker Daemon 在第一次启动时,会判断是否存在 Docker 的默认网络,如果存在,则直接删除。删除完成后,再进行 Docker 网络的初始化,这就是为了避免已经存在的脏数据影响容器网络的连通性。初始化默认网络的逻辑如下:
daemon/daemon_unix.go
// line 921funcinitBridgeDriver(controller *libnetwork.Controller, cfg config.BridgeConfig)error {// line 922 设置网桥的名字为docker0,可在Docker Daemon配置文件中修改 bridgeName := bridge.DefaultBridgeNameif cfg.Iface != "" { bridgeName = cfg.Iface } ...// line 1053 网络初始化 _, err = controller.NewNetwork("bridge", network.NetworkBridge, "", libnetwork.NetworkOptionEnableIPv6(cfg.EnableIPv6), libnetwork.NetworkOptionDriverOpts(netOption), libnetwork.NetworkOptionIpam("default", "", v4Conf, v6Conf, nil), libnetwork.NetworkOptionDeferIPv6Alloc(deferIPv6Alloc))if err != nil {return fmt.Errorf(`error creating default %q network: %v`, network.NetworkBridge, err) }returnnil}
上述源码中,可以看到在初始化 Docker Daemon 网络时,设置了网络模式为 bridge,是否开启 IPv6,Ipam 的配置等,具体的网络初始化逻辑如下:
libnetwork/controller.go
// line 464func(c *Controller)NewNetwork(networkType, name string, id string, options ...NetworkOption)(_ *Network, retErr error) { ...// line 617 创建Docker网络if err := c.addNetwork(nw); err != nil { } ...}// line 798func(c *Controller)addNetwork(n *Network)error {// line 799 获取Network Driver d, err := n.driver(true) ...// line 805if err := d.CreateNetwork(n.id, n.generic, n, n.getIPData(4), n.getIPData(6)); err != nil {return err } ...}
因为设置的网络模式为 bridge,而之前已经注册了 Bridge Network Driver,所以这里的 d.CreateNetwork() 函数会调用 Bridge Network Driver 的实现:
libnetwork/drivers/bridge/bridge_linux.go
// line 634func(d *driver)CreateNetwork(id string, option map[string]interface{}, nInfo driverapi.NetworkInfo, ipV4Data, ipV6Data []driverapi.IPAMData)error { ...// line 673if err = d.createNetwork(config); err != nil {return err } ...}
关键函数是 d.createNetwork(),这个函数是具体的执行函数,主要逻辑如下:
libnetwork/drivers/bridge/bridge_linux.go
// line 694func(d *driver)createNetwork(config *networkConfiguration)(err error) { ...// line 752 创建虚拟网桥docker0 bridgeAlreadyExists := bridgeIface.exists()if !bridgeAlreadyExists {// 创建docker0虚拟网桥,并设置Mac地址 bridgeSetup.queueStep(setupDevice) ... } ...// line 777 根据步骤设置Docker Daemon网络,包括路由、firewall、gateway等for _, step := range []struct { Condition bool Fn setupStep }{ ...// line 798 设置路由 {d.config.EnableIPTables, network.setupIP4Tables}, ... } ...// line 827 启动docker0 bridgeSetup.queueStep(setupDeviceUp)return bridgeSetup.apply()}
上述源码中,setupDevice 函数会调用库vishvananda/netlink来创建默认网桥 docker0 并设置 Mac 地址。然后 network.setupIP4Tables 函数会在路由表里 Docker 路由表下新增几个默认的路由规则:
libnetwork/drivers/bridge/setup_ip_tables_linux.go
// line 141func(n *bridgeNetwork)setupIPTables(ipVersion iptables.IPVersion, maskedAddr *net.IPNet, config *networkConfiguration, i *bridgeInterface)error { ...// line 154if config.Internel { ... } else {// line 168 生成获取nat、filter路由表规则 natChain, filterChain, _, _, err := n.getDriverChains(ipVersion) ...// line 173 在iptables中添加nat表路由规则 err = iptable.ProgramChain(natChain, config.BridgeName, hairpinMode, true) ...// line 178 在iptables中添加filter表路由规则 err = iptable.ProgramChain(filterChain, config.BridgeName, hairpinMode, true) ... }}
至此,Docker Daemon 中的 Network Driver 已经注册完成,同时默认的路由表也完成了初始化,并添加了一些默认路由规则。后续在容器启动的时候,Docker Daemon 会根据容器的网络模式,选择对应的 Driver 去执行具体的网络配置操作。
Sandbox
在 Docker Daemon 中,Sandbox 是一个核心的抽象概念,它并不是指一个独立的程序,而是一套为容器提供隔离运行环境的资源和规则集合,和 Container 是一一对应的,其核心目标是确保容器内应用与外部环境(包括宿主机和其他容器)安全隔离。Sandbox 也会作为实例数据存储在本地的 KV 数据库中,供 Docker Daemon 查询和更新。
存储初始化
存储初始化还是在网络初始化的流程中,此处直接从存储初始化的入口开始:
libnetwork/controller.go
// line 107funcNew(cfgOptions ...config.Option)(*Controller, error) { ...// line 119 初始化网络信息KV存储数据库if err := c.initStores(); err != nil {returnnil, err } ...}
libnetwork/store.go
// line 13func(c *Controller)initStores()error { ... c.store, err = datastore.New(c.cfg.Scope) ...}
libnetwork/datastore/datastore.go
// line 80funcDefaultScope(dataDir string)ScopeCfg {var dbpath stringif dataDir == "" { dbpath = defaultPrefix + "/local-kv.db" } else { dbpath = dataDir + "/network/files/local-kv.db" }return ScopeCfg{ Client: ScopeClientCfg{ Provider: string(store.BOLTDB), Address: dbpath, Config: &store.Config{ Bucket: "libnetwork", ConnectionTimeout: time.Minute, }, }, }}// line 140funcNew(cfg ScopeCfg)(*Store, error) {if cfg.Client.Provider == "" || cfg.Client.Address == "" { cfg = DefaultScope("") }return newClient(cfg.Client.Provider, cfg.Client.Address, cfg.Client.Config)}
Docker Daemon 这里使用的 KV 数据库是etcd-io/bbolt。至此,KV 数据库的初始化完成,数据默认存储在路径/var/lib/docker/network/files/local-kv.db下,后续的容器网络信息都会以 Sandbox 的形式存储在这里。
创建与更新
每个容器会有且只有一个 Sandbox,Sandbox 的创建是在为容器分配网络的逻辑中,后续如果容器的网络信息由变化,则会更新该 Sandbox。更新逻辑较为分散,这里只介绍创建逻辑:
daemon/container_operations.go
// line 683func(daemon *Daemon)connectToNetwork(cfg *config.Config, container *container.Container, idOrName string, endpointConfig *network.EndpointSettings, updateSettings bool)(retErr error) { ...// line 757if sb == nil {// 构建创建Sandbox的参数 sbOptions, err := daemon.buildSandboxOptions(cfg, container)if err != nil {return err }// 创建Sandbox sb, err = daemon.netController.NewSandbox(container.ID, sbOptions...)if err != nil {return err } ... } ...}
libnetwork/controller.go
// line 870func(c *Controller)NewSandbox(containerID string, options ...SandboxOption)(_ *Sandbox, retErr error) { ...// line 898 构建Sandbox信息if sb == nil {// TODO(thaJeztah): given that a "containerID" must be unique in the list of sandboxes, is there any reason we're not using containerID as sandbox ID on non-Windows? sandboxID := containerIDif runtime.GOOS != "windows" { sandboxID = stringid.GenerateRandomID() } sb = &Sandbox{ id: sandboxID, containerID: containerID, endpoints: []*Endpoint{}, epPriority: map[string]int{}, populatedEndpoints: map[string]struct{}{}, config: containerConfig{}, controller: c, extDNS: []extDNSEntry{}, } } ...// line 962 存储/更新Sandbox信息if err := sb.storeUpdate(); err != nil {returnnil, fmt.Errorf("failed to update the store state of sandbox: %v", err) }return sb, nil}
至此,完成了 Sandbox 的创建并和容器进行绑定,后续容器网络的状态变化都会更新到 Sandbox 中。
容器网络初始化
创建 veth-pari
在向 Containerd 发送启动容器的请求之前,Docker Daemon 会先初始化容器的网络,容器网络初始化的逻辑入口函数是 initializeNetworking(),这个函数的调用逻辑可以查看上篇文章的Docker 源码解析:(三)Docker Daemon 处理请求与执行任务的方式,网络初始化具体逻辑如下:
daemon/container_operations.go
// line 924func(daemon *Daemon)initializeNetworking(cfg *config.Config, container *container.Container)error {// line 925 容器网络模式为Containerif container.HostConfig.NetworkMode.IsContainer() { } ...// line 950 分配网络if err := daemon.allocateNetwork(cfg, container); err != nil {return err }// 构建容器的Hostnamereturn container.BuildHostnameFile()}
daemon/container_operations.go
// line 479func(daemon *Daemon)allocateNetwork(cfg *config.Config, container *container.Container)(retErr error) { ...// line 505 defaultNetName := runconfig.DefaultDaemonNetworkMode().NetworkName()if nConf, ok := container.NetworkSettings.Networks[defaultNetName]; ok { cleanOperationalData(nConf)// 为容器分配网络并连接到默认网桥if err := daemon.connectToNetwork(cfg, container, defaultNetName, nConf, updateSettings); err != nil {return err } } ...}
connectToNetwork() 函数中,会调用 n.CreateEndpoint() 函数为容器创建虚拟网卡。
daemon/container_operations.go
// line 683func(daemon *Daemon)connectToNetwork(cfg *config.Config, container *container.Container, idOrName string, endpointConfig *network.EndpointSettings, updateSettings bool)(retErr error) { ...// line 732 构建创建虚拟网卡的选项 createOptions, err := buildCreateEndpointOptions(container, n, endpointConfig, sb, ipAddresses(cfg.DNS)) ...// line 738 创建虚拟网卡 ep, err := n.CreateEndpoint(endpointName, createOptions...) ...}
在 connectToNetwork() 函数中,需要关注两个函数调用:buildCreateEndpointOptions() 函数和 n.CreateEndpoint() 函数。buildCreateEndpointOptions() 函数的主要作用是为容器网络指定 IPAM、DNS 等选项参数范围,指定范围后,则会调用 n.CreateEndpoint()函数,根据制定的范围创建虚拟网卡。
libnetwork/network.go
// line 1124func(n *Network)CreateEndpoint(name string, options ...EndpointOption)(*Endpoint, error) { ...// line 1141 创建虚拟网卡return n.createEndpoint(name, options...)}
创建和配置虚拟网卡的具体逻辑如下:
libnetwork/network.go
// line 1144func(n *Network)createEndpoint(name string, options ...EndpointOption)(*Endpoint, error) { ...// line 1147 初始化虚拟网卡基础信息 ep := &Endpoint{name: name, generic: make(map[string]interface{}), iface: &EndpointInterface{}}// 生成随机ID ep.id = stringid.GenerateRandomID() ...// line 1160 执行指定范围的选项函数 ep.processOptions(options...) ...// line 1189 指定IPV4地址if err = ep.assignAddress(ipam, true, n.enableIPv6 && !n.postIPv6); err != nil {returnnil, err } ...// line 1198 创建虚拟网卡if err = n.addEndpoint(ep); err != nil {returnnil, err } ...// line 1211 更新网卡存储信息if err = n.getController().updateToStore(ep); err != nil {returnnil, err }}
创建虚拟网卡的逻辑如下:
libnetwork/network.go
// line 1107func(n *Network)addEndpoint(ep *Endpoint)error {// line 1108 获取Network Driver d, err := n.driver(true) ...// line 1113 调用Network Driver创建虚拟网卡 err = d.CreateEndpoint(n.id, ep.id, ep.Iface(), ep.generic) ...}
d.CreateEndpoint() 的逻辑就是调用对应网络模式的 Driver 创建虚拟网卡,具体的 Driver 已经在 Docker Daemon 初始化的时候,就完成注册了。容器默认使用 Bridge 的网络模式,所以这里也会调用 Bridge 的 Network Driver 来创建虚拟网卡:
libnetwork/drivers/bridge/bridge_linux.go
// line 929 Network Driver创建虚拟网卡func(d *driver)CreateEndpoint(nid, eid string, ifInfo driverapi.InterfaceInfo, epOptions map[string]interface{})error { ...// line 988 生成veth-pair宿主机端的名称 hostIfName, err := netutils.GenerateIfaceName(d.nlh, vethPrefix, vethLen)if err != nil {return err }// line 994 生成veth-pair容器端的名称 containerIfName, err := netutils.GenerateIfaceName(d.nlh, vethPrefix, vethLen)if err != nil {return err }// line 1000 创建veth-pair veth := &netlink.Veth{ LinkAttrs: netlink.LinkAttrs{Name: hostIfName, TxQLen: 0}, PeerName: containerIfName, }if err = d.nlh.LinkAdd(veth); err != nil {return types.InternalErrorf("failed to add the host (%s) <=> sandbox (%s) pair interfaces: %v", hostIfName, containerIfName, err) } ...// line 1051 将veth-pair的宿主机端连接到docker0if err = addToBridge(d.nlh, hostIfName, config.BridgeName); err != nil {return fmt.Errorf("adding interface %s to bridge %s failed: %v", hostIfName, config.BridgeName, err) } ...// line 1069 设置veth-pair的容器端的MAC地址if endpoint.macAddress == nil { endpoint.macAddress = electMacAddress(epConfig, endpoint.addr.IP)if err = ifInfo.SetMacAddress(endpoint.macAddress); err != nil {return err } }// line 1077 启动veth-pairif err = d.nlh.LinkSetUp(host); err != nil {return fmt.Errorf("could not set link up for host interface %s: %v", hostIfName, err) } ...// line 1105 更新网络配置if err = d.storeUpdate(endpoint); err != nil {return fmt.Errorf("failed to save bridge endpoint %.7s to store: %v", endpoint.id, err) }returnnil}
连接 docker0
veth-pair 一端链接到 docker0,逻辑很简单,就是调用 netlink 库,在宿主机侧将 veth-pair 的一端的 master 设置为 docker0 即可:
libnetwork/drivers/bridge/bridge_linux.go
// line 929 Network Driver创建虚拟网卡func(d *driver)CreateEndpoint(nid, eid string, ifInfo driverapi.InterfaceInfo, epOptions map[string]interface{})error { ...// line 1051 将veth-pair的宿主机端连接到docker0if err = addToBridge(d.nlh, hostIfName, config.BridgeName); err != nil {return fmt.Errorf("adding interface %s to bridge %s failed: %v", hostIfName, config.BridgeName, err) } ...}// line 908funcaddToBridge(nlh *netlink.Handle, ifaceName, bridgeName string)error { lnk, err := nlh.LinkByName(ifaceName)if err != nil {return fmt.Errorf("could not find interface %s: %v", ifaceName, err) }if err := nlh.LinkSetMaster(lnk, &netlink.Bridge{LinkAttrs: netlink.LinkAttrs{Name: bridgeName}}); err != nil { log.G(context.TODO()).WithError(err).Errorf("Failed to add %s to bridge via netlink", ifaceName)return err }returnnil}
连接 Sandbox
将 veth-pair 一端连接到 Sandbox,需要在 Sandbox 创建完成之后,这里说的连接到 Sandbox,其实指的是连接到 Sandbox 的 Net Namespace 中,具体逻辑如下:
daemon/container_operations.go
// line 683func(daemon *Daemon)connectToNetwork(cfg *config.Config, container *container.Container, idOrName string, endpointConfig *network.EndpointSettings, updateSettings bool)(retErr error) { ...// line 775if err := ep.Join(sb, joinOptions...); err != nil {return err } ...}
接着调用 libnetwork 的函数 Join()。Join() 将虚拟网卡连接到 Sandbox,并将分配给虚拟网卡的网络资源填充到 Sandbox 中:
libnetwork/endpoint.go
// line 459func(ep *Endpoint)Join(sb *Sandbox, options ...EndpointOption)error {if sb == nil || sb.ID() == "" || sb.Key() == "" {return types.InvalidParameterErrorf("invalid Sandbox passed to endpoint join: %v", sb) } sb.joinLeaveStart()defer sb.joinLeaveEnd()return ep.sbJoin(sb, options...)}// line 470func(ep *Endpoint)sbJoin(sb *Sandbox, options ...EndpointOption)(err error) { ...// line 543if err = sb.populateNetworkResources(ep); err != nil {return err } ...}
接着调用 Sandbox 的接口:
libnetwork/sandbox_linux.go
// line 286func(sb *Sandbox)populateNetworkResources(ep *Endpoint)error { ...// line 305if i != nil && i.srcName != "" {// line 319if err := sb.osSbox.AddInterface(i.srcName, i.dstPrefix, ifaceOptions...); err != nil {return fmt.Errorf("failed to add interface %s to sandbox: %v", i.srcName, err) } } ...}
AddInterface() 函数会将一个已经存在的虚拟网卡连接到 Sandbox 中:
libnetwork/osl/interface_linux.go
// line 162func(n *Namespace)AddInterface(srcName, dstPrefix string, options ...IfaceOption)error { ...// line 215 根据名称查询虚拟网卡 iface, err := nlh.LinkByName(i.srcName) ...// line 221 配置虚拟网卡之前先停止该虚拟网卡if err := nlh.LinkSetDown(iface); err != nil {return fmt.Errorf("failed to set link down: %v", err) }// line 226 配置虚拟网卡if err := configureInterface(nlh, iface, i); err != nil { } ...// line 242 轮询启动虚拟网卡for err = nlh.LinkSetUp(iface); err != nil && cnt < 3; cnt++ { ... } ...}
AddInterface() 函数逻辑很直接,就是先关闭该虚拟网卡,配置后再启动该虚拟网卡:
libnetwork/osl/interface_linux.go
// line 315funcconfigureInterface(nlh *netlink.Handle, iface netlink.Link, i *Interface)error { ifaceName := iface.Attrs().Name ifaceConfigurators := []struct { Fn func(*netlink.Handle, netlink.Link, *Interface)error ErrMessage string }{ {setInterfaceName, fmt.Sprintf("error renaming interface %q to %q", ifaceName, i.DstName())}, {setInterfaceMAC, fmt.Sprintf("error setting interface %q MAC to %q", ifaceName, i.MacAddress())}, {setInterfaceIP, fmt.Sprintf("error setting interface %q IP to %v", ifaceName, i.Address())}, {setInterfaceIPv6, fmt.Sprintf("error setting interface %q IPv6 to %v", ifaceName, i.AddressIPv6())}, {setInterfaceMaster, fmt.Sprintf("error setting interface %q master to %q", ifaceName, i.DstMaster())}, {setInterfaceLinkLocalIPs, fmt.Sprintf("error setting interface %q link local IPs to %v", ifaceName, i.LinkLocalAddresses())}, }for _, config := range ifaceConfigurators {if err := config.Fn(nlh, iface, i); err != nil {return fmt.Errorf("%s: %v", config.ErrMessage, err) } }returnnil}
可以看到在上述配置相中,包含了名称、IP、Mac 以及 Net Namespace 的配置,我们着重看下设置 Net Namespace 的逻辑:
libnetwork/osl/interface_linux.go
// line 337funcsetInterfaceMaster(nlh *netlink.Handle, iface netlink.Link, i *Interface)error {if i.DstMaster() == "" {returnnil }// 将虚拟网卡的Master设置为 Sandbox 的 Net Namespacereturn nlh.LinkSetMaster(iface, &netlink.Bridge{ LinkAttrs: netlink.LinkAttrs{Name: i.DstMaster()}, })}
上述所有代码中,主要功能就是创建 veth-pair 虚拟网卡,此时还是调用了库vishvananda/netlink来完成 veth-pair 的创建。创建完成之后,再将 veth-pair 一端作为容器端 连接到 Sandbox 中,另外一端作为宿主机端,并将宿主机端连接到 Bridge 网络模式的默认网桥 docker0 上,最后启动该 veth-pair 并更新网络 KV 数据库中的信息。至此,容器的虚拟网卡的配置和分配逻辑就结束了,下面我们看下容器的端口映射逻辑。
端口映射
之前 Docker 网络系列文章中详细描述了容器端口映射的原理就是修改宿主机的路由表,实现宿主机及外网的流量对容器的访问路径。下面我们就看看这部分的逻辑实现。设置端口映射路由表规则是在启动容器时,为容器配置网络后,逻辑入口如下:
daemon/container_operations.go
// line 683func(daemon *Daemon)connectToNetwork(cfg *config.Config, container *container.Container, idOrName string, endpointConfig *network.EndpointSettings, updateSettings bool)(retErr error) { ...// line 738 创建虚拟网卡 ep, err := n.CreateEndpoint(endpointName, createOptions...) ...// line 757 创建容器的Sandboxif sb == nil {// line 758 构建创建Sandbox的参数 sbOptions, err := daemon.buildSandboxOptions(cfg, container) ... } ...// line 775 构建网络if err := ep.Join(sb, joinOptions...); err != nil {return err } ...}
读取端口
在函数 daemon.buildSandboxOptions() 中,会从容器的 HostConfig 中获取需要映射的端口信息:
daemon/container_operations.go
// line 45func(daemon *Daemon)buildSandboxOptions(cfg *config.Config, container *container.Container)([]libnetwork.SandboxOption, error) { ...// line 97 获取Publish的端口 bindings := make(nat.PortMap)if container.HostConfig.PortBindings != nil {for p, b := range container.HostConfig.PortBindings { bindings[p] = []nat.PortBinding{}for _, bb := range b { bindings[p] = append(bindings[p], nat.PortBinding{ HostIP: bb.HostIP, HostPort: bb.HostPort, }) } } } ...// line 111 获取Exposed的端口 ports := make([]nat.Port, 0, len(container.Config.ExposedPorts))for p := range container.Config.ExposedPorts { ports = append(ports, p) } ...// line 117var ( publishedPorts []types.PortBinding exposedPorts []types.TransportPort )for _, port := range ports {// 整理所有的published和exposed的端口 ... }// line 155 将映射端口作为Sandbox的参数 sboxOptions = append(sboxOptions, libnetwork.OptionPortMapping(publishedPorts), libnetwork.OptionExposedPorts(exposedPorts)) ...}
经过整理,所有需要映射的端口都会作为 Sandbox 的参数。
路由规则
路由规则是根据 published 和 exposed 的端口,分别在路由表中插入路由规则,逻辑入口如下:
daemon/container_operations.go
// line 683func(daemon *Daemon)connectToNetwork(cfg *config.Config, container *container.Container, idOrName string, endpointConfig *network.EndpointSettings, updateSettings bool)(retErr error) { ...// line 775 构建网络if err := ep.Join(sb, joinOptions...); err != nil {return err } ...}
继续向下调用:
libnetwork/endpoint.go
// line 459func(ep *Endpoint)Join(sb *Sandbox, options ...EndpointOption)error { ...return ep.sbJoin(sb, options...)}// line 470func(ep *Endpoint)sbJoin(sb *Sandbox, options ...EndpointOption)(err error) { ...// line 584if moveExtConn { ...// line 609if !n.internal { log.G(context.TODO()).Debugf("Programming external connectivity on endpoint %s (%s)", ep.Name(), ep.ID())if err = d.ProgramExternalConnectivity(n.ID(), ep.ID(), sb.Labels()); err != nil {return types.InternalErrorf("driver failed programming external connectivity on endpoint %s (%s): %v", ep.Name(), ep.ID(), err) } } } ...}
接着调用 Bridge Network Driver 的 ProgramExternalConnectivity() 函数:
libnetwork/drivers/bridge/bridge_linux.go
// line 1299func(d *driver)ProgramExternalConnectivity(nid, eid string, options map[string]interface{})error { ...// line 1320 endpoint.portMapping, err = network.allocatePorts(endpoint, network.config.DefaultBindingIP, d.config.EnableUserlandProxy)if err != nil {return err } ...}
libnetwork/drivers/bridge/port_mapping_linux.go
// line 16func(n *bridgeNetwork)allocatePorts(ep *bridgeEndpoint, reqDefBindIP net.IP, ulPxyEnabled bool)([]types.PortBinding, error) { ... pb, err := n.allocatePortsInternal(ep.extConnConfig.PortBindings, ep.addr.IP, containerIPv6, defHostIP, ulPxyEnabled) ...}// line 38func(n *bridgeNetwork)allocatePortsInternal(bindings []types.PortBinding, containerIPv4, containerIPv6, defHostIP net.IP, ulPxyEnabled bool)([]types.PortBinding, error) { ...// line 40for _, c := range bindings { ...// line 44 校验IPv4端口合法性if ok := n.validatePortBindingIPv4(&bIPv4, containerIPv4, defHostIP); ok {// line 45 分配端口if err := n.allocatePort(&bIPv4, ulPxyEnabled); err != nil { ... } bs = append(bs, bIPv4) } }returnnil}// line 131func(n *bridgeNetwork)allocatePort(bnd *types.PortBinding, ulPxyEnabled bool)error { ...// line 155 循环尝试为Publised的端口分配宿主机端口for i := 0; i < maxAllocatePortAttempts; i++ {if host, err = portmapper.MapRange(container, bnd.HostIP, int(bnd.HostPort), int(bnd.HostPortEnd), ulPxyEnabled); err == nil {break } ... } ...}
libnetwork/portmapper/mapper.go
// line 55func(pm *PortMapper)MapRange(container net.Addr, hostIP net.IP, hostPortStart, hostPortEnd int, useProxy bool)(host net.Addr, retErr error) { ...// line 173if err := pm.AppendForwardingTableEntry(m.proto, hostIP, allocatedHostPort, containerIP.String(), containerPort); err != nil {returnnil, err } ...}
libnetwork/portmapper/mapper_linux.go
// line 32func(pm *PortMapper)AppendForwardingTableEntry(proto string, sourceIP net.IP, sourcePort int, containerIP string, containerPort int)error {return pm.forward(iptables.Append, proto, sourceIP, sourcePort, containerIP, containerPort)}// line 41func(pm *PortMapper)forward(action iptables.Action, proto string, sourceIP net.IP, sourcePort int, containerIP string, containerPort int)error {if pm.chain == nil {returnnil }return pm.chain.Forward(action, sourceIP, sourcePort, proto, containerIP, containerPort, pm.bridgeName)}
libnetwork/iptables/iptables.go
// line 314func(c *ChainInfo)Forward(action Action, ip net.IP, port int, proto, destAddr string, destPort int, bridgeName string)error { ...// line 335if err := iptable.ProgramRule(Nat, c.Name, action, args); err != nil {return err } ...// line 347if err := iptable.ProgramRule(Filter, c.Name, action, args); err != nil {return err } ...// line 359if err := iptable.ProgramRule(Nat, "POSTROUTING", action, args); err != nil {return err } ...}
经过了上面所有的过程, Docker Daemon 就为容器分配了 veth-pair,一端链接在默认网桥 docker0,另一端链接在容器中,并为容器以及用户 published 的容器端口写入了路由规则,使得宿主机和外部流量可以访问到容器内部,至此,容器的网络初始化完成。
夜雨聆风
