-
Notifications
You must be signed in to change notification settings - Fork 106
Feature/multiserver plugin #3421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Changes from all commits
cac59e3
c0c2a1a
cdb71d5
c9f2a6f
411ccb6
a0b83a1
fa663fd
17089b7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,317 @@ | ||
| (plugin-multiserver)= | ||
| # Splitting Topologies Across Multiple Servers | ||
|
|
||
| The *multiserver* plugin distributes a single *netlab* topology across multiple physical servers. It assigns nodes to servers, classifies links as local or cross-server, and generates a self-contained containerlab configuration directory for each server with VXLAN-based interconnects. | ||
|
|
||
| ```eval_rst | ||
| .. contents:: Table of Contents | ||
| :depth: 2 | ||
| :local: | ||
| :backlinks: none | ||
| ``` | ||
|
|
||
| ```{warning} | ||
| * All physical servers must have direct IP reachability (e.g. over a management network or dedicated interconnect). | ||
| ``` | ||
|
|
||
| ## Using the Plugin | ||
|
|
||
| * Add `plugin: [ multiserver ]` to lab topology. | ||
| * Define target servers in the **multiserver.servers** dictionary. | ||
| * Choose an assignment mode (`explicit` or `auto`) with **multiserver.assignment**. | ||
|
|
||
| The plugin runs during `netlab create` and generates self-contained per-server directories (e.g. `server-srv1/`, `server-srv2/`) with tailored `clab.yml` files, node configs, and VXLAN scripts ready for deployment. | ||
|
|
||
| ## Configuring Plugin Parameters | ||
|
|
||
| The plugin is configured with the **multiserver** topology-level dictionary that has these parameters: | ||
|
|
||
| | Parameter | Type | Meaning | | ||
| |-----------|------|---------| | ||
| | **assignment** | string | How to assign nodes to servers: `explicit` (default) or `auto` | | ||
| | **servers** | dictionary | Target physical servers, keyed by server name | | ||
| | **vxlan** | dictionary | Global settings for VXLAN tunnels | | ||
| | **replicate** | list | Nodes or groups that must be duplicated on all servers | | ||
| | **output_dir** | string | Template for per-server directory names (default: `server-{server_name}`); supports `{server_name}`, `{server_id}`, and `{name}` (topology name) | | ||
| | **copy_dirs** | list | Subdirectories copied into every server directory (default: `[group_vars, templates]`); overrides the default list | | ||
| | **copy_files** | list | Top-level files copied into every server directory (default: `[ansible.cfg]`); overrides the default list | | ||
| | **extra_copy_dirs** | list | Additional subdirectories to copy on top of **copy_dirs** | | ||
| | **extra_copy_files** | list | Additional top-level files to copy on top of **copy_files** | | ||
|
|
||
| (multiserver-servers)= | ||
| ### Server Parameters | ||
|
|
||
| The **multiserver.servers** dictionary is keyed by server name (e.g. `srv1`, `dc-east`). The name is used for per-server directory names and log messages, and because servers are a dictionary, duplicate server names are impossible. Each entry supports these parameters: | ||
|
|
||
| | Parameter | Type | Meaning | | ||
| |-----------|------|---------| | ||
| | **id** | integer | Numeric identifier used for VXLAN bookkeeping; auto-assigned if omitted | | ||
| | **host** | string | IP address or hostname of the remote server | | ||
| | **groups** | list | *netlab* groups whose members are assigned to this server | | ||
| | **members** | list | Individual node names assigned to this server | | ||
| | **vxlan_dev** | string | Physical interface to bind VXLAN tunnels to on this server | | ||
| | **weight** | integer | Relative capacity for auto-assignment (default: `1`); a server with `weight: 2` absorbs twice as many nodes before being considered as loaded as a server with `weight: 1` | | ||
|
|
||
| (multiserver-vxlan)= | ||
| ### VXLAN Parameters | ||
|
|
||
| Global VXLAN settings are specified in the **multiserver.vxlan** dictionary: | ||
|
|
||
| | Parameter | Type | Meaning | | ||
| |-----------|------|---------| | ||
| | **vni_base** | integer | Starting VNI for cross-server links (default: `10000`) | | ||
| | **dstport** | integer | UDP destination port for VXLAN traffic (default: `4789`) | | ||
| | **dev** | string | Default physical interface to bind VXLAN tunnels (default: `ens33`) | | ||
|
|
||
| By default, VXLAN tunnels bind to the global default interface specified in **multiserver.vxlan.dev** (which falls back to `ens33` if not configured). If your physical servers use different interface names, you can override this interface per-server using the **vxlan_dev** parameter under each server in the **multiserver.servers** dictionary. | ||
|
|
||
| (multiserver-assignment)= | ||
| ## Assignment Modes | ||
|
|
||
| ### Explicit Assignment (Default) | ||
|
|
||
| In `explicit` mode, every node must be mapped to a server using the **groups** or **members** attributes of a [server entry](multiserver-servers). Any unassigned node (excluding [replicated nodes](multiserver-replicate)) results in an error. | ||
|
|
||
| ```yaml | ||
| plugin: [ multiserver ] | ||
|
|
||
| multiserver: | ||
| assignment: explicit | ||
| servers: | ||
| srv1: | ||
| host: 192.168.168.128 | ||
| groups: [ core ] | ||
| members: [ edge-node ] | ||
| srv2: | ||
| host: 192.168.168.129 | ||
| groups: [ spines, leaves ] | ||
| ``` | ||
|
|
||
| ### Automatic Assignment | ||
|
|
||
| In `auto` mode, nodes that are not explicitly pinned to a server are distributed automatically using a greedy balancing algorithm: | ||
|
Muddyblack marked this conversation as resolved.
|
||
|
|
||
| 1. Nodes belonging to a *netlab* group are kept together — the entire group is placed on the server with the lowest current load. Larger groups are placed first for better balance. | ||
| 2. Remaining ungrouped nodes are assigned one at a time to the least-loaded server. | ||
|
Muddyblack marked this conversation as resolved.
|
||
|
|
||
| **Load** is defined as `(assigned node count) / weight`, where **weight** defaults to `1`. Nodes already pinned via **groups** or **members** attributes count toward server load, so the algorithm balances around any explicit assignments. | ||
|
|
||
| ```yaml | ||
| plugin: [ multiserver ] | ||
|
|
||
| multiserver: | ||
| assignment: auto | ||
| servers: | ||
| srv1: | ||
| host: 192.168.168.128 | ||
| srv2: | ||
| host: 192.168.168.129 | ||
| ``` | ||
|
|
||
| Use **weight** to account for servers with different capacities. A server with `weight: 2` is treated as twice as capable and absorbs proportionally more nodes before being considered equally loaded: | ||
|
|
||
| ```yaml | ||
| multiserver: | ||
| assignment: auto | ||
| servers: | ||
| srv1: | ||
| host: 192.168.168.128 | ||
| weight: 1 # smaller server | ||
| srv2: | ||
| host: 192.168.168.129 | ||
| weight: 2 # larger server — gets roughly twice as many nodes | ||
| ``` | ||
|
|
||
| ```{tip} | ||
| You can pin specific nodes or groups to a server in `auto` mode using **groups** and **members** attributes. Only unassigned nodes are auto-distributed. | ||
| ``` | ||
|
|
||
| #### Group Granularity | ||
|
|
||
| Because auto mode keeps entire groups together on a single server, the granularity of your groups directly affects how evenly nodes are distributed. Define groups at the smallest unit you want to keep on one server. | ||
|
|
||
| For example, consider a topology with two sites, each containing five nodes: | ||
|
|
||
| ```yaml | ||
| # BAD: one large group — all 10 nodes land on one server | ||
| groups: | ||
| sites: | ||
| members: [ site1-r1, site1-r2, site1-r3, site1-r4, site1-r5, | ||
| site2-r1, site2-r2, site2-r3, site2-r4, site2-r5 ] | ||
| ``` | ||
|
|
||
| ```yaml | ||
| # GOOD: per-site groups — one site per server | ||
| groups: | ||
| site1: | ||
| members: [ site1-r1, site1-r2, site1-r3, site1-r4, site1-r5 ] | ||
| site2: | ||
| members: [ site2-r1, site2-r2, site2-r3, site2-r4, site2-r5 ] | ||
| sites: | ||
| members: [ site1-r1, site1-r2, site1-r3, site1-r4, site1-r5, | ||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You might want to point out in a comment that it's even better to use |
||
| site2-r1, site2-r2, site2-r3, site2-r4, site2-r5 ] | ||
| ``` | ||
|
|
||
| In the second example the parent `sites` group can still be used for Ansible targeting or shared configuration — it does not affect placement because the child groups (`site1`, `site2`) claim their members first during assignment. | ||
|
|
||
| ```{note} | ||
| Groups are processed in definition order. Child groups defined **before** a parent group will claim their members first, making the parent group a no-op for assignment. Always define fine-grained groups before aggregate groups in your topology. | ||
| ``` | ||
|
|
||
| (multiserver-replicate)= | ||
| ### Replicated Nodes | ||
|
|
||
| Nodes listed in **multiserver.replicate** are instantiated on every server. This is useful for infrastructure services that need local access on each physical host — for example, monitoring collectors, route reflectors, or DNS resolvers. | ||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wonder how well the inevitably overlapping IP addresses work. Also, I don't think route reflectors are a good example (I can easily see how that would result in split routing). It might be best to move this section to the end of the document and use your specific example, including an explanation of how the overlapping IP addresses are resolved as you're effectively deploying an implicit anycast service. On a second thought, maybe that's a better way to go -- require an explicit anycast service? I'm fine with whatever you decide is best, and this is not a showstopper. It's just that I can see too many unexpected consequences, so there should be a large enough "THERE BE DRAGONS" sign attached to this concept ;) |
||
|
|
||
| Links connecting to replicated nodes are always treated as local, so traffic between a replicated node and its neighbors never crosses the VXLAN overlay. | ||
|
|
||
| ```yaml | ||
| multiserver: | ||
| assignment: auto | ||
| servers: | ||
| srv1: | ||
| host: 192.168.168.128 | ||
| srv2: | ||
| host: 192.168.168.129 | ||
| replicate: [ prometheus, grafana ] | ||
| ``` | ||
|
|
||
| ## Complete Example | ||
|
|
||
| A minimal two-server topology with explicit assignment: | ||
|
|
||
| ```yaml | ||
| plugin: [ multiserver ] | ||
|
|
||
| provider: clab | ||
|
|
||
| groups: | ||
| spines: | ||
| members: [ s1, s2 ] | ||
| leaves: | ||
| members: [ l1, l2 ] | ||
|
|
||
| nodes: | ||
| s1: | ||
| device: srlinux | ||
| s2: | ||
| device: srlinux | ||
| l1: | ||
| device: srlinux | ||
| l2: | ||
| device: srlinux | ||
|
|
||
| links: | ||
| - s1-l1 | ||
| - s1-l2 | ||
| - s2-l1 | ||
| - s2-l2 | ||
|
|
||
| multiserver: | ||
| assignment: explicit | ||
| servers: | ||
| spine-host: | ||
| host: 192.168.168.128 | ||
| groups: [ spines ] | ||
| vxlan_dev: ens33 # Override per-server (optional) | ||
| leaf-host: | ||
| host: 192.168.168.129 | ||
| groups: [ leaves ] | ||
| vxlan_dev: eth0 # Override per-server (optional) | ||
| vxlan: | ||
| vni_base: 10000 | ||
| dev: ens33 # Global default interface | ||
| ``` | ||
|
|
||
| This places spines on `spine-host` and leaves on `leaf-host`. All four links cross servers and are provisioned as containerlab native VXLAN endpoints. | ||
|
|
||
| ## Behind the Scenes | ||
|
|
||
| When the plugin processes the topology, it classifies links into three categories: | ||
|
|
||
| * **Local links** connecting nodes on the same server remain as regular containerlab veth pairs or bridges. | ||
| * **Cross-server point-to-point links** are provisioned via containerlab's native VXLAN link endpoints (`type: vxlan` in `clab.yml`). | ||
| * **Cross-server multi-access links** use a local Linux bridge on each server, interconnected via host-level VXLAN tunnels configured by generated setup scripts. | ||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here's a crazy idea: what if you implemented cross-server multi-access links with Linux bridge nodes (https://netlab.tools/node/roles/#implementing-multi-access-links-with-bridge-nodes) -- when analyzing the topology, you could add necessary bridges to nodes and the bridge attribute to links, totally removing the need for extra provisioning scripts. OTOH, while this would make the end result simpler, you would need a very careful orchestration of steps between this plugin and the bridge code (https://github.com/ipspace/netlab/blob/dev/netsim/roles/bridge.py). You'd have to create the bridge nodes before the Worst case, I could add another plugin hook ;)
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is starting to look more and more like a core feature, not a plugin Perhaps we should add the notion of 'servers' to the topology, and add vxlan support for containerlab links? |
||
|
|
||
| Each per-server directory is self-contained and includes: | ||
|
|
||
| * A tailored `clab.yml` with only the relevant nodes and cross-server VXLAN interfaces | ||
| * A filtered `netlab.snapshot.pickle` for use with `netlab up --snapshot` | ||
| * A filtered `hosts.yml` containing only the nodes assigned to that server, so `netlab initial` does not attempt to configure nodes on other servers | ||
| * Copies of `node_files/` and `host_vars/` for only the nodes on that server | ||
| * Copies of the directories and files listed in **multiserver.copy_dirs** and **multiserver.copy_files** | ||
| * Per-server `vxlan-setup.sh` and `vxlan-teardown.sh` scripts (when multi-access VXLAN tunnels are needed), registered in that server's snapshot as [CLI hooks](dev-cli-hooks) (`netlab.up.post_start_clab` / `netlab.down.pre_stop_clab`) so `netlab up` and `netlab down` run them automatically on the remote host | ||
|
|
||
| (multiserver-deployment)= | ||
| ## Deployment Workflow | ||
|
|
||
| ```{note} | ||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A minor detail: You might want to use the "pre_probe" netlab up hook on the master node to abort the "netlab up" process. Set a flag in the topology defaults (or some such) after writing the worker pickles and check it in the pre_probe hook. |
||
| The plugin does **not** orchestrate remote servers. It runs only on the control node during `netlab create`, where it generates a self-contained directory per server. It never opens SSH connections, runs commands remotely, or copies files to other hosts. You copy each directory to its server yourself (Step 2), and `netlab` then runs **independently on each server** (Step 3) — the per-server VXLAN CLI hooks fire locally on that server, not from the control node. | ||
| ``` | ||
|
|
||
| **Step 1: Generate configurations** on your workstation: | ||
|
|
||
| ```bash | ||
| netlab create topology.yml | ||
| ``` | ||
|
|
||
| The plugin automatically copies all required files into each server directory — no extra bundling step is needed. | ||
|
|
||
| **Step 2: Copy server directories to remote hosts** (e.g. via rsync): | ||
|
ipspace marked this conversation as resolved.
|
||
|
|
||
| ```bash | ||
| rsync -avz server-spine-host/ user@192.168.168.128:~/lab/server-spine-host/ | ||
| rsync -avz server-leaf-host/ user@192.168.168.129:~/lab/server-leaf-host/ | ||
| ``` | ||
|
|
||
| **Step 3: Deploy on each server** by running the following on each remote host: | ||
|
|
||
| ```bash | ||
| sudo netlab up --snapshot -vv | ||
| ``` | ||
|
|
||
| When multi-access VXLAN tunnels are present, `netlab up` runs `vxlan-setup.sh` automatically via a [CLI hook](dev-cli-hooks) registered by the plugin. | ||
|
|
||
| ```{important} | ||
| **Why is `--snapshot` required on remote servers?** | ||
|
|
||
| You must run `sudo netlab up --snapshot` on remote servers to load the topology from the pre-generated snapshot (`netlab.snapshot.pickle`) instead of the original `topology.yml`. | ||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Usually we don't need "sudo" with "netlab up", unless you need it to set up VXLAN tunnels, in which case you could add sudo commands to that script. However, then you'd need to check whether sudo is installed (see the installation scripts). The "use bridge nodes and let containerlab deal with it" idea sounds more and more promising ;)) |
||
|
|
||
| Running with `topology.yml` directly on remote servers will fail because: | ||
| 1. **Consistency**: Netlab dynamically allocates IP addresses, interface IDs, and VXLAN VNIs. Independent creation runs on different hosts would result in mismatched allocations. | ||
| 2. **Recursion**: Running `netlab create` on `topology.yml` on the remote hosts would execute the `multiserver` plugin again, causing it to split the topology recursively and generate nested server subdirectories. | ||
| ``` | ||
|
|
||
| **Teardown** on each server: | ||
|
|
||
| ```bash | ||
| sudo netlab down | ||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Again, "sudo" is usually not needed. |
||
| ``` | ||
|
|
||
| When multi-access VXLAN tunnels are present, `netlab down` runs `vxlan-teardown.sh` automatically via a CLI hook registered by the plugin. | ||
|
|
||
| ## Customising What Gets Copied | ||
|
|
||
| By default, the plugin copies `group_vars/` and `templates/` subdirectories, plus `ansible.cfg`, into every server directory. To add extra items on top of the defaults, use **extra_copy_dirs** and **extra_copy_files**: | ||
|
|
||
| ```yaml | ||
| multiserver: | ||
| extra_copy_dirs: [ monitoring ] | ||
| extra_copy_files: [ netlab.lock ] | ||
| ``` | ||
|
|
||
| To replace the defaults entirely, use **copy_dirs** and **copy_files**: | ||
|
|
||
| ```yaml | ||
| multiserver: | ||
| copy_dirs: [ group_vars, templates, monitoring ] | ||
| copy_files: [ ansible.cfg, netlab.lock ] | ||
| ``` | ||
|
|
||
| The Ansible inventory (`hosts.yml`) is always written into each server directory and is automatically filtered to contain only the nodes assigned to that server. | ||
|
|
||
| ## Limitations | ||
|
|
||
| * Only the **containerlab** provider is supported. Libvirt and virtualbox topologies cannot be split across servers. | ||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. BTW, I'm perfectly fine with this. Totally out-of-scope, but I think we should focus primarily on containerlab and pester vrnetlab maintainers where needed to make the containers start faster. |
||
| * Cross-server VXLAN tunnels use a flat VNI space starting at **vni_base**. The maximum VNI value is 16777215 (24-bit). Topologies with more than ~16 million cross-server links will fail validation. | ||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Wow. Made my day 😂😂 |
||
| * All physical servers must have direct IP reachability — the plugin does not support NAT traversal or relay hosts between servers. | ||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We need a better term for the workers, as they might be VMs. |
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -996,6 +996,44 @@ def must_be_node_id(value: typing.Any) -> dict: | |
|
|
||
| return { '_valid': True } | ||
|
|
||
| @type_test() | ||
| def must_be_group_id(value: typing.Any) -> dict: | ||
| if not isinstance(value,str): # Otherwise it must be a string | ||
| return { '_type': 'valid group name (a string)' } | ||
|
|
||
| topology = global_vars.get_topology() # Try to get current lab topology | ||
| if topology is None: # pragma: no-cover | ||
| log.fatal('Calling group_id validation before the topology has been initialized') | ||
|
|
||
| if value not in topology.get('groups',{}): | ||
| return { | ||
| '_type': "group", | ||
| '_value': f"valid group name (found {value})", | ||
| '_hint_id': "groups", | ||
| '_hint': "Valid group names are "+", ".join(list(topology.get('groups',{}))) | ||
| } | ||
|
|
||
| return { '_valid': True } | ||
|
|
||
| @type_test() | ||
| def must_be_node_or_group(value: typing.Any) -> dict: | ||
| if not isinstance(value,str): # Otherwise it must be a string | ||
| return { '_type': 'valid node or group name (a string)' } | ||
|
|
||
| topology = global_vars.get_topology() # Try to get current lab topology | ||
| if topology is None: # pragma: no-cover | ||
| log.fatal('Calling node_or_group validation before the topology has been initialized') | ||
|
|
||
| if value not in topology.nodes and value not in topology.get('groups',{}): | ||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Personal preference: when fetching a value multiple times, I would fetch it into a local variable and then use that variable to enforce consistency. |
||
| return { | ||
| '_type': "node or group", | ||
| '_value': f"valid node or group name (found {value})", | ||
| '_hint_id': "node_or_group", | ||
| '_hint': "Valid node or group names are "+", ".join(list(topology.nodes) + list(topology.get('groups',{}))) | ||
| } | ||
|
|
||
| return { '_valid': True } | ||
|
|
||
| @type_test() | ||
| def must_be_r_proto(value: typing.Any) -> dict: | ||
| if not isinstance(value,str): | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would make more sense to make this
eth0(that's what you get on less-opinionated distros ;) or leave it undefined but make it a required attribute so the user is forced to define it.ens33is oddly specific.