|
|
@@ -114,11 +114,11 @@ would be sub-port 0 on port 1 on switch 1.
|
|
|
Switch ID
|
|
|
^^^^^^^^^
|
|
|
|
|
|
-The switchdev driver must implement the switchdev op switchdev_port_attr_get for
|
|
|
-SWITCHDEV_ATTR_PORT_PARENT_ID for each port netdev, returning the same physical ID
|
|
|
-for each port of a switch. The ID must be unique between switches on the same
|
|
|
-system. The ID does not need to be unique between switches on different
|
|
|
-systems.
|
|
|
+The switchdev driver must implement the switchdev op switchdev_port_attr_get
|
|
|
+for SWITCHDEV_ATTR_PORT_PARENT_ID for each port netdev, returning the same
|
|
|
+physical ID for each port of a switch. The ID must be unique between switches
|
|
|
+on the same system. The ID does not need to be unique between switches on
|
|
|
+different systems.
|
|
|
|
|
|
The switch ID is used to locate ports on a switch and to know if aggregated
|
|
|
ports belong to the same switch.
|
|
|
@@ -142,7 +142,7 @@ The port netdevs representing the physical switch ports can be organized into
|
|
|
higher-level switching constructs. The default construct is a standalone
|
|
|
router port, used to offload L3 forwarding. Two or more ports can be bonded
|
|
|
together to form a LAG. Two or more ports (or LAGs) can be bridged to bridge
|
|
|
-to L2 networks. VLANs can be applied to sub-divide L2 networks. L2-over-L3
|
|
|
+L2 networks. VLANs can be applied to sub-divide L2 networks. L2-over-L3
|
|
|
tunnels can be built on ports. These constructs are built using standard Linux
|
|
|
tools such as the bridge driver, the bonding/team drivers, and netlink-based
|
|
|
tools such as iproute2.
|
|
|
@@ -177,6 +177,10 @@ entries are installed, for example, using iproute2 bridge cmd:
|
|
|
|
|
|
bridge fdb add ADDR dev DEV [vlan VID] [self]
|
|
|
|
|
|
+The driver should use the helper switchdev_port_fdb_xxx ops for ndo_fdb_xxx
|
|
|
+ops, and handle add/delete/dump of SWITCHDEV_OBJ_PORT_FDB object using
|
|
|
+switchdev_port_obj_xxx ops.
|
|
|
+
|
|
|
XXX: what should be done if offloading this rule to hardware fails (for
|
|
|
example, due to full capacity in hardware tables) ?
|
|
|
|
|
|
@@ -194,11 +198,11 @@ in turn, will notify the bridge driver using the switchdev notifier call:
|
|
|
|
|
|
err = call_switchdev_notifiers(val, dev, info);
|
|
|
|
|
|
-Where val is SWITCHDEV_FDB_ADD when learning and SWITCHDEV_FDB_DEL when forgetting, and
|
|
|
-info points to a struct switchdev_notifier_fdb_info. On SWITCHDEV_FDB_ADD, the bridge
|
|
|
-driver will install the FDB entry into the bridge's FDB and mark the entry as
|
|
|
-NTF_EXT_LEARNED. The iproute2 bridge command will label these entries
|
|
|
-"offload":
|
|
|
+Where val is SWITCHDEV_FDB_ADD when learning and SWITCHDEV_FDB_DEL when
|
|
|
+forgetting, and info points to a struct switchdev_notifier_fdb_info. On
|
|
|
+SWITCHDEV_FDB_ADD, the bridge driver will install the FDB entry into the
|
|
|
+bridge's FDB and mark the entry as NTF_EXT_LEARNED. The iproute2 bridge
|
|
|
+command will label these entries "offload":
|
|
|
|
|
|
$ bridge fdb
|
|
|
52:54:00:12:35:01 dev sw1p1 master br0 permanent
|
|
|
@@ -229,18 +233,18 @@ the bridge's FDB. It's possible, but not optimal, to enable learning on the
|
|
|
device port and on the bridge port, and disable learning_sync.
|
|
|
|
|
|
To support learning and learning_sync port attributes, the driver implements
|
|
|
-switchdev op switchdev_port_attr_get/set for SWITCHDEV_ATTR_PORT_BRIDGE_FLAGS. The driver
|
|
|
-should initialize the attributes to the hardware defaults.
|
|
|
+switchdev op switchdev_port_attr_get/set for SWITCHDEV_ATTR_PORT_BRIDGE_FLAGS.
|
|
|
+The driver should initialize the attributes to the hardware defaults.
|
|
|
|
|
|
FDB Ageing
|
|
|
^^^^^^^^^^
|
|
|
|
|
|
There are two FDB ageing models supported: 1) ageing by the device, and 2)
|
|
|
ageing by the kernel. Ageing by the device is preferred if many FDB entries
|
|
|
-are supported. The driver calls call_switchdev_notifiers(SWITCHDEV_FDB_DEL, ...) to
|
|
|
-age out the FDB entry. In this model, ageing by the kernel should be turned
|
|
|
-off. XXX: how to turn off ageing in kernel on a per-port basis or otherwise
|
|
|
-prevent the kernel from ageing out the FDB entry?
|
|
|
+are supported. The driver calls call_switchdev_notifiers(SWITCHDEV_FDB_DEL,
|
|
|
+...) to age out the FDB entry. In this model, ageing by the kernel should be
|
|
|
+turned off. XXX: how to turn off ageing in kernel on a per-port basis or
|
|
|
+otherwise prevent the kernel from ageing out the FDB entry?
|
|
|
|
|
|
In the kernel ageing model, the standard bridge ageing mechanism is used to age
|
|
|
out stale FDB entries. To keep an FDB entry "alive", the driver should refresh
|
|
|
@@ -262,8 +266,8 @@ STP State Change on Port
|
|
|
|
|
|
Internally or with a third-party STP protocol implementation (e.g. mstpd), the
|
|
|
bridge driver maintains the STP state for ports, and will notify the switch
|
|
|
-driver of STP state change on a port using the switchdev op switchdev_attr_port_set for
|
|
|
-SWITCHDEV_ATTR_PORT_STP_UPDATE.
|
|
|
+driver of STP state change on a port using the switchdev op
|
|
|
+switchdev_attr_port_set for SWITCHDEV_ATTR_PORT_STP_UPDATE.
|
|
|
|
|
|
State is one of BR_STATE_*. The switch driver can use STP state updates to
|
|
|
update ingress packet filter list for the port. For example, if port is
|
|
|
@@ -296,33 +300,38 @@ IGMP Snooping
|
|
|
XXX: complete this section
|
|
|
|
|
|
|
|
|
-L3 routing
|
|
|
-----------
|
|
|
+L3 Routing Offload
|
|
|
+------------------
|
|
|
|
|
|
Offloading L3 routing requires that device be programmed with FIB entries from
|
|
|
the kernel, with the device doing the FIB lookup and forwarding. The device
|
|
|
does a longest prefix match (LPM) on FIB entries matching route prefix and
|
|
|
-forwards the packet to the matching FIB entry's nexthop(s) egress ports. To
|
|
|
-program the device, the switchdev driver is called with add/delete ops for IPv4
|
|
|
-and IPv6 FIB entries. For IPv4, the driver implements switchdev ops:
|
|
|
-
|
|
|
- int (*switchdev_fib_ipv4_add)(struct net_device *dev,
|
|
|
- __be32 dst, int dst_len,
|
|
|
- struct fib_info *fi,
|
|
|
- u8 tos, u8 type,
|
|
|
- u32 nlflags, u32 tb_id);
|
|
|
-
|
|
|
- int (*switchdev_fib_ipv4_del)(struct net_device *dev,
|
|
|
- __be32 dst, int dst_len,
|
|
|
- struct fib_info *fi,
|
|
|
- u8 tos, u8 type,
|
|
|
- u32 tb_id);
|
|
|
-
|
|
|
-to add/delete IPv4 dst/dest_len prefix on table tb_id. The *fi structure holds
|
|
|
-details on the route and route's nexthops. *dev is one of the port netdevs
|
|
|
-mentioned in the routes next hop list. If the output port netdevs referenced
|
|
|
-in the route's nexthop list don't all have the same switch ID, the driver is
|
|
|
-not called to add/delete the FIB entry.
|
|
|
+forwards the packet to the matching FIB entry's nexthop(s) egress ports.
|
|
|
+
|
|
|
+To program the device, the driver implements support for
|
|
|
+SWITCHDEV_OBJ_IPV[4|6]_FIB object using switchdev_port_obj_xxx ops.
|
|
|
+switchdev_port_obj_add is used for both adding a new FIB entry to the device,
|
|
|
+or modifying an existing entry on the device.
|
|
|
+
|
|
|
+XXX: Currently, only SWITCHDEV_OBJ_IPV4_FIB objects are supported.
|
|
|
+
|
|
|
+SWITCHDEV_OBJ_IPV4_FIB object passes:
|
|
|
+
|
|
|
+ struct switchdev_obj_ipv4_fib { /* IPV4_FIB */
|
|
|
+ u32 dst;
|
|
|
+ int dst_len;
|
|
|
+ struct fib_info *fi;
|
|
|
+ u8 tos;
|
|
|
+ u8 type;
|
|
|
+ u32 nlflags;
|
|
|
+ u32 tb_id;
|
|
|
+ } ipv4_fib;
|
|
|
+
|
|
|
+to add/modify/delete IPv4 dst/dest_len prefix on table tb_id. The *fi
|
|
|
+structure holds details on the route and route's nexthops. *dev is one of the
|
|
|
+port netdevs mentioned in the routes next hop list. If the output port netdevs
|
|
|
+referenced in the route's nexthop list don't all have the same switch ID, the
|
|
|
+driver is not called to add/modify/delete the FIB entry.
|
|
|
|
|
|
Routes offloaded to the device are labeled with "offload" in the ip route
|
|
|
listing:
|
|
|
@@ -340,7 +349,7 @@ listing:
|
|
|
12.0.0.4 via 11.0.0.9 dev sw1p2 proto zebra metric 20 offload
|
|
|
192.168.0.0/24 dev eth0 proto kernel scope link src 192.168.0.15
|
|
|
|
|
|
-XXX: add/del IPv6 FIB API
|
|
|
+XXX: add/mod/del IPv6 FIB API
|
|
|
|
|
|
Nexthop Resolution
|
|
|
^^^^^^^^^^^^^^^^^^
|