Przeglądaj źródła

Merge branch 'bonding'

Veaceslav Falico says:

====================
bonding: add an option to rely on unvalidated arp packets

v4 -> v5:
Again per Nik's advise correct the bond_opts restrictions for arp_validate
- set it the same as arp_interval.

v3 -> v4:
Per Nikolay's advise, remove the new bond_opts restriction on modes setting
for arp_validate.

v2 -> v3:
Per Jay's advise, use the 'filter' keyword instead of 'arp' one, and use
his text for documentation. Also, rebase on the latest net-next. Sorry for
the delay, didn't manage to send it before net-next was closed.

v1 -> v2:
Don't remove the 'all traffic' functionality - rather, add new arp_validate
options to specify that we want *only* unvalidated arps.

Currently, if arp_validate is off (0), slave_last_rx() returns the
slave->dev->last_rx, which is always updated on *any* packet received by
slave, and not only arps. This means that, if the validation of arps is
off, we're treating *any* incoming packet as a proof of slave being up, and
not only arps.

This might seem logical at the first glance, however it can cause a lot of
troubles and false-positives, one example would be:

The arp_ip_target is NOT accessible, however someone in the broadcast domain
spams with any broadcast traffic. This way bonding will be tricked that the
slave is still up (as in - can access arp_ip_target), while it's not.

The net_device->last_rx is already used in a lot of drivers (even though the
comment states to NOT do it :)), and it's also ugly to modify it from bonding.

However, some loadbalance setups might rely on the fact that even non-arp
traffic is a sign of slave being up - and we definitely can't break anyones
config - so an extension to arp_validate is needed.

So, to fix this, add an option for the user to specify if he wants to
filter out non-arp traffic on unvalidated slaves, remove the last_rx from
bonding, *always* call bond_arp_rcv() in slave's rx_handler (which is
bond_handle_frame), and if we spot an arp there with this option on - update
the slave->last_arp_rx - and use it instead of net_device->last_rx. Finally,
rename last_arp_rx to last_rx to reflect the changes.

Also rename slave->jiffies to ->last_link_up, to reflect better its
meaning, add the new option's documentation and update the arp_validate one
to be a bit more descriptive.
====================

Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller 11 lat temu
rodzic
commit
82f148e992

+ 66 - 30
Documentation/networking/bonding.txt

@@ -270,16 +270,15 @@ arp_ip_target
 arp_validate
 
 	Specifies whether or not ARP probes and replies should be
-	validated in the active-backup mode.  This causes the ARP
-	monitor to examine the incoming ARP requests and replies, and
-	only consider a slave to be up if it is receiving the
-	appropriate ARP traffic.
+	validated in any mode that supports arp monitoring, or whether
+	non-ARP traffic should be filtered (disregarded) for link
+	monitoring purposes.
 
 	Possible values are:
 
 	none or 0
 
-		No validation is performed.  This is the default.
+		No validation or filtering is performed.
 
 	active or 1
 
@@ -293,31 +292,68 @@ arp_validate
 
 		Validation is performed for all slaves.
 
-	For the active slave, the validation checks ARP replies to
-	confirm that they were generated by an arp_ip_target.  Since
-	backup slaves do not typically receive these replies, the
-	validation performed for backup slaves is on the ARP request
-	sent out via the active slave.  It is possible that some
-	switch or network configurations may result in situations
-	wherein the backup slaves do not receive the ARP requests; in
-	such a situation, validation of backup slaves must be
-	disabled.
-
-	The validation of ARP requests on backup slaves is mainly
-	helping bonding to decide which slaves are more likely to
-	work in case of the active slave failure, it doesn't really
-	guarantee that the backup slave will work if it's selected
-	as the next active slave.
-
-	This option is useful in network configurations in which
-	multiple bonding hosts are concurrently issuing ARPs to one or
-	more targets beyond a common switch.  Should the link between
-	the switch and target fail (but not the switch itself), the
-	probe traffic generated by the multiple bonding instances will
-	fool the standard ARP monitor into considering the links as
-	still up.  Use of the arp_validate option can resolve this, as
-	the ARP monitor will only consider ARP requests and replies
-	associated with its own instance of bonding.
+	filter or 4
+
+		Filtering is applied to all slaves. No validation is
+		performed.
+
+	filter_active or 5
+
+		Filtering is applied to all slaves, validation is performed
+		only for the active slave.
+
+	filter_backup or 6
+
+		Filtering is applied to all slaves, validation is performed
+		only for backup slaves.
+
+	Validation:
+
+	Enabling validation causes the ARP monitor to examine the incoming
+	ARP requests and replies, and only consider a slave to be up if it
+	is receiving the appropriate ARP traffic.
+
+	For an active slave, the validation checks ARP replies to confirm
+	that they were generated by an arp_ip_target.  Since backup slaves
+	do not typically receive these replies, the validation performed
+	for backup slaves is on the broadcast ARP request sent out via the
+	active slave.  It is possible that some switch or network
+	configurations may result in situations wherein the backup slaves
+	do not receive the ARP requests; in such a situation, validation
+	of backup slaves must be disabled.
+
+	The validation of ARP requests on backup slaves is mainly helping
+	bonding to decide which slaves are more likely to work in case of
+	the active slave failure, it doesn't really guarantee that the
+	backup slave will work if it's selected as the next active slave.
+
+	Validation is useful in network configurations in which multiple
+	bonding hosts are concurrently issuing ARPs to one or more targets
+	beyond a common switch.  Should the link between the switch and
+	target fail (but not the switch itself), the probe traffic
+	generated by the multiple bonding instances will fool the standard
+	ARP monitor into considering the links as still up.  Use of
+	validation can resolve this, as the ARP monitor will only consider
+	ARP requests and replies associated with its own instance of
+	bonding.
+
+	Filtering:
+
+	Enabling filtering causes the ARP monitor to only use incoming ARP
+	packets for link availability purposes.  Arriving packets that are
+	not ARPs are delivered normally, but do not count when determining
+	if a slave is available.
+
+	Filtering operates by only considering the reception of ARP
+	packets (any ARP packet, regardless of source or destination) when
+	determining if a slave has received traffic for link availability
+	purposes.
+
+	Filtering is useful in network configurations in which significant
+	levels of third party broadcast traffic would fool the standard
+	ARP monitor into considering the links as still up.  Use of
+	filtering can resolve this, as only ARP traffic is considered for
+	link availability purposes.
 
 	This option was added in bonding version 3.1.0.
 

+ 24 - 32
drivers/net/bonding/bond_main.c

@@ -798,7 +798,7 @@ void bond_change_active_slave(struct bonding *bond, struct slave *new_active)
 		return;
 
 	if (new_active) {
-		new_active->jiffies = jiffies;
+		new_active->last_link_up = jiffies;
 
 		if (new_active->link == BOND_LINK_BACK) {
 			if (USES_PRIMARY(bond->params.mode)) {
@@ -1115,9 +1115,6 @@ static rx_handler_result_t bond_handle_frame(struct sk_buff **pskb)
 	slave = bond_slave_get_rcu(skb->dev);
 	bond = slave->bond;
 
-	if (bond->params.arp_interval)
-		slave->dev->last_rx = jiffies;
-
 	recv_probe = ACCESS_ONCE(bond->recv_probe);
 	if (recv_probe) {
 		ret = recv_probe(skb, bond, slave);
@@ -1400,10 +1397,10 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 
 	bond_update_speed_duplex(new_slave);
 
-	new_slave->last_arp_rx = jiffies -
+	new_slave->last_rx = jiffies -
 		(msecs_to_jiffies(bond->params.arp_interval) + 1);
 	for (i = 0; i < BOND_MAX_ARP_TARGETS; i++)
-		new_slave->target_last_arp_rx[i] = new_slave->last_arp_rx;
+		new_slave->target_last_arp_rx[i] = new_slave->last_rx;
 
 	if (bond->params.miimon && !bond->params.use_carrier) {
 		link_reporting = bond_check_dev_link(bond, slave_dev, 1);
@@ -1447,7 +1444,7 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 	}
 
 	if (new_slave->link != BOND_LINK_DOWN)
-		new_slave->jiffies = jiffies;
+		new_slave->last_link_up = jiffies;
 	pr_debug("Initial state of slave_dev is BOND_LINK_%s\n",
 		 new_slave->link == BOND_LINK_DOWN ? "DOWN" :
 		 (new_slave->link == BOND_LINK_UP ? "UP" : "BACK"));
@@ -1894,7 +1891,7 @@ static int bond_miimon_inspect(struct bonding *bond)
 				 * recovered before downdelay expired
 				 */
 				slave->link = BOND_LINK_UP;
-				slave->jiffies = jiffies;
+				slave->last_link_up = jiffies;
 				pr_info("%s: link status up again after %d ms for interface %s\n",
 					bond->dev->name,
 					(bond->params.downdelay - slave->delay) *
@@ -1969,7 +1966,7 @@ static void bond_miimon_commit(struct bonding *bond)
 
 		case BOND_LINK_UP:
 			slave->link = BOND_LINK_UP;
-			slave->jiffies = jiffies;
+			slave->last_link_up = jiffies;
 
 			if (bond->params.mode == BOND_MODE_8023AD) {
 				/* prevent it from being the active one */
@@ -2245,7 +2242,7 @@ static void bond_validate_arp(struct bonding *bond, struct slave *slave, __be32
 		pr_debug("bva: sip %pI4 not found in targets\n", &sip);
 		return;
 	}
-	slave->last_arp_rx = jiffies;
+	slave->last_rx = jiffies;
 	slave->target_last_arp_rx[i] = jiffies;
 }
 
@@ -2255,15 +2252,16 @@ int bond_arp_rcv(const struct sk_buff *skb, struct bonding *bond,
 	struct arphdr *arp = (struct arphdr *)skb->data;
 	unsigned char *arp_ptr;
 	__be32 sip, tip;
-	int alen;
+	int alen, is_arp = skb->protocol == __cpu_to_be16(ETH_P_ARP);
 
-	if (skb->protocol != __cpu_to_be16(ETH_P_ARP))
+	if (!slave_do_arp_validate(bond, slave)) {
+		if ((slave_do_arp_validate_only(bond, slave) && is_arp) ||
+		    !slave_do_arp_validate_only(bond, slave))
+			slave->last_rx = jiffies;
 		return RX_HANDLER_ANOTHER;
-
-	read_lock(&bond->lock);
-
-	if (!slave_do_arp_validate(bond, slave))
-		goto out_unlock;
+	} else if (!is_arp) {
+		return RX_HANDLER_ANOTHER;
+	}
 
 	alen = arp_hdr_len(bond->dev);
 
@@ -2314,11 +2312,10 @@ int bond_arp_rcv(const struct sk_buff *skb, struct bonding *bond,
 		bond_validate_arp(bond, slave, sip, tip);
 	else if (bond->curr_active_slave &&
 		 time_after(slave_last_rx(bond, bond->curr_active_slave),
-			    bond->curr_active_slave->jiffies))
+			    bond->curr_active_slave->last_link_up))
 		bond_validate_arp(bond, slave, tip, sip);
 
 out_unlock:
-	read_unlock(&bond->lock);
 	if (arp != (struct arphdr *)skb->data)
 		kfree(arp);
 	return RX_HANDLER_ANOTHER;
@@ -2361,9 +2358,9 @@ static void bond_loadbalance_arp_mon(struct work_struct *work)
 	oldcurrent = ACCESS_ONCE(bond->curr_active_slave);
 	/* see if any of the previous devices are up now (i.e. they have
 	 * xmt and rcv traffic). the curr_active_slave does not come into
-	 * the picture unless it is null. also, slave->jiffies is not needed
-	 * here because we send an arp on each slave and give a slave as
-	 * long as it needs to get the tx/rx within the delta.
+	 * the picture unless it is null. also, slave->last_link_up is not
+	 * needed here because we send an arp on each slave and give a slave
+	 * as long as it needs to get the tx/rx within the delta.
 	 * TODO: what about up/down delay in arp mode? it wasn't here before
 	 *       so it can wait
 	 */
@@ -2372,7 +2369,7 @@ static void bond_loadbalance_arp_mon(struct work_struct *work)
 
 		if (slave->link != BOND_LINK_UP) {
 			if (bond_time_in_interval(bond, trans_start, 1) &&
-			    bond_time_in_interval(bond, slave->dev->last_rx, 1)) {
+			    bond_time_in_interval(bond, slave->last_rx, 1)) {
 
 				slave->link  = BOND_LINK_UP;
 				slave_state_changed = 1;
@@ -2401,7 +2398,7 @@ static void bond_loadbalance_arp_mon(struct work_struct *work)
 			 * if we don't know our ip yet
 			 */
 			if (!bond_time_in_interval(bond, trans_start, 2) ||
-			    !bond_time_in_interval(bond, slave->dev->last_rx, 2)) {
+			    !bond_time_in_interval(bond, slave->last_rx, 2)) {
 
 				slave->link  = BOND_LINK_DOWN;
 				slave_state_changed = 1;
@@ -2489,7 +2486,7 @@ static int bond_ab_arp_inspect(struct bonding *bond)
 		 * active.  This avoids bouncing, as the last receive
 		 * times need a full ARP monitor cycle to be updated.
 		 */
-		if (bond_time_in_interval(bond, slave->jiffies, 2))
+		if (bond_time_in_interval(bond, slave->last_link_up, 2))
 			continue;
 
 		/*
@@ -2690,7 +2687,7 @@ static bool bond_ab_arp_probe(struct bonding *bond)
 	new_slave->link = BOND_LINK_BACK;
 	bond_set_slave_active_flags(new_slave);
 	bond_arp_send_all(bond, new_slave);
-	new_slave->jiffies = jiffies;
+	new_slave->last_link_up = jiffies;
 	rcu_assign_pointer(bond->current_arp_slave, new_slave);
 	rtnl_unlock();
 
@@ -3060,8 +3057,7 @@ static int bond_open(struct net_device *bond_dev)
 
 	if (bond->params.arp_interval) {  /* arp interval, in milliseconds. */
 		queue_delayed_work(bond->wq, &bond->arp_work, 0);
-		if (bond->params.arp_validate)
-			bond->recv_probe = bond_arp_rcv;
+		bond->recv_probe = bond_arp_rcv;
 	}
 
 	if (bond->params.mode == BOND_MODE_8023AD) {
@@ -4186,10 +4182,6 @@ static int bond_check_params(struct bond_params *params)
 	}
 
 	if (arp_validate) {
-		if (bond_mode != BOND_MODE_ACTIVEBACKUP) {
-			pr_err("arp_validate only supported in active-backup mode\n");
-			return -EINVAL;
-		}
 		if (!arp_interval) {
 			pr_err("arp_validate requires arp_interval\n");
 			return -EINVAL;

+ 11 - 8
drivers/net/bonding/bond_options.c

@@ -47,11 +47,14 @@ static struct bond_opt_value bond_xmit_hashtype_tbl[] = {
 };
 
 static struct bond_opt_value bond_arp_validate_tbl[] = {
-	{ "none",   BOND_ARP_VALIDATE_NONE,   BOND_VALFLAG_DEFAULT},
-	{ "active", BOND_ARP_VALIDATE_ACTIVE, 0},
-	{ "backup", BOND_ARP_VALIDATE_BACKUP, 0},
-	{ "all",    BOND_ARP_VALIDATE_ALL,    0},
-	{ NULL,     -1,                       0},
+	{ "none",		BOND_ARP_VALIDATE_NONE,		BOND_VALFLAG_DEFAULT},
+	{ "active",		BOND_ARP_VALIDATE_ACTIVE,	0},
+	{ "backup",		BOND_ARP_VALIDATE_BACKUP,	0},
+	{ "all",		BOND_ARP_VALIDATE_ALL,		0},
+	{ "filter",		BOND_ARP_FILTER,		0},
+	{ "filter_active",	BOND_ARP_FILTER_ACTIVE,		0},
+	{ "filter_backup",	BOND_ARP_FILTER_BACKUP,		0},
+	{ NULL,			-1,				0},
 };
 
 static struct bond_opt_value bond_arp_all_targets_tbl[] = {
@@ -151,7 +154,8 @@ static struct bond_option bond_opts[] = {
 		.id = BOND_OPT_ARP_VALIDATE,
 		.name = "arp_validate",
 		.desc = "validate src/dst of ARP probes",
-		.unsuppmodes = BOND_MODE_ALL_EX(BIT(BOND_MODE_ACTIVEBACKUP)),
+		.unsuppmodes = BIT(BOND_MODE_8023AD) | BIT(BOND_MODE_TLB) |
+			       BIT(BOND_MODE_ALB),
 		.values = bond_arp_validate_tbl,
 		.set = bond_option_arp_validate_set
 	},
@@ -809,8 +813,7 @@ int bond_option_arp_interval_set(struct bonding *bond,
 			cancel_delayed_work_sync(&bond->arp_work);
 		} else {
 			/* arp_validate can be set only in active-backup mode */
-			if (bond->params.arp_validate)
-				bond->recv_probe = bond_arp_rcv;
+			bond->recv_probe = bond_arp_rcv;
 			cancel_delayed_work_sync(&bond->mii_work);
 			queue_delayed_work(bond->wq, &bond->arp_work, 0);
 		}

+ 17 - 9
drivers/net/bonding/bonding.h

@@ -188,8 +188,9 @@ struct slave {
 	struct net_device *dev; /* first - useful for panic debug */
 	struct bonding *bond; /* our master */
 	int    delay;
-	unsigned long jiffies;
-	unsigned long last_arp_rx;
+	/* all three in jiffies */
+	unsigned long last_link_up;
+	unsigned long last_rx;
 	unsigned long target_last_arp_rx[BOND_MAX_ARP_TARGETS];
 	s8     link;    /* one of BOND_LINK_XXXX */
 	s8     new_link;
@@ -342,6 +343,11 @@ static inline bool bond_is_active_slave(struct slave *slave)
 #define BOND_ARP_VALIDATE_BACKUP	(1 << BOND_STATE_BACKUP)
 #define BOND_ARP_VALIDATE_ALL		(BOND_ARP_VALIDATE_ACTIVE | \
 					 BOND_ARP_VALIDATE_BACKUP)
+#define BOND_ARP_FILTER			(BOND_ARP_VALIDATE_ALL + 1)
+#define BOND_ARP_FILTER_ACTIVE		(BOND_ARP_VALIDATE_ACTIVE | \
+					 BOND_ARP_FILTER)
+#define BOND_ARP_FILTER_BACKUP		(BOND_ARP_VALIDATE_BACKUP | \
+					 BOND_ARP_FILTER)
 
 static inline int slave_do_arp_validate(struct bonding *bond,
 					struct slave *slave)
@@ -349,6 +355,12 @@ static inline int slave_do_arp_validate(struct bonding *bond,
 	return bond->params.arp_validate & (1 << bond_slave_state(slave));
 }
 
+static inline int slave_do_arp_validate_only(struct bonding *bond,
+					     struct slave *slave)
+{
+	return bond->params.arp_validate & BOND_ARP_FILTER;
+}
+
 /* Get the oldest arp which we've received on this slave for bond's
  * arp_targets.
  */
@@ -368,14 +380,10 @@ static inline unsigned long slave_oldest_target_arp_rx(struct bonding *bond,
 static inline unsigned long slave_last_rx(struct bonding *bond,
 					struct slave *slave)
 {
-	if (slave_do_arp_validate(bond, slave)) {
-		if (bond->params.arp_all_targets == BOND_ARP_TARGETS_ALL)
-			return slave_oldest_target_arp_rx(bond, slave);
-		else
-			return slave->last_arp_rx;
-	}
+	if (bond->params.arp_all_targets == BOND_ARP_TARGETS_ALL)
+		return slave_oldest_target_arp_rx(bond, slave);
 
-	return slave->dev->last_rx;
+	return slave->last_rx;
 }
 
 #ifdef CONFIG_NET_POLL_CONTROLLER

+ 1 - 7
include/linux/netdevice.h

@@ -1312,13 +1312,7 @@ struct net_device {
 /*
  * Cache lines mostly used on receive path (including eth_type_trans())
  */
-	unsigned long		last_rx;	/* Time of last Rx
-						 * This should not be set in
-						 * drivers, unless really needed,
-						 * because network stack (bonding)
-						 * use it if/when necessary, to
-						 * avoid dirtying this cache line.
-						 */
+	unsigned long		last_rx;	/* Time of last Rx */
 
 	/* Interface address info used in eth_type_trans() */
 	unsigned char		*dev_addr;	/* hw address, (before bcast