high volume reliability in libnl

Thomas Graf tgraf at infradead.org
Tue Dec 13 09:28:42 EST 2011


On Mon, Dec 12, 2011 at 04:28:12PM -0500, Brett Ciphery wrote:
> I see mentioned in the netlink man page that it is not a reliable
> protocol and if receiving messages from the kernel, it should detect an
> ENOBUFS error and resychronize.

Right. However, the only situation in which a netlink message can be
lost is if the kernel runs out of memory.

The kernel used to return a message of type NLMSG_OVERRUN if a message
was lost but this is no longer the case. The overrun case is now also
handled using socket errors.

Said socket error (ENOBUFS) is also set if a notification can't be send
due to memory pressure. If this is the case, all listeners of the
notification group are walked and a socket error is attached to indicate
that a message was lost.

This behaviour can be disabled by setting the NETLINK_NO_ENOBUFS socket
option on the socket in which case the socket will not be notified.

> Is this limitation handled within
> libnl?  I've been poking around the git tree (head 4a7791e) and it seems
> to return an error in this case (nl.c:494), which calling functions
> don't _appear_ to handle and resubmit.

Right. This is currently not handled by libnl automatically.

> Could you confirm my suspicion?  I may submit a patch in the near future
> to handle this.

Confirmed. Feel free to propose such behaviour and I'll be happy to
include such as patch.



More information about the libnl mailing list