problem with default local port(nl_pid) when netlink used both via libnl and directly in same application

Laine Stump laine at redhat.com
Mon May 7 11:47:53 EDT 2012


On 05/07/2012 10:39 AM, Brett Ciphery wrote:
> [Re: problem with default local port(nl_pid) when netlink used both via libnl and directly in same application] On 07/05/2012 (Mon 08:53) Thomas Graf wrote:
>
>> On Mon, May 07, 2012 at 05:05:32AM -0400, Laine Stump wrote:
>>> I've just diagnosed a problem in libvirt that traces back to libnl's
>>> unilateral decision to use getpid() of the calling process as the
>>> default "local port" (nl_pid) for the first netlink socket it creates
>>> for each process.
>>>
>>> The problem is that this is also the default value used it a piece of
>>> code running in that process uses direct system calls to create/bind a
>>> netlink socket. In our example, this was the result of calling glibc's
>>> getaddrinfo() function, so we weren't even aware that it was happening.
>>> Even though getaddrinfo() only keeps its netlink socket connected for a
>>> short period, if that is running in a separate thread from the thread
>>> that calls nl_handle_alloc()/nl_connect(), the result will be that the
>>> bind() in nl_connect() fails with EADDRINUSE.
>>>
> Hey,
>
> I'll add one more situation where this symptom might pop up...
>
> If nl_socket_alloc() does run first and thus its bind() is successful

To be more specific, nl_socket_socket() only decides what port to use;
it's nl_connect() that actually calls bind(), and so must be called
before any other direct bind() of a netlink socket.

> with the process pid, if that process then forks it will inherit this
> fd.  If later nl_socket_free() is called and subsequently
> nl_socket_alloc(), the inherited fd will still be open in the other
> process and this will cause nl_socket_alloc() to produce an EADDRINUSE.
>
> A workaround is to close these fd references in the new process but it
> would be quite useful if nl_socket_alloc() was more robust -- of course
> no easy task given backwards compatibility.

I did notice during my investigation the nl_connect() will automatically
set SOCK_CLOEXEC in the call to socket(2) (if it's defined on the
platform), so this will be taken care of for you. (hopefully nobody will
ever need a libnl-created socket to be maintained across a fork)

I'm not sure what platforms, if any, don't have SOCK_CLOEXEC, but
fortunately all the platforms I'm concerned about do :-)



More information about the libnl mailing list