[Bug report] Unable to remove 'default' routes using the CLI tools

Roopa Prabhu roopa at cumulusnetworks.com
Tue Mar 19 16:37:21 EDT 2013


Problem:
tony's testcase uses nl-route-add. nl-route-add does not use any 
exclusive flags, so kernel will append the route (I can reproduce the 
problem with similar routes added via 'ip route append').

And for such routes with 'no NLM_F_EXCL' or 'having NLM_F_APPEND', 
kernel also checks for next hop match. However, libnl does not check for 
next hop during lookup. Resulting in updating (add/remove) existing object.

solution:
Fix libnl to include additional attributes in the hash key depending on 
'message type + msg flags'.
ie In this case, use next hop info during lookup when ('msgtype == 
RTM_NEWROUTE  && (msgflag == NLM_F_CREATE || 'msgflag & NLM_F_APPEND' || 
'and more if needed'))

I can submit a libnl patch for this. However, the problem is, in this 
particular case (looking at the latest kernel), kernel does not send the 
required msgflags in the RTM_NEWROUTE notification msg.
I will submit a kernel patch to fix this in the future.

This will be available in the next kernel release.

So, for the current problem, a few things to note here,

- Its unclear why this would work before the hash implementation went 
in, because even the linear search algorithm used before searched for 
similar attributes (ROUTE_ATTR_FAMILY | ROUTE_ATTR_TOS |
                                    ROUTE_ATTR_TABLE | ROUTE_ATTR_DST).
I am able to reproduce it even without the hash implementation.


- If there is any problem with the hash implementation, i am willing to 
submit a patch to make the hash implementation optional. Settable via an 
api until I fix the real issue. However, for now it does not seem like 
that is the case. unless somebody confirms

- In this particular test case, multiple default ipv4 routes via the 
same interface, results in kernel picking up the first one. Unclear 
where this is really useful. So does not seem like a critical bug.

- We should probably add NLM_F_EXCL flag in nl-route-add.


Thanks,
Roopa


On 3/19/13 7:20 AM, Roopa Prabhu wrote:
> On 3/19/13 4:38 AM, Roopa Prabhu wrote:
>> Hi Tony,
>>
>> Looks like The default routes that you are trying to add end up having
>> the same key attributes (ROUTE_ATTR_FAMILY | ROUTE_ATTR_TOS |
>> ROUTE_ATTR_TABLE | ROUTE_ATTR_DST |
>> ROUTE_ATTR_PRIO).
>> And they are hashing to the same route. Your old default route ends up
>> replacing the ones you added.
>>
>> One thing that comes to my mind is, we could store the new routes in the
>> same bucket instead of knocking off the old routes. But I want to be
>> sure that route replace by the kernel also works in this case. Still
>> looking at it.
>>
>> Have you been able to add these routes successfully using 'ip route' ?
>> (I had some problems).
>
> This works with 'ip route append'. unlike nl-route-add, 'ip route add'
> adds the NLM_F_EXCL flag.
>
>> What command line are you using ?.
>>
>>
>> Thanks,
>> Roopa
>>
>>
>>
>>
>> On 3/18/13 3:00 PM, Roopa Prabhu wrote:
>>> Thanks tony for the commit id.
>>>
>>> I will try to reproduce your problem today.
>>>
>>> On 3/18/13 10:32 AM, Tony Cheneau wrote:
>>>> Hello,
>>>>
>>>> I just ran a git-bisect and can new affirm that the issue has been
>>>> introduced with commit a2207c7beb80050671d209650aaaeba429658e49 (just
>>>> before version 3.2.15). I joined a patch, that disable the issue for
>>>> now, but I don't believe it should be applied to the tree, as it does
>>>> not fix the real problem. Unfortunately, I don't have time or in-depth
>>>> knowledge of the code to actually fix the issue properly.
>>>>
>>>> Please note that I was only interested in the "route" submodule, so it
>>>> is the only one I tested. It could also affect some other submodules
>>>> (given previous commits seem to affect similar portions of the code in
>>>> those others submodules).
>>>>
>>>> I'd be interested to know if I'm the only one having this issue.
>>>>
>>>> Regards,
>>>> Tony
>>>>
>>>> P.-S.: I cc'ed Roopa, because both of us had a private email exchange
>>>> about this topic.
>>>>
>>>> Le 2013-02-05 23:43, Tony Cheneau a écrit :
>>>>> Hello,
>>>>>
>>>>> For my own purposes, I'm building a bunch of python wrappers around
>>>>> libraries exposed by CLI tools (that is the functions exposed in the
>>>>> netlink/cli directory). To be more precise, I'm using cython to glue
>>>>> the libnl C code within a python wrapper. While performing my tests, I
>>>>> found out a odd behavior when playing with default routes: I could not
>>>>> always remove them. To me, it qualifies as a bug, but it might be an
>>>>> intended behavior. This issue is also present with the nl-route-delete
>>>>> binary, so I'll use this tool for my bug report. Finally, I must add
>>>>> that I'm using the current git tree tip.
>>>>>
>>>>> So, first step is to add multiple default routes (lines starting with
>>>>> # are indicates the command I inputted):
>>>>> # ./nl-route-add -d default -n via=10.0.0.3,dev=eth0
>>>>> Added inet default via 10.0.0.3 dev eth0
>>>>> # ./nl-route-add -d default -n via=10.0.0.2,dev=eth0
>>>>> Added inet default via 10.0.0.2 dev eth0
>>>>> # ./nl-route-add -d default -n via=10.0.0.1,dev=eth0
>>>>> Added inet default via 10.0.0.1 dev eth0
>>>>>
>>>>> Then I try listing current routes:
>>>>> # ./nl-route-list|grep default
>>>>> inet default table main type unicast via 10.0.20.200 dev eth0
>>>>> inet6 default table unspec type unreachable via dev lo
>>>>>
>>>>> I find it weird enough that the route does not show up. Only the
>>>>> initial route to my default gateway shows up (it was set before I run
>>>>> any of the previous commands).
>>>>> However, if I try adding a route again, I obtain the following
>>>>> message (that would indicate that the previous commands went through):
>>>>> # ./nl-route-add -d default -n via=10.0.0.1,dev=eth0
>>>>> Error: Unable to add route: Object exists
>>>>>
>>>>> Now, this is the part that motivates me to write this bug report, if
>>>>> I try to remove the default routes using symmetrical command
>>>>> "nl-route-delete", I obtain the following:
>>>>> # ./nl-route-delete -d default -n via=10.0.0.3,dev=eth0
>>>>> Deleted 0 routes
>>>>> # ./nl-route-delete -d default -n via=10.0.0.2,dev=eth0
>>>>> Deleted 0 routes
>>>>> # ./nl-route-delete -d default -n via=10.0.0.1,dev=eth0
>>>>> Deleted 0 routes
>>>>>
>>>>> However, if I remove the next-hop selector, it works (I omitted the
>>>>> part when I removed my default gateway, because I lost connection on
>>>>> this one and could not retrieve the output):
>>>>> # ./nl-route-delete -d default
>>>>> Deleted inet default table main type unicast via 10.0.0.3 dev eth0
>>>>> Deleted 1 routes
>>>>> # ./nl-route-delete -d default
>>>>> Deleted inet default table main type unicast via 10.0.0.2 dev eth0
>>>>> Deleted 1 routes
>>>>> # ./nl-route-delete -d default
>>>>> Deleted inet default table main type unicast via 10.0.0.1 dev eth0
>>>>> Deleted 1 routes
>>>>>
>>>>> It also works if I reverse the ordering when removing routes (when no
>>>>> default gateway is set):
>>>>> # ./nl-route-add -d default -n via=10.0.0.3,dev=eth0
>>>>> Added inet default via 10.0.0.3 dev eth0
>>>>> # ./nl-route-add -d default -n via=10.0.0.2,dev=eth0
>>>>> Added inet default via 10.0.0.2 dev eth0
>>>>> # ./nl-route-add -d default -n via=10.0.0.1,dev=eth0
>>>>> Added inet default via 10.0.0.1 dev eth0
>>>>> # ./nl-route-delete -d default -n via=10.0.0.1,dev=eth0
>>>>> Deleted 0 routes
>>>>> # ./nl-route-delete -d default -n via=10.0.0.3
>>>>> Deleted inet default table main type unicast via 10.0.0.3 dev eth0
>>>>> Deleted 1 routes
>>>>> # ./nl-route-delete -d default -n via=10.0.0.2
>>>>> Deleted inet default table main type unicast via 10.0.0.2 dev eth0
>>>>> Deleted 1 routes
>>>>> # ./nl-route-delete -d default -n via=10.0.0.1
>>>>> Deleted inet default table main type unicast via 10.0.0.1 dev eth0
>>>>> Deleted 1 routes
>>>>>
>>>>> The same phenomena seems to occur when using IPv6 addresses.
>>>>>
>>>>> (As a side note, I wanted to add that I'm aware of the various python
>>>>> bindings existing for libnl, but, to the best of my knowledge, none of
>>>>> them seems to control the netlink "route" sub-system that I need.)
>>>>>
>>>>> Regards,
>>>>> Tony Cheneau
>>>>>
>>>>> _______________________________________________
>>>>> libnl mailing list
>>>>> libnl at lists.infradead.org
>>>>> http://lists.infradead.org/mailman/listinfo/libnl
>>>
>>
>>
>> _______________________________________________
>> libnl mailing list
>> libnl at lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/libnl
>
>
> _______________________________________________
> libnl mailing list
> libnl at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/libnl




More information about the libnl mailing list