tx watch dog timeout on resume kills device
Dan Williams
dcbw at redhat.com
Thu Apr 21 11:41:15 EDT 2011
On Tue, 2011-04-19 at 19:19 +0100, Daniel Drake wrote:
> Hi,
>
> At http://dev.laptop.org/ticket/10748 we're seeing libertas sd8686
> dying occasionally during resume.
>
> [ 885.737199] Restarting tasks ... done.
> [ 891.020099] libertas: tx watch dog timeout
> [ 894.030042] libertas: command 0x000b timed out
> [ 894.034676] libertas: Timeout submitting command 0x000b
> [ 894.040554] libertas: PREP_CMD: command 0x000b failed: -11
> [ 896.010255] libertas: tx watch dog timeout
> [ 899.020103] libertas: command 0x001f timed out
> [ 899.024664] libertas: Timeout submitting command 0x001f
> [ 899.030530] ------------[ cut here ]------------
> [ 899.035468] WARNING: at lib/list_debug.c:30 __list_add+0x44/0x5a()
>
> (the list corruption triggered by this failure must be another issue)
>
> I'm still trying to figure out if there is some conflict in command
> sequencing with the 0x1f GET_RSSI command submitted upon the timeout,
> and 0xb which seems to be submitted by lbs_get_wireless_stats
> (unfortunately enabling debug messages seems to avoid the issue)
>
> We're also on 2.6.35; newer kernels don't submit the GET_RSSI command
> so we'll be sure to test the latest code as well.
>
> In the mean time, lbs_tx_timeout() seems a bit suspect. It would be
> good to get some eyes on it.
>
> I don't understand what this does:
> dev->trans_start = jiffies; /* prevent tx timeout */
Yeah, I have no idea what's going on there; that code has been there for
a while I think. This part might be due to a rewrite. The tx_feedback
stuff is purely for radiotap which we used to use for monitoring and
other stuff. That code was always a bit questionable to me, perhaps
there are better ways of doing this now? I haven't looked at what other
drivers do.
But the core issue is what should we do when the card fails to TX a
frame within the timeout? Is the card really dead? Or does it just
need more time?
Dan
> And the work done by lbs_send_tx_feedback() seems odd (we RX a
> being-transmitted packet? Can't see any other driver that does this)
>
> Is calling lbs_host_to_card_done() here likely to screw with any
> pending commands?
>
> Finally, how are TX timeouts detected by the network layer? I guess it
> could be confused because of time elapsed during suspend? It seems
> suspect that we receive a timeout immediately upon resume.
>
> Thanks,
> Daniel
>
> _______________________________________________
> libertas-dev mailing list
> libertas-dev at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/libertas-dev
More information about the libertas-dev
mailing list