BUG() in if_sdio_handle_cmd()

Dan Williams dcbw at redhat.com
Fri Dec 25 14:33:13 EST 2009


On Fri, 2009-12-18 at 22:31 -0800, Deepak Saxena wrote:
> 
> We (OLPC) are working on stress testing of the suspend/resume path and
> in the process hit the "BUG_ON(priv->resp_len[i])" in if_sdio_handle_command().
> This can easily be reproduced by  a suspend/resume loop while
> running a ping from a host to the unit under test. From my reading
> of this code, what this looks like some sort of race where we're getting 
> response before we have completed processing the previous one (as
> the driver can only handle 1 response at a time from my reading of 
> the code). We first saw this issue when the host -> libertas ping 
> was a flood but we can also reproduce it using a non-flood ping. 

The driver only handles one command at a time because the firmware spec
also says that the firmware only handles one command at a time.  But
this issue would indicate some issues in the driver lock somewhere.
Since priv->resp_len is supposed to be protected by the driver lock.

> If we change the debug level to 0x404000, this BUG() goes away,
> though we then start seeing other issues (tx timeouts and
> warnings due to list corruption when queueing commands). 
> If we set the debug level to either 0x400000 or 0x4000 we
> do hit the bug.
> 
> A bug like this was reported last year by Jeff Sutherland 
> (http://lists.infradead.org/pipermail/libertas-dev/2008-October/001994.html)
> and it appears that there was no resolution in terms of a fix to driver 
> of firmware (at least not one that is publically posted). Was a
> solution found and not posted or is this still an unresolved
> issue? 

I don't think a solution was found because I couldn't really reproduce
it, but that could be due to lack of heavy stress-testing of the driver
itself.  The SDIO code is also somewhat less stress-tested than the USB
code obviously since it's quite a bit newer and hasn't had millions of
deployed units :(

Can you take a quick look at the locking code around priv->resp_len and
see if anything jumps out at you?  The driver's locking and threading
structure could use a bit of a cleanup surely.

Dan


> The OLPC bug can be seen at http://dev.laptop.org/ticket/9836.
> 
> [  164.040152] Kernel BUG at cb9c970f [verbose debug info unavailable]          
> [  164.040152] invalid opcode: 0000 [#1] PREEMPT                                
> [  164.040152] last sysfs file: /sys/power/state                                
> [  164.040152] Modules linked in: fuse uinput videobuf_dma_contig videobuf_core 
> mousedev psmouse serio_raw libertas_sdio libertas lib80211 [last unloaded: scsi_
> wait_scan]                                                                      
> [  164.040152]                                                                  
> [  164.040152] Pid: 2649, comm: ksdioirqd/mmc1 Not tainted (2.6.31.6 #1) XO     
> [  164.040152] EIP: 0060:[<cb9c970f>] EFLAGS: 00010002 CPU: 0                   
> [  164.040152] EIP is at if_sdio_interrupt+0x3f7/0x7c8 [libertas_sdio]          
> [  164.040152] EAX: 0000075d EBX: caf40000 ECX: ca78c340 EDX: 00000001          
> [  164.040152] ESI: ca78c340 EDI: 0000004c EBP: ca3bdf78 ESP: ca3bdf50          
> [  164.040152]  DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068                    
> [  164.040152] Process ksdioirqd/mmc1 (pid: 2649, ti=ca3bd000 task=cad30500 task
> .ti=ca3bd000)                                                                   
> [  164.040152] Stack:                                                           
> [  164.040152]  00000000 00000286 ca78c340 0000004c 00000000 00000000 00000000 0
> 0000000                                                                         
> [  164.040152] <0> c99a1318 7fffffff ca3bdfac b0674543 c99bc080 00000000 c99bc21
> 4 00000001                                                                      
> [  164.040152] <0> c99a1318 00000001 00000001 02000000 cac83db4 c99bc080 b067445
> f ca3bdfe0                                                                      
> [  164.040152] Call Trace:                                                      
> [  164.040152]  [<b0674543>] ? sdio_irq_thread+0xe4/0x1d1                       
> [  164.040152]  [<b067445f>] ? sdio_irq_thread+0x0/0x1d1                        
> [  164.040152]  [<b0432849>] ? kthread+0x6d/0x72                                
> [  164.040152]  [<b04327dc>] ? kthread+0x0/0x72                                 
> [  164.040152]  [<b0403103>] ? kernel_thread_helper+0x7/0x10                    
> [  164.040152] Code: 98 1d 00 00 e8 46 1c d9 e4 31 d2 8b 4d e0 89 45 dc 8b 45 e0
>  80 b8 68 0b 00 00 00 0f 94 c2 8d 82 5c 07 00 00 83 7c 81 0c 00 74 04 <0f> 0b eb
>  fe 8b 75 e0 8b 4d e4 89 4c 86 0c 69 c2 08 09 00 00 8b                          
> [  164.040152] EIP: [<cb9c970f>] if_sdio_interrupt+0x3f7/0x7c8 [libertas_sdio] S
> S:ESP 0068:ca3bdf50   
> 
> 
> _______________________________________________
> libertas-dev mailing list
> libertas-dev at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/libertas-dev




More information about the libertas-dev mailing list