Skip to content

Commit 273e129

Browse files
FWei-HWgregkh
authored andcommitted
scsi: fix race between simultaneous decrements of ->host_failed
commit 72d8c36ec364c82bf1bf0c64dfa1041cfaf139f7 upstream. sas_ata_strategy_handler() adds the works of the ata error handler to system_unbound_wq. This workqueue asynchronously runs work items, so the ata error handler will be performed concurrently on different CPUs. In this case, ->host_failed will be decreased simultaneously in scsi_eh_finish_cmd() on different CPUs, and become abnormal. It will lead to permanently inequality between ->host_failed and ->host_busy, and scsi error handler thread won't start running. IO errors after that won't be handled. Since all scmds must have been handled in the strategy handler, just remove the decrement in scsi_eh_finish_cmd() and zero ->host_busy after the strategy handler to fix this race. Fixes: 50824d6 ("[SCSI] libsas: async ata-eh") Signed-off-by: Wei Fang <fangwei1@huawei.com> Reviewed-by: James Bottomley <jejb@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
1 parent cf2a2c6 commit 273e129

3 files changed

Lines changed: 10 additions & 4 deletions

File tree

Documentation/scsi/scsi_eh.txt

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -263,19 +263,23 @@ scmd->allowed.
263263

264264
3. scmd recovered
265265
ACTION: scsi_eh_finish_cmd() is invoked to EH-finish scmd
266-
- shost->host_failed--
267266
- clear scmd->eh_eflags
268267
- scsi_setup_cmd_retry()
269268
- move from local eh_work_q to local eh_done_q
270269
LOCKING: none
270+
CONCURRENCY: at most one thread per separate eh_work_q to
271+
keep queue manipulation lockless
271272

272273
4. EH completes
273274
ACTION: scsi_eh_flush_done_q() retries scmds or notifies upper
274-
layer of failure.
275+
layer of failure. May be called concurrently but must have
276+
a no more than one thread per separate eh_work_q to
277+
manipulate the queue locklessly
275278
- scmd is removed from eh_done_q and scmd->eh_entry is cleared
276279
- if retry is necessary, scmd is requeued using
277280
scsi_queue_insert()
278281
- otherwise, scsi_finish_command() is invoked for scmd
282+
- zero shost->host_failed
279283
LOCKING: queue or finish function performs appropriate locking
280284

281285

drivers/ata/libata-eh.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -606,7 +606,7 @@ void ata_scsi_error(struct Scsi_Host *host)
606606
ata_scsi_port_error_handler(host, ap);
607607

608608
/* finish or retry handled scmd's and clean up */
609-
WARN_ON(host->host_failed || !list_empty(&eh_work_q));
609+
WARN_ON(!list_empty(&eh_work_q));
610610

611611
DPRINTK("EXIT\n");
612612
}

drivers/scsi/scsi_error.c

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1127,7 +1127,6 @@ static int scsi_eh_action(struct scsi_cmnd *scmd, int rtn)
11271127
*/
11281128
void scsi_eh_finish_cmd(struct scsi_cmnd *scmd, struct list_head *done_q)
11291129
{
1130-
scmd->device->host->host_failed--;
11311130
scmd->eh_eflags = 0;
11321131
list_move_tail(&scmd->eh_entry, done_q);
11331132
}
@@ -2226,6 +2225,9 @@ int scsi_error_handler(void *data)
22262225
else
22272226
scsi_unjam_host(shost);
22282227

2228+
/* All scmds have been handled */
2229+
shost->host_failed = 0;
2230+
22292231
/*
22302232
* Note - if the above fails completely, the action is to take
22312233
* individual devices offline and flush the queue of any

0 commit comments

Comments
 (0)