--- zzzz-none-000/linux-3.10.107/Documentation/scsi/scsi_eh.txt 2017-06-27 09:49:32.000000000 +0000 +++ scorpion-7490-727/linux-3.10.107/Documentation/scsi/scsi_eh.txt 2021-02-04 17:41:59.000000000 +0000 @@ -42,20 +42,14 @@ Once LLDD gets hold of a scmd, either the LLDD will complete the command by calling scsi_done callback passed from midlayer when -invoking hostt->queuecommand() or SCSI midlayer will time it out. +invoking hostt->queuecommand() or the block layer will time it out. [1-2-1] Completing a scmd w/ scsi_done For all non-EH commands, scsi_done() is the completion callback. It -does the following. - - 1. Delete timeout timer. If it fails, it means that timeout timer - has expired and is going to finish the command. Just return. - - 2. Link scmd to per-cpu scsi_done_q using scmd->en_entry - - 3. Raise SCSI_SOFTIRQ +just calls blk_complete_request() to delete the block layer timer and +raise SCSI_SOFTIRQ SCSI_SOFTIRQ handler scsi_softirq calls scsi_decide_disposition() to determine what to do with the command. scsi_decide_disposition() @@ -64,10 +58,12 @@ - SUCCESS scsi_finish_command() is invoked for the command. The - function does some maintenance choirs and notify completion by - calling scmd->done() callback, which, for fs requests, would - be HLD completion callback - sd:sd_rw_intr, sr:rw_intr, - st:st_intr. + function does some maintenance chores and then calls + scsi_io_completion() to finish the I/O. + scsi_io_completion() then notifies the block layer on + the completed request by calling blk_end_request and + friends or figures out what to do with the remainder + of the data in case of an error. - NEEDS_RETRY - ADD_TO_MLQUEUE @@ -86,33 +82,45 @@ 1. invokes optional hostt->eh_timed_out() callback. Return value can be one of - - EH_HANDLED - This indicates that eh_timed_out() dealt with the timeout. The - scmd is passed to __scsi_done() and thus linked into per-cpu - scsi_done_q. Normal command completion described in [1-2-1] - follows. + - BLK_EH_HANDLED + This indicates that eh_timed_out() dealt with the timeout. + The command is passed back to the block layer and completed + via __blk_complete_requests(). + + *NOTE* After returning BLK_EH_HANDLED the SCSI layer is + assumed to be finished with the command, and no other + functions from the SCSI layer will be called. So this + should typically only be returned if the eh_timed_out() + handler raced with normal completion. - - EH_RESET_TIMER + - BLK_EH_RESET_TIMER This indicates that more time is required to finish the command. Timer is restarted. This action is counted as a retry and only allowed scmd->allowed + 1(!) times. Once the - limit is reached, action for EH_NOT_HANDLED is taken instead. + limit is reached, action for BLK_EH_NOT_HANDLED is taken instead. - *NOTE* This action is racy as the LLDD could finish the scmd - after the timeout has expired but before it's added back. In - such cases, scsi_done() would think that timeout has occurred - and return without doing anything. We lose completion and the - command will time out again. - - - EH_NOT_HANDLED - This is the same as when eh_timed_out() callback doesn't exist. + - BLK_EH_NOT_HANDLED + eh_timed_out() callback did not handle the command. Step #2 is taken. + 2. If the host supports asynchronous completion (as indicated by the + no_async_abort setting in the host template) scsi_abort_command() + is invoked to schedule an asynchrous abort. If that fails + Step #3 is taken. + 2. scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD) is invoked for the command. See [1-3] for more information. +[1-3] Asynchronous command aborts + + After a timeout occurs a command abort is scheduled from + scsi_abort_command(). If the abort is successful the command + will either be retried (if the number of retries is not exhausted) + or terminated with DID_TIME_OUT. + Otherwise scsi_eh_scmd_add() is invoked for the command. + See [1-4] for more information. -[1-3] How EH takes over +[1-4] How EH takes over scmds enter EH via scsi_eh_scmd_add(), which does the following. @@ -164,7 +172,7 @@ - eh_strategy_handler() callback This is one big callback which should perform whole error - handling. As such, it should do all choirs SCSI midlayer + handling. As such, it should do all chores the SCSI midlayer performs during recovery. This will be discussed in [2-2]. Once recovery is complete, SCSI EH resumes normal operation by @@ -324,7 +332,8 @@ <> - This action is taken for each timed out command. + This action is taken for each timed out command when + no_async_abort is enabled in the host template. hostt->eh_abort_handler() is invoked for each scmd. The handler returns SUCCESS if it has succeeded to make LLDD and all related hardware forget about the scmd. @@ -423,7 +432,7 @@ scsi_unjam_host() and it is responsible for whole recovery process. On completion, the handler should have made lower layers forget about all failed scmds and either ready for new commands or offline. Also, -it should perform SCSI EH maintenance choirs to maintain integrity of +it should perform SCSI EH maintenance chores to maintain integrity of SCSI midlayer. IOW, of the steps described in [2-1-2], all steps except for #1 must be implemented by eh_strategy_handler().