MS KB Q186775:《Tips for Windows NT Driver Developers -- Things to Avoid》一文(​​中译版​​)罗列了很多驱动开发过程中的注意点。微软仅仅提到了这些注意点,但是没有解释其背后的原因,只能结合网上的资料+自己分析其原因。我在osronline上看到有人针对item 20:"Never call IoCompleteRequest while holding a spin lock. It can deadlock your system."​​提问​​,所以就记录自此。

问题描述:

"I'm done exactly this inside my DPC to complete pending IRPs which I store in a list (which is protected by a spinlock). It's not too much of a problem to rewrite, but I'd still like to know what could possibly happen. "

相比于回复2,回复1的解释不具有很强的说服力,但还是有一定的参考价值.

回复1:

"There are two issues here. One: You're holding a spinlock, most commonly your IRP queue spinlock, and you call IoCR. (Or IoCallDriver, which could result in an immediate IoCR.) Completion of the IRP could cause an upper layer's IO completion routine to send you another IRP, before your call to IoCompleteRequest returns -- i.e., before you can drop the spinlock in that path. Your dispatch routine for the new IRP has no idea that it's been called "within" your call to IoCR, so tries to acquire the same spinlock to synch with your IRP state info... deadlock. This is difficult (no, not impossible) to cover via "I know what I'm doing". (One way to cover it, of course, is to build your own "VMS-style spinlock" that can tolerate multiple acquisitions by the same CPU. :-) Two - and this is much softer - the doc actually goes much farther, claiming you should never *call out of the driver* while holding ANY spinlock(特意加粗并翻译一下:回复者意思是不要带着自旋锁离开自己的驱动程序). Now before you say "I don't want to hear it", I agree, this is obviously bogus -- the counterexample of KeReleaseSpinLock comes immediately to mind! The real issue is that they don't want us doing things that can take an unknown, unpredictable, possibly-a-lot-longer-than-expected, etc., period of time, while we hold a spinlock. This seems to me to be a valid concern, and IoCompleteRequest, cruising as it does through any number of drivers' I/O completion routines, seems to me to be in that category. "

回复2:

"IoCompleteRequest will call some arbitrary completion routines of the upper drivers. These routines can be complex, and calling complex code while holding a spinlock is a bad idea. Also - these routines can decide to submit some IRP (maybe the same IRP) to your driver's dispatch entry points. If these entry points also acquire the same spinlock - you will have a deadlock on the SMP kernel."

一般在IOCancel回调函数中容易出现这样的问题,如下:

VOID PtDriverCancelIRP(IN PDEVICE_OBJECT DeviceObject, IN PIRP Irp ) 
{
UNREFERENCED_PARAMETER(DeviceObject);

KdPrint(( "[WENZ] User Message Cancel Irp....\n" ));

if ( Irp == PeddingIRP)
PeddingIRP = NULL;

Irp->IoStatus.Status = STATUS_CANCELLED;
Irp->IoStatus.Information = 0;
IoCompleteRequest(Irp,IO_NO_INCREMENT);
}

...

NTSTATUS
DeviceControl( PDEVICE_OBJECT DeviceObject, PIRP Irp )
{
...
switch ( irpSp->Parameters.DeviceIoControl.IoControlCode )
{
...
case IOCTL_NOTIFY_STATE:
Irp->IoStatus.Information = 0;
Irp->IoStatus.Status = STATUS_PENDING;
IoMarkIrpPending(Irp);
PeddingIRP = Irp;
IoSetCancelRoutine(Irp, PtDriverCancelIRP);

return STATUS_PENDING;
...
}
...
}

这段代码可能会触发蓝屏错误:DRIVER_RETURNED_HOLDING_CANCEL_LOCK。cancel回调要尽快释放自旋锁,然而,代码中没有这样做,并调用了IoCompleteRequest,需要改为如下:

VOID PtDriverCancelIRP(IN PDEVICE_OBJECT DeviceObject, IN PIRP Irp ) 
{
UNREFERENCED_PARAMETER(DeviceObject);

KdPrint(( "[WENZ] User Message Cancel Irp....\n" ));

if ( Irp == PeddingIRP)
PeddingIRP = NULL;

IoReleaseCancelSpinLock(); // release the cancel spinlock

Irp->IoStatus.Status = STATUS_CANCELLED;
Irp->IoStatus.Information = 0;
IoCompleteRequest(Irp,IO_NO_INCREMENT);
}