The idea is to take some real bugs which have been present at some point in Python's asyncio lock and model the problems and the fixes iteratively.
step1.tla models this old version of Python's asyncio lock. The expectation was to get one of the issues mentioned in https://bugs.python.org/issue27585 but instead we get the following error trace:
1: Init
/\ waiters = <<>>
/\ lockState = unlocked
/\ taskStates = (Task1 :> running @@ Task2 :> running @@ Task3 :> running)
2: Cancel
/\ waiters = <<>>
/\ lockState = unlocked
/\ taskStates = (Task1 :> running @@ Task2 :> canceled @@ Task3 :> running)
3: Acquire
/\ waiters = <<>>
/\ lockState = locked
/\ taskStates = (Task1 :> acquired @@ Task2 :> canceled @@ Task3 :> running)
4: Cancel
/\ waiters = <<>>
/\ lockState = locked
/\ taskStates = (Task1 :> canceled @@ Task2 :> canceled @@ Task3 :> running)
5: Acquire
/\ waiters = <<Task3>>
/\ lockState = locked
/\ taskStates = (Task1 :> canceled @@ Task2 :> canceled @@ Task3 :> waiting)
6: Stuttering
What this shows is that we can get a deadlock by cancelling the task which holds the lock. This is something to be aware of, but from the asyncio library's perspective, there is nothing which can be done. Libraries using structured concurrency approaches like trio seem like a natural fit to handle cases like this (python-trio/trio#182).
The bug we where originally interested in happens when a waiting task is
canceled. To get to this issue, in step2.tla we restrict the
Cancel
action to only cancel tasks which are in the waiting
state:
Cancel(task) ==
+ \* Only cancel waiting tasks.
+ /\ taskStates[task] = waiting
/\ taskStates' = [taskStates EXCEPT ![task] = canceled]
/\ UNCHANGED <<waiters, lockState>>
This works, the error trace shows the problem:
1: Init
/\ waiters = <<>>
/\ lockState = unlocked
/\ taskStates = (Task1 :> running @@ Task2 :> running @@ Task3 :> running)
2: Acquire
/\ waiters = <<>>
/\ lockState = locked
/\ taskStates = (Task1 :> running @@ Task2 :> acquired @@ Task3 :> running)
3: Acquire
/\ waiters = <<Task1>>
/\ lockState = locked
/\ taskStates = (Task1 :> waiting @@ Task2 :> acquired @@ Task3 :> running)
4: Cancel
/\ waiters = <<Task1>>
/\ lockState = locked
/\ taskStates = (Task1 :> canceled @@ Task2 :> acquired @@ Task3 :> running)
5: Release
/\ waiters = <<Task1>>
/\ lockState = unlocked
/\ taskStates = (Task1 :> canceled @@ Task2 :> done @@ Task3 :> running)
6: Acquire
/\ waiters = <<Task1, Task3>>
/\ lockState = unlocked
/\ taskStates = (Task1 :> canceled @@ Task2 :> done @@ Task3 :> waiting)
7: Stuttering
The bug is that the lock can't be acquired when there are waiters in the queue, even if they have been all canceled. The fix is to simply account for this case python/asyncio#393:
From bbe53f381fbade4f8c1d04b2a463270710f1392a Mon Sep 17 00:00:00 2001
From: Guido van Rossum <guido@python.org>
Date: Thu, 4 Aug 2016 09:12:07 -0700
Subject: [PATCH] Avoid deadlock when a cancelled future is in self._waiters.
---
asyncio/locks.py | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/asyncio/locks.py b/asyncio/locks.py
index 741aaf27..deefc938 100644
--- a/asyncio/locks.py
+++ b/asyncio/locks.py
@@ -166,7 +166,7 @@ def acquire(self):
This method blocks until the lock is unlocked, then sets it to
locked and returns True.
"""
- if not self._waiters and not self._locked:
+ if not self._locked and all(w.cancelled() for w in self._waiters):
self._locked = True
return True
We do the same in step3.tla
Acquire(task) ==
\* If a waiting task was woken up, it will be next. Guard this action so that no
\* other task can acquire the lock in this case.
/\ \A t \in Tasks: taskStates[t] /= wakeup
/\ taskStates[task] = running
- /\ IF waiters = <<>> /\ lockState = unlocked THEN
+ /\ IF lockState = unlocked /\ \A t \in Range(waiters): taskStates[t] = canceled THEN
/\ taskStates' = [taskStates EXCEPT ![task] = acquired]
/\ lockState' = locked
/\ UNCHANGED waiters
This seems to fix the issue, we don't get an error anymore.
There is another bug mentioned in https://bugs.python.org/issue43991, which
happens when a task is canceled immediately after it was woken up. In this case
the lock is not released after release
was called and no other task is woken
up, which means all other waiting tasks will wait forever.
The issue is easy to reproduce with a small change in step4.tla:
Cancel(task) ==
- \* Only cancel waiting tasks.
- /\ taskStates[task] = waiting
+ \* Cancel waiting tasks and tasks just after they have been woken up.
+ /\ taskStates[task] \in {waiting, wakeup}
/\ taskStates' = [taskStates EXCEPT ![task] = canceled]
/\ UNCHANGED <<waiters, lockState>>
Trace:
1: Init
/\ waiters = <<>>
/\ taskStates = (Task1 :> running @@ Task2 :> running @@ Task3 :> running)
/\ lockState = unlocked
2: Acquire
/\ waiters = <<>>
/\ taskStates = (Task1 :> running @@ Task2 :> running @@ Task3 :> acquired)
/\ lockState = locked
3: Acquire
/\ waiters = <<Task2>>
/\ taskStates = (Task1 :> running @@ Task2 :> waiting @@ Task3 :> acquired)
/\ lockState = locked
4: Acquire
/\ waiters = <<Task2, Task1>>
/\ taskStates = (Task1 :> waiting @@ Task2 :> waiting @@ Task3 :> acquired)
/\ lockState = locked
5: Release
/\ waiters = <<Task2, Task1>>
/\ taskStates = (Task1 :> waiting @@ Task2 :> wakeup @@ Task3 :> done)
/\ lockState = unlocked
6: Cancel
/\ waiters = <<Task2, Task1>>
/\ taskStates = (Task1 :> waiting @@ Task2 :> canceled @@ Task3 :> done)
/\ lockState = unlocked
7: Stuttering
The fix is again very simple, but to model it we would have to change our spec so that we know what state the task was in when it was canceled. One way would be to use structures for tasks like
[state |-> waiting, canceled |-> TRUE]