前言
此文主要介绍FreeBSD lock的内核实现,只是几年前的随笔记录,希望能提供帮助。lock的实现和进程调度相关,有兴趣的需要联合进程调度一起分析,当把这些串联起来就发现操作系统就是个哲学系统,不是什么不可逾越的天堑,国内没有成熟的操作系统只是这方面没有从0-N的积累,没有培养国产操作系统的土壤。
正文
propagate_priority 是在turnstile_wait 中被调用(能调用到propagate_priority的地方基本上是拿着锁的线程也是被blocked),其内部会查这个锁(blocking lock: mutex,rw,rm)被谁占用,一旦查出是由sleep lock(sx,lockmgr,sleep())占用,则直接panic 说一个睡的线程占用一个不可睡的锁(这里的锁就是上面的block锁)
propagate_priority(td) 函数实现 :
ts = td->td_blocked; 当前线程锁在的turnstile
pri = td->td_priority;
for (;;){
td = ts->ts_owner; 当前拿着这个turnstile的thread
/*
* If this thread already has higher priority than the
* thread that is being blocked, we are finished.
*/
if (td->td_priority <= pri) {
thread_unlock(td);
return;
}
/*
* Bump this thread's priority.
*/
sched_lend_prio(td, pri); 升级拿着此lock的线程优先级
/*
* If lock holder is actually running or on the run queue
* then we are done.
*/
if (TD_IS_RUNNING(td) || TD_ON_RUNQ(td)) { ------------>线程正在运行, 或是在可运行队列 则退出函数;
MPASS(td->td_blocked == NULL);
thread_unlock(td);
return;
}
/*
* Pick up the lock that td is blocked on.
*/
ts = td->td_blocked; 当前拿着这个turnstile的thread 锁在的turnstile (这里相当于链表的next,这样一直把调用propagate_priority的线程的优先级尽可能沿着thread的 td_blocked(td_blocked指向的是turnstile)向上传递;如果调用该函数的thread优先级很高则会提升所有挡在它前面的线程优先级)
/* Resort td on the list if needed. */
if (!turnstile_adjust_thread(ts, td)) { -------->这个线程的优先级已经被上面的函数sched_lend_prio 提升过优先级了,而且这个线程也是被block, 所以自然需要在blocked的turnstile上要求重新排队;
mtx_unlock_spin(&ts->ts_lock);
return;
}
}
propagate_priority 实现思想: 从当前持有该把锁的turnstile 开始, 如果持有该锁的线程(ts_owner) 优先级低,则借给他优先级, 如果持有该锁的优先级很高则 没必要继续下去了, 函数返回
接下来的流程是在借过优先级之后,如果当前线程正在运行,或者在运行队列上, 则借过优先级就行了, 如果没在可运行队列上, 则说明持有该锁的线程 也是被block在了其他的turnstile上(不可能在sleep queue上,因为申请sleep lock的线程不可以拿 blocking mutex)
那么就要继续找到持有该锁的线程 被阻塞在了哪个线程, 然后继续for循环,从头开始,总之就是 一个线程要把自己的优先级借给一切挡在了其前面的线程;
sleepable 标志只在 sleep lock(如上)中被标记
sched_bind()中会调用 sched_pin() 来把 curthread->td_pinned++
在进程调度的时候会检查是否可被迁移
#define THREAD_CAN_MIGRATE(td) ((td)->td_pinned == 0)
Turnstile:
ts_pending链表是被turnstile_signal or turnstile_broadcase 操作的用来将线程放在运行队列之前的一个过渡链表;
td_blocked 记录的是该线程被锁在哪个turnstile 上,在turnstile_wait 时 记录block在哪个turnstile上,在
turnstile_unpend时将指针清空
turnstile_setowner:
ts->ts_owner = owner;
LIST_INSERT_HEAD(&owner->td_contested, ts, ts_link); 由此可见 td_contested
3类turnstile 链表 entry 是turnstile
/*
- There are three different lists of turnstiles as follows. The list
- connected by ts_link entries is a per-thread list of all the turnstiles
- attached to locks that we(we are owner thread) own. This is used to fixup our priority when
- a lock is released. The other two lists use the ts_hash entries. The
- first of these two is the turnstile chain list that a turnstile is on
- when it is attached to a lock. The second list to use ts_hash is the
- free list hung off of a turnstile that is attached to a lock.
在turnstile里面的lists, entry 是 thread
- Each turnstile contains three lists of threads. The two ts_blocked lists
- are linked list of threads blocked on the turnstile’s lock. One list is
- for exclusive waiters, and the other is for shared waiters. The
- ts_pending list is a linked list of threads previously awakened by
- turnstile_signal() or turnstile_wait() that are waiting to be put on
- the run queue.
turnstile_trywait:
根据lock hash查到该lock的 turnstile chain tc,之后再这个tc单项链表上根据 ts_hash找到 在这个chain上是否有该lock的turnstile 如果有则返回;
如果没有说明没有在该hash上的 chain为空,则把当前线程的 turnstile 返回并在返回之前 记录ts->ts_lockobj = lock;
turnstile_wait:
在这里将上面函数选出来的turnstile 作为目标turnstile, 将传进来的thread 放到ts_blocked链表上(有优先级的链表), 然后将thread的 td_blocked 指向这个ts;
增删ts_hash的地方有:1.turnstile_wait 如果是该lock还没有一个turnstile则把当前线程的turnstile作为该lock的turnstile插入到该turnstile chain的头部(插入到链表头部)
如果已经有了该lock的turnstile则把当前线程插入到该turnstile的 ts_blocked的队列里面去(通过thread的 td_lockq),把该线程的turnstile视为无用 放入到该lock的turnstile
的free 链表中(当turnstile_broadcast的时候会从该free链表中取出turnstile结构还给线程)
2.turnstile_signal 和 turnstile_broadcast一样:
都是调用LIST_REMOVE从当前的链表中删除(这里n-1个是从ts->ts_free这个链表里remove,最后一个应该是在turnstile chain里面把该lock的turnstile删除)
td_lockq 基本使用来连接lock在同一个turnstile的所有线程的
ts_link:
通过ts_link把该turnstile放到持有该把所的线程的 td_contested为链表头的链表当中去
|<--------------------------------ts_owner
| |
thread -> td_contested----------------->turnstile for lock1-----(ts_link)------>turnstile for lock2-----------turn for lock3--------
|
( |-->ts_blocked 链,串起来所以的block在当前锁的thread)
|
thread X wait for lock1
|
( | -->use td_lockq)
|
thread Y wait for lock1
/*
- Adjust the thread’s position on a turnstile after its priority has been
- changed.
*/
A<===>turnstile_adjust_thread(ts,td):
该函数在一个线程 的优先级改变的时候被调用 假如上一个图里面的thread Y的优先级得到改变(改变的可能路径:sched_prio---->turnstile_adjust—>A;
turnstile_wait ----> propagate_priority—>A)
该函数通过td_lockq来找到相应合适的位置,假如现在的优先级高于thread X,则现在应该把thread Y 添加到X的前面
关于读写锁:
在释放读锁的时候,因为读锁是共享的,所以没有阻塞在读锁上的读锁请求,多个读也只是体现在计数上而已,所以读锁释放时如果读者不止一个则很简单只需
减计数即可(因为即使有写者等待,也得等所有读锁释放后才能有下文),所以当读锁全部释放且当前有写者在等待则该锁上是有turnstile(如果没有写者在等待,那很好直接把锁置为RW_UNLOCKED),所以接下来需要turnstile_broadcast;
释放写锁第一步看是否有等待者:没有则直接置为RW_UNLOCKED;若有等待 则一定是有turnstile的(无论读等还是写等,肯定有block所以肯定有turnstile)所以上来就开始turnstile_broadcast
_rw_init_flags:
读写锁初始化为:
rw->rw_lock = RW_UNLOCKED;
/*#define RW_UNLOCKED RW_READERS_LOCK(0)
#define RW_READERS_LOCK(x) ((x) << RW_READERS_SHIFT | RW_LOCK_READ)
#define RW_LOCK_READ 0x01
* The rw_lock field consists of several fields. The low bit(bit 0) indicates
* if the lock is locked with a read (shared) or write (exclusive) lock.
* A value of 0 indicates a write lock, and a value of 1 indicates a read
* lock. Bit 1 is a boolean indicating if there are any threads waiting
* for a read lock. Bit 2 is a boolean indicating if there are any threads
* waiting for a write lock. The rest of the variable's definition is
* dependent on the value of the first bit. For a write lock, it is a
* pointer to the thread holding the lock, similar to the mtx_lock field of
* mutexes. For read locks, it is a count of read locks that are held.
*/
#define RW_LOCK_READ 0x01
#define RW_LOCK_READ_WAITERS 0x02
#define RW_LOCK_WRITE_WAITERS 0x04
* When the lock is not locked by any thread, it is encoded as a read lock
* with zero waiters.
/* Try to obtain a write lock once. */
#define _rw_write_lock(rw, tid) \
atomic_cmpset_acq_ptr(&(rw)->rw_lock, RW_UNLOCKED, (tid)) 如果当前为读锁且读者个数为0则是无人在锁(见上面的初始化)
/* Release a write lock quickly if there are no waiters. */
#define _rw_write_unlock(rw, tid) \
atomic_cmpset_rel_ptr(&(rw)->rw_lock, (tid), RW_UNLOCKED)
__rw_rlock :
从函数的 语句 ”if (RW_CAN_READ(v)) { “中得出:
如果读写锁先是被读锁,之后有写锁再等,之后又来了一个想读锁的,这时读锁要靠边站(不能去拿读锁了,因为有人想写,就得让给人家,不然的话写者容易被饿死,这样看来 写锁是优先于读锁的)
.....
if (!(v & RW_LOCK_READ_WAITERS)) { 如果当前是有写者或者是有写等待者,则要给锁置上有人在等的标志
if (!atomic_cmpset_ptr(&rw->rw_lock, v,
v | RW_LOCK_READ_WAITERS)) {
turnstile_cancel(ts);
continue;
}
if (LOCK_LOG_TEST(&rw->lock_object, 0))
CTR2(KTR_LOCK, "%s: %p set read waiters flag",
__func__, rw);
}
_rw_runlock_cookie:在读锁释放的时候1要看是否有多个读锁者,若没有则2.要看是否有等待者,若没有最好赶紧释放锁并置成RW_UNLOCKED,如果有等待者 3要看是读等还是写等:
x = RW_UNLOCKED;
if (v & RW_LOCK_WRITE_WAITERS) { /*选择唤醒挂在哪个queue上的线程(s),从这里也能看出 写等优先级高于读等*/
queue = TS_EXCLUSIVE_QUEUE;
x |= (v & RW_LOCK_READ_WAITERS);
} else
queue = TS_SHARED_QUEUE;
__rw_wunlock_hard:
能进这个函数则肯定是有人在等待的,因为如果没有等待的直接调用_rw_write_unlock就成功了