watchdog就是看门狗。以前实习公司的watchdog就是监视进程,如果进程挂了就重新启动进程。
在Android中watchdog的原理也类似,通过向进程发送消息,判断返回值延迟时间,若超时,通知zogte自杀,后面init会重启zogte,所以重启的是android,不影响kernel,速度较快。
盗个图:
开始撸代码:
1.启动在systemserver:
final Watchdog watchdog = Watchdog.getInstance();
watchdog.init(context, mActivityManagerService);
Watchdog.getInstance().start();
2.getInstance是单例模式,就是调用watchdog的构造
250 private Watchdog() {
251 super("watchdog");
252 // Initialize handler checkers for each common thread we want to check. Note
253 // that we are not currently checking the background thread, since it can
254 // potentially hold longer running operations with no guarantees about the timeliness
255 // of operations there.
256
257 // The shared foreground thread is the main checker. It is where we
258 // will also dispatch monitor checks and do other work.
259 mMonitorChecker = new HandlerChecker(FgThread.getHandler(),
260 "foreground thread", DEFAULT_TIMEOUT);
261 mHandlerCheckers.add(mMonitorChecker);
262 // Add checker for main thread. We only do a quick check since there
263 // can be UI running on the thread.
264 mHandlerCheckers.add(new HandlerChecker(new Handler(Looper.getMainLooper()),
265 "main thread", DEFAULT_TIMEOUT));
266 // Add checker for shared UI thread.
267 mHandlerCheckers.add(new HandlerChecker(UiThread.getHandler(),
268 "ui thread", DEFAULT_TIMEOUT));
269 // And also check IO thread.
270 mHandlerCheckers.add(new HandlerChecker(IoThread.getHandler(),
271 "i/o thread", DEFAULT_TIMEOUT));
272 // And the display thread.
273 mHandlerCheckers.add(new HandlerChecker(DisplayThread.getHandler(),
274 "display thread", DEFAULT_TIMEOUT));
275
276 // Initialize monitor for Binder threads.
277 addMonitor(new BinderThreadMonitor());
278
279 mOpenFdMonitor = OpenFdMonitor.create();
280
281 // See the notes on DEFAULT_TIMEOUT.
282 assert DB ||
283 DEFAULT_TIMEOUT > ZygoteConnectionConstants.WRAPPED_PID_TIMEOUT_MILLIS;
284 }
在Watchdog构造函数中将main thread,UIthread,Iothread,DisplayThread加入mHandlerCheckers列表中。最后初始化monitor放入mMonitorCheckers列表中 ,还有binder和fd的monitor
3.watchdog监控
Watchdog提供两种监视方式,一种是通过monitor()回调监视服务关键区是否出现死锁或阻塞,一种是通过发送消息监视服务主线程是否阻塞。比如服务ams(monitor),跑在systemserver(发送消息)上。
addMonitor()
addThread()
monitor监控服务是通过服务实现watchdog的monitor接口,主动实现的。
发生watchdog时,会打印watchdog重启时有有两种提示语:“Block in Handler in ......”和“Block in monitor”,它们分别对应不同的阻塞类型
4.watchdog工作
watchdog是个thread,start就是调用run,看run函数,比较长
首先是进入无限循环,调用
scheduleCheckLocked();进行监控
进入这个函数里面:
1.如果monitor空,或者线程正在发消息,直接返回true,此时不可能有阻塞
2.mComplete为false,代表正在进行监控
3.若都不满足,则postAtFrontOfQueue(this),进行检查
调用postAtFrontOfQueue后,如果没有阻塞,则很快有返回,代表thread没有阻塞,有返回就会调用它的run函数,调用相应服务的monitor,而monitor就是加个锁,看能不能获取到,获取到就没有阻塞
@Override
200 public void run() {
201 final int size = mMonitors.size();
202 for (int i = 0 ; i < size ; i++) {
203 synchronized (Watchdog.this) {
204 mCurrentMonitor = mMonitors.get(i);
205 }
206 mCurrentMonitor.monitor();
207 }
208
209 synchronized (Watchdog.this) {
210 mCompleted = true;
211 mCurrentMonitor = null;
212 }
213 }
4.报异常逻辑
在每个监测过程中,调用evaluateCheckerCompletionLocked进行返回时间计算
complete就是没有阻塞
waitting状态就是时间在0~30,继续等待
waited_half状态实在30~59 时间过半,开始dump ams stacktrace
到60秒,就是有阻塞发生了
获取阻塞的服务和线程,生成log和dropbox
最后开杀
Slog.w(TAG, "*** WATCHDOG KILLING SYSTEM PROCESS: " + subject);
563 WatchdogDiagnostics.diagnoseCheckers(blockedCheckers);
564 Slog.w(TAG, "*** GOODBYE!");
565 Process.killProcess(Process.myPid());
566 System.exit(10);
5.接收广播重启
在init()函数中,接下来会调用registerReceiver()来注册系统重启的BroadcastReceiver。在收到系统重启广播时会执行RebootRequestReceiver的onReceive()函数,继而调用rebootSystem()重启系统。它允许其它模块(如CTS)通过发广播来让系统重启。所以watchdog有一个重要的工作,就是接收广播并重启系统。
盗了张图: