java 看门狗锁 java看门狗机制

转载

技术领航舵手 2023-11-14 09:58:16

文章标签 java 看门狗锁重启发送消息 UI 文章分类 Java 后端开发

watchdog就是看门狗。以前实习公司的watchdog就是监视进程，如果进程挂了就重新启动进程。

在Android中watchdog的原理也类似，通过向进程发送消息，判断返回值延迟时间，若超时，通知zogte自杀，后面init会重启zogte，所以重启的是android，不影响kernel，速度较快。

盗个图：

java 看门狗锁 java看门狗机制_重启

开始撸代码：

1.启动在systemserver：

final Watchdog watchdog = Watchdog.getInstance();
watchdog.init(context, mActivityManagerService);
Watchdog.getInstance().start();

2.getInstance是单例模式，就是调用watchdog的构造

250    private Watchdog() {
251        super("watchdog");
252        // Initialize handler checkers for each common thread we want to check.  Note
253        // that we are not currently checking the background thread, since it can
254        // potentially hold longer running operations with no guarantees about the timeliness
255        // of operations there.
256
257        // The shared foreground thread is the main checker.  It is where we
258        // will also dispatch monitor checks and do other work.
259        mMonitorChecker = new HandlerChecker(FgThread.getHandler(),
260                "foreground thread", DEFAULT_TIMEOUT);
261        mHandlerCheckers.add(mMonitorChecker);
262        // Add checker for main thread.  We only do a quick check since there
263        // can be UI running on the thread.
264        mHandlerCheckers.add(new HandlerChecker(new Handler(Looper.getMainLooper()),
265                "main thread", DEFAULT_TIMEOUT));
266        // Add checker for shared UI thread.
267        mHandlerCheckers.add(new HandlerChecker(UiThread.getHandler(),
268                "ui thread", DEFAULT_TIMEOUT));
269        // And also check IO thread.
270        mHandlerCheckers.add(new HandlerChecker(IoThread.getHandler(),
271                "i/o thread", DEFAULT_TIMEOUT));
272        // And the display thread.
273        mHandlerCheckers.add(new HandlerChecker(DisplayThread.getHandler(),
274                "display thread", DEFAULT_TIMEOUT));
275
276        // Initialize monitor for Binder threads.
277        addMonitor(new BinderThreadMonitor());
278
279        mOpenFdMonitor = OpenFdMonitor.create();
280
281        // See the notes on DEFAULT_TIMEOUT.
282        assert DB ||
283                DEFAULT_TIMEOUT > ZygoteConnectionConstants.WRAPPED_PID_TIMEOUT_MILLIS;
284    }

在Watchdog构造函数中将main thread，UIthread，Iothread，DisplayThread加入mHandlerCheckers列表中。最后初始化monitor放入mMonitorCheckers列表中，还有binder和fd的monitor

3.watchdog监控

Watchdog提供两种监视方式，一种是通过monitor()回调监视服务关键区是否出现死锁或阻塞，一种是通过发送消息监视服务主线程是否阻塞。比如服务ams（monitor），跑在systemserver（发送消息）上。

addMonitor()

addThread()

monitor监控服务是通过服务实现watchdog的monitor接口，主动实现的。

发生watchdog时，会打印watchdog重启时有有两种提示语：“Block in Handler in ......”和“Block in monitor”，它们分别对应不同的阻塞类型

4.watchdog工作

watchdog是个thread，start就是调用run，看run函数，比较长

首先是进入无限循环，调用

scheduleCheckLocked();进行监控

进入这个函数里面：

1.如果monitor空，或者线程正在发消息，直接返回true，此时不可能有阻塞
2.mComplete为false，代表正在进行监控
3.若都不满足，则postAtFrontOfQueue(this)，进行检查

调用postAtFrontOfQueue后，如果没有阻塞，则很快有返回，代表thread没有阻塞，有返回就会调用它的run函数，调用相应服务的monitor，而monitor就是加个锁，看能不能获取到，获取到就没有阻塞

@Override
200        public void run() {
201            final int size = mMonitors.size();
202            for (int i = 0 ; i < size ; i++) {
203                synchronized (Watchdog.this) {
204                    mCurrentMonitor = mMonitors.get(i);
205                }
206                mCurrentMonitor.monitor();
207            }
208
209            synchronized (Watchdog.this) {
210                mCompleted = true;
211                mCurrentMonitor = null;
212            }
213        }

4.报异常逻辑

在每个监测过程中，调用evaluateCheckerCompletionLocked进行返回时间计算

complete就是没有阻塞

waitting状态就是时间在0~30，继续等待

waited_half状态实在30~59 时间过半，开始dump ams stacktrace

到60秒，就是有阻塞发生了

获取阻塞的服务和线程，生成log和dropbox

最后开杀

Slog.w(TAG, "*** WATCHDOG KILLING SYSTEM PROCESS: " + subject);
563                WatchdogDiagnostics.diagnoseCheckers(blockedCheckers);
564                Slog.w(TAG, "*** GOODBYE!");
565                Process.killProcess(Process.myPid());
566                System.exit(10);

5.接收广播重启

在init()函数中，接下来会调用registerReceiver()来注册系统重启的BroadcastReceiver。在收到系统重启广播时会执行RebootRequestReceiver的onReceive()函数，继而调用rebootSystem()重启系统。它允许其它模块（如CTS）通过发广播来让系统重启。所以watchdog有一个重要的工作，就是接收广播并重启系统。

盗了张图：

java 看门狗锁 java看门狗机制_java 看门狗锁_02