背景

如果作为一名android系统研发工程师,很有可能需要监控系统中所有App的卡顿,以便协助App解决卡顿问题,提升用户体验。当然最主要的是APP研发很难发现每个页面的卡顿,这个时候有系统支持就会发现卡顿的Activity。

基础知识补充

Android屏幕刷新机制理解Android硬件加速原理的小白文

大概描述下UI绘制一帧的流程

1、无论是resume或者invalidate等刷新UI的接口,最终都调用到了ViewRootImpl.scheduleTraversals

void scheduleTraversals() {
        if (!mTraversalScheduled) {
            mTraversalScheduled = true;
            mTraversalBarrier = mHandler.getLooper().getQueue().postSyncBarrier();
            mChoreographer.postCallback(
                    Choreographer.CALLBACK_TRAVERSAL, mTraversalRunnable, null);
            if (!mUnbufferedInputDispatch) {
                scheduleConsumeBatchedInput();
            }
            notifyRendererOfFramePending();
            pokeDrawLockIfNeeded();
        }
    }

2、scheduleTraversals方法内部会Choreographer.postCallback,这个是最主要的,看下这个接口的备注,下一帧绘制信号来了会调用这个callback

/**
     * Posts a callback to run on the next frame.
     * <p>
     * The callback runs once then is automatically removed.
     * </p>
     */
    @TestApi
    public void postCallback(int callbackType, Runnable action, Object token) {
        postCallbackDelayed(callbackType, action, token, 0);
    }

3、mTraversalRunnable就是帧信号回调后调用doTraversal()内部依次performMeasure()、performLayout()、performDraw()
4、Choreographer.postCallback加入的回调什么条件下被执行,代码Choreographer.java

public void postCallback(int callbackType, Runnable action, Object token) {
        postCallbackDelayed(callbackType, action, token, 0);
    }

    private void scheduleVsyncLocked() {
        mDisplayEventReceiver.scheduleVsync();
    }

层层传递,最终调用到了mDisplayEventReceiver.scheduleVsync();看一下DisplayEventReceiver.java

/**
     * Schedules a single vertical sync pulse to be delivered when the next
     * display frame begins.
     */
    public void scheduleVsync() {
        if (mReceiverPtr == 0) {
            Log.w(TAG, "Attempted to schedule a vertical sync pulse but the display event "
                    + "receiver has already been disposed.");
        } else {
            nativeScheduleVsync(mReceiverPtr);
        }
    }

根据接口注释,这个接口会关联这个DisplayEventReceiver实体类安排一个定向的脉冲信号(可以理解成回调),会在下一帧绘制开始时发送。
5、因为nativeScheduleVsync是关联DisplayEventReceiver注册的,所以当收到下一帧绘制信号时会回调onVsync接口

/**
     * Called when a vertical sync pulse is received.
     * The recipient should render a frame and then call {@link #scheduleVsync}
     * to schedule the next vertical sync pulse.
     *
     * @param timestampNanos The timestamp of the pulse, in the {@link System#nanoTime()}
     * timebase.
     * @param builtInDisplayId The surface flinger built-in display id such as
     * {@link SurfaceControl#BUILT_IN_DISPLAY_ID_MAIN}.
     * @param frame The frame number.  Increases by one for each vertical sync interval.
     */
    public void onVsync(long timestampNanos, int builtInDisplayId, int frame) {
    }

6、Choreographer内部实例化了FrameDisplayEventReceiver,重写了onVsync接口,最终会调用到了Choreographer.doFrame()接口。

private void postCallbackDelayedInternal(int callbackType,
            Object action, Object token, long delayMillis) {
        synchronized (mLock) {
            final long now = SystemClock.uptimeMillis();
            final long dueTime = now + delayMillis;
            mCallbackQueues[callbackType].addCallbackLocked(dueTime, action, token);
        }
    }

    void doFrame(long frameTimeNanos, int frame) {
        ......
        try {
            Trace.traceBegin(Trace.TRACE_TAG_VIEW, "Choreographer#doFrame");
            AnimationUtils.lockAnimationClock(frameTimeNanos / TimeUtils.NANOS_PER_MS);

            mFrameInfo.markInputHandlingStart();
            doCallbacks(Choreographer.CALLBACK_INPUT, frameTimeNanos);

            mFrameInfo.markAnimationsStart();
            doCallbacks(Choreographer.CALLBACK_ANIMATION, frameTimeNanos);

            mFrameInfo.markPerformTraversalsStart();
            doCallbacks(Choreographer.CALLBACK_TRAVERSAL, frameTimeNanos);

            doCallbacks(Choreographer.CALLBACK_COMMIT, frameTimeNanos);
        } finally {
            AnimationUtils.unlockAnimationClock();
            Trace.traceEnd(Trace.TRACE_TAG_VIEW);
        }
    }

重要的集合mCallbackQueues,这个集合会在postCallback时加入传入的runnable,在doFrame中调用doCallbacks,doCallbacks内部会从mCallbackQueues取出runnable然后执行。最终去执行ViewRootImpl的doTraversal()

硬件加速下,Draw在GPU绘制的流程

1、cpu负责计算,measure,layout都是在主线程进行的,View视图被抽象成RenderNode节点传递到GPU进行绘制。
ViewRootImpl.java

private boolean draw(boolean fullRedrawNeeded) {
        ....
        <!--关键点1 是否开启硬件加速-->
        if (mAttachInfo.mThreadedRenderer != null && mAttachInfo.mThreadedRenderer.isEnabled()) {
            ....
            <!--关键点2 硬件加速绘制-->
            mAttachInfo.mThreadedRenderer.draw(mView, mAttachInfo, this, callback);
            ....
        }

GPU绘制是借助ThreadedRenderer去绘制的
2、ThreadedRenderer.java的draw方法

/**
     * Draws the specified view
     */
    void draw(View view, AttachInfo attachInfo, DrawCallbacks callbacks,
            FrameDrawingCallback frameDrawingCallback) {
        attachInfo.mIgnoreDirtyState = true;

        final Choreographer choreographer = attachInfo.mViewRootImpl.mChoreographer;
        choreographer.mFrameInfo.markDrawStart();

        updateRootDisplayList(view, callbacks);
        ....
        final long[] frameInfo = choreographer.mFrameInfo.mFrameInfo;
        if (frameDrawingCallback != null) {
            nSetFrameCallback(mNativeProxy, frameDrawingCallback);
        }
        int syncResult = nSyncAndDrawFrame(mNativeProxy, frameInfo, frameInfo.length);
        if ((syncResult & SYNC_LOST_SURFACE_REWARD_IF_FOUND) != 0) {
            setEnabled(false);
            attachInfo.mViewRootImpl.mSurface.release();
            // Invalidate since we failed to draw. This should fetch a Surface
            // if it is still needed or do nothing if we are no longer drawing
            attachInfo.mViewRootImpl.invalidate();
        }
        if ((syncResult & SYNC_INVALIDATE_REQUIRED) != 0) {
            attachInfo.mViewRootImpl.invalidate();
        }
    }

    private static native int nSyncAndDrawFrame(long nativeProxy, long[] frameInfo, int size);

通过native方法nSyncAndDrawFrame交给底层去绘制了,绘制完成会在调用ViewRootImpl.invalidate()刷新。

底层又如何接管去绘制的

1、nSyncAndDrawFrame接口在frameworks/base/core/jni/android_view_ThreadedRenderer.cpp

static int android_view_ThreadedRenderer_syncAndDrawFrame(JNIEnv* env, jobject clazz,
        jlong proxyPtr, jlongArray frameInfo, jint frameInfoSize) {
    LOG_ALWAYS_FATAL_IF(frameInfoSize != UI_THREAD_FRAME_INFO_SIZE,
            "Mismatched size expectations, given %d expected %d",
            frameInfoSize, UI_THREAD_FRAME_INFO_SIZE);
    RenderProxy* proxy = reinterpret_cast<RenderProxy*>(proxyPtr);
    env->GetLongArrayRegion(frameInfo, 0, frameInfoSize, proxy->frameInfo());
    return proxy->syncAndDrawFrame();
}

2、RenderProxy.syncAndDrawFrame
frameworks/base/libs/hwui/renderthread/RenderProxy.cpp

int RenderProxy::syncAndDrawFrame() {
    return mDrawFrameTask.drawFrame();
}

3、frameworks/base/libs/hwui/renderthread/DrawFrameTask.cpp

int DrawFrameTask::drawFrame() {
    LOG_ALWAYS_FATAL_IF(!mContext, "Cannot drawFrame with no CanvasContext!");

    mSyncResult = SyncResult::OK;
    mSyncQueued = systemTime(CLOCK_MONOTONIC);
    postAndWait();

    return mSyncResult;
}

void DrawFrameTask::postAndWait() {
    AutoMutex _lock(mLock);
    mRenderThread->queue().post([this]() { run(); });
    mSignal.wait(mLock);
}

void DrawFrameTask::run() {
    ......
    context->draw();
    ......
}

4、context是CanvasContext,frameworks/base/libs/hwui/renderthread/CanvasContext.cpp

void CanvasContext::draw() {
    SkRect dirty;
    mDamageAccumulator.finish(&dirty);

    if (dirty.isEmpty() && Properties::skipEmptyFrames && !surfaceRequiresRedraw()) {
        mCurrentFrameInfo->addFlag(FrameInfoFlags::SkippedFrame);
        return;
    }

    mCurrentFrameInfo->markIssueDrawCommandsStart();

    Frame frame = mRenderPipeline->getFrame();
    setPresentTime();

    SkRect windowDirty = computeDirtyRect(frame, &dirty);

    bool drew = mRenderPipeline->draw(frame, windowDirty, dirty, mLightGeometry, &mLayerUpdateQueue,
                                      mContentDrawBounds, mOpaque, mLightInfo, mRenderNodes,
                                      &(profiler()));

    int64_t frameCompleteNr = mFrameCompleteCallbacks.size() ? getFrameNumber() : -1;

    waitOnFences();

    bool requireSwap = false;
    bool didSwap =
            mRenderPipeline->swapBuffers(frame, drew, windowDirty, mCurrentFrameInfo, &requireSwap);

    mIsDirty = false;

    if (requireSwap) {
        if (!didSwap) {  // some error happened
            setSurface(nullptr);
        }
        SwapHistory& swap = mSwapHistory.next();
        swap.damage = windowDirty;
        swap.swapCompletedTime = systemTime(CLOCK_MONOTONIC);
        swap.vsyncTime = mRenderThread.timeLord().latestVsync();
        if (mNativeSurface.get()) {
            int durationUs;
            nsecs_t dequeueStart = mNativeSurface->getLastDequeueStartTime();
            if (dequeueStart < mCurrentFrameInfo->get(FrameInfoIndex::SyncStart)) {
                // Ignoring dequeue duration as it happened prior to frame render start
                // and thus is not part of the frame.
                swap.dequeueDuration = 0;
            } else {
                mNativeSurface->query(NATIVE_WINDOW_LAST_DEQUEUE_DURATION, &durationUs);
                swap.dequeueDuration = us2ns(durationUs);
            }
            mNativeSurface->query(NATIVE_WINDOW_LAST_QUEUE_DURATION, &durationUs);
            swap.queueDuration = us2ns(durationUs);
        } else {
            swap.dequeueDuration = 0;
            swap.queueDuration = 0;
        }
        mCurrentFrameInfo->set(FrameInfoIndex::DequeueBufferDuration) = swap.dequeueDuration;
        mCurrentFrameInfo->set(FrameInfoIndex::QueueBufferDuration) = swap.queueDuration;
        mHaveNewSurface = false;
        mFrameNumber = -1;
    } else {
        mCurrentFrameInfo->set(FrameInfoIndex::DequeueBufferDuration) = 0;
        mCurrentFrameInfo->set(FrameInfoIndex::QueueBufferDuration) = 0;
    }


#if LOG_FRAMETIME_MMA
    float thisFrame = mCurrentFrameInfo->duration(FrameInfoIndex::IssueDrawCommandsStart,
                                                  FrameInfoIndex::FrameCompleted) /
                      NANOS_PER_MILLIS_F;
    if (sFrameCount) {
        sBenchMma = ((9 * sBenchMma) + thisFrame) / 10;
    } else {
        sBenchMma = thisFrame;
    }
    if (++sFrameCount == 10) {
        sFrameCount = 1;
        ALOGD("Average frame time: %.4f", sBenchMma);
    }
#endif

    if (didSwap) {
        for (auto& func : mFrameCompleteCallbacks) {
            std::invoke(func, frameCompleteNr);
        }
        mFrameCompleteCallbacks.clear();
    }

    mJankTracker.finishFrame(*mCurrentFrameInfo);
    if (CC_UNLIKELY(mFrameMetricsReporter.get() != nullptr)) {
        mFrameMetricsReporter->reportFrameMetrics(mCurrentFrameInfo->data());
    }

    GpuMemoryTracker::onFrameCompleted();
}

最终native绘制完会存放到共享内存中,等待Surface通过SwapBuffers获取绘制结果。

绘制流程梳理完了,如何去监控卡顿

不知道大家留意到没,上述的三个大步骤都有个对象一直在记录时间,比如:
mFrameInfo.markPerformTraversalsStart();
mFrameInfo.markInputHandlingStart();
mFrameInfo.markAnimationsStart();等等等等

FrameInfo这个对象会一直被传递,从java层一直到native的CanvasContext。每一个步骤都会mark一个时间,记录下来。
所以,卡顿监控就可以从这个FrameInfo对象获取每个步骤的时间,如果时间超过一帧,其实就是卡顿了。大体思路如何操作呢?
1、从绘制的终点CanvasContext.draw()方法的末尾,传递FrameInfo到自定义service。
2、自定义service通过处理一些时间细节,判断是否掉帧。
3、自定义service通过aidl桥接到app或者直接在自定义service中存储掉帧数据。
4、把处理过得数据汇总,通过网络请求上传到服务器,分析观察。
5、CanvasContext中还能拿到当年绘制的Activity信息,可以把掉帧和窗口关联起来。

FrameInfo字段解析

具体几个绘制名称的含义:

  • IntendedVsync app_vsync的时间
  • Vsync 开始处理vync事件的时间
  • OldestInputEvent 如上处理批量事件中最老的一个inputEvent的时间
  • NewestInputEvent 最新事件
  • HandleInputStart mainthread开始处理input事件的时间
  • AnimationStart mainthread开始处理动画的时间
  • PerformTraversalsStart mainthread开始遍历视图的时间
  • DrawStart mainthread开始执行draw函数的时间
  • SyncQueued 添加事件到ThreadRender的时间
  • SyncStart renderthread开始同步main thread数据的的时间
  • IssueDrawCommandsStart renderThread开始绘制的时间
  • SwapBuffers renderThread开始交换buffer的时间
  • FrameCompleted renderThread 完成交换buffer的时间
  • DequeueBufferDuration renderThread交换buffer中dequeueBuffer花费的时间
  • QueueBufferDuration renderThread queueBuffer的时间

几个重要绘制节点时间间隔的含义(MQS会通过这几个间隔判断是否app自身导致的卡顿):

  • IntendedVsync-Vsync main thread delay time
  • HandleInputStart-AnimationStart handle_input_time interval
  • AnimationStart-PerformTraversalsStart handle_animation_time
  • PerformTraversalsStart-DrawStart handle_traversal_time
  • SyncStart-IssueDrawCommandsStart bitmap_uploads_time
  • IssueDrawCommandsStart-SwapBuffers issue_draw_commands_time

卡顿使用FrameCompleted - IntendedVsync 的时间差作为一次绘制的时长。

如何优化

Android性能优化典范