背景
如果作为一名android系统研发工程师,很有可能需要监控系统中所有App的卡顿,以便协助App解决卡顿问题,提升用户体验。当然最主要的是APP研发很难发现每个页面的卡顿,这个时候有系统支持就会发现卡顿的Activity。
基础知识补充
Android屏幕刷新机制理解Android硬件加速原理的小白文
大概描述下UI绘制一帧的流程
1、无论是resume或者invalidate等刷新UI的接口,最终都调用到了ViewRootImpl.scheduleTraversals
void scheduleTraversals() {
if (!mTraversalScheduled) {
mTraversalScheduled = true;
mTraversalBarrier = mHandler.getLooper().getQueue().postSyncBarrier();
mChoreographer.postCallback(
Choreographer.CALLBACK_TRAVERSAL, mTraversalRunnable, null);
if (!mUnbufferedInputDispatch) {
scheduleConsumeBatchedInput();
}
notifyRendererOfFramePending();
pokeDrawLockIfNeeded();
}
}
2、scheduleTraversals方法内部会Choreographer.postCallback,这个是最主要的,看下这个接口的备注,下一帧绘制信号来了会调用这个callback
/**
* Posts a callback to run on the next frame.
* <p>
* The callback runs once then is automatically removed.
* </p>
*/
@TestApi
public void postCallback(int callbackType, Runnable action, Object token) {
postCallbackDelayed(callbackType, action, token, 0);
}
3、mTraversalRunnable就是帧信号回调后调用doTraversal()内部依次performMeasure()、performLayout()、performDraw()
4、Choreographer.postCallback加入的回调什么条件下被执行,代码Choreographer.java
public void postCallback(int callbackType, Runnable action, Object token) {
postCallbackDelayed(callbackType, action, token, 0);
}
private void scheduleVsyncLocked() {
mDisplayEventReceiver.scheduleVsync();
}
层层传递,最终调用到了mDisplayEventReceiver.scheduleVsync();看一下DisplayEventReceiver.java
/**
* Schedules a single vertical sync pulse to be delivered when the next
* display frame begins.
*/
public void scheduleVsync() {
if (mReceiverPtr == 0) {
Log.w(TAG, "Attempted to schedule a vertical sync pulse but the display event "
+ "receiver has already been disposed.");
} else {
nativeScheduleVsync(mReceiverPtr);
}
}
根据接口注释,这个接口会关联这个DisplayEventReceiver实体类安排一个定向的脉冲信号(可以理解成回调),会在下一帧绘制开始时发送。
5、因为nativeScheduleVsync是关联DisplayEventReceiver注册的,所以当收到下一帧绘制信号时会回调onVsync接口
/**
* Called when a vertical sync pulse is received.
* The recipient should render a frame and then call {@link #scheduleVsync}
* to schedule the next vertical sync pulse.
*
* @param timestampNanos The timestamp of the pulse, in the {@link System#nanoTime()}
* timebase.
* @param builtInDisplayId The surface flinger built-in display id such as
* {@link SurfaceControl#BUILT_IN_DISPLAY_ID_MAIN}.
* @param frame The frame number. Increases by one for each vertical sync interval.
*/
public void onVsync(long timestampNanos, int builtInDisplayId, int frame) {
}
6、Choreographer内部实例化了FrameDisplayEventReceiver,重写了onVsync接口,最终会调用到了Choreographer.doFrame()接口。
private void postCallbackDelayedInternal(int callbackType,
Object action, Object token, long delayMillis) {
synchronized (mLock) {
final long now = SystemClock.uptimeMillis();
final long dueTime = now + delayMillis;
mCallbackQueues[callbackType].addCallbackLocked(dueTime, action, token);
}
}
void doFrame(long frameTimeNanos, int frame) {
......
try {
Trace.traceBegin(Trace.TRACE_TAG_VIEW, "Choreographer#doFrame");
AnimationUtils.lockAnimationClock(frameTimeNanos / TimeUtils.NANOS_PER_MS);
mFrameInfo.markInputHandlingStart();
doCallbacks(Choreographer.CALLBACK_INPUT, frameTimeNanos);
mFrameInfo.markAnimationsStart();
doCallbacks(Choreographer.CALLBACK_ANIMATION, frameTimeNanos);
mFrameInfo.markPerformTraversalsStart();
doCallbacks(Choreographer.CALLBACK_TRAVERSAL, frameTimeNanos);
doCallbacks(Choreographer.CALLBACK_COMMIT, frameTimeNanos);
} finally {
AnimationUtils.unlockAnimationClock();
Trace.traceEnd(Trace.TRACE_TAG_VIEW);
}
}
重要的集合mCallbackQueues,这个集合会在postCallback时加入传入的runnable,在doFrame中调用doCallbacks,doCallbacks内部会从mCallbackQueues取出runnable然后执行。最终去执行ViewRootImpl的doTraversal()
硬件加速下,Draw在GPU绘制的流程
1、cpu负责计算,measure,layout都是在主线程进行的,View视图被抽象成RenderNode节点传递到GPU进行绘制。
ViewRootImpl.java
private boolean draw(boolean fullRedrawNeeded) {
....
<!--关键点1 是否开启硬件加速-->
if (mAttachInfo.mThreadedRenderer != null && mAttachInfo.mThreadedRenderer.isEnabled()) {
....
<!--关键点2 硬件加速绘制-->
mAttachInfo.mThreadedRenderer.draw(mView, mAttachInfo, this, callback);
....
}
GPU绘制是借助ThreadedRenderer去绘制的
2、ThreadedRenderer.java的draw方法
/**
* Draws the specified view
*/
void draw(View view, AttachInfo attachInfo, DrawCallbacks callbacks,
FrameDrawingCallback frameDrawingCallback) {
attachInfo.mIgnoreDirtyState = true;
final Choreographer choreographer = attachInfo.mViewRootImpl.mChoreographer;
choreographer.mFrameInfo.markDrawStart();
updateRootDisplayList(view, callbacks);
....
final long[] frameInfo = choreographer.mFrameInfo.mFrameInfo;
if (frameDrawingCallback != null) {
nSetFrameCallback(mNativeProxy, frameDrawingCallback);
}
int syncResult = nSyncAndDrawFrame(mNativeProxy, frameInfo, frameInfo.length);
if ((syncResult & SYNC_LOST_SURFACE_REWARD_IF_FOUND) != 0) {
setEnabled(false);
attachInfo.mViewRootImpl.mSurface.release();
// Invalidate since we failed to draw. This should fetch a Surface
// if it is still needed or do nothing if we are no longer drawing
attachInfo.mViewRootImpl.invalidate();
}
if ((syncResult & SYNC_INVALIDATE_REQUIRED) != 0) {
attachInfo.mViewRootImpl.invalidate();
}
}
private static native int nSyncAndDrawFrame(long nativeProxy, long[] frameInfo, int size);
通过native方法nSyncAndDrawFrame交给底层去绘制了,绘制完成会在调用ViewRootImpl.invalidate()刷新。
底层又如何接管去绘制的
1、nSyncAndDrawFrame接口在frameworks/base/core/jni/android_view_ThreadedRenderer.cpp
static int android_view_ThreadedRenderer_syncAndDrawFrame(JNIEnv* env, jobject clazz,
jlong proxyPtr, jlongArray frameInfo, jint frameInfoSize) {
LOG_ALWAYS_FATAL_IF(frameInfoSize != UI_THREAD_FRAME_INFO_SIZE,
"Mismatched size expectations, given %d expected %d",
frameInfoSize, UI_THREAD_FRAME_INFO_SIZE);
RenderProxy* proxy = reinterpret_cast<RenderProxy*>(proxyPtr);
env->GetLongArrayRegion(frameInfo, 0, frameInfoSize, proxy->frameInfo());
return proxy->syncAndDrawFrame();
}
2、RenderProxy.syncAndDrawFrame
frameworks/base/libs/hwui/renderthread/RenderProxy.cpp
int RenderProxy::syncAndDrawFrame() {
return mDrawFrameTask.drawFrame();
}
3、frameworks/base/libs/hwui/renderthread/DrawFrameTask.cpp
int DrawFrameTask::drawFrame() {
LOG_ALWAYS_FATAL_IF(!mContext, "Cannot drawFrame with no CanvasContext!");
mSyncResult = SyncResult::OK;
mSyncQueued = systemTime(CLOCK_MONOTONIC);
postAndWait();
return mSyncResult;
}
void DrawFrameTask::postAndWait() {
AutoMutex _lock(mLock);
mRenderThread->queue().post([this]() { run(); });
mSignal.wait(mLock);
}
void DrawFrameTask::run() {
......
context->draw();
......
}
4、context是CanvasContext,frameworks/base/libs/hwui/renderthread/CanvasContext.cpp
void CanvasContext::draw() {
SkRect dirty;
mDamageAccumulator.finish(&dirty);
if (dirty.isEmpty() && Properties::skipEmptyFrames && !surfaceRequiresRedraw()) {
mCurrentFrameInfo->addFlag(FrameInfoFlags::SkippedFrame);
return;
}
mCurrentFrameInfo->markIssueDrawCommandsStart();
Frame frame = mRenderPipeline->getFrame();
setPresentTime();
SkRect windowDirty = computeDirtyRect(frame, &dirty);
bool drew = mRenderPipeline->draw(frame, windowDirty, dirty, mLightGeometry, &mLayerUpdateQueue,
mContentDrawBounds, mOpaque, mLightInfo, mRenderNodes,
&(profiler()));
int64_t frameCompleteNr = mFrameCompleteCallbacks.size() ? getFrameNumber() : -1;
waitOnFences();
bool requireSwap = false;
bool didSwap =
mRenderPipeline->swapBuffers(frame, drew, windowDirty, mCurrentFrameInfo, &requireSwap);
mIsDirty = false;
if (requireSwap) {
if (!didSwap) { // some error happened
setSurface(nullptr);
}
SwapHistory& swap = mSwapHistory.next();
swap.damage = windowDirty;
swap.swapCompletedTime = systemTime(CLOCK_MONOTONIC);
swap.vsyncTime = mRenderThread.timeLord().latestVsync();
if (mNativeSurface.get()) {
int durationUs;
nsecs_t dequeueStart = mNativeSurface->getLastDequeueStartTime();
if (dequeueStart < mCurrentFrameInfo->get(FrameInfoIndex::SyncStart)) {
// Ignoring dequeue duration as it happened prior to frame render start
// and thus is not part of the frame.
swap.dequeueDuration = 0;
} else {
mNativeSurface->query(NATIVE_WINDOW_LAST_DEQUEUE_DURATION, &durationUs);
swap.dequeueDuration = us2ns(durationUs);
}
mNativeSurface->query(NATIVE_WINDOW_LAST_QUEUE_DURATION, &durationUs);
swap.queueDuration = us2ns(durationUs);
} else {
swap.dequeueDuration = 0;
swap.queueDuration = 0;
}
mCurrentFrameInfo->set(FrameInfoIndex::DequeueBufferDuration) = swap.dequeueDuration;
mCurrentFrameInfo->set(FrameInfoIndex::QueueBufferDuration) = swap.queueDuration;
mHaveNewSurface = false;
mFrameNumber = -1;
} else {
mCurrentFrameInfo->set(FrameInfoIndex::DequeueBufferDuration) = 0;
mCurrentFrameInfo->set(FrameInfoIndex::QueueBufferDuration) = 0;
}
#if LOG_FRAMETIME_MMA
float thisFrame = mCurrentFrameInfo->duration(FrameInfoIndex::IssueDrawCommandsStart,
FrameInfoIndex::FrameCompleted) /
NANOS_PER_MILLIS_F;
if (sFrameCount) {
sBenchMma = ((9 * sBenchMma) + thisFrame) / 10;
} else {
sBenchMma = thisFrame;
}
if (++sFrameCount == 10) {
sFrameCount = 1;
ALOGD("Average frame time: %.4f", sBenchMma);
}
#endif
if (didSwap) {
for (auto& func : mFrameCompleteCallbacks) {
std::invoke(func, frameCompleteNr);
}
mFrameCompleteCallbacks.clear();
}
mJankTracker.finishFrame(*mCurrentFrameInfo);
if (CC_UNLIKELY(mFrameMetricsReporter.get() != nullptr)) {
mFrameMetricsReporter->reportFrameMetrics(mCurrentFrameInfo->data());
}
GpuMemoryTracker::onFrameCompleted();
}
最终native绘制完会存放到共享内存中,等待Surface通过SwapBuffers获取绘制结果。
绘制流程梳理完了,如何去监控卡顿
不知道大家留意到没,上述的三个大步骤都有个对象一直在记录时间,比如:
mFrameInfo.markPerformTraversalsStart();
mFrameInfo.markInputHandlingStart();
mFrameInfo.markAnimationsStart();等等等等
FrameInfo这个对象会一直被传递,从java层一直到native的CanvasContext。每一个步骤都会mark一个时间,记录下来。
所以,卡顿监控就可以从这个FrameInfo对象获取每个步骤的时间,如果时间超过一帧,其实就是卡顿了。大体思路如何操作呢?
1、从绘制的终点CanvasContext.draw()方法的末尾,传递FrameInfo到自定义service。
2、自定义service通过处理一些时间细节,判断是否掉帧。
3、自定义service通过aidl桥接到app或者直接在自定义service中存储掉帧数据。
4、把处理过得数据汇总,通过网络请求上传到服务器,分析观察。
5、CanvasContext中还能拿到当年绘制的Activity信息,可以把掉帧和窗口关联起来。
FrameInfo字段解析
具体几个绘制名称的含义:
- IntendedVsync app_vsync的时间
- Vsync 开始处理vync事件的时间
- OldestInputEvent 如上处理批量事件中最老的一个inputEvent的时间
- NewestInputEvent 最新事件
- HandleInputStart mainthread开始处理input事件的时间
- AnimationStart mainthread开始处理动画的时间
- PerformTraversalsStart mainthread开始遍历视图的时间
- DrawStart mainthread开始执行draw函数的时间
- SyncQueued 添加事件到ThreadRender的时间
- SyncStart renderthread开始同步main thread数据的的时间
- IssueDrawCommandsStart renderThread开始绘制的时间
- SwapBuffers renderThread开始交换buffer的时间
- FrameCompleted renderThread 完成交换buffer的时间
- DequeueBufferDuration renderThread交换buffer中dequeueBuffer花费的时间
- QueueBufferDuration renderThread queueBuffer的时间
几个重要绘制节点时间间隔的含义(MQS会通过这几个间隔判断是否app自身导致的卡顿):
- IntendedVsync-Vsync main thread delay time
- HandleInputStart-AnimationStart handle_input_time interval
- AnimationStart-PerformTraversalsStart handle_animation_time
- PerformTraversalsStart-DrawStart handle_traversal_time
- SyncStart-IssueDrawCommandsStart bitmap_uploads_time
- IssueDrawCommandsStart-SwapBuffers issue_draw_commands_time
卡顿使用FrameCompleted - IntendedVsync 的时间差作为一次绘制的时长。
如何优化