[TOC]
文章参考:http://gityuan.com/2016/07/02/android-anr/
文章参考:https://mp.weixin.qq.com/s/vXNrg_wYM7sgPztInWTBHw
概述 首先,ANR(Application Not responding)是指应用程序未响应,Android系统对于一些事件需要在一定的时间范围内完成,如果超过预定时间能未能得到有效响应或者响应时间过长,都会造成ANR。ANR由消息处理机制保证,Android在系统层实现了一套精密的机制来发现ANR,核心原理是消息调度和超时处理。
其次,ANR机制主体实现在系统层。所有与ANR相关的消息,都会经过系统进程(system_server)调度,然后派发到应用进程完成对消息的实际处理,同时,系统进程设计了不同的超时限制来跟踪消息的处理。 一旦应用程序处理消息不当,超时限制就起作用了,它收集一些系统状态,譬如CPU/IO使用情况、进程函数调用栈,并且报告用户有进程无响应了(ANR对话框)。
然后,ANR问题本质是一个性能问题。ANR机制实际上对应用程序主线程的限制,要求主线程在限定的时间内处理完一些最常见的操作(启动服务、处理广播、处理输入), 如果处理超时,则认为主线程已经失去了响应其他操作的能力。主线程中的耗时操作,譬如密集CPU运算、大量IO、复杂界面布局等,都会降低应用程序的响应能力。
相关源码解析:
ANR触发场景 造成ANR的原因很多,都是因为在主线程执行任务太久阻塞了界面更新导致的,主要有以下几类:
InputDispatching Timeout :输入事件分发超时5s特定时间内无响应;
BroadcastQueue Timeout :前台广播在10s内、后台广播在60秒无法处理完成,也就是BroadCastReceiver的特定的时间内没有完成广播处理。 要注意的是,只有串行的广播才有超时机制,并行的广播并不会超时,也就是说,如果广播是动态注册的,直接调用sendBroadcast触发,如果主线程Looper中排在后面的Message不会触发超时机制,那么即使这个广播是前台广播,系统也永远不会弹出框提示用户超时了。
Service Timeout :小概率类型,前台服务在20s内、后台服务在200秒内未执行完成;
ContentProvider Timeout :内容提供者,在publish之后超过10s;
这些ANR触发的时间,我们可以在ActivityManagerService里面都能找到对应的定义
1 2 3 4 5 6 7 static final int BROADCAST_FG_TIMEOUT = 10 *1000 ; static final int BROADCAST_BG_TIMEOUT = 60 *1000 ; static final int CONTENT_PROVIDER_PUBLISH_TIMEOUT = 10 *1000 ;
ANR的源码分析 1、Service的ANR源码 我们通过context.startService(intent)来启动service最终都会调用到ContextImpl里面去,最终通过AMS来发起一次跨进程通信,最终调用到system_server进程中去启动service,这里不再废话,直接列出流程。
Service Timeout是位于”ActivityManager”线程中的AMS.MainHandler收到SERVICE_TIMEOUT_MSG
消息时触发。
具体流程如下:
A进程中调用 context.startService(intent)。
最终调用到system_server进程的AMS的startService()中。
最后会调用到system_server进程的ActiveService的realStartServiceLocked()中。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 final class MainHandler extends Handler { public MainHandler (Looper looper) { super (looper, null , true ); } @Override public void handleMessage (Message msg) { switch (msg.what) { case GC_BACKGROUND_PROCESSES_MSG: { synchronized (ActivityManagerService.this ) { performAppGcsIfAppropriateLocked(); } } break ; case SERVICE_TIMEOUT_MSG: { mServices.serviceTimeout((ProcessRecord)msg.obj); } break ; case SERVICE_FOREGROUND_TIMEOUT_MSG: { mServices.serviceForegroundTimeout((ServiceRecord)msg.obj); } break ; case CONTENT_PROVIDER_PUBLISH_TIMEOUT_MSG: { ProcessRecord app = (ProcessRecord)msg.obj; synchronized (ActivityManagerService.this ) { processContentProviderPublishTimedOutLocked(app); } } break ; case KILL_APPLICATION_MSG: { synchronized (ActivityManagerService.this ) { final int appId = msg.arg1; final int userId = msg.arg2; Bundle bundle = (Bundle)msg.obj; String pkg = bundle.getString("pkg" ); String reason = bundle.getString("reason" ); forceStopPackageLocked(pkg, appId, false , false , true , false , false , userId, reason); } } break ; } }
对于Service有两类:
对于前台服务,则超时为SERVICE_TIMEOUT = 20s;
对于后台服务,则超时为SERVICE_BACKGROUND_TIMEOUT = 200s
1 2 3 4 5 6 7 8 static final int SERVICE_TIMEOUT = 20 *1000 ; static final int SERVICE_BACKGROUND_TIMEOUT = SERVICE_TIMEOUT * 10 ;
我们知道,在Service启动流程中. 其中在Service进程attach到system_server进程的过程中会调用realStartServiceLocked()
方法。在这个方法里面,就会发送delay消息来等待超时。
超时消息的发送 ActiveServices#realStartServiceLocked 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 private final void realStartServiceLocked (ServiceRecord r, ProcessRecord app, boolean execInFg) throws RemoteException { bumpServiceExecutingLocked(r, execInFg, "create" ); try { app.thread.scheduleCreateService(r, r.serviceInfo, mAm.compatibilityInfoForPackage(r.serviceInfo.applicationInfo), app.getReportedProcState()); r.postNotification(); created = true ; } catch (DeadObjectException e) { Slog.w(TAG, "Application dead when creating service " + r); mAm.appDiedLocked(app, "Died when creating service" ); throw e; } finally { if (!created) { final boolean inDestroying = mDestroyingServices.contains(r); serviceDoneExecutingLocked(r, inDestroying, inDestroying); if (newService) { app.stopService(r); r.setProcess(null ); } if (!inDestroying) { scheduleServiceRestartLocked(r, false ); } } } }
bumpServiceExecutingLocked方法 下面我们来看一下bumpServiceExecutingLocked(r, execInFg, “create”);
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 private final void bumpServiceExecutingLocked (ServiceRecord r, boolean fg, String why) { boolean timeoutNeeded = true ; if ((mAm.mBootPhase < SystemService.PHASE_THIRD_PARTY_APPS_CAN_START) && (r.app != null ) && (r.app.pid == android.os.Process.myPid())) { Slog.w(TAG, "Too early to start/bind service in system_server: Phase=" + mAm.mBootPhase + " " + r.getComponentName()); timeoutNeeded = false ; } long now = SystemClock.uptimeMillis(); if (r.executeNesting == 0 ) { r.executeFg = fg; ServiceState stracker = r.getTracker(); if (stracker != null ) { stracker.setExecuting(true , mAm.mProcessStats.getMemFactorLocked(), now); } if (r.app != null ) { r.app.executingServices.add(r); r.app.execServicesFg |= fg; if (timeoutNeeded && r.app.executingServices.size() == 1 ) { scheduleServiceTimeoutLocked(r.app); } } } else if (r.app != null && fg && !r.app.execServicesFg) { r.app.execServicesFg = true ; if (timeoutNeeded) { scheduleServiceTimeoutLocked(r.app); } } r.executeFg |= fg; r.executeNesting++; r.executingStart = now; }
从上面的日志中,我们可以看出会通过r.app.execServicesFg
来标记启动的是不是前台Service。然后调用scheduleServiceTimeoutLocked来进行超时消息的发送。
ActiveServices#scheduleServiceTimeoutLocked 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 void scheduleServiceTimeoutLocked (ProcessRecord proc) { if (proc.executingServices.size() == 0 || proc.thread == null ) { return ; } Message msg = mAm.mHandler.obtainMessage( ActivityManagerService.SERVICE_TIMEOUT_MSG); msg.obj = proc; mAm.mHandler.sendMessageDelayed(msg, proc.execServicesFg ? SERVICE_TIMEOUT : SERVICE_BACKGROUND_TIMEOUT); }
超时消息的取消 在system_server进程ActiveServices#realStartServiceLocked()调用的过程会发动超时消息, 超时时间内没有启动完成触发超时机制. 如果在超时时间范围内启动成功,则会取消这个超时消息?
那么什么时候会进行超时超时消息的取消呢?
这个时候,我们就要回到业务进程里面去看了。经过Binder等层层调用进入业务进程的主线程handleCreateService()的过程.
上面我们说到了。执行完realStartServiceLocked中的bumpServiceExecutingLocked方法之后。我们就要调用app.thread.scheduleCreateService方法。 这个方法其实就会调用业务进程的ActivityThread的scheduleCreateService方法。
ActivityThread#scheduleCreateService 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 public final void scheduleCreateService (IBinder token, ServiceInfo info, CompatibilityInfo compatInfo, int processState) { updateProcessState(processState, false ); CreateServiceData s = new CreateServiceData (); s.token = token; s.info = info; s.compatInfo = compatInfo; sendMessage(H.CREATE_SERVICE, s); }
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 class H extends Handler { public static final int CREATE_SERVICE = 114 ; @UnsupportedAppUsage String codeToString (int code) { if (DEBUG_MESSAGES) { switch (code) { case CREATE_SERVICE: return "CREATE_SERVICE" ; } } return Integer.toString(code); } public void handleMessage (Message msg) { if (DEBUG_MESSAGES) Slog.v(TAG, ">>> handling: " + codeToString(msg.what)); switch (msg.what) { case CREATE_SERVICE: if (Trace.isTagEnabled(Trace.TRACE_TAG_ACTIVITY_MANAGER)) { Trace.traceBegin(Trace.TRACE_TAG_ACTIVITY_MANAGER, ("serviceCreate: " + String.valueOf(msg.obj))); } handleCreateService((CreateServiceData) msg.obj); Trace.traceEnd(Trace.TRACE_TAG_ACTIVITY_MANAGER); break ; } Object obj = msg.obj; if (obj instanceof SomeArgs) { ((SomeArgs) obj).recycle(); } if (DEBUG_MESSAGES) Slog.v(TAG, "<<< done: " + codeToString(msg.what)); } }
ActivityThread#handleCreateService 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 @UnsupportedAppUsage private void handleCreateService (CreateServiceData data) { Service service = null ; try { if (localLOGV) Slog.v(TAG, "Creating service " + data.info.name); ContextImpl context = ContextImpl.createAppContext(this , packageInfo); Application app = packageInfo.makeApplication(false , mInstrumentation); java.lang.ClassLoader cl = packageInfo.getClassLoader(); service = packageInfo.getAppFactory().instantiateService(cl, data.info.name, data.intent); context.getResources().addLoaders(app.getResources().getLoaders().toArray(new ResourcesLoader [0 ])); context.setOuterContext(service); service.attach(context, this , data.info.name, data.token, app, ActivityManager.getService()); service.onCreate(); mServices.put(data.token, service); try { ActivityManager.getService().serviceDoneExecuting(data.token, SERVICE_DONE_EXECUTING_ANON, 0 , 0 ); } catch (RemoteException e) { throw e.rethrowFromSystemServer(); } } catch (Exception e) { if (!mInstrumentation.onException(service, e)) { throw new RuntimeException ("Unable to create service " + data.info.name + ": " + e.toString(), e); } } }
这个方法看起来就很熟悉了。其实就是通过反射创建Service对象。然后依次调用Service的生命周期方法。然后再通知System服务端Service启动成功。
好吧。那看来超时消息的取消。还是要回到SystemServer服务端类分析了。
首先我们看ActivityManagerService#serviceDoneExecuting方法
ActivityManagerService#serviceDoneExecuting 1 2 3 4 5 6 7 8 9 10 11 12 13 14 public void serviceDoneExecuting (IBinder token, int type, int startId, int res) { synchronized (this ) { if (!(token instanceof ServiceRecord)) { Slog.e(TAG, "serviceDoneExecuting: Invalid service token=" + token); throw new IllegalArgumentException ("Invalid service token" ); } mServices.serviceDoneExecutingLocked((ServiceRecord)token, type, startId, res); } }
ActiveServices#serviceDoneExecuting方法 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 private void serviceDoneExecutingLocked (ServiceRecord r, boolean inDestroying, boolean finishing) { if (r.executeNesting <= 0 ) { if (r.app != null ) { if (DEBUG_SERVICE) Slog.v(TAG_SERVICE, "Nesting at 0 of " + r.shortInstanceName); r.app.execServicesFg = false ; r.app.executingServices.remove(r); if (r.app.executingServices.size() == 0 ) { if (DEBUG_SERVICE || DEBUG_SERVICE_EXECUTING) Slog.v(TAG_SERVICE_EXECUTING, "No more executingServices of " + r.shortInstanceName); mAm.mHandler.removeMessages(ActivityManagerService.SERVICE_TIMEOUT_MSG, r.app); } else if (r.executeFg) { for (int i=r.app.executingServices.size()-1 ; i>=0 ; i--) { if (r.app.executingServices.valueAt(i).executeFg) { r.app.execServicesFg = true ; break ; } } } if (inDestroying) { if (DEBUG_SERVICE) Slog.v(TAG_SERVICE, "doneExecuting remove destroying " + r); mDestroyingServices.remove(r); r.bindings.clear(); } mAm.updateOomAdjLocked(r.app, true , OomAdjuster.OOM_ADJ_REASON_UNBIND_SERVICE); } } }
超时消息的触发 在system_server进程中有一个Handler线程, 名叫”ActivityManager”.当倒计时结束便会向该Handler线程发送 一条信息SERVICE_TIMEOUT_MSG
,
上面我已经说到了超时消息的发送。ActiveServices#scheduleServiceTimeoutLocked方法中,我们会进行超时消息的发送。
ActivityManagerService#MainHandler.handleMessage 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 final class MainHandler extends Handler { public MainHandler (Looper looper) { super (looper, null , true ); } @Override public void handleMessage (Message msg) { switch (msg.what) { case GC_BACKGROUND_PROCESSES_MSG: { synchronized (ActivityManagerService.this ) { performAppGcsIfAppropriateLocked(); } } break ; case SERVICE_TIMEOUT_MSG: { mServices.serviceTimeout((ProcessRecord)msg.obj); } break ; } }
方法还是调度到了ActiveServices的serviceTimeout方法
ActiveServices#serviceTimeout 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 void serviceTimeout (ProcessRecord proc) { String anrMessage = null ; synchronized (mAm) { if (timeout != null && mAm.mProcessList.mLruProcesses.contains(proc)) { Slog.w(TAG, "Timeout executing service: " + timeout); StringWriter sw = new StringWriter (); PrintWriter pw = new FastPrintWriter (sw, false , 1024 ); pw.println(timeout); timeout.dump(pw, " " ); pw.close(); mLastAnrDump = sw.toString(); mAm.mHandler.removeCallbacks(mLastAnrDumpClearer); mAm.mHandler.postDelayed(mLastAnrDumpClearer, LAST_ANR_LIFETIME_DURATION_MSECS); anrMessage = "executing service " + timeout.shortInstanceName; } else { Message msg = mAm.mHandler.obtainMessage( ActivityManagerService.SERVICE_TIMEOUT_MSG); msg.obj = proc; mAm.mHandler.sendMessageAtTime(msg, proc.execServicesFg ? (nextTime+SERVICE_TIMEOUT) : (nextTime + SERVICE_BACKGROUND_TIMEOUT)); } } if (anrMessage != null ) { mAm.mAnrHelper.appNotResponding(proc, anrMessage); } }
方法又回到了ActivityManager中的ANRHelper中。
ANRHelper#appNotResponding 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 void appNotResponding (ProcessRecord anrProcess, String annotation) { appNotResponding(anrProcess, null , null , null , null , false , annotation); } void appNotResponding (ProcessRecord anrProcess, String activityShortComponentName, ApplicationInfo aInfo, String parentShortComponentName, WindowProcessController parentProcess, boolean aboveSystem, String annotation) { synchronized (mAnrRecords) { mAnrRecords.add(new AnrRecord (anrProcess, activityShortComponentName, aInfo, parentShortComponentName, parentProcess, aboveSystem, annotation)); } startAnrConsumerIfNeeded(); }