[TOC]

文章参考:http://gityuan.com/2016/07/02/android-anr/

文章参考:https://mp.weixin.qq.com/s/vXNrg_wYM7sgPztInWTBHw

概述

首先,ANR(Application Not responding)是指应用程序未响应,Android系统对于一些事件需要在一定的时间范围内完成,如果超过预定时间能未能得到有效响应或者响应时间过长,都会造成ANR。ANR由消息处理机制保证,Android在系统层实现了一套精密的机制来发现ANR,核心原理是消息调度和超时处理。

其次,ANR机制主体实现在系统层。所有与ANR相关的消息,都会经过系统进程(system_server)调度,然后派发到应用进程完成对消息的实际处理,同时,系统进程设计了不同的超时限制来跟踪消息的处理。 一旦应用程序处理消息不当,超时限制就起作用了,它收集一些系统状态,譬如CPU/IO使用情况、进程函数调用栈,并且报告用户有进程无响应了(ANR对话框)。

然后,ANR问题本质是一个性能问题。ANR机制实际上对应用程序主线程的限制,要求主线程在限定的时间内处理完一些最常见的操作(启动服务、处理广播、处理输入), 如果处理超时,则认为主线程已经失去了响应其他操作的能力。主线程中的耗时操作,譬如密集CPU运算、大量IO、复杂界面布局等,都会降低应用程序的响应能力。

相关源码解析:

1
2
3
4
5
6
7
// SystemServer进程
// framework/base/services/core/java/com/android/server/am/ActivityManagerService.java
// framework/base/services/core/java/com/android/server/am/ActiveServices.java
// framework/base/services/core/java/com/android/server/am/AnrHelper.java

// 业务APP进程
// framework/base/core/java/android/app/ActivityThread.java

ANR触发场景

造成ANR的原因很多,都是因为在主线程执行任务太久阻塞了界面更新导致的,主要有以下几类:

  • InputDispatching Timeout :输入事件分发超时5s特定时间内无响应;
  • BroadcastQueue Timeout :前台广播在10s内、后台广播在60秒无法处理完成,也就是BroadCastReceiver的特定的时间内没有完成广播处理。 要注意的是,只有串行的广播才有超时机制,并行的广播并不会超时,也就是说,如果广播是动态注册的,直接调用sendBroadcast触发,如果主线程Looper中排在后面的Message不会触发超时机制,那么即使这个广播是前台广播,系统也永远不会弹出框提示用户超时了。
  • Service Timeout :小概率类型,前台服务在20s内、后台服务在200秒内未执行完成;
  • ContentProvider Timeout :内容提供者,在publish之后超过10s;

这些ANR触发的时间,我们可以在ActivityManagerService里面都能找到对应的定义

1
2
3
4
5
6
7
//代码位于:com.android.server.am.ActivityManagerService
// 前台广播,后台广播的超时时间
static final int BROADCAST_FG_TIMEOUT = 10*1000;
static final int BROADCAST_BG_TIMEOUT = 60*1000;

// ContentProvider的超时时间
static final int CONTENT_PROVIDER_PUBLISH_TIMEOUT = 10*1000;

ANR的源码分析

1、Service的ANR源码

我们通过context.startService(intent)来启动service最终都会调用到ContextImpl里面去,最终通过AMS来发起一次跨进程通信,最终调用到system_server进程中去启动service,这里不再废话,直接列出流程。

Service Timeout是位于”ActivityManager”线程中的AMS.MainHandler收到SERVICE_TIMEOUT_MSG消息时触发。

具体流程如下:

  1. A进程中调用 context.startService(intent)。
  2. 最终调用到system_server进程的AMS的startService()中。
  3. 最后会调用到system_server进程的ActiveService的realStartServiceLocked()中。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
final class MainHandler extends Handler {
public MainHandler(Looper looper) {
super(looper, null, true);
}

@Override
public void handleMessage(Message msg) {
switch (msg.what) {
case GC_BACKGROUND_PROCESSES_MSG: {
synchronized (ActivityManagerService.this) {
performAppGcsIfAppropriateLocked();
}
} break;
// 收到Service timeout 消息
case SERVICE_TIMEOUT_MSG: {
mServices.serviceTimeout((ProcessRecord)msg.obj);
} break;
// 收到前台Service Timeout的消息
case SERVICE_FOREGROUND_TIMEOUT_MSG: {
mServices.serviceForegroundTimeout((ServiceRecord)msg.obj);
} break;


case CONTENT_PROVIDER_PUBLISH_TIMEOUT_MSG: {
ProcessRecord app = (ProcessRecord)msg.obj;
synchronized (ActivityManagerService.this) {
processContentProviderPublishTimedOutLocked(app);
}
} break;
case KILL_APPLICATION_MSG: {
synchronized (ActivityManagerService.this) {
final int appId = msg.arg1;
final int userId = msg.arg2;
Bundle bundle = (Bundle)msg.obj;
String pkg = bundle.getString("pkg");
String reason = bundle.getString("reason");
forceStopPackageLocked(pkg, appId, false, false, true, false,
false, userId, reason);
}
} break;

}
}

对于Service有两类:

  • 对于前台服务,则超时为SERVICE_TIMEOUT = 20s;
  • 对于后台服务,则超时为SERVICE_BACKGROUND_TIMEOUT = 200s
1
2
3
4
5
6
7
8
 /**
* 代码位于:framework/base/services/core/java/com/android/server/am/ActiveServices.java
*/
// How long we wait for a service to finish executing.
static final int SERVICE_TIMEOUT = 20*1000;

// How long we wait for a service to finish executing.
static final int SERVICE_BACKGROUND_TIMEOUT = SERVICE_TIMEOUT * 10;

我们知道,在Service启动流程中. 其中在Service进程attach到system_server进程的过程中会调用realStartServiceLocked()方法。在这个方法里面,就会发送delay消息来等待超时。

超时消息的发送

ActiveServices#realStartServiceLocked

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
/**
* 代码位于:framework/base/services/core/java/com/android/server/am/ActiveServices.java
*/
private final void realStartServiceLocked(ServiceRecord r,
ProcessRecord app, boolean execInFg) throws RemoteException {
// ......
// 核心逻辑有两个:1 开启ANR检测; 2 启动服务。
// 发送delay消息(SERVICE_TIMEOUT_MSG)。开启ANR检测,也是埋雷和爆雷的地方
bumpServiceExecutingLocked(r, execInFg, "create");
// ......
try {
// .....
// 最终执行服务的onCreate()方法。
// 这个方法会通过跨进程调用到业务进程的ActivityThread中scheduleCreateService方法
app.thread.scheduleCreateService(r, r.serviceInfo,
mAm.compatibilityInfoForPackage(r.serviceInfo.applicationInfo),
app.getReportedProcState());
r.postNotification();
created = true;
} catch (DeadObjectException e) {
Slog.w(TAG, "Application dead when creating service " + r);
mAm.appDiedLocked(app, "Died when creating service");
throw e;
} finally {
if (!created) {
// Keep the executeNesting count accurate.
final boolean inDestroying = mDestroyingServices.contains(r);
serviceDoneExecutingLocked(r, inDestroying, inDestroying);
// Cleanup.
if (newService) {
app.stopService(r);
r.setProcess(null);
}

// Retry.
if (!inDestroying) {
scheduleServiceRestartLocked(r, false);
}
}
}
}

bumpServiceExecutingLocked方法

下面我们来看一下bumpServiceExecutingLocked(r, execInFg, “create”);

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
/**
* 代码位于:framework/base/services/core/java/com/android/server/am/ActiveServices.java
*/
private final void bumpServiceExecutingLocked(ServiceRecord r, boolean fg, String why) {
// 判断是否需要发送timeout消息
boolean timeoutNeeded = true;
if ((mAm.mBootPhase < SystemService.PHASE_THIRD_PARTY_APPS_CAN_START)
&& (r.app != null) && (r.app.pid == android.os.Process.myPid())) {

Slog.w(TAG, "Too early to start/bind service in system_server: Phase=" + mAm.mBootPhase
+ " " + r.getComponentName());
timeoutNeeded = false;
}
long now = SystemClock.uptimeMillis();
if (r.executeNesting == 0) {
r.executeFg = fg;
ServiceState stracker = r.getTracker();
if (stracker != null) {
stracker.setExecuting(true, mAm.mProcessStats.getMemFactorLocked(), now);
}
// 将ServiceRecord添加到ProcessRecord的executingServices里面去
if (r.app != null) {
r.app.executingServices.add(r);
r.app.execServicesFg |= fg;
// 调度处理后台Service Timeout的消息处理
if (timeoutNeeded && r.app.executingServices.size() == 1) {
// 开始进行ANR检测
scheduleServiceTimeoutLocked(r.app);
}
}
} else if (r.app != null && fg && !r.app.execServicesFg) {
r.app.execServicesFg = true;
// 调度处理前台Service Timeout的消息处理
if (timeoutNeeded) {
scheduleServiceTimeoutLocked(r.app);
}
}
r.executeFg |= fg;
r.executeNesting++;
r.executingStart = now;
}

从上面的日志中,我们可以看出会通过r.app.execServicesFg来标记启动的是不是前台Service。然后调用scheduleServiceTimeoutLocked来进行超时消息的发送。

ActiveServices#scheduleServiceTimeoutLocked

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
/**
* 代码位于:framework/base/services/core/java/com/android/server/am/ActiveServices.java
*/
void scheduleServiceTimeoutLocked(ProcessRecord proc) {
if (proc.executingServices.size() == 0 || proc.thread == null) {
return;
}
Message msg = mAm.mHandler.obtainMessage(
ActivityManagerService.SERVICE_TIMEOUT_MSG);

// How long we wait for a service to finish executing.
// static final int SERVICE_TIMEOUT = 20*1000;
// How long we wait for a service to finish executing.
// static final int SERVICE_BACKGROUND_TIMEOUT = SERVICE_TIMEOUT * 10;
// 在ActiveService.java中的常量可以看出前台Service的超时时间是20s 而后台Service的超时时间是200s
msg.obj = proc;
mAm.mHandler.sendMessageDelayed(msg,
proc.execServicesFg ? SERVICE_TIMEOUT : SERVICE_BACKGROUND_TIMEOUT);
}

超时消息的取消

在system_server进程ActiveServices#realStartServiceLocked()调用的过程会发动超时消息, 超时时间内没有启动完成触发超时机制. 如果在超时时间范围内启动成功,则会取消这个超时消息?

那么什么时候会进行超时超时消息的取消呢?

这个时候,我们就要回到业务进程里面去看了。经过Binder等层层调用进入业务进程的主线程handleCreateService()的过程.

上面我们说到了。执行完realStartServiceLocked中的bumpServiceExecutingLocked方法之后。我们就要调用app.thread.scheduleCreateService方法。 这个方法其实就会调用业务进程的ActivityThread的scheduleCreateService方法。

ActivityThread#scheduleCreateService

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
/**
* 代码位于:framework/base/core/java/android/app/ActivityThread.java
* 这段代码是ActivityThread的内部类:ApplicationThread extends IApplicationThread.Stub。
* Binder通信的业务Client端的逻辑。由服务端ActiveServices类中执行完realStartServiceLocked中的bumpServiceExecutingLocked方法之后。
* 调用app.thread.scheduleCreateService方法来进行调起的
*/
public final void scheduleCreateService(IBinder token, ServiceInfo info, CompatibilityInfo compatInfo,
int processState) {
updateProcessState(processState, false);
CreateServiceData s = new CreateServiceData();
s.token = token;
s.info = info;
s.compatInfo = compatInfo;

sendMessage(H.CREATE_SERVICE, s);
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
class H extends Handler {
// ......
public static final int CREATE_SERVICE = 114;
@UnsupportedAppUsage
// ......
String codeToString(int code) {
if (DEBUG_MESSAGES) {
switch (code) {

case CREATE_SERVICE:
return "CREATE_SERVICE";

}
}
return Integer.toString(code);
}
// ......
public void handleMessage(Message msg) {
if (DEBUG_MESSAGES)
Slog.v(TAG, ">>> handling: " + codeToString(msg.what));
switch (msg.what) {
// ......
case CREATE_SERVICE:
if (Trace.isTagEnabled(Trace.TRACE_TAG_ACTIVITY_MANAGER)) {
Trace.traceBegin(Trace.TRACE_TAG_ACTIVITY_MANAGER,
("serviceCreate: " + String.valueOf(msg.obj)));
}
// 通过ApplicationThread的内部类由Binder跨进程来调用这个方法
handleCreateService((CreateServiceData) msg.obj);
Trace.traceEnd(Trace.TRACE_TAG_ACTIVITY_MANAGER);
break;
// ......
}
Object obj = msg.obj;
if (obj instanceof SomeArgs) {
((SomeArgs) obj).recycle();
}
if (DEBUG_MESSAGES)
Slog.v(TAG, "<<< done: " + codeToString(msg.what));
}
}

ActivityThread#handleCreateService

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
@UnsupportedAppUsage
private void handleCreateService(CreateServiceData data) {
// ......
Service service = null;
try {
if (localLOGV)
Slog.v(TAG, "Creating service " + data.info.name);
//创建ContextImpl对象
ContextImpl context = ContextImpl.createAppContext(this, packageInfo);
//创建Application对象
Application app = packageInfo.makeApplication(false, mInstrumentation);
java.lang.ClassLoader cl = packageInfo.getClassLoader();
service = packageInfo.getAppFactory().instantiateService(cl, data.info.name, data.intent);
// Service resources must be initialized with the same loaders as the
// application
// context.
context.getResources().addLoaders(app.getResources().getLoaders().toArray(new ResourcesLoader[0]));

context.setOuterContext(service);
//调用服务attach()方法
service.attach(context, this, data.info.name, data.token, app, ActivityManager.getService());
//调用服务onCreate()方法
service.onCreate();
mServices.put(data.token, service);
try {
// 我们一看这个方法就是跨进程调用。来通知SystemServer服务端。我们的Service启动成功
ActivityManager.getService().serviceDoneExecuting(data.token, SERVICE_DONE_EXECUTING_ANON, 0, 0);
} catch (RemoteException e) {
throw e.rethrowFromSystemServer();
}
} catch (Exception e) {
if (!mInstrumentation.onException(service, e)) {
throw new RuntimeException("Unable to create service " + data.info.name + ": " + e.toString(), e);
}
}
}

这个方法看起来就很熟悉了。其实就是通过反射创建Service对象。然后依次调用Service的生命周期方法。然后再通知System服务端Service启动成功。

好吧。那看来超时消息的取消。还是要回到SystemServer服务端类分析了。

首先我们看ActivityManagerService#serviceDoneExecuting方法

ActivityManagerService#serviceDoneExecuting

1
2
3
4
5
6
7
8
9
10
11
12
13
14
/**
* 代码位于:framework/base/services/core/java/com/android/server/am/ActivityManagerService.java
* 这个方法是由业务进程的ActivityThread中的ActivityThread#handleCreateService方法来进行实例化完业务的Service
* 然后来通知SystemServer进程的serviceDoneExecutingLocked通知Service创建完毕
*/
public void serviceDoneExecuting(IBinder token, int type, int startId, int res) {
synchronized(this) {
if (!(token instanceof ServiceRecord)) {
Slog.e(TAG, "serviceDoneExecuting: Invalid service token=" + token);
throw new IllegalArgumentException("Invalid service token");
}
mServices.serviceDoneExecutingLocked((ServiceRecord)token, type, startId, res);
}
}

ActiveServices#serviceDoneExecuting方法

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
    /**
* 代码位于:framework/base/services/core/java/com/android/server/am/ActiveServices.java
*/
private void serviceDoneExecutingLocked(ServiceRecord r, boolean inDestroying,
boolean finishing) {
// ......
if (r.executeNesting <= 0) {
if (r.app != null) {
if (DEBUG_SERVICE) Slog.v(TAG_SERVICE,
"Nesting at 0 of " + r.shortInstanceName);
r.app.execServicesFg = false;
r.app.executingServices.remove(r);
//当前服务所在进程中没有正在执行的service
if (r.app.executingServices.size() == 0) {
if (DEBUG_SERVICE || DEBUG_SERVICE_EXECUTING) Slog.v(TAG_SERVICE_EXECUTING,
"No more executingServices of " + r.shortInstanceName);
mAm.mHandler.removeMessages(ActivityManagerService.SERVICE_TIMEOUT_MSG, r.app);
} else if (r.executeFg) {
// Need to re-evaluate whether the app still needs to be in the foreground.
for (int i=r.app.executingServices.size()-1; i>=0; i--) {
if (r.app.executingServices.valueAt(i).executeFg) {
r.app.execServicesFg = true;
break;
}
}
}
if (inDestroying) {
if (DEBUG_SERVICE) Slog.v(TAG_SERVICE,
"doneExecuting remove destroying " + r);
mDestroyingServices.remove(r);
r.bindings.clear();
}
mAm.updateOomAdjLocked(r.app, true, OomAdjuster.OOM_ADJ_REASON_UNBIND_SERVICE);
}
// ......
}
}

超时消息的触发

在system_server进程中有一个Handler线程, 名叫”ActivityManager”.当倒计时结束便会向该Handler线程发送 一条信息SERVICE_TIMEOUT_MSG,

上面我已经说到了超时消息的发送。ActiveServices#scheduleServiceTimeoutLocked方法中,我们会进行超时消息的发送。

ActivityManagerService#MainHandler.handleMessage

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
/**
* 代码位于:framework/base/services/core/java/com/android/server/am/ActivityManagerService.java
*/
final class MainHandler extends Handler {
public MainHandler(Looper looper) {
super(looper, null, true);
}

@Override
public void handleMessage(Message msg) {
switch (msg.what) {
case GC_BACKGROUND_PROCESSES_MSG: {
synchronized (ActivityManagerService.this) {
performAppGcsIfAppropriateLocked();
}
} break;
// 收到Service timeout 消息
case SERVICE_TIMEOUT_MSG: {
mServices.serviceTimeout((ProcessRecord)msg.obj);
} break;
// ......
}
}

方法还是调度到了ActiveServices的serviceTimeout方法

ActiveServices#serviceTimeout

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
/**
* 代码位于:framework/base/services/core/java/com/android/server/am/ActiveServices.java
*/
void serviceTimeout(ProcessRecord proc) {
String anrMessage = null;
synchronized(mAm) {
// ......
if (timeout != null && mAm.mProcessList.mLruProcesses.contains(proc)) {
Slog.w(TAG, "Timeout executing service: " + timeout);
StringWriter sw = new StringWriter();
PrintWriter pw = new FastPrintWriter(sw, false, 1024);
pw.println(timeout);
timeout.dump(pw, " ");
pw.close();
mLastAnrDump = sw.toString();
mAm.mHandler.removeCallbacks(mLastAnrDumpClearer);
mAm.mHandler.postDelayed(mLastAnrDumpClearer, LAST_ANR_LIFETIME_DURATION_MSECS);
anrMessage = "executing service " + timeout.shortInstanceName;
} else {
Message msg = mAm.mHandler.obtainMessage(
ActivityManagerService.SERVICE_TIMEOUT_MSG);
msg.obj = proc;
mAm.mHandler.sendMessageAtTime(msg, proc.execServicesFg
? (nextTime+SERVICE_TIMEOUT) : (nextTime + SERVICE_BACKGROUND_TIMEOUT));
}
}
//当存在timeout的service,则执行appNotResponding
if (anrMessage != null) {
mAm.mAnrHelper.appNotResponding(proc, anrMessage);
}
}

方法又回到了ActivityManager中的ANRHelper中。

ANRHelper#appNotResponding

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

/**
* 代码位于:framework/base/services/core/java/com/android/server/am/AnrHelper.java
* ActiveServices#serviceTimeout的方法在调用的时候。
* mAm.mAnrHelper.appNotResponding(proc, anrMessage)
*
*/
void appNotResponding(ProcessRecord anrProcess, String annotation) {
appNotResponding(anrProcess, null /* activityShortComponentName */, null /* aInfo */,
null /* parentShortComponentName */, null /* parentProcess */,
false /* aboveSystem */, annotation);
}

void appNotResponding(ProcessRecord anrProcess, String activityShortComponentName,
ApplicationInfo aInfo, String parentShortComponentName,
WindowProcessController parentProcess, boolean aboveSystem, String annotation) {
synchronized (mAnrRecords) {
mAnrRecords.add(new AnrRecord(anrProcess, activityShortComponentName, aInfo,
parentShortComponentName, parentProcess, aboveSystem, annotation));
}
startAnrConsumerIfNeeded();
}