What’s new in Android Runtime (Google I/O ’18)
Articles Blog

What’s new in Android Runtime (Google I/O ’18)

August 16, 2019

Hello, everyone. I’m Mathieu and this
is my colleague Calin. And today, we’re
going to be going over what’s new with the Android
Runtime on Android, also known as ART. So what is ART? Well, ART is the software layer
in between the applications and the operating system. It provides mechanisms for
executing Java language and call in applications. To accomplish this,
ART does two things. It executes Dex files, the
intermediate representation of Android applications, through
a hybrid model consisting of an interpretation,
just-in-time compilation, and profile-based
ahead-of-time compilation. ART also does memory management
for Android applications through an automatic reclamation
through a garbage collector. This is a concurrent
compacting garbage collector, so that there’s less jank
for your applications. Now let’s look at
how ART has changed over the last few years. Over the years, there have
been many improvements to ART. In Nougat, we introduced a
profile-guided compilation to improve application startup
time, reduce memory usage, and reduce storage requirements. Also in Nougat, we added a JIT,
much like Dalvik used to have. This was done to remove the
need for optimizing apps. That was kind of a big problem
during Android system updates. And in Oreo, we added a new
concurrent compacting garbage collector to reduce RAM
requirements, have less jank, as well as accelerate
allocations. As you can see
here on the slide, this new garbage collector
enabled a new dump pointer allocator that is
17 times faster than the allocator in
Dalvik or in KitKat. Now we talked about what
happened in the past, but what’s new in Android P? First of all, there are
new compiler optimizations that help accelerate the
performance of Kotlin code in Android. This is especially
important since Kotlin is a first-class
programming language for Android development. Next up, we have memory
and storage optimizations to help entry-level devices,
such as Android Go devices. This is important to help
improve the performance for the next billion users. And finally, we
have cloud profiles. Device collected profiles
from the just-in-time compiler are uploaded and
aggregated in the cloud to enable faster performance
directly after installation of applications. So let’s start with Kotlin. Last year, we
announced Kotlin as a first-class
officially-supported programming language
for Android development. And then, we began to
investigate the performance. Why Kotlin, you might ask? Well, Kotlin is a safe,
expressive, concise, object-oriented language
that is designed to be interoperable
with Java language. The reason ART focuses
on optimizing Kotlin so that the developers can
leverage all of these language features while still having
fast and jank-free applications. Let’s see how Kotlin
optimizations are normally performed inside of
the Android Runtime. Usually, optimizations
are performed in an investigative manner. And there’s an
order of preference for fixing performance issues
so that the most amount of Kotlin applications
can actually benefit from the optimization. The preferred option is
fixing a performance issue inside of kotlinc. And kotlinc is the compiler
developed by JetBrains Google and JetBrains of course,
work closely together on all kinds of optimizations
and fixes for issues. If we fix a
performance issue here, it’ll be able to be
deployed to the most amount of Kotlin applications. Alternatively, if
that doesn’t work, then we consider fixing
the performance issue inside of bytecode converters. Fixing in the bytecode
converter will enable existing versions
of the Android platform to get the performance fix. And if that option doesn’t
work, the last option is to fix the performance issue
in the Android Runtime, also known as ART. So the reason that we might not
want to fix in ART right away is because ART is updated as
part of the Android platform. So that means that not all
devices will get the fix. Now let’s look at an example. One example of a
Kotlin optimization is the parameter null-check. As you can see here,
this is a simple method that just returns the
length of a string, but the string is nullable. So what this means is that the
compiler inserts a null-check into the function bytecode
to actually verify that the string is
not null, and throw the corresponding
exception if required. Implemented in the
bytecode, the first step is loading the name
of the parameter and then invoking
a separate function to do the actual null check. So there is some
extra overhead here as you might see, because the
invocation in the common case, you do the extra indication
that goes to the function to do the null check. And this function in
turns, if required, calls the third function to
throw the actual parameter is null exception. Checks such as
these are commonly required for Java
language and Kotlin interoperability because
Java language does not have a non-nullable property. Now let’s see how we
can optimize this. If we look at the bytecode, one
of the first things we can do is actually inline the method
that does the null-check into the caller. After inlining, this
improves performance because there is
one less invocation. And from here, you can
see one other thing we can do is that the
name of the parameter is not actually required
unless the argument is null. So from here, we
can do code syncing to move loading of the parameter
name inside of the conditional. So overall, these
two optimizations help performance by removing
one invocation and one of the loadings of
a string literal. Apart from this optimization,
we also track Kotlin performance on various benchmarks. Other improvements here include
improved auto vectorization of loops, also intrinsic
methods that are specifically tailored for Kotlin code to
help improve performance there. So the ART team
is always working on improving this performance. Now that we’ve done Kotlin,
what about memory and storage improvements? So since ART is responsible
for Java language and Kotlin applications, it’s
also pretty important to just kind of make sure
that the programs don’t use too much memory and take
too much space on a device. There have been several
improvements focusing on this area, including reducing
the amount of space and memory usage required by Dex files. Now why are RAM and storage
optimizations important? Well, recall that last
year, we introduced a new initiative
called Android Go, aiming at running the
latest versions of Android on entry-level devices. Since these devices typically
have 1 gigabyte or less of RAM and 8 gigabytes
or less of storage, it’s kind of important to
focus on optimizing these areas so that the users can
run enough applications and install as many
applications as they– or more applications than they
would otherwise be able to. Now this isn’t just
for Android Go. Premium devices also benefit
from optimizations in these two areas, but since they
have more resources, normally it’s to
a lesser degree. Anyways, before we talk about
RAM and storage optimizations, let’s do a little bit of a
review about how applications work on your Android devices. An application normally comes
in an Application Package Kit, also known as an APK. Inside of the APK,
there are usually one or more Dalvik Executable
Files, also known as Dex files, that contain instructions that
ART uses to either interpret or compile your application. Since Dex files are required
to be quickly accessed during execution, they are
mapped directly into memory during application startup so
that ART can have quick access. This means that there is a
startup cost as well as a RAM cost proportional to the
size of the Dex file. Finally, Dex files are usually
stored twice on the device. The first place they
are stored is inside of the application package kit. And then the second
place they are stored is in an extracted form,
so that ART can have faster access during application
startup without needing to extract from the
ZIP file each time. Now let’s take a closer look
at the contents of Dex files. Within a Dex file, there
are several sections containing different
types of data related to the applications. But where is the space
going in the Dex file? One way to do this
is you can kind of calculate where the space
is going for each Dex file and average out the results. This chart here is for the top
99 most downloaded applications in the Play Store. And you can see that the
largest section is the code item section containing the Dex
instructions used by ART. The next largest section
is the StringData section. And this section contains
the string literals loaded from code, method names,
class names, and field names. Combined, these two sections
are around 64% of the Dex file, so they’re pretty important
areas to optimize. Let’s see if there’s
a way we can reduce the size of these sections. One new feature introduced in
Android P is called CompactDex. The goal of
CompactDex is simple– reduce the size of Dex files
to get memory and store savings on the device. From the previous slide,
we saw that some sections are larger than others. So it’s important to just
focus on the large sections to get the most savings. For the code items, they
are more often deduplicated. And they also have
their headers shrunk to save a space for each
method, essentially, inside of the application. And another thing here worth
noting about the string data is that large
applications frequently shift multiple Dex files in
their APK because of Dex format limitations. Specifically, the
64k method limit means that you can only have
64,000 method references in a single Dex
file before needing to add another one
to your application. And every time you add
another Dex file, this causes duplication, specifically
of string data, that could otherwise be stored only once. CompactDex shrinks this
by providing deduplication across the Dex files in the APK. Now let’s go to the
generation process. First, let’s look
at how Dex files are processed on Android Oreo. The first step run by Dex AOT–
the Ahead-Of-Time compiler– is that the Dex files are
extracted from the APK and stored in a VDex container. The reason they are extracted,
as I mentioned earlier, is so that they can be
loaded more efficiently during application startup. One other thing here worth
noting is the profile. So the profile, as
introduced in Nougat, is essentially data about the
application execution including which methods are
executed during startup, what methods are hot– so
compiled by the JIT compiler– and what classes are loaded. On Oreo, we are already
optimizing the Dex files stored in the VDex container by
applying layout optimizations. And also we were
deciding which methods to compile based on what
methods are hot in the Dex file. Now let’s look at Dex processing
on Android P. In Android P, the ahead-of-time compiler
now converts the Dex files to a more efficient
CompactDex representation inside of the container. One new addition here
is the introduction of a shared data section. Specifically, data that is
present in multiple Dex files will be in the
shared data section only once, so it
kind of deduplicates data that’s commonly shared. And one of the most
commonly shared things here is the StringData. So this is how we reduce
the large StringData section that we saw earlier. Finally, since the conversion
is automatically done on-device, this means that all
existing applications can get the benefits
of CompactDex without needing to
recompile their APKs. OK, so let’s look at one
example of how we actually shrink the Dex code items. Apart from the
instructions, each code item has a 16-byte header. And then most of the
values in the header are usually small values. So what we do here is we
shrink the fields in the header to be 4 bytes each. And then we have an
optional pre-header to extend them as required. The pre-header is 0 bytes
in most of the cases, but can be up to 12
bytes in the worst case. So other than the
pre-header, we also shrink the instruction count. Since the average method is
not going to be that large, we shrink this down to 11
bytes instead of 32 bytes, and we use the 5
remaining bytes for flags that are ART specific. Finally, we moved
the debug information into a separate
space-efficient table to help enable more
deduplication of the code items. Overall, this optimization saves
around 12 bytes per code item in the CompactDex file. And here are the results for
the top 99 most downloaded APKs. So the average space required
by the Dex files on a device is around 11.6% smaller. And then other than
the storage savings, you also get memory savings
because the Dex files are resident in memory
during application usage– at least partially
resident in memory. And one more thing here. Let’s go over the layout
optimizations a little bit. So even though we had introduced
the JIT profiles in Android N, we did not have any layout
optimizations back then. So what this means is the Dex
is kind of randomly ordered and not disregarding
the usage pattern. In Android O, we added this
type of layout optimization that groups the methods used
during application startup together and the
methods that are hot– so that means their code
is frequently accessed during execution– together. This seems like a
pretty good ad– a pretty big win so far. Well, let’s see what we did
for Android P. In Android P, we have more flexible
profile information, which enables us to put the
methods that are used only during startup together. This helps reduce
the amount of memory used because the application
or the operating system can remove those pages
from memory after startup. We also put the
hot code together, since it’s frequently
accessed during execution. And finally, we put the code
that’s just never touched at all during
execution at the end, so that it’s not loaded
into memory unless required. And the reason that these layout
optimizations are important is because they improve
locality and reduce how many parts of the
Dex file are actually loaded into memory during
application usage and startup. So if you improve
the locality here, you can get startup benefits
and reduction in memory usage. And now to Calin
for cloud profiles. [APPLAUSE] CALIN JURAVLE:
Thank you, Mathieu. Hello, everyone. My name is Calin. And I’m here today
to present to you how we plan to improve and
scale up the Android Runtime profiling infrastructure. However, before we
start, profiling is a rather overloaded term. When we speak about profiling
in today’s presentation, we’re going to refer
to the metadata that Android Runtime captures
about the application execution, metadata
that is going to be feed into a profile-guided
optimization process. We’re going to see how we extend
the on-device capabilities in order to drive performance
right at install time. Before we jump into what is
new and how actually it works, let me briefly remind
you how Android uses profile-guided optimizations. This is an efficient technique
that we introduced an Android Nougat as part of a
hybrid execution model. Hybrid means that the
code being executed can be in three
different optimization states at the same time. The primary goal
of this technique is to improve all key metrics
of the application performance. We’re talking about faster
application startup time, reduced memory footprint,
a better user experience by providing less
jank during usage, less disk space used by
the compiler artifacts– which means more disk
space for our users– and nonetheless, an
increased battery life, because we do
heavy optimizations when the device is not used
rather than at the use time. How does this work? It all starts when the Play
Store installs the application. But first, we do very,
very light optimizations and we have the application
ready to go for the user. At first launch,
the application will start in what we call
an interpretation mode. As the runtime executes
the application code, it discovers the most
frequently used methods and the most important
methods to be optimized. That’s when a JIT
system kicks in, and it’ll optimize that code. During this time,
the JIT system also records what we call
profile information. This profile information
essentially encapsulates data about the methods that
are being executed and about the classes
that are being loaded. Every now and then, we
dump this profile to disk so that we can reuse it later. Once the device is put
aside and it’s not in use– a state which we call
idle maintenance mode– we’re going to use
that profile to drive profile-guided optimizations. The result is an optimized
app which will eventually replace the original state. Now when the user
relaunches the app, it will have a much snappier
startup time, a much better steady-state
performance execution, and overall, the
battery will drain less. In this state, the
application will be interpreted, just-in-time
compiled, or pre-optimized. Now just how efficient
is this technique? We gathered some
data from the field for Google Maps application. Here we can see two charts. The left one presents data
from a Marshmallow build. You can see that
the startup time is pretty constant over time. It does not fluctuate. And this is pretty
much expected. You don’t want to
have deviations here. However on the
right-hand side, you can see that in Nougat, the
startup time drops over time. Eventually, it stabilizes
over being about 25% faster than it used
to be at install time. And this is great news. It means that the more
the user uses the app, the more we can optimize it. And over time, the performance
gets better and better. This is great, but
we can do better and we want to do better. We shouldn’t need to wait
for optimal performance. And our goal with
cloud profiles is to deliver near optimal
performance for right after install time,
without having to wait for the
application to be profiled. So let’s see how this
is going to work. Let me introduce you the
idea of cloud profiles. This is based on main
two key observations. First one is that usually apps
have many commonly used code paths that are shared between a
multitude of users and devices. Take, for example, classes
loading during startup time. Each device will have
its own specific set. However, globally, we can
extract the common intersection of all those classes. And that’s valuable data
for us to optimize upon. Second, we know that most app
developers roll out their apps incrementally, starting
with alpha/beta channels or, for example,
1%, 2% of their user base. And the idea behind
cloud profiles is to use this initial set
of alpha/beta channel users to bootstrap performance
for the rest of the users. So how’s it going to work? Once we have an
initial set of devices, we’re going to
extract the profile information about your
APK from those devices. We’re going to upload
that information to Play. And there, we’re going
to combine everything. We’re going to aggregate
whatever comes in and we’re going to generate
what we call a core application profile. This core profile will
contain information that’s relevant
across all device executions, and not
just a single one. When a new device requests
for that application to be installed, we’re going
to deliver this core profile alongside the main
application APK to the device. Locally, the device
will be able to use that data to perform
profile-guided optimizations right at install time. That will deliver an
improved code startup performance and much better
steady-state performance over time. Now having profiles in the cloud
offers much more opportunities than directly influencing
the app performance with profile-guided
optimizations. The core profiles offers
valuable data, for example, for developers to act upon. And we believe there
is enough information there so that developers can
tune their own applications. We’re going to explore how we
can share these data later. Now you can see in this workflow
that to deliver such a thing, we need support from Android
platform and Play alike. In today’s presentation,
we’re going to focus on the Android support. So what did we do in P to
support this lifecycle? We added new
interfaces that will allow us to extract the
profile and bootstrap the information from the cloud. The functionality is available
to all system-level apps which acquire the necessary
permissions. And in our case, Play
is just a consumer. The two APIs I’m talking
about are profile extractions. And these are exposed via
a new platform manager. We call it ART manager. The second API is
profile installation. And this is seemingly integrated
in the current installer session. What we did here is to add a new
kind of installation artifact that the platform understands. We called this Dex
Metadata files. Essentially, in a
similar way to the APKs, the Dex Metadata
files are archives which will contain information
in how the runtime can optimize the application. Initially, these
Dex Metadata files will contain the core profile
that I mentioned about earlier. At install time, they
will deliver these files, if they are available,
to the device, where they will be streamlined
into the Dex optimizer on device. It is worthwhile mentioning that
we’ll offer support for Google Play dynamic delivery. So if you plan to
split the functionality of your application
in different APKs, all the APKs will have
their Dex Metadata files. So let’s take a look how
everything fits together from the device perspective. You remember that I presented
this diagram in the beginning, showing how the
profiling works locally. Let’s focus here
just on the profile file on the application. Once we manage to
capture a profile file, we’re going upstream
this information to Play. On Play, as I mentioned,
we’ll aggregate this data with many, many other profiles. And when we have
a core profiles, we’re going to deliver it to
our new users as a core profile. The idea of the
core profile is not to replace on-device
profiling, it’s only to bootstrap the
profile optimizations. So essentially
instead of starting with a completely blank
state about your application, we already know what are
the most commonly executed code paths, and we’ll
be able to start the optimizations from there. So now essentially what was a
pure on-device profile feedback loop, it gets extended
with a cloud component. Now I keep talking
about this core profile, and I think it’s
important to dedicate a bit more attention to it. So let’s talk a bit how
we’re going to build it. We already know that
on-device, from one execution to the other, the profiles
aggregate quite well. They reach a stable
point pretty fast. That means that it will not
reoptimize the application over and over and over again. After a few optimization
steps, it will stop. However, that’s data
from one device. How well does this work when
you try to do it across devices? How many samples you
would need in order to get to a robust,
reliable profile? We looked at our own
Google applications and we tried to figure that out. Here we can see a
plot which represents the amount of information
in the core profiles relative to the total
number of aggregations. The y-axis represents the
amount of information. And the actual value–
numeric value– is not important there. What is important
from this graph is that actually
from 20, 40 kind of number of profile
aggregations, the information in the profile
reaches a plateau. And that’s very important. It sends a very
important message. It means that the
alpha/beta channel users will provide
us with enough data to build a core profile. And it means that the
majority of the production users of your application
will always have the best possible experience. So how do we actually
aggregate the information? I mentioned before
that in the profile, you will find information
about classes and methods. On-device, this is
roughly how it looks like. We’re going to take all the
executions that we have seen before, then we’ll create
a union of everything that we’ve seen. In the aggregated
profile, you’ll have information about classes,
methods, about everything that you’ve seen. On cloud however, we don’t
really want everything. We only want the
commonly-executed code paths. And what we are doing
instead of having a union, we’ll be having a
smart intersection. We’ll be only keeping
the information relevant to all
executions, meaning we’re going to filter
out all the outliers. The result is what we call the
core profile, which only keeps the most commonly-seen samples. And this is what’s going to
get eventually to the device. How well does this work? Let’s look again at data
captured from Google Apps. We tested this across a
variety of applications. And here are the results of
some representative ones. In this set, you
can find application which relies on native code– for example, Google Camera– or applications which are
much more Java oriented– say, Google Maps or Google Docs. For Google Camera,
for example, we get a startup time
improvement of about 12.6%. And that’s excellent given that
the application itself doesn’t have a lot of Java code. However, for Maps or Docs,
which are heavily Java-based, we can see that the
optimizations improves the startup time by
about 28% or 33%. Across the board, we can
see an average of about 20% improvement. And that obviously
depends with what the application is doing, how
my Java code is being used, and so on. Now I mentioned in the
beginning that besides improving the application
performance directly via profile-guided
optimizations, the profile also offer
much more opportunities. I’m going to present
a short use case study and walk you through
some important aspects that the profiles can reveal
about your application. During this use case
study, I’m going to focus on a single question– are you shipping unnecessary
code to the clients? Are you? Let’s take a look at some data. Again, this case study reflects
the state of some Google Apps that we tested. We see that on average, we
profiled about 14% to 15% of the code. And about 85% of the
code remains unprofiled. When you spread
the distribution, we can see, for example,
that in some apps, 5% to 10% of the code gets profiled. In some other apps, even 50%
of the code gets profiled. And this is a rather intriguing
result. And the reason for that is that if the code is not
profiled, that most likely means that it might not
have ever been executed. Obviously, that’s
for a good case. I mean, the code for example,
can be unexpected error code paths, right? We all want the applications
to be reliable and robust, and the error handling
must be there. Hopefully, it never
gets executed. You may have backwards
compatibility code, support for previous
API level and such. You may have features which
are not used on all devices. It may have features
very targeted. And you might also have a lot of
unnecessary code lying around, maybe including libraries
that you don’t really use. Now it’s a bit hard to
break down the percentage for these categories. And there can be other reasons
why we didn’t profile the code. But the skewed distribution
here is a strong indication that there is a lot of room
of improvements for APK. The code can be
reorganized or trimmed down for better efficiency. For example, Google Play
introduced dynamic delivery schemes, which may
help you reduce the code that you share
by targeting features only to certain users. And that’s something
that you might want to look at and
take advantage of. So we believe that there is
quite a bit of unnecessary code lying around, at
least in our own APKs. Now since we’ve focused on
the code that actually doesn’t get profiled, is there anything
that we can extract out of the profile code? To understand this,
let me talk a bit about different categories
of profile code. When the application
code is being profiled, the runtime will try to label
it depending on its state. And you will have a label
for the startup category, for the post-startup category,
and for the hot category. Obviously, these are
pretty self-explanatory. The hot category of
the code is essentially what the runtime’s seen to
be the most important part of your code. It’s important to keep in mind
that these are not disjoint. Say for example, that the
method foo is being executed. This can be executing
during startup, it can be executing
during post-startup time, and it can also
be marked as hot– for example, if you have
a very heavy computation during that method. Now if you know the code which
is executing during startup time, if you focus
on that, you will be able to lower the startup
time of the application. As such, the first
impression that the users will have upon your
application will be very good. If you look at the
post-startup code, that will help you, for example,
lay out the application Dex bytecode. That will lead to
memory improvements and will be much smoother
on low-end devices. As for the hot code,
this is the code that should get
the most attention for your optimizations efforts. It’s the code that is
most heavily optimized by the runtime, and it might
be so because the runtime identified that it would be
very beneficial to invest time there. And it’s what if you,
for example, start with– try to improve the quality and
the performance of your app, this is where you should
spend your effort, or your initial effort. Now for this is
important, like how much code of your application is
actually being marked as hot? Because if everything
gets hot, then everything can be optimized. So that’s not really useful. Let me show you the breakdown
of these three categories. In this graph, you can
see on the red columns, the percentages for the profile
code and the not-profile code. This will sum up to 100% and
it’s what I showed you earlier. They are here just
for the reference. The blue boxes show
the percentages of the startup code,
the post-startup code, and the hot one relative
to the total Dex bytecode, so don’t expect it to add 100%. Also, one piece of code can
be in different categories at the same time. But as you can see
here, the average– on average about 10% of the
application Dex bytecode is being marked as hot. And this indicates that
when you focus on your app optimizations, you can
dedicate the attention starting with just a small part of
your application code base. You obviously should spend time
on all the other parts as well, but probably this is where
you should start from. Let me go over a quick
review of what we presented today and the main benefits. We started with Kotlin,
and we described a few new compiler optimizations
that we added that focused on Kotlin performance. We described briefly how we
approach Kotlin optimizations, and that we first try
to seek improvements in the Kotlin compiler. We moved to memory and
storage optimizations, and my colleague
Mathieu introduced you to the concept of CompactDex. This is a new Dex format
available just on-device, which focuses on the memory savings. And finally, I presented you
the idea of cloud profiles. And we talk about
how we can bootstrap the profile-guided
optimizations using a small percentage of
alpha/beta channel users in order to lead important
performance improvement right after install time for the
majority of the production users. With this, I’d like to
thank you for your attention and for your presence. And I want to invite you all
to Android Runtime office hours tomorrow, where we can
answer any questions that you would have about
today’s presentation or about the runtime in general. We’re going to be at 5:30 in
section A. Thank you so much. [APPLAUSE] [MUSIC PLAYING]

Only registered users can comment.

  1. There are many AI technologies involved in this conference.
    It is suggested that the AI start-up team should prepare relevant domain names.
    For example:
    The above basically includes the technology discussed in this GOOGLE IO.

  2. if the app release new version on Play Store, will the profiling need to start from the first place again? Since system won't know how many code have been renew in advanced. Or the cloud profiling have some kind of comparing and merging way (like github…) to do this between different version of app release?

  3. The first guy does not pronounce the "t" in Kotlin and he pronounces "bytes" as "bits". The colleagues who watched him rehearse should have told him.


Leave a Reply

Your email address will not be published. Required fields are marked *