The information on these pages may be out of date, or may refer to
resources that have moved or have been made read-only.
For more information please refer to the
Apache Attic
Introduction to Constraint Propagation
Approaches to Byte Code Verification
Version | Version Information | Date |
---|---|---|
Initial version | Mikhail Loenko: document created | March 20, 2007 |
Java* Class File Verification ensures that byte code is safe to avoid various attacks and runtime errors, but its classic implementation requires complex time and memory consuming dataflow analysis that generates a proof of type safety. There are alternative approaches; for example CLDC* verifier takes the class files annotated with the proof of type safety. To make sure the byte code is valid, verifier just validates the proof. The validation is simple, fast, and does not require much memory. This document presents a new verification approach implemented in Harmony [1] JIRA 3363 [4]. The approach is based on Constraint Propagation, it neither generates a direct proof of class file validness nor validates any existing proof. Instead it takes original Java class file containing no additional information and generates a proof that a proof of validness does exist. The new approach results in significant performance and memory footprint advantage over J2SE*-style verification.
Java Class File Verification is an important part of Java Security Model. Verification ensures that binary representation of a class is structurally correct, e.g. that every instruction and every branch obeys the type discipline.
Verification is a complicated time and memory consuming procedure that is absolutely necessary: If the classes were not verified then a pure java malicious application could get unrestricted access to the memory [6].
Verification result depends on the other classes linked to the given class. So, even if the class is developed by a trusted developer and digitally signed, it anyway must be validated in the user environment: Consider class A that stores an object of class B in the field of type C. If in the developer's environment the B is a subclass of the C, then this store operation is valid. In a user environment that might not be the case. If B is replaced with some other valid and digitally signed B that is not a subclass of the C, then that class A is a source of runtime errors and a hole for attacks. The verification can not be avoided, but it can be reduced. In the J2ME* world, which is about small limited-resource devices, a simplified verification approach is implemented. The "proof" is already generated by an off-device pre-verifier and stored within the class file. In-device simplified verifier has just to validate the proof. Proof validation is significantly simpler comparing to proof generation, so entire in-device verification consumes reasonable amount of resources.
In-device part of J2ME CLDC verification is fast but it cannot deal with the legacy code. In the best case an existing application must be passed into the pre-verifier. In the worst case it is unusable. For example, if an application contains a digitally signed jar file, passing that file through the pre-verifier invalidates its digital signature.
That is why research in the verification area is necessary. A verifier operating original J2SE class files without any additional information would be desired by the micro device holders, if it consumed acceptable time and memory.
This document presents verification approach based on the Constraint Propagation technique. The next section makes some introduction into the Constraints world. Then we will go deep into J2SE and J2ME CLDC verification approaches. After that a new verification approach is presented. Finally, comparison tables presenting approach differences is given.
Constraint Programming is a programming paradigm where a set of constraints that the solution must meet is specified, rather than the steps to obtain the solution. The solution to be found is a set of variables; each of them has a set of its possible values known as domains [7]. Those domains might be represented by either explicit enumeration of all possible values or, for example, by numeric intervals containing all the possible values [2].
As far as exact variable values are unknown, the techniques used to find a solution normally work with various domain representations rather than with certain values.
Constraint Propagation is one of such techniques [3, 5, 8]; it uses constraints to remove those values from the domains that knowingly do not belong to any solution.
For example, if there is a sub-definite variable x, whose domain is represented by an interval of [-20, 30], then the result of propagating the constraint x≥0, would be a narrowing the domain to [0, 30].
Constraint Propagation can also be used to identify whether a set of constrains is self-contradictory: Let's consider the previous example with two constraints: x ≥ 0 and x ≤ -10. The propagation will cause narrowing the domain of the x variable first to [0, 30] and next to an empty interval. As soon as the set of possible values of the x variable does not contain any value then no solution that meets all the constraints exists.
The power of Constraint Propagation is in its ability to validate constraints over unknown substances or variables, whose values are un- or sub-defined.
Traditional J2SE classfile verification requires a complex dataflow analysis algorithm implementing effectively a theorem prover for type safety. This dataflow analysis algorithm is computation intensive.
At the beginning of the dataflow pass for every instruction of the java method the contents of operand stack and local variable array is recorded [10]. Also every instruction has a "changed" bit associated with it. Originally this bit is cleared for every instruction.
Then for the first instruction of the method that contents is initialized with method's arguments and its "changed" bit is set. After that the following loop is run:
The verification ends when no instruction with a "changed" bit set exists. The outcome of this approach is:
One of optimizations of the algorithm above: the types are stored for some key points only, e.g. branch targets, rather than for every instruction of the method. The types for the remaining instructions are recalculated on the fly. This optimization significantly reduces memory footprint but increases number of "modeling" passes through a single instruction.
This specific of the J2SE verifier made it unusable for limited-resource devices. The next section briefly describes one of verification approaches for the limited devices, a J2ME CLDC verification approach.
In the J2ME world due to the space and performance constraints, the classfile verification has been broken up to two parts:
Stackmap information is contents of operand stack and local variable array (similar to what was generated in J2SE verification) recorded for all instructions that can be jumped to, i.e. jump targets and exception handlers.
J2ME verification looks like the following [9]:
At the beginning of the dataflow pass an array for the contents of operand stack and local variables for a single instruction, is created. This array is used to store derived types as verifier goes through the instructions. The derived types initialized with method's arguments. Then the following loop iterating one-by-one from the first till the last instruction of the method is run:
Finally, if the loop passed through all the instructions without any matching failure (i.e. derived types are assignable to the recorded types or acceptable by instructions), the class is valid. The outcome of this approach is:
In J2ME verification approach described above if a different pre-verifier generates a different proof for the same class and in-device verifier successfully validates it then the class is valid. So the important fact affecting class validity is that the valid proof does exist; the proof itself does not matter. The valid proof is the proof that meets all the constraints derived from the byte code. So the verification task can be reformulated as follows: identify whether the set of constraints implied by the byte code is self-contradictory. The Constraint Propagation is among the best techniques for dealing with such kind of the tasks.
This verification consists of two passes. The first one is parsing pass makes sure that all instructions have valid opcode, no instructions jump to a middle of another instruction or outside the method code. In this step all jump targets are found. Normally this step takes 4-6% of overall verification.
After the first pass for instructions that are jump targets or starts of exception handlers, structures similar to the J2ME stackmap arrays are created. Elements of those stackmap arrays are sub-definite values.
The second pass is J2ME-like dataflow analysis pass. It performs all checks described in the CLDC specification with the only exception. To make matching of or against a sub-definite value, special dataflow constraint is created and recorded. During the dataflow pass Constraint Satisfaction Problem is created from sub-definite variables and dataflow constraints. This problem is solved using Constraint Propagation technique. If at the end of the pass the problem is consistent, verification passed, otherwise it failed.
So the key idea of the approach proposed is:
The proposed approach requires one pass through the instruction set to identify "stackmap points" and one pass modeling instruction's behavior. Unlike the J2SE verifier, the CP verifier does not have to hold types for each instruction, but it has to store the constraints and operate with sub-definite variables whose size is twice bigger than the size of a regular verification type.
If I[n] -- instruction set of the given method, then
On the set of Java verification types extended by some artificial types let's set up a partial order. If A and B are Java types, then A≤ B if and only if A is assignable to B without explicit cast. Examples of artificial types are: MIN that is the minimal element in the set, MAX that is a maximal element, ONE_WORD is such that any type except long and double is assignable to it, etc.
The verifier operates with two kinds of vectors: WorkMap and StackMap. Both kinds of vectors have size of the sum of java method's maximum number of local variables and method's maximal stack depth.
Each element of a WorkMap vector is either a constant of some verification type or a reference to a variable. WorkMap vector corresponds to "derived types" in terms of J2ME verification approach.
Each element of a StackMap is a variable. Each variable is a structure containing two fields, "lower bound" and "upper bound". The fields represent sub-definite value of the variable. StackMap vector corresponds to "recorded types" in terms of J2ME verification approach.
Also each variable has a set of "targets" associated with it. If V is the given variable, its set of "targets" contains all variables V' such that constraint V ≤ V' exists in the system.
Both StackMaps and WorkMaps have attributes like "stack depth" that have similar purpose as corresponding CLDC attributes and will not be described here.
Now let's see how the terminology used in the CP verifier relates to that one used in the Java world.
As far as A≤ B means that A is assignable to B, more general types are greater than less general ones, e.g. java.lang.String≤ java.lang.Object.
If A and B are types, then max {A, B} in terms of Java means the least general type T such that both A and B are assignable to T.
For example,
max {java.lang.Exception, java.io.IOException} = java.lang.Exception max {java.lang.Exception, java.lang.Error} = java.lang.Throwable max {int, float} = ONE_WORD
If A and B are types, then min {A, B} in terms of Java means the most general type assignable to both A and B.
For example,
min {java.lang.Exception, java.io.IOException} = java.io.IOException min {java.lang.Exception, java.lang.Error} = NULL min {int, float} = MIN
Verification starts with the parsing pass. The main goal of this pass is to identify the instructions of the following types:
if*
or a switchgoto
Implementation of the parsing pass is straightforward and will not be described here.
After the first pass for every instruction having multiple predecessors a StackMap vector is created. StackMap vector elements are initialized. At this point their sets of targets are empty. Their lower bounds are initialized with value MIN, their upper bounds are initialized with value MAX.
Then StackMap vectors corresponding to starts of exception handlers are initialized with stack depth equal to 1 and constant value on stack that corresponds to the type of exception of that handler.
Then verifier creates a single WorkMap vector, whose elements are initialized with constant values representing method arguments. Remaining WorkMap elements (if any) are initialized with MAX.
A dataflow stack is created. The first instruction of the method is pushed onto the stack. The dataflow pass consists of a series of loops. Each loop begins with instruction popped from the dataflow stack and iterates until it is explicitly said that the loop breaks. At every loop iteration the following steps are made:
Finally if the loop passed through all the instructions and for each variable its lower bound is assignable to its upper bound, the class is successfully verified. The outcome of this approach is:
In reality it is not necessary to have StackMap for all instructions required by the CLDC specification. Only those instructions that have multiple predecessors require StackMaps. For example, if an instruction follows an unconditional jump such as goto or switch and just a single other instruction has a jump to the given instruction then the given instruction has exactly one predecessor and thus it does not need a stackmap in the CP approach.
Special type of constraints is designed to process aaload
instructions: aaload(V1,
V2) means that V1 is an array, whose element is assignable
to V2.
If dataflow pass hits jsr
instruction then the dataflow pass is suspended.
Special mark (followed by the jsr
target) is pushed onto the dataflow stack and until that mark is popped from the stack we are
verifying the subroutine. Once we are done with the subroutine the dataflow pass is resumed:
we now know how this specific subroutine modifies the WorkMap, so each call to this subroutine
is processed as a regular instruction.
As it was discussed in the previous section the bounds of variables are calculated as min or max of some types (most likely Java verification types). For example (see above) max {java.lang.Exception, java.lang.Error} = java.lang.Throwable. Representation for variable bounds might be implementation-specific. An implementation that targets memory footprint reduction will likely calculate exact bounds and represent it as a single Java verification type (java.lang.Throwable in our case).
Implementation HARMONY-3363 implements it in a slightly different way: in some cases the referred classes might be not loaded at the moment of verification, so the lower bound in our case is represented as a list {java.lang.Exception, java.lang.Error}. This is a performance oriented implementation.
Minimal element (MIN) on the set of verification types corresponds to SM_TOP
in the
HARMONY-3363 implementation (it's used in assert()
calls basically). Maximal element MAX corresponds to
the SM_BOGUS
constant.
Let's look how different approaches verify the following example method:
static void test() { for (Element e = new MyElement(); e != null; e = e.next()) { ... // linear code not modifying e } }
Where MyElement
extends Element
, e.next()
returns Element
.
This method compiles into the bytecode that logically can be splitted into the following blocks (see also
disassembled code below):
MyElement
and invoking its constructor. Reference to
MyElement
is stored in local variable #0. aload_0
, loading local variable #0 onto the stack, and ifnull
jumping to the end of the method in case of null Element.next()
method for the object in local #0. The block consists of three
instructions: aload_0
, loading a reference, invokevirtual
that is invokation
of next()
, and astore_0
that stores method's result at local #0. Note, that
invokevirtual
instruction expects Element
to be on top of the stack. goto
instruction jumping to the beginning of the second block. return
instruction ------ block 1 ------- new #2; //class MyElement dup invokespecial MyElement."":()V astore_0 ------ block 2 ------- L2: aload_0 ifnull L6 ------ block 3 ------- // code inside loop ------ block 4 ------- aload_0 invokevirtual Method Element.next:()LElement; astore_0 ------ block 5 ------- goto L2 ------ block 6 ------- L6: return
The method has one local variable and maximal stack depth of 2. Now let's look how different verifiers handle this method.
Verification starts with the first instruction of the method. Then it goes through the
first block. The first block does not have any branching so instructions passed consequently one-by-one,
every instruction sets the "changed" bit of the next instruction. After the last instruction of the first
block local variable #0 contains MyElement
type, the first instruction of the second block has
the "changed" bit set.
Then the second block is processed. To compare local #0 to null
, instructions aload_0
and ifnull
are used. aload_0
loads a reference from local #0 onto the stack, it
expects a reference to be there. At this point we have MyElement
there, so it goes OK so far.
After the aload_0
instruction MyElement
is on the top of the operand stack.
The ifnull
instruction pops a reference from the stack and either makes or does not make a jump.
It expects a reference on top of the stack, so far it is MyElement
there that is OK.
ifnull
is the last instruction of the second block; it sets the "changed" bit for the first
instructions of both the third and the sixth blocks.
Verifier may chose any of them. Let's first go to the third block, which is linear. Verifier goes through it;
it does not modify local #0 as far as the original Java code does not modify the e
variable.
At the end of this block local #0 still holds MyElement
.
Then the verifier goes to the block number four, where e.next()
is invoked: First aload_0
instruction loads a reference from local #0 onto the stack and then invokevirtual
instruction
calls the next()
method expecting that a reference on the top of the stack is of the
Element
type. As far as MyElement
is a sub class of Element
it is OK.
Return type of the next()
method is Element
; it's stored in the e
variable
that is local #0.
Now local #0 holds Element
and verifier goes to the block 5, consisting of a goto to the block 2.
States of all the type recorded for various instructions are shown below.
------ block 2 ------- L2: locals: [MyElement]; stack: [] aload_0 locals: [MyElement]; stack: [MyElement] ifnull L6 ------ block 3 ------- locals: [MyElement]; stack: [] // Some linear code ------ block 4 ------- locals: [MyElement]; stack: [] aload_0 locals: [MyElement]; stack: [MyElement] invokevirtual Element.next:()LElement; locals: [MyElement]; stack: [Element] astore_0 ------ block 5 ------- locals: [Element]; stack: [] goto L2 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< we are here ------ block 6 ------- L6: locals: [MyElement]; stack: [] return
Current types are merged to the types recorded for the first instruction of the second block. Current
local #0 that is Element
is merged to the recorded local #0 that is MyElement
.
The result is Element
. It is more general type, so local #0 is changed and the "changed" bit is
set for the first instruction of the second block
------ block 2 ------- L2: locals: [Element]; stack: [] aload_0
Verifier comes back to block number two and passes blocks 2, 3, and 4 once again: for each instruction
MyElement
recorded in the local #0 is changed to more general Element
.
Then it comes to astore_0
, the last instruction in the fourth block. It stores Element
from the top of the stack to local #0. Thus the stack is now empty, local #0 contains Element
.
This type state is identical to that one from the previous pass. The "changed" bit for the next instruction
(which is goto
from block 5) remains cleared.
The only remaining instruction with "changed" bit set is the first instruction of the last block,
return
, which does not have successor instructions. Verification is successfully finished.
J2SE verifier passed through the instruction blocks 2, 3 and part of 4 two times. In general case it may pass through a single instruction multiple times. Note that block 3 is the body of the loop in the original Java method that might be very long.
Now let's see how CLDC handles it.
The bytecode first goes to a pre-verifier. After the pre-verification step StackMaps are recorded for the first instructions of both the second and sixth blocks. Those instructions have the same StackMaps recorded for them:
locals: [Element]; stack: []
In-device verifier allocates an array of length 3 for storing the derived types.
Verifier starts with the first instruction of the first block. Originally the stack is empty, local #0
contains UNUSABLE type. Then verifier consequently passes instruction in the first block,
modeling instructions' behavior to modify derived vector appropriately. At the end of the first block is
has an empty stack and MyElement
in the local #0. The successor instruction for the last
instruction in the first block is the first instruction in the second block. It has stackmap recorded for it,
so verifier does matching: It makes sure that MyElement
that is in the local #0 of the derived
types is assignable to Element
that is in the local #0 of the recorded stackmap.
derived: locals: [MyElement]; stack: [] <<<<<<<<<<<<<<<<<<<<<<<<<< ------ block 2 ------- L2: recorded: locals: [Element]; stack: [] aload_0 ifnull L6
Then it goes to the first instruction of the second block. As far as it has a stackmap recorded for it, the
stackmap is copied into the derived types. So derived types now have Element
as local #0 and an
empty stack.
------ block 2 ------- L2: recorded: locals: [Element]; stack: [] derived: locals: [Element]; stack: [] <<<<<<<<<<<<<<<<<<<<<<< aload_0 ifnull L6
Verifier goes through the second block. Once it reaches ifnull
the derived types are matched
against the types recorded for the first instruction of the block 6.
Then verifier goes through block 3. In the fourth block the invokevirtual
instruction returns
Element
that is then stored in the local #0.
Derived types now have Element
in the local #0 and an empty stack. In the block 5 the derived
types are matched against the recorded ones for the first instruction of the second block; the matching
goes OK.
Finally verifier goes to block 6, copies the recorded stackmap into the derived types and validates
return
instruction. Verification is done.
Verifier used two stackmaps preliminary stored in the class file and allocated memory for storing array of length three for the derived types. Each instruction is passed only once.
Now let's look how CP verifier does it.
The first pass (parsing) of the CP verifier identifies that the first instruction of the second block
has multiple predecessors. The first instruction of the sixth block has a single predecessor that is
ifnull
.
The first instruction of the method is pushed onto the dataflow stack. WorkMap vector of the length of
three is allocated and initialized. Originally the stack is empty, local #0 contains MAX
(SM_BOGUS
constant in HARMONY-3363).
The second pass starts.
The first dataflow loop pops from the dataflow stack and starts from the first instruction of the method.
It passes through the first block the same way CLDC verifier did. At the end of the block WorkMap holds
MyElement
in the local #0 position and an empty stack.
The successor instruction that is the first instruction of the second block must have a StackMap. A space for StackMap is allocated at the first use. As far as the stack in the current WorkMap is empty, we allocate a space just for a single element of StackMap. Let this element be Var1, it corresponds to local #0. Var1 is a structure consisting of lower and upper bounds that are originally set to MIN and MAX correspondingly.
WorkMap: locals: [MyElement]; stack: [] <<<<<<<<<<<<<<<<<<<<<<< ------ block 2 ------- L2: StackMap: locals: [Var1:(MIN, MAX)]; stack: [] aload_0 ifnull L6
Now WorkMap is "matched" against the StackMap. As a result lower bound of Var1 is set
to MyElement
.
Then verifier goes to the second block. As far as its first instruction has a StackMap, the WorkMap is reinitialized. Now its stack is still empty, but local #0 is a pointer to Var1.
------ block 2 ------- L2: StackMap: locals: [Var1:(MyElement, MAX)]; stack: [] WorkMap: locals: [@Var1]; stack: [] <<<<<<<<<<<<<<<<<<<<<<<<<<< aload_0 ifnull L6
The second block starts with instruction aload_0
. It expects a reference (i.e. at least
java.lang.Object
) in the local #0. As far as WorkMap's local #0 points to
Var1, the upper bound of Var1 is set to Object
now.
The lower bound, MyElement
is assignable to the upper one that is Object
,
so everything is OK for now.
The aload_0
instruction pushes contents of the local variable #0 onto the operand stack.
Verifier now models instruction's behavior. So the WorkMap now has a pointer to Var1
in the local #0 position and it has a stack of depth 1, containing another pointer to the same
Var1.
------ block 2 ------- L2: StackMap: locals: [Var1:(MyElement, Object)]; stack: [] aload_0 WorkMap: locals: [@Var1]; stack: [@Var1] <<<<<<<<<<<<<<< ifnull L6
Then ifnull
instruction expects a reference to be on the top of the stack. Top of the stack
contains a pointer to Var1, whose upper bound is already Object
. Nothing
changes. Verifier pops element from the operand stack and determines successor instructions. They are:
the next instruction (that is the first instruction of the third block) and the first instruction of the
sixth block.
The first instruction of the sixth block is pushed onto the dataflow stack, a copy of the current WorkMap is recorded for that instruction.
------ block 3 ------- WorkMap: locals: [@Var1]; stack: [] <<<<<<<<<<<<<<<<<<<<<<<<<<< // Some linear code < skipped > ------ block 6 ------- L6: WorkMap copy: locals: [@Var1]; stack: [] return
The loop continues with the first instruction of the third block, and so on.
Verifier passes block 3; in the fourth block after aload_0
it reaches invokevirtual
instruction. Before invokevirtual
the WorkMap has pointers to Var1 in
the local #0 position and on the top of the operand stack.
invokevirtual
calls next()
method from the Element
class, so it expects
Element
type to be on the top of the operand stack. The top of the operand stack points to
Var1, so upper bound of Var1 is set to
min {Object, Element} that is Element
.
Verifier makes sure that the lower bound that is MyElement
is assignable to the upper one
that is Element
. Everything is OK for now.
The method invoked returns a reference to Element
.
StackMap: locals: [Var1:(MyElement, Element)]; stack: [] < skipped > ------ block 4 ------- aload_0 invokevirtual Element.next:()LElement; WorkMap: locals: [@Var1]; stack: [Element] <<<<<<<<<<<<<<<<<<
Reference to Element
is stored at local #0. WorkMap now contains Element
in the local #0 position and has an empty stack.
astore_0 WorkMap: locals: [Element]; stack: [] <<<<<<<<<<<<<<<<<<<
The loop proceeds with block 5 consisting of a goto instruction. The successor instruction is identified; it is the first instruction of the second block. It has a StackMap.
The WorkMap is matched against the StackMap: the lower bound of Var1 is set to
max {MyElement, Element} that is Element
.
------ block 2 ------- L2: StackMap: locals: [Var1:(Element, Element)]; stack: [] < skipped > ------ block 5 ------- goto L2 WorkMap: locals: [Element]; stack: [] <<<<<<<<<<<<<<<<<
Verifier makes sure the new lower bound is assignable to the upper bound. It's OK.
The loop proceeds from the goto target that is the first instruction of the second loop. But this instruction was already passed, so the loop breaks.
The next loop starts from the first instruction of the sixth method that was pushed onto the dataflow stack by the first loop. The method ends, so the verification is successful.
Verifier allocated memory for one WorkMap vector of three elements, one structure of two elements for a sub-definite variable, and one copy of WorkMap of one element. Each instruction's behavior was modeled only once.
This section includes comparison of CP verification approach to J2SE and J2ME CLDC verification approaches on the test classes.
To measure J2SE performance, the following implementations were used: BEA 1.5, Harmony r517188, and modified Harmony r517188 where regular verifier was substituted with the CP one taken from HARMONY-3363.
Since no CLDC implementation supporting -noverify option is available to the author, performance comparisons to CLDC were not performed.
To compare the performance, ~7500 Harmony classes were verified. Each new class was loaded by a custom classloader (because of that all the implementations failed to load some of the classes: those classes were removed them from the list). The code used to load the classes is given below.
long total = System.currentTimeMillis(); for (int i = 0; i < cnt; i++) { cl = kl.defineKlass(clsNames[i], clsBytes[i]); cl.getConstructors(); } total = System.currentTimeMillis() - total;
The time of verification of the remaining classes with different implementations is presented in the Table below.
Measurement | BEA 1.5 | Harmony r517188 | CP (Harmony r517188 & H-3363 patch) |
---|---|---|---|
-Xverify
|
2734 | 2344 | 1734 |
-noverify
|
813 | 1297 | 1297 |
verification time | 1921 | 1047 | 437 |
To compare memory consumption the same classes were used as for the performance comparison. Memory consumption as well as performance depends on the actual implementation, not only on the approach behind that. To roughly estimate how specific approach affects memory consumption the following measurements were made.
For the J2SE approach number of types according to maximal local variables count and actual stack depth at each instruction was calculated according to the J2SE specification.
For the CLDC approach number of all recorded and derived types was calculated. That includes all the types recorded in stackmap attribute for the java method and maximum possible number of derived types that is the sum of max_locals and max_stack. CLDC specification does not count the recoded types when talking about memory consumption, but it is not quite fair: If a class file is loaded and recorded types is a part of the class file then all the recorded types must be somehow stored in the memory (One can say that recorded types might be placed in the read-only memory though).
For the CP approach number of upper and lower bounds of all variables as well as WorkMap elements and the number of constraints were calculated. Memory footprint was estimated for memory-optimized implementation of the CP approach.
Table below contains also estimation of average memory consumption in bytes for java methods of various sizes. To make the estimation, the size of a verification type for each approach and the size of a constraint for the CP approach were taken as four bytes. In reality verification type might be from one to three bytes. Depending on the method code length, a constraint might be from two up to eight bytes in theory.
Method length | J2SE classic | CLDC | CP memory optimized |
---|---|---|---|
1-10 instructions | 51 bytes | 15 bytes | 16 bytes |
11-100 instructions | 684 bytes | 76 bytes | 121 bytes |
101-1000 instructions | 9K | 0.7K | 1.5K |
1001-10000 instructions | 46K | 462 bytes | 586 bytes |
By a chance large methods require less memory in CP and CLDC approaches. The reason is that the large methods in the tested classes had less branching and more linear code. Actual memory consumption of a verifier is defined by the amount of space required to verify the heaviest method, so the Table below shows per-approach maximal memory consumption on the tested methods. Note that different approaches have reached their maximum consumption on different methods.
J2SE unoptimized | CLDC | CP |
---|---|---|
721K | 17K | 28K |
It is worth noting that both J2SE and CP verifiers need that memory at the moment of method's verification only, while in the CLDC case the recorded types are stored in the class file. So if a CLDC device loads an application that contains for example 3 java methods, then memory amount required to perform verification would be multiplied by ~3.
One more thing to note: if a method consists of a short try
block, several catch
statements, and a long finally
code,
then CLDC pre-verifier inlines subroutines thus multiplying code length. CLDC in-device verifier will
go thru each entrance of the subroutine code. Unlike that, CP verifier will go thru the subroutine only once.
This document presents Java class file verification approach based on the Constraint Propagation technique. The approach does not need any special Java Compiler or any other preliminary work on the class file; instead it verifies just regular Java applications.
Performance comparisons demonstrated that the CP verifier outperforms its 1.5 counterpart in times. Implementation of CP verifier as well as a lot of other software has a performance/memory consumption tradeoff. The performance targeted implementation would be useful in various Java Application Servers while memory targeted one would be beneficial for the CLDC devices.
Memory footprint estimation demonstrated that CP-based verification approach requires memory amount comparable to CLDC approach.
Implementation of the CP verifier is available in Harmony JIRA-3363
[1] Apache Harmony. http://harmony.apache.org
[2] Benhamou F. Interval constraint logic programming // Constraint Programming: Basic and Trends / Ed. by A. Podelski. -- Berlin a.o.: Springer Verlag, 1995. -- P. 1--21. -- (Lect. Notes Comput. Sci.; Vol 910).
[3] Cleary. J. Logical Arithmetic // Future Comput. Syst. -- 1987. -- Vol. 2, N. 2. P. 125--149.
[4] Contribution of alternative bytecode verifier. http://issues.apache.org/jira/browse/HARMONY-3363
[5] Davis E. Constraint propagation with interval labels // Artificial Intelligence. -- 1987 -- Vol. 32, N. 3. -- P. 281--331.
[6] Govindavajhala. S., Appel. A. W. Using Memory Errors to Attack a Virtual Machine // Proc. of the 2003 IEEE Symposium on Security and Privacy. -- 2003. -- P. 154--165.
[7] Huffman D.A. Impossible objects as nonsence sentences // Machine Intelligence. -- 1971. -- Vol. 6. -- P. 295--323.
[8] Kumar V. Algorithms for constraint-satisfaction problems: a survey // AI. -- 1992. -- Vol. 13, N. 1. -- P. 32--44.
[9] Sun Microsystems. Connected Limited Device Configuration // Available at http://www.jcp.org/en/jsr/detail?id=139
[10] Sun Microsystems. The Java Virtual Machine Specification // Available at http://java.sun.com/j2se/1.5.0/docs/guide/vm/index.html
[*] Other brands and names are the property of their respective owners.