Greets,
In order to present a useful public C API for KS, we need to make
method calls available -- not just functions. But in KS, inheritance
is implemented using vtables -- structs with function pointer members
-- and once those vtables are part of the public API, you can't
change the vtable struct layout without wrecking binary
compatibility. Here is an excellent explanation of the problem:
http://www.usenix.org/events/javavm02/yu/yu_html/node5.html
Freezing the vtables would severely cramp our ability to develop KS.
However, if we are unable to guarantee binary compatibility, outside
developers will be severely limited in their ability to extend KS
from C, so I've been looking for a way around this problem for a
while... Happily, I think I've found one:
"Supporting Binary Compatibility with Static Compilation"[1]
Dachuan Yu, Zhong Shao, and Valery Trifonov
<http://www.usenix.org/events/javavm02/yu/yu_html/index.html>
Right now, KS virtual method invocations look something like this:
object->vtable->method_name(object)
Here's the actual pound-define for KinoSearch::Index::Term's destroy
() method, which overrides a method inherited from
KinoSearch::Util::Obj:
#define Kino_Term_Destroy(self) \
(self)->_->destroy((kino_Obj*)self)
self->_ is the vtable; "destroy" is a member.
Under the indirect dispatch system, the vtable becomes an array of
function pointers rather than a struct with function pointer members,
and method invocation changes to something like this:
object->vtable[offset](object)
Here's how an actual pound-define might look:
#define Kino_Term_Destroy(self) \
((kino_Obj_destroy_t)((self)->_[kino_Term_destroy_OFFSET])
((kino_Obj*)self))
What this allows us to do is define the vtable layout and the offsets
dynamically during a bootstrap operation. The payoff is that a
method macro so defined retains binary compatibility even as the
composition of the vtable changes with subsequent releases.
Stated another way: if we make the layout of the current vtables part
of the public API, externally compiled code will assume that a method
like "destroy" is located at a fixed location in the vtable -- and if
the layout of the vtable changes, the externally compiled code will
jump into the wrong method. (BAD!) However, if we make that offset a
variable and set it at runtime, the externally compiled code will
always find the correct method to jump into.
Therefore, someone could write another XS library extending KS, and
upgrading KS itself wouldn't cause breakage.
There's a cost in CPU cycles for this flexibility: one extra array
look-up operation. However, GCJ uses this design, and the
performance penalty is apparently only around 2% on average:
<http://www.usenix.org/events/javavm02/yu/yu_html/node29.html>
That might seem mild, but it actually makes sense to me, at least.
On a modern, pipelining processor chip, that extra op just isn't a
big deal. When I changed InStream and OutStream into "final"
classes, so that heavily used methods like OutStream_Write_VInt()
resolved directly to function addresses and no longer needed to be
resolved via vtable double dereference, the benchmark barely budged:
<http://xrl.us/7rty (Link to mail-archives.apache.org)>
A note about type safety:
The array of function pointers will have to be implemented as an
array of void*, since we won't know which functions go where in the
vtable until runtime. This would seem to be a drawback, since in
theory we lose a certain amount of compile-time checking. However,
we aren't really losing much, if anything. The current system
doesn't perform real type checking; the first argument is always cast
(in this example, to kino_Obj*):
#define Kino_Term_Destroy(self) \
(self)->_->destroy((kino_Obj*)self)
However, at present, there *will* be a compile time error if the
vtable doesn't contain a method with the appropriate name:
/* compile-time error */
kino_Obj_destroy_t destroy_meth = self->_->destro;
We will continue to enjoy a similar level of safety because the name
of the offset variable will have to be resolved by the dynamic
loader. Say we remove the Kino_Term_Destroy method... then this code
will crash at run-time, because the kino_Term_destroy_OFFSET symbol
cannot be resolved:
destroy_meth = self->_[kino_Term_destroy_OFFSET];
Of course a run-time crash would be bad -- but that just means that
we can't redact public methods -- which we wouldn't be doing anyway.
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
[1] The technique I've out differs slightly from what's described in
the paper. For us the offsets can be stored in individual variables,
but Yu et al put them in an "otable" array which is initialized by
the Java class loader.
_______________________________________________
KinoSearch mailing list
KinoSearch@rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
In order to present a useful public C API for KS, we need to make
method calls available -- not just functions. But in KS, inheritance
is implemented using vtables -- structs with function pointer members
-- and once those vtables are part of the public API, you can't
change the vtable struct layout without wrecking binary
compatibility. Here is an excellent explanation of the problem:
http://www.usenix.org/events/javavm02/yu/yu_html/node5.html
Freezing the vtables would severely cramp our ability to develop KS.
However, if we are unable to guarantee binary compatibility, outside
developers will be severely limited in their ability to extend KS
from C, so I've been looking for a way around this problem for a
while... Happily, I think I've found one:
"Supporting Binary Compatibility with Static Compilation"[1]
Dachuan Yu, Zhong Shao, and Valery Trifonov
<http://www.usenix.org/events/javavm02/yu/yu_html/index.html>
Right now, KS virtual method invocations look something like this:
object->vtable->method_name(object)
Here's the actual pound-define for KinoSearch::Index::Term's destroy
() method, which overrides a method inherited from
KinoSearch::Util::Obj:
#define Kino_Term_Destroy(self) \
(self)->_->destroy((kino_Obj*)self)
self->_ is the vtable; "destroy" is a member.
Under the indirect dispatch system, the vtable becomes an array of
function pointers rather than a struct with function pointer members,
and method invocation changes to something like this:
object->vtable[offset](object)
Here's how an actual pound-define might look:
#define Kino_Term_Destroy(self) \
((kino_Obj_destroy_t)((self)->_[kino_Term_destroy_OFFSET])
((kino_Obj*)self))
What this allows us to do is define the vtable layout and the offsets
dynamically during a bootstrap operation. The payoff is that a
method macro so defined retains binary compatibility even as the
composition of the vtable changes with subsequent releases.
Stated another way: if we make the layout of the current vtables part
of the public API, externally compiled code will assume that a method
like "destroy" is located at a fixed location in the vtable -- and if
the layout of the vtable changes, the externally compiled code will
jump into the wrong method. (BAD!) However, if we make that offset a
variable and set it at runtime, the externally compiled code will
always find the correct method to jump into.
Therefore, someone could write another XS library extending KS, and
upgrading KS itself wouldn't cause breakage.
There's a cost in CPU cycles for this flexibility: one extra array
look-up operation. However, GCJ uses this design, and the
performance penalty is apparently only around 2% on average:
<http://www.usenix.org/events/javavm02/yu/yu_html/node29.html>
That might seem mild, but it actually makes sense to me, at least.
On a modern, pipelining processor chip, that extra op just isn't a
big deal. When I changed InStream and OutStream into "final"
classes, so that heavily used methods like OutStream_Write_VInt()
resolved directly to function addresses and no longer needed to be
resolved via vtable double dereference, the benchmark barely budged:
<http://xrl.us/7rty (Link to mail-archives.apache.org)>
A note about type safety:
The array of function pointers will have to be implemented as an
array of void*, since we won't know which functions go where in the
vtable until runtime. This would seem to be a drawback, since in
theory we lose a certain amount of compile-time checking. However,
we aren't really losing much, if anything. The current system
doesn't perform real type checking; the first argument is always cast
(in this example, to kino_Obj*):
#define Kino_Term_Destroy(self) \
(self)->_->destroy((kino_Obj*)self)
However, at present, there *will* be a compile time error if the
vtable doesn't contain a method with the appropriate name:
/* compile-time error */
kino_Obj_destroy_t destroy_meth = self->_->destro;
We will continue to enjoy a similar level of safety because the name
of the offset variable will have to be resolved by the dynamic
loader. Say we remove the Kino_Term_Destroy method... then this code
will crash at run-time, because the kino_Term_destroy_OFFSET symbol
cannot be resolved:
destroy_meth = self->_[kino_Term_destroy_OFFSET];
Of course a run-time crash would be bad -- but that just means that
we can't redact public methods -- which we wouldn't be doing anyway.
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
[1] The technique I've out differs slightly from what's described in
the paper. For us the offsets can be stored in individual variables,
but Yu et al put them in an "otable" array which is initialized by
the Java class loader.
_______________________________________________
KinoSearch mailing list
KinoSearch@rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch