Mailing List Archive

Towards a stable C API... via indirect dispatch
Greets,

In order to present a useful public C API for KS, we need to make
method calls available -- not just functions. But in KS, inheritance
is implemented using vtables -- structs with function pointer members
-- and once those vtables are part of the public API, you can't
change the vtable struct layout without wrecking binary
compatibility. Here is an excellent explanation of the problem:

http://www.usenix.org/events/javavm02/yu/yu_html/node5.html

Freezing the vtables would severely cramp our ability to develop KS.
However, if we are unable to guarantee binary compatibility, outside
developers will be severely limited in their ability to extend KS
from C, so I've been looking for a way around this problem for a
while... Happily, I think I've found one:

"Supporting Binary Compatibility with Static Compilation"[1]
Dachuan Yu, Zhong Shao, and Valery Trifonov
<http://www.usenix.org/events/javavm02/yu/yu_html/index.html>

Right now, KS virtual method invocations look something like this:

object->vtable->method_name(object)

Here's the actual pound-define for KinoSearch::Index::Term's destroy
() method, which overrides a method inherited from
KinoSearch::Util::Obj:

#define Kino_Term_Destroy(self) \
(self)->_->destroy((kino_Obj*)self)

self->_ is the vtable; "destroy" is a member.

Under the indirect dispatch system, the vtable becomes an array of
function pointers rather than a struct with function pointer members,
and method invocation changes to something like this:

object->vtable[offset](object)

Here's how an actual pound-define might look:

#define Kino_Term_Destroy(self) \
((kino_Obj_destroy_t)((self)->_[kino_Term_destroy_OFFSET])
((kino_Obj*)self))

What this allows us to do is define the vtable layout and the offsets
dynamically during a bootstrap operation. The payoff is that a
method macro so defined retains binary compatibility even as the
composition of the vtable changes with subsequent releases.

Stated another way: if we make the layout of the current vtables part
of the public API, externally compiled code will assume that a method
like "destroy" is located at a fixed location in the vtable -- and if
the layout of the vtable changes, the externally compiled code will
jump into the wrong method. (BAD!) However, if we make that offset a
variable and set it at runtime, the externally compiled code will
always find the correct method to jump into.

Therefore, someone could write another XS library extending KS, and
upgrading KS itself wouldn't cause breakage.

There's a cost in CPU cycles for this flexibility: one extra array
look-up operation. However, GCJ uses this design, and the
performance penalty is apparently only around 2% on average:

<http://www.usenix.org/events/javavm02/yu/yu_html/node29.html>

That might seem mild, but it actually makes sense to me, at least.
On a modern, pipelining processor chip, that extra op just isn't a
big deal. When I changed InStream and OutStream into "final"
classes, so that heavily used methods like OutStream_Write_VInt()
resolved directly to function addresses and no longer needed to be
resolved via vtable double dereference, the benchmark barely budged:

<http://xrl.us/7rty (Link to mail-archives.apache.org)>

A note about type safety:

The array of function pointers will have to be implemented as an
array of void*, since we won't know which functions go where in the
vtable until runtime. This would seem to be a drawback, since in
theory we lose a certain amount of compile-time checking. However,
we aren't really losing much, if anything. The current system
doesn't perform real type checking; the first argument is always cast
(in this example, to kino_Obj*):

#define Kino_Term_Destroy(self) \
(self)->_->destroy((kino_Obj*)self)

However, at present, there *will* be a compile time error if the
vtable doesn't contain a method with the appropriate name:

/* compile-time error */
kino_Obj_destroy_t destroy_meth = self->_->destro;

We will continue to enjoy a similar level of safety because the name
of the offset variable will have to be resolved by the dynamic
loader. Say we remove the Kino_Term_Destroy method... then this code
will crash at run-time, because the kino_Term_destroy_OFFSET symbol
cannot be resolved:

destroy_meth = self->_[kino_Term_destroy_OFFSET];

Of course a run-time crash would be bad -- but that just means that
we can't redact public methods -- which we wouldn't be doing anyway.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/

[1] The technique I've out differs slightly from what's described in
the paper. For us the offsets can be stored in individual variables,
but Yu et al put them in an "otable" array which is initialized by
the Java class loader.


_______________________________________________
KinoSearch mailing list
KinoSearch@rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
Re: Towards a stable C API... via indirect dispatch [ In reply to ]
Marvin Humphrey writes:
> object->vtable[offset](object)
>
> Here's how an actual pound-define might look:
>
> #define Kino_Term_Destroy(self) \
> ((kino_Obj_destroy_t)((self)->_[kino_Term_destroy_OFFSET]) ((kino_Obj*)self))

Sounds to me like a reasonable tradeoff between flexibility and
performance, especially given how low the measured performance impact
is.

> The array of function pointers will have to be implemented as an
> array of void*,

Theoretically, object pointers (including void pointers) and function
pointers are incommensurate according to the C standard -- you get
undefined behaviour when you cast between them. Situations like this
are meant to be handled by picking an arbitrary function-pointer type
to store in the vtable array:

typedef void (*kino_method_t)(kino_Obj *);

You still have to cast back to the correct type before calling through
such a pointer, of course.

That said, I don't know if any Perl-capable platforms will demonstrate
actual breakage if you cast a function pointer to a void pointer and
back again. Either way, using the arbitrary-function-pointer approach
is more portable, and doesn't seem to have any significant cost
associated with it.

> Say we remove the Kino_Term_Destroy method... then this code
> will crash at run-time, because the kino_Term_destroy_OFFSET
> symbol cannot be resolved:
>
> destroy_meth = self->_[kino_Term_destroy_OFFSET];
>
> Of course a run-time crash would be bad -- but that just means that
> we can't redact public methods -- which we wouldn't be doing anyway.

More specifically, the failure would be at link-time, right? Unless
I'm misunderstanding, code using the new macro will contain a reference
to the kino_Term_destroy_OFFSET symbol, so the linker should fail when
trying to resolve uses of that symbol in callers. Of course, assuming
that most uses of the Kinosearch code rely on a dynamically loaded
KinoSearch.so (or local equivalent), that turns out to be roughly the
same thing as run-time anyway.

--
Aaron Crane

_______________________________________________
KinoSearch mailing list
KinoSearch@rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
Re: Towards a stable C API... via indirect dispatch [ In reply to ]
On 10/27/07, Marvin Humphrey <marvin@rectangular.com> wrote:
> In order to present a useful public C API for KS, we need to make
> method calls available -- not just functions. But in KS, inheritance
> is implemented using vtables -- structs with function pointer members
> -- and once those vtables are part of the public API, you can't
> change the vtable struct layout without wrecking binary
> compatibility.

Good summary, but I'm not sure that there are that many cases where we
would be breaking the binary API. The link you provide suggests that
things will break if we add a new method to the end of the vtable, but
I don't think this applies to the way Boilerplater works. Am I
wrong?

My instinct is that this is a fine solution to the problem, but that
it's not really a problem that we need to solve right now. I'm not
seeing how KinoSearch is different than any other shared library ---
couldn't we just bump up the major version number every time the
binary API changes?

I may just be being misled by the trivial examples you are using.
Could you offer a better example of a case where this technique would
be necessary?

Aaron Crane writes:
> More specifically, the failure would be at link-time, right? Unless
> I'm misunderstanding, code using the new macro will contain a reference
> to the kino_Term_destroy_OFFSET symbol, so the linker should fail when
> trying to resolve uses of that symbol in callers.

I'm pretty sure that in Marvin's solution the offset value is a
variable stored in the Kinosearch shared library. This is probably
what you mean in the next sentence, but to confirm, this means the
failure will be detected at dynamic link time by ld-linux.so (or
equivalent), which from a user's point of view is the same as failure
at run-time

Nathan Kurz
nate@verse.com

_______________________________________________
KinoSearch mailing list
KinoSearch@rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
Re: Towards a stable C API... via indirect dispatch [ In reply to ]
On Oct 28, 2007, at 9:05 AM, Aaron Crane wrote:

> Theoretically, object pointers (including void pointers) and function
> pointers are incommensurate according to the C standard -- you get
> undefined behaviour when you cast between them.

Ah, yes, I'd forgotten that.

Presently, the vtables are actual objects themselves, with a
'refcount' member and the whole bit. Having them be objects makes it
easier to implement dynamic subclassing, a feature which is required
by both Schema and FieldSpec, and which may come in handy elsewhere
in the future.

The vtable objects belong to the class
"KinoSearch::Util::VirtualTable". Here's the definition for
KinoSearch::Index::Term's vtable object:

KINO_TERM_VTABLE KINO_TERM = {
(KINO_OBJ_VTABLE*)&KINO_VIRTUALTABLE, /* vtable object's
vtable */
1, /* refcount */
(KINO_OBJ_VTABLE*)&KINO_OBJ, /* parent */
"KinoSearch::Index::Term", /* class name */
(kino_Obj_clone_t)kino_Term_clone,
(kino_Obj_destroy_t)kino_Term_destroy,
(kino_Obj_equals_t)kino_Term_equals,
(kino_Obj_hash_code_t)kino_Obj_hash_code,
(kino_Obj_is_a_t)kino_Obj_is_a,
(kino_Obj_to_string_t)kino_Term_to_string,
(kino_Obj_serialize_t)kino_Term_serialize,
(kino_Term_get_field_t)kino_Term_get_field,
(kino_Term_get_text_t)kino_Term_get_text,
(kino_Term_copy_t)kino_Term_copy
};

The first four member variables aren't function pointers, and I'd
kinda sorta been hoping to sneak them into the array somehow. ;) A
fifth member var will actually be needed as well: 'size' (or
something like that), describing the size of the vtable either in
bytes or in array members.

One approach is to keep the vtables as structs, with the last member
a "flexible array" of function pointers:

typedef struct kino_VTable {
KINO_OBJ_VTABLE *_;
chy_u32_t refcount;
KINO_OBJ_VTABLE *parent;
const char *class_name;
size_t size;
kino_method_t methods[];
} kino_VTable;

Flexible arrays are C99, but you can get away with them on C89 if you
declare them to be at least length 1.

kino_method_t methods[1];

You then take advantage of C's lack of bounds checking to malloc()
enough memory for however many elements you need. :) It's a hack,
but widely portable -- Perl's regex engine depends on it, for example.

The downside of having the vtable be a struct rather than an array is
that it adds an extra addition op to the process of finding the right
function pointer.

method_OFFSET * sizeof(kino_method_t)

method_OFFSET * sizeof(kino_method_t) + FIXED_OFFSET

Here's some AT&T assembler, for code implementing the array technique:

# %eax register holds method_OFFSET
# %edx register holds address of "methods" array
movl (%edx,%eax,4), %eax

Here's assembler for code using a vtable struct containing a
"methods" array:

# %eax register holds method_OFFSET
# %edx register holds vtable struct pointer
movl 20(%edx,%eax,4), %eax # <----------- NOTE extra "20"

(To see the whole context, view the attached file "need_meth.s",
which was generated from the attached file "need_meth.c" using the
command "gcc -S -Wall -Os need_meth.c" on an x86 Linux box.)

I'm not sure how much of a penalty you pay for the extra addition op
-- only a benchmark would tell -- but I'm reasonably sure it doesn't
help matters. :)

It seems to me that the only way to get away with using the array
rather than the struct containing the array involves some nasty
casting hacks. Worth it, y'think?

>> Say we remove the Kino_Term_Destroy method... then this code
>> will crash at run-time, because the kino_Term_destroy_OFFSET
>> symbol cannot be resolved:
>>
>> destroy_meth = self->_[kino_Term_destroy_OFFSET];
>>
>> Of course a run-time crash would be bad -- but that just means that
>> we can't redact public methods -- which we wouldn't be doing anyway.
>
> More specifically, the failure would be at link-time, right? Unless
> I'm misunderstanding, code using the new macro will contain a
> reference
> to the kino_Term_destroy_OFFSET symbol, so the linker should fail when
> trying to resolve uses of that symbol in callers. Of course, assuming
> that most uses of the Kinosearch code rely on a dynamically loaded
> KinoSearch.so (or local equivalent), that turns out to be roughly the
> same thing as run-time anyway.

Exactly.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
Re: Towards a stable C API... via indirect dispatch [ In reply to ]
On Oct 28, 2007, at 12:52 PM, Nathan Kurz wrote:

> Good summary, but I'm not sure that there are that many cases where we
> would be breaking the binary API. The link you provide suggests that
> things will break if we add a new method to the end of the vtable, but
> I don't think this applies to the way Boilerplater works. Am I
> wrong?

It does apply. Say we add Scorer_Advance as a new virtual method,
and deprecate Scorer_Next and Scorer_Skip_To. The size of the Scorer
vtable will increase by 1 function pointer.

Now say that you have a separate distro, with your own custom Scorer
subclass, MyScorer, which defines a single custom method,
MyScorer_Foo. Your externally compiled code expects to find
MyScorer_Foo in the vtable starting at an offset of sizeof
(KINO_SCORER_VTABLE) -- but that sizeof() was calculated when
KINO_SCORER_VTABLE was smaller than it is now. Thus, all the compiled-
in, fixed offsets for your custom method calls are now wrong.

/* This now jumps into Scorer_Advance. Bad! */
MyScorer_Foo(myscorer);

> My instinct is that this is a fine solution to the problem, but that
> it's not really a problem that we need to solve right now.

In one sense, we don't have to solve it right away, because
presenting a public C API isn't immediately necessary. But if we can
work things out, all kinds of possibilities open up. For instance,
subclassing HitCollector from Perl will never scale, because
HitColl_Collect() gets called way too often when iterating over a
large result set. But subclassing HitCollector is powerful stuff,
and I'd like to allow it.

> I'm not
> seeing how KinoSearch is different than any other shared library ---
> couldn't we just bump up the major version number every time the
> binary API changes?

It's not any different from any other shared library. But preserving
a traditional binary API is a FPITA. I expect continuing flux for
some time to come. Freezing the vtable layouts would suck, and hard.

> I may just be being misled by the trivial examples you are using.
> Could you offer a better example of a case where this technique would
> be necessary?

Say that we don't get to the positional scoring improvements soon ;)
and we revert to the original plan of publishing 0.20 with the
current scoring behavior. Yet when the positional scoring engine is
finished, we'd still like to publish it as
KSx::Search::PositionAwareScorer, so that those in the know can
override the default engine and get more relevant search results.

Under this scheme, you can release that as an XS distro, and _all_
your compiled-in method calls will continue work as advertised no
matter how much we monkey around with the vtables in the core library.

Essentially, the vtable layout is no longer part of the public binary
API. What we promise instead is that 1) a vtable exists and 2) e.g.
Kino_Scorer_Destroy_OFFSET tells you where to find the right function
pointer in that vtable.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/



_______________________________________________
KinoSearch mailing list
KinoSearch@rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
Re: Towards a stable C API... via indirect dispatch [ In reply to ]
On 10/28/07, Marvin Humphrey <marvin@rectangular.com> wrote:
> > The link you provide suggests that
> > things will break if we add a new method to the end of the vtable, but
> > I don't think this applies to the way Boilerplater works. Am I
> > wrong?
>
> It does apply. Say we add Scorer_Advance as a new virtual method,
> and deprecate Scorer_Next and Scorer_Skip_To. The size of the Scorer
> vtable will increase by 1 function pointer.
>
> Now say that you have a separate distro, with your own custom Scorer
> subclass, MyScorer, which defines a single custom method,
> MyScorer_Foo. Your externally compiled code expects to find
> MyScorer_Foo in the vtable starting at an offset of sizeof
> (KINO_SCORER_VTABLE) -- but that sizeof() was calculated when
> KINO_SCORER_VTABLE was smaller than it is now. Thus, all the compiled-
> in, fixed offsets for your custom method calls are now wrong.
>
> /* This now jumps into Scorer_Advance. Bad! */
> MyScorer_Foo(myscorer);


There must be something that I'm still not getting. I can now see
that things will break in some cases, but I don't see breakage in your
example. But maybe I've been away from the code too long.

I'm presuming you are building a KSx extension as a shared library.
Boilerplater runs, and using a local copy of the Kinosearch header
files, creates a vtable struct for MyScorer that takes into account
appropriate inheritance rules. This vtable is instantiated as a
global variable in MyScorer.so as KINO_MYSCORER in MyScorer.r, with
the external KinoSearch methods represented by symbols identified by
function names. At run-time, ld.so recognizes that MyScorer.so is
incomplete, finds the symbols in KinoSearch.so, and does whatever it
does to magically patch things up so the function pointers work.

OK, so then the user upgrades KinoSearch.so (or a second user grabs
MyScorer.so and a newer version of KinoSearch.so). From what I can
tell, the MyScorer vtable is still going to work fine. Pointers to
functions local to MyScorer.so will work as before, and pointers to
functions in the new KinoSearch.so, since they are found by ld.so by
name, should work fine as well. Thus, in my view,
MyScorer_Foo(myscorer) should work as expected so long as the symbol
names don't change, even if the format of the parent Scorer vtables
change completely. No?

Where I do see the problem is when the new KinoSearch.so tries to call
Scorer_Advance(myscorer). The new code goes to the offset in the
vtable of a new Scorer at which it expects to find an entry for the
Advance method, and instead (unknowingly) finds a pointer to
MyScorer_Foo(): boom!

But what I don't understand yet is how your proposed solution is going
to deal with this. Since the MyScorer vtable has no knowledge of the
Advance method, looking within MyScorer.so for
Kino_MyScorer_Advance_OFFSET is just going to cause ld.so to fail
during run-time linking. Since adding another level of indirection to
all of our class methods seems like a pretty convoluted way to just
check for a compatible library version, I presume I'm missing
something.

I think I can see (through a glass, darkly) that this technique could
work for a language where vtables are chained and crawled at runtime,
but even then I'm not sure it is really a good thing. Suppose that
MyScorer had overriden Scorer_Skip_To with a custom My_Scorer_Skip_To,
and then the core KinoSearch switched to using Scorer_Advance. While
the offset method you are proposing could allow MyScorer to seemlessly
abandon the custom My_Scorer_Skip_To for the default Scorer_Advance,
this silent switch to wrong behaviour seems worse than a crash.

I'm not sure I believe that architectural changes like this can or
should be dealt with automatically. I'm probably missing something,
but instinctively I think that checking for (and crashing on) major
version incompatibility is the best you can do. While the technique
you propose would essentially automate bumping up this version number,
it seems like there should be a simpler way of accomplishing that if
that is the only goal. But again, I'm probably missing something
essential here.

Nathan Kurz
nate@verse.com

_______________________________________________
KinoSearch mailing list
KinoSearch@rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
Re: Towards a stable C API... via indirect dispatch [ In reply to ]
On 10/28/07, Marvin Humphrey <marvin@rectangular.com> wrote:
>
> On Oct 28, 2007, at 9:05 AM, Aaron Crane wrote:
>
> > Theoretically, object pointers (including void pointers) and function
> > pointers are incommensurate according to the C standard -- you get
> > undefined behaviour when you cast between them.
...
> It seems to me that the only way to get away with using the array
> rather than the struct containing the array involves some nasty
> casting hacks. Worth it, y'think?

I hadn't been aware of the potential incompatibility between function
and object pointers in C. Thanks!

On the other hand, on doing some more research, I don't think this is
much of a concern in practice. I can't find primary sources, but I
find many claims that both Posix and Win32 require that they be
interchangeable.

My preference would be for using whatever is simpler and clearer to
read, so long as it compiles without warning on our target systems and
is not unreasonably ineffecient. My semi-informed guess is that
sticking with the current array approach is best.

Nathan Kurz
nate@verse.com

_______________________________________________
KinoSearch mailing list
KinoSearch@rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
Re: Towards a stable C API... via indirect dispatch [ In reply to ]
On Oct 29, 2007, at 11:35 AM, Nathan Kurz wrote:

> OK, so then the user upgrades KinoSearch.so (or a second user grabs
> MyScorer.so and a newer version of KinoSearch.so). From what I can
> tell, the MyScorer vtable is still going to work fine. Pointers to
> functions local to MyScorer.so will work as before, and pointers to
> functions in the new KinoSearch.so, since they are found by ld.so by
> name, should work fine as well. Thus, in my view,
> MyScorer_Foo(myscorer) should work as expected so long as the symbol
> names don't change, even if the format of the parent Scorer vtables
> change completely. No?

Crap, I muffed the example. :( Yu et al do a better job of
explaining things:

http://www.usenix.org/events/javavm02/yu/yu_html/node5.html

I'll take another stab at it, nonetheless.

Say MyScorer iterates over a PostingList by calling PList_Next(self-
>plist) repeatedly. When MyScorer.so is compiled, PList_Next is
located at (say) offset 48 in the PostingList vtable. If we add a
method to Obj (PostingList's ancestor), that bumps everything in
PostingList's vtable down by sizeof(kino_method_t). Now, when
MyScorer.so tries to call PList_Next, it looks for the function
pointer at offset 48 -- and jumps into PList_Seek_Lex instead of
PList_Next (which is now at offset 52).

So, if we make the vtable layout part of the binary API, we can never
add any new virtual methods without breaking compat.

> I think I can see (through a glass, darkly) that this technique could
> work for a language where vtables are chained and crawled at runtime,
> but even then I'm not sure it is really a good thing.

That's exactly what's being proposed: generating the vtables at runtime.

Incidentally, some vtables already *are* being created at runtime in
KS. All subclasses of Schema and FieldSpec use
KinoSearch::Util::DynVirtualTable to acquire one.

However, all that DynVirtualTable does right now is change the class
name. Under "indirect dispatch", the layout of the methods in the
vtable would be dynamically generated as well.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/



_______________________________________________
KinoSearch mailing list
KinoSearch@rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
Re: Towards a stable C API... via indirect dispatch [ In reply to ]
On Oct 29, 2007, at 11:52 AM, Nathan Kurz wrote:

> My preference would be for using whatever is simpler and clearer to
> read, so long as it compiles without warning on our target systems and
> is not unreasonably ineffecient.

Using a explicit struct is definitely clearer than the other
stratagem, which basically amounts to creating a struct and then
pretending that it's an array some of the time.

The extra add op is almost certainly insignificant. Let's just keep
the vtables as structs.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/



_______________________________________________
KinoSearch mailing list
KinoSearch@rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
Re: Towards a stable C API... via indirect dispatch [ In reply to ]
On Oct 29, 2007, at 12:32 PM, Marvin Humphrey wrote:

> On Oct 29, 2007, at 11:52 AM, Nathan Kurz wrote:
>
>> My preference would be for using whatever is simpler and clearer to
>> read, so long as it compiles without warning on our target systems
>> and
>> is not unreasonably ineffecient.
>
> Using a explicit struct is definitely clearer than the other
> stratagem, which basically amounts to creating a struct and then
> pretending that it's an array some of the time.
>
> The extra add op is almost certainly insignificant. Let's just
> keep the vtables as structs.

As I've been coding this up, I've refined my position.

The method_OFFSET var should hold the distance, in *bytes*, from the
top of the vtable struct. To get the address of the proper function
pointer, just add method_OFFSET and the address of the vtable together.

This is the assembler we should see when attempting to load the
correct function pointer:

movl self, %edx # load self into %edx
movl method_OFFSET, %eax # load method_OFFSET into %eax
addl (%edx), %eax # add the address of self->_ to %eax

Here's an example method macro definition to produce that assembler:

#define Kino_Term_Destroy(self) \
((kino_Obj_destroy_t)((char*)(self)->_) +
Kino_Term_Destroy_OFFSET)((kino_Obj*)self)

Note that in order for that to compile, "self" must have a member
named "_" -- so you can't just throw any old variable into a method
macro.

The vtable member for each object remains a struct.

struct kino_Obj {
kino_VTable *_;
chy_u32_t refcount;
};

VTables are always treated as structs, except when they are treated
as raw memory addresses within method macros.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/



_______________________________________________
KinoSearch mailing list
KinoSearch@rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
Re: Towards a stable C API... via indirect dispatch [ In reply to ]
On 10/29/07, Marvin Humphrey <marvin@rectangular.com> wrote:
> So, if we make the vtable layout part of the binary API, we can never
> add any new virtual methods without breaking compat.

OK, I think I got it now: the problem comes when methods are added to
the ancestors of class one is accessing in the shared library. Could
this also be solved by marking those classes as BOIL_FINAL_CLASS()?
That would have the effect of letting the run-time linker do the
versioning for you. But probably not possible if you are making use
of dynamic classes as you say.

I'm getting there, but I'm still not completely sold on the advantages
of the indirection you are contemplating. Along the lines of my
previous message, the potential for silently wrong behaviour worries
me.

There's a good paper by Ulrich Drepper on shared libraries:
http://people.redhat.com/drepper/dsohowto.pdf
I haven't read it in a while, and probably should reread it. Glancing
at it again, I note that it has what seems like a very relevant
section on ABI versioning, as well as a great overview of how ld.so
actually works.

Tangential to this, Drepper has been publishing a fantastic piece
about optimizing memory access. The latest installment is here:
http://lwn.net/SubscriberLink/255364/1e135ddf3bff4b77/
I haven't read them all, but plan to once the series is done. I think
they are dead on for the improvements that could be made for
KinoSearch.

Nathan Kurz
nate@verse.com

_______________________________________________
KinoSearch mailing list
KinoSearch@rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
Re: Towards a stable C API... via indirect dispatch [ In reply to ]
On 10/30/07, Marvin Humphrey <marvin@rectangular.com> wrote:
> The method_OFFSET var should hold the distance, in *bytes*, from the
> top of the vtable struct. To get the address of the proper function
> pointer, just add method_OFFSET and the address of the vtable together.

Check. You are probably familiar with it, but the offsetof() macro
may useful for generating these offsets. The Apache Portable Runtime
might be worth checking for portabalized macros for this. It also
might be worth considering for its mmap support.

The Linux kernel builds on this to add container_of, which may or may
not help as well. Here's a blog post that talks about its use within
the VFS, which I referred to earlier:
http://modal-echoes.blogspot.com/2007/03/implementing-polymorphism-in-c.html

> This is the assembler we should see when attempting to load the
> correct function pointer:
>
> movl self, %edx # load self into %edx
> movl method_OFFSET, %eax # load method_OFFSET into %eax
> addl (%edx), %eax # add the address of self->_ to %eax

Realize that things are going to be a little less efficient once you
are using a shared library. I think the method_OFFSET variable is
going to require another level of indirection to allow for the symbol
relocation. But I'm fuzzy on this.

Nathan Kurz
nate@verse.com

_______________________________________________
KinoSearch mailing list
KinoSearch@rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
Re: Towards a stable C API... via indirect dispatch [ In reply to ]
On Oct 30, 2007, at 2:47 PM, Nathan Kurz wrote:

> You are probably familiar with it, but the offsetof() macro
> may useful for generating these offsets.

Thanks for the reminder. offsetof() will come in handy, but it can't
be used for measuring all the offsets for this particular design.
The revised kino_VirtualTable struct looks like this:

struct kino_VirtualTable {
kino_VirtualTable *_;
KINO_OBJ_MEMBER_VARS;
kino_VirtualTable *parent;
const char *class_name;
size_t byte_size;
kino_method_t methods[1]; /* flexible array */
};

The methods will all be poured into the methods[] array at runtime
during a bootstrap operation -- so offsetof() won't help there. All
of the "subclasses" of kino_VirtualTable -- KINO_OBJ_VTABLE,
KINO_SCORER_VTABLE and the like -- had actual members that you could
take an offsetof() against; however, I think those are all going to
go away. KINO_OBJ, KINO_SCORER and the rest will all become
kino_VirtualTable* objects.

> The Apache Portable Runtime
> might be worth checking for portabalized macros for this. It also
> might be worth considering for its mmap support.

I looked into APR as Lucy was starting up. At OSCON, I saw Garrett
Rooney give a talk on it; I was especially interested because he was
the first person I know of to attempt a port of Lucene to C (the now-
defunct "Lucene4C" project at Apache).

First, APR did way, way, WAY more than we needed. Second, I wasn't
enthused about e.g. how it made you pass around apr_pool_t objects
everywhere. Third, PTSD from trying to install Subversion and going
through hell with conflicting versions of apr/apr-util/apr-whatever
colored my judgment some, though that might have been Subversion's
fault rather than APR's. :) In the end, it seemed too heavyweight
to clear the high bar I set for adding dependencies.

> The Linux kernel builds on this to add container_of, which may or may
> not help as well. Here's a blog post that talks about its use within
> the VFS, which I referred to earlier:
> http://modal-echoes.blogspot.com/2007/03/implementing-polymorphism-
> in-c.html

Interestingly, that's pretty close to how OO works in the maint
branch of KS. For instance, here's the definition of the Scorer struct:

typedef struct scorer {
void *child;
Similarity *sim;
float (*score)(struct scorer*);
bool (*next)(struct scorer*);
U32 (*doc)(struct scorer*);
bool (*skip_to)(struct scorer*, U32);
SV *similarity_sv;
} Scorer;

TermScorer, PhraseScorer, etc, all fill in their own "child" struct.

Shared vtables a la Boilerplater/C++/Java are much more compact,
allowing you to add many more methods, basically for free.

>> This is the assembler we should see when attempting to load the
>> correct function pointer:
>>
>> movl self, %edx # load self into %edx
>> movl method_OFFSET, %eax # load method_OFFSET into %eax
>> addl (%edx), %eax # add the address of self->_ to %eax
>
> Realize that things are going to be a little less efficient once you
> are using a shared library. I think the method_OFFSET variable is
> going to require another level of indirection to allow for the symbol
> relocation. But I'm fuzzy on this.

True. (grrr.)

It'll look something like this when compiled with fpic/fPIC:

movl self, %edx # Load self into %edx.
movl method_OFFSET@GOT(%ebx), %eax # Load Global Offset Table
# pointer entry for
# method_OFFSET into %eax.
movl (%eax), %eax # Dereference method_OFFSET.
addl (%edx), %eax # Add the address of self->_.

I think that stuff'll get pipelined, though -- so as long as the
vtables, the OFFSET vars, and the GOT don't cause cache misses, I
don't think the penalty will be severe.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/




_______________________________________________
KinoSearch mailing list
KinoSearch@rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
Re: Towards a stable C API... via indirect dispatch [ In reply to ]
On Oct 30, 2007, at 2:33 PM, Nathan Kurz wrote:

> OK, I think I got it now: the problem comes when methods are added to
> the ancestors of class one is accessing in the shared library. Could
> this also be solved by marking those classes as BOIL_FINAL_CLASS()?

In theory, we could add new "final" methods (though I believe we'd
have to mod Boilerplater's implementation to actually make that
work). Method macros for "final" methods are just aliases to actual
functions, and don't go through the vtable -- so adding new ones
doesn't require changes to vtable layout.

However, we couldn't do any of the following, as they all mess with
the composition of the vtable:

* add new virtual methods
* remove existing virtual methods, even non-public ones
* change the order of virtual methods

> I'm getting there, but I'm still not completely sold on the advantages
> of the indirection you are contemplating.

Well, let me put it this way:

If in order to present a C API, we have to freeze the vtables, there
won't *be* a C API.

KinoSearch's implementation details are in too much flux. The API
and file format are approaching beta status... but an ABI that
includes frozen vtables is a ways off, and it would be unreasonable
to bump major versions as frequently as I expect we would need to.
Perl and CPAN suck when it comes to handling major version changes.

> Along the lines of my previous message, the potential for silently
> wrong
> behaviour worries me.

Absolutely, that's a major concern of mine, and I hope to wind up
with something that is *more* reliable rather than less.

Binary compatibility problems with C++ shared libraries are at the
root of "DLL Hell" on Windows; if MSVC used indirect dispatch, one
source of ABI compat issues would be removed. By making the vtables
dynamic, I hope to avoid accidental incompatibilities between XS KSx
extensions and KS itself. I don't want to screw up an update and mod
a vtable somehow, causing people's apps to start segfaulting.

Some level of safety is gained by initializing all vtable pointers to
NULL -- so if a bootstrap operation fails, we'll get an immediate
segfault. Furthermore, all method_OFFSET vars will get initialized
to 0, and the first method in every vtable can be set to trigger a
fatal exception. This should compensate for the fact that the
compiler won't be able to tell whether "extern" variables have been
initialized.

Aside from that, I don't think there's inherently anything less safe
about the proposed new way of doing things, except that there'll be
some new code with new bugs.

Ideally, KS objects should be opaque. *No* struct layout details
should be described by either the API or the ABI, with the exception
that method calls will depend on finding a vtable as the first
element in any object. It won't be possible to enforce struct
layout privacy, but it should be possible to set things up so that a
conscientious author isn't *required* to write code that depends on
violating it.

> There's a good paper by Ulrich Drepper on shared libraries:
> http://people.redhat.com/drepper/dsohowto.pdf
> I haven't read it in a while, and probably should reread it. Glancing
> at it again, I note that it has what seems like a very relevant
> section on ABI versioning, as well as a great overview of how ld.so
> actually works.

It's a good paper, though the section on ABI versioning in particular
doesn't yield much because it focuses on versioning controls
within .so files, and at the moment, KS versioning happens at the
Perl level.

> Tangential to this, Drepper has been publishing a fantastic piece
> about optimizing memory access.

Cache management of the OO infrastructure is something I'm trying to
take into account, as it seems likely that the vtables will be
accessed constantly. Right now, all of the vtable pointers and the
method_OFFSET vars are getting dumped into a single C file, so that
they'll be concentrated together. Perl does something similar: its
most frequently accessed op codes are all in one file, pp_hot.c.

The vtables themselves will be dynamically allocated. The classic
way to handle things would be one malloc() per object, but I think
what we might try instead is grabbing memory in page-sized chunks, so
that the vtables will be packed together into a few memory pages.

I'm also trying not to spend too much time on the optimization aspect
of this, since its unclear how much of a payoff we'll see. :)
Flexibility, reliability and API stability are all more important
than the difference between decent optimization and excellent
optimization.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/



_______________________________________________
KinoSearch mailing list
KinoSearch@rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch