Mailing List Archive: inside-out objects

inside-out objects

Nov 20, 2007, 12:18 PM

Post #1 of 4 (2635 views)

Marvin,

I see from recent KS commits that you are refactoring to use inside-out
objects. I know that's a fairly hot design choice these days for Perl. Are you
finding it makes it easier to do things with XS, C and the reference counting?
Will you share with the class why you've decided to go the inside-out route?

Thanks.
pek
--
Peter Karman . peter@peknet.com . http://peknet.com/

_______________________________________________
KinoSearch mailing list
KinoSearch@rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch

Re: inside-out objects [ In reply to ]

marvin at rectangular

Nov 20, 2007, 5:14 PM

Post #2 of 4 (2483 views)

Permalink

On Nov 20, 2007, at 12:18 PM, Peter Karman wrote:

> I see from recent KS commits that you are refactoring to use inside-
> out
> objects. I know that's a fairly hot design choice these days for Perl.

Inside-out objects are typically touted for three reasons. From
<http://www.perlfoundation.org/perl5/index.cgi?inside_out_object>:

* Option to enforce privacy.
* Proper namespacing for member variables.
* Isolation from implementation details.

The first point, enforced privacy, is not a concern. In fact, the KS
flavor of inside-out uses "our" rather than "my" variables to
facilitate introspection and debugging.

The second point, proper namespacing, is kinda nice, but hash-key
typos aren't really a problem within the KS code base -- yay for Vim
auto-complete. :) Typos in parameter labels are always a concern,
but the inside-out model doesn't offer any advantage there and KS has
other means of dealing with those.

It's the third point that makes all the difference.

With the inside-out pattern, we can take a C struct based on
Boilerplater, stuff it into a blessed scalar ref and enjoy nearly all
the advantages of traditional Perl objects. So we have something
extremely usable from C and highly usable from Perl.

In contrast, a typical hash-based Perl object is extremely usable
from Perl... but quite awkward to use from C. Perl objects can't be
used polymorphically from C in place of Boilerplater objects. You
have to wrap the Perl object in a Boilerplater object and override
every method you want to use.

Recent changes to KS have made it much easier to call back to Perl
(or potentially any other host language) from C. Right now,
MockScorer is the exception, but soon all KS classes will follow its
model: base classes written in C which may be subclassed using either
Perl or C.

> Are you finding it makes it easier to do things with XS, C and the
> reference counting?

Object destruction is a huge issue and has been one of the hardest
things to get right. KinoSearch::Util::Nat, which will become
KinoSearch::Obj, solves many problems by caching a Perl object at
construction-time and relying solely on the Perl refcount. This is a
big enough topic that I'll post about it separately.

Another advantage lies in unifying the OO structure of KS. Right
now, there are two different kinds of KinoSearch classes: those with
a C struct at their core, and those with a Perl object at their
core. Having only one type of object will make things easier for a
variety of reasons.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/

_______________________________________________
KinoSearch mailing list
KinoSearch@rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch

Re: inside-out objects [ In reply to ]

marvin at rectangular

Nov 20, 2007, 5:48 PM

Post #3 of 4 (2480 views)

Permalink

On Nov 20, 2007, at 12:18 PM, Peter Karman wrote:

> Are you finding it makes it easier to do things with XS, C and the
> reference counting?

KS objects under anything other than the new, temporary class
KinoSearch::Util::Nat maintain their own refcount, separate from
Perl. When a Perl object wrapping a KS object has its SvREFCNT fall
to 0, the DESTROY method which gets called is
KinoSearch::Util::Obj::DESTROY, which simply decrements the KS
object's internal refcount rather than invoking Kino_Obj_Destroy(obj).

void
DESTROY(self)
kino_Obj *self;
PPCODE:
REFCOUNT_DEC(self);

We have to do things that way because there are many KS objects which
Perl doesn't know about. For instance, when TopDocCollector's C
constructor TDColl_new() is invoked, it creates its own HitQueue
object without telling Perl anything about it. However, should we
need to deal with that HitQueue from Perl-space, we have to wrap it
in a Perl object. That's what happens here:

{
my $hit_queue = $collector->get_hit_queue;
} # $hit_queue goes out of scope, DESTROY called

Currently, when that $hit_queue goes out of scope, the Perl wrapper
object gets destroyed. However, the interior KS HitQueue object must
not be destroyed, because $collector still needs it.

As a consequence, KS objects can reappear wrapped in several
different Perl objects, which is rather strange and is probably a bug
waiting to bite someone. Here's an example of how things can go
wrong: cycling through multiple Perl objects doesn't work well with
the inside-out pattern, because DESTROY gets invoked over and over
again, necessitating a broken hack like this...

sub DESTROY {
my $self = shift;
if ($self->refcount < 2) {
delete $inside_out_var{$$self};
}
$self->SUPER::DESTROY;
}

That hack doesn't even work reliably because if the last refcount
gets decremented by KS internally, the Perl DESTROY method will never
get called and any inside-out vars will leak.

The solution is to cache a Perl object within a KS object, so that
effectively Perl *does* know about it. That's the difference between
Nat and Obj. Under Nat, the refcounting is handled via the cached
Perl object. There are no longer two refcounts.

One drawback of this design, though, is that Perl objects are
heavyweight. That's ok for big stuff like a PostingList, but it's
not-so-great for small stuff like a ByteBuf, a Token, or a TermInfo.
If we were to put a Perl object into every last one of those, I'd be
concerned both about memory usage and performance.

My current plan is to override the refcounting infrastructure for
small classes by basing them off of a "FastObj" class which will use
an integer refcount as Obj does now. The scheme is more complicated
to implement than I'd like, and it will have the one-KS-object-many-
Perl-objects problem for anything that subclasses FastObj. But it
will work in the near term and maybe it won't be so bad.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/

_______________________________________________
KinoSearch mailing list
KinoSearch@rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch

Re: inside-out objects [ In reply to ]

peter at peknet

Nov 20, 2007, 7:06 PM

Post #4 of 4 (2493 views)

Permalink

Marvin Humphrey wrote on 11/20/07 7:48 PM:

> The solution is to cache a Perl object within a KS object, so that
> effectively Perl *does* know about it. That's the difference between
> Nat and Obj. Under Nat, the refcounting is handled via the cached Perl
> object. There are no longer two refcounts.

Now I feel like my decision to take this very same approach in the libswish3
perl bindings was not so crazy after all! :)

Nice explanation, Marvin. Thanks.

--
Peter Karman . http://peknet.com/ . peter@peknet.com

_______________________________________________
KinoSearch mailing list
KinoSearch@rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch