Mailing List Archive

RE: storing XML can be good... was Re: ANNOUNCE: PT_XP ath
But for value-added applications, I just don't have time to implement custom
objects for every type of field I would need to store. If my organization
only uses 12 fields (reads/writes to) out of 200 possible fields, there is
no sense in having me value-subtact from the data that is on a complext
journey between complex systems. DOM provides me the ability to have an
object model, and not have to sacrifice date integrity, especially if Zope
is a middle-point between several systems needing to intercahge data. Could
I build my own object model and be able to expect better performance? Of
course, but do I have time? I've got to be a pragmitist on this one most of
the time. Caching mitigates some of the read-oriented performance concerns,
anyway. I would rather just throw more hardware at Zope (i.e. more ZEO node
boxes, more RAM & CPU speed) than have to lose this flexibility.

What would be nice is if the likelyhood of write conflicts in a
write-intensive system could be reduced by not needing to lock the whole DOM
object, but this, seems to be only a minor issue for my particular
application, so I think I'm willing to live with this as a limitaion,
becuase adding useable fields becomes simply a matter of writing an API
method (well 2 - one each for read / write), and not a whole new data
structure / class. This means that if a user of software (not a core
developer) doing this needed to extend the API on the fly with some TTW
python scripts, they could quickly and easily if they could speak DOM.

Sean

-----Original Message-----
From: Martijn Pieters [mailto:mj@zope.com]
Sent: Monday, October 22, 2001 11:27 AM
To: sean.upton@uniontrib.com
Cc: zope-xml@zope.org
Subject: Re: storing XML can be good... was Re: [Zope-xml] ANNOUNCE:
PT_XPath


On Mon, Oct 22, 2001 at 10:40:09AM -0700, sean.upton@uniontrib.com wrote:
> I hear more and more often that XML should not be stored, and just be an
> interchange format. And in most cases, that is likely to be correct,
but...
>
> I think there are some very convincing reasons to, indeed, store XML on
the
> server for particular use cases. The best possible illustration that I
can
> think of is a use case that I have; sorry that this 'manifesto for storing
> XML' is so lengthy, but I think this position needs definitive
justification
> - and I think that for this use case below, I'm right... ;)
>
> Eventually, for our internal use, my company will likely be creating a
> product that will support the use and storage of NITF (News Industry Text
> Format) and NewsML - both standards for news content defined by the IPTC
> (International Press Telecommunications Council) and/or the Newspaper
> Association of America (NAA). These DTDs were designed to address
> deficiencies in current information flow in the newspaper industry; NITF,
in
> particular, began in the pre-XML days, as an SGML DTD with the specific
goal
> of allowing news producers/vendors to scrap the 150 or so different
> proprietary text interchange and storage formats used by the industry.
The
> idea was that in translation (filtering) between lots of formats,
> information is almost always lost, and always costly.

To me, this is still no reason to justify an XML storage. Note that Zope
doesn't force you to use a fixed number of fields; the object tree and
python provide you with much more flexibility. I am convinced that it is
posible to build a Zope app with the same fidelity as the NITF, with
lossless conversion between the internal storage format and NITF.

Using Zope objects instead of XML would buy you speed and scalability. DOMs
are memory hogs, and CPU intensive to build and manipulate. Only use a DOM
when a custom API isn't feasable. Don't carry around all that extra weight
if you can avoid it!

--
Martijn Pieters
| Software Engineer mailto:mj@zope.com
| Zope Corporation http://www.zope.com/
| Creators of Zope http://www.zope.org/
---------------------------------------------
RE: storing XML can be good... was Re: ANNOUNCE: PT_XP ath [ In reply to ]
I think this is an interesting approach that would also work: merge your
flat changes into righ XML as needed, preserving the XML structure and
previously value added data...

The advantage to ParsedXML, of course, is that for read purposes, it is
already parsed into a DOM, and shouldn't require parsing, if I understand
correctly, except for the occasional re-read after modification.

I never understood the use of BLOBs in RDB when using coupled with an object
database. Perhaps one solution is to have a container-type object in the
ZODB that contains both a dumb-slow-DOM and a bunch of
smart-fielded-objects, and a merge with the dumb-slow-DOM could be done from
the custom objects as needed... In my case, though, I would want people
other than myself to be able to extend the API as needed, so I still like
the simplicity of being able to support new behaviors in TTW python scripts
that could be written by a slightly techical person with enough experience
playing with DOM to get what they want.

Sean

-----Original Message-----
From: Wade Leftwich [mailto:wade@lightlink.com]
Sent: Monday, October 22, 2001 12:11 PM
To: Martijn Pieters; sean.upton@uniontrib.com
Cc: zope-xml@zope.org
Subject: Re: storing XML can be good... was Re: [Zope-xml] ANNOUNCE:
PT_XPath


I am in a situation similar to Sean's, sending and receiving a bunch of
newsfeeds in NITF format. (Hi Sean, i think we met up on the NITF list a
while back.)

Because I'm adding on to (and hoping to replace) an ASP system, I've got all
my data in MSSQL. My tables have the usual columns that Sean alluded to:
headline, byline, date, etc. I also keep the original NITF-
XML in a bigtext field. I run a cron job to rewrite the XML if any of the
other fields gets edited.

Right now I don't think I would want to give up the RDB storage, even if I
didn't need it for historical reasons. While each article is a tree, the
organization at higher levels is definitely tabular. And many people
understand RDB's.

I would love to be able to access the tree structure of each article within
Zope, without first parsing XML to a DOM. Having done a fair amount of work
with DOM, I hear what Martijn is saying about it being fat and
slow. What about an object representation, like Martijn suggested, that
could live in a BLOB in an RDB?

Wade Leftwich
Ithaca, NY



10/22/2001 2:26:41 PM, Martijn Pieters <mj@zope.com> wrote:

>On Mon, Oct 22, 2001 at 10:40:09AM -0700, sean.upton@uniontrib.com wrote:
>> I hear more and more often that XML should not be stored, and just be an
>> interchange format. And in most cases, that is likely to be correct,
but...
>>
>> I think there are some very convincing reasons to, indeed, store XML on
the
>> server for particular use cases. The best possible illustration that I
can
>> think of is a use case that I have; sorry that this 'manifesto for
storing
>> XML' is so lengthy, but I think this position needs definitive
justification
>> - and I think that for this use case below, I'm right... ;)
>>
>> Eventually, for our internal use, my company will likely be creating a
>> product that will support the use and storage of NITF (News Industry Text
>> Format) and NewsML - both standards for news content defined by the IPTC
>> (International Press Telecommunications Council) and/or the Newspaper
>> Association of America (NAA). These DTDs were designed to address
>> deficiencies in current information flow in the newspaper industry; NITF,
in
>> particular, began in the pre-XML days, as an SGML DTD with the specific
goal
>> of allowing news producers/vendors to scrap the 150 or so different
>> proprietary text interchange and storage formats used by the industry.
The
>> idea was that in translation (filtering) between lots of formats,
>> information is almost always lost, and always costly.
>
>To me, this is still no reason to justify an XML storage. Note that Zope
>doesn't force you to use a fixed number of fields; the object tree and
>python provide you with much more flexibility. I am convinced that it is
>posible to build a Zope app with the same fidelity as the NITF, with
>lossless conversion between the internal storage format and NITF.
>
>Using Zope objects instead of XML would buy you speed and scalability. DOMs
>are memory hogs, and CPU intensive to build and manipulate. Only use a DOM
>when a custom API isn't feasable. Don't carry around all that extra weight
>if you can avoid it!
>
>--
>Martijn Pieters
>| Software Engineer mailto:mj@zope.com
>| Zope Corporation http://www.zope.com/
>| Creators of Zope http://www.zope.org/
>---------------------------------------------
>
>_______________________________________________
>Zope-xml mailing list
>Zope-xml@zope.org
>http://lists.zope.org/mailman/listinfo/zope-xml
>
>
RE: storing XML can be good... was Re: ANNOUNCE: PT_XP ath [ In reply to ]
> -----Original Message-----
> From: Martijn Pieters [mailto:mj@zope.com]
> Sent: Tuesday, 23 October 2001 4:27 AM
> To: sean.upton@uniontrib.com
> Cc: zope-xml@zope.org
> Subject: Re: storing XML can be good... was Re: [Zope-xml] ANNOUNCE:
> PT_XPath
>
>
> On Mon, Oct 22, 2001 at 10:40:09AM -0700,
> sean.upton@uniontrib.com wrote:
> > I hear more and more often that XML should not be stored,
> and just be an
> > interchange format. And in most cases, that is likely to
> be correct, but...
> >
> > I think there are some very convincing reasons to, indeed,
> store XML on the
> > server for particular use cases. The best possible
> illustration that I can
> > think of is a use case that I have; sorry that this
> 'manifesto for storing
> > XML' is so lengthy, but I think this position needs
> definitive justification
> > - and I think that for this use case below, I'm right... ;)
> >
> > Eventually, for our internal use, my company will likely be
> creating a
> > product that will support the use and storage of NITF (News
> Industry Text
> > Format) and NewsML - both standards for news content
> defined by the IPTC
> > (International Press Telecommunications Council) and/or the
> Newspaper
> > Association of America (NAA). These DTDs were designed to address
> > deficiencies in current information flow in the newspaper
> industry; NITF, in
> > particular, began in the pre-XML days, as an SGML DTD with
> the specific goal
> > of allowing news producers/vendors to scrap the 150 or so different
> > proprietary text interchange and storage formats used by
> the industry. The
> > idea was that in translation (filtering) between lots of formats,
> > information is almost always lost, and always costly.
>
> To me, this is still no reason to justify an XML storage.
> Note that Zope
> doesn't force you to use a fixed number of fields; the object tree and
> python provide you with much more flexibility. I am convinced
> that it is
> posible to build a Zope app with the same fidelity as the NITF, with
> lossless conversion between the internal storage format and NITF.
>
> Using Zope objects instead of XML would buy you speed and
> scalability. DOMs
> are memory hogs, and CPU intensive to build and manipulate.
> Only use a DOM
> when a custom API isn't feasable. Don't carry around all that
> extra weight
> if you can avoid it!

The actualy physical storage is kind of irrelevent. If it's stored as
objects vs XML text fragments make no difference (other than efficency of
course). What is important is two things:

- It can be represented as WC3 DOM so it can be manipulated using a
standard API

- There is a easy flexible mechanizm for adjusting its storage policy. eg I
can say that all Blah elements need to be indexed, or all foo sub trees can
be chunked as one object since they will always be accessed togeather.

ParsedXML isn't this since it only has one storage, one large text fragment.
XMLDocument also had one policy of one object per element. Custom Zope
objects each with DOM interfaces can be flexible however it would be very
time consuming to change from one policy to another. You would have to
difine new sets of objects and then write conversion scripts that delete the
old objects and create new ones.

So what does that leave? A hypothetical product. Let's call it FlexiXML. I
can import a lot of XML and it will use a defualt policy, let's say that is
to store the a compact parsed representation and do the DOM or XML on the
fly. It would also give a folder, document, properties Zope like API.
Then later I want to optimize this store for both speed and storage
efficency. I specify a couple of XPath queries to select elements that
should be treated as chunked into one ZODB object so to improve storage
efficiency (eg a one whole HR employee record). This then changes the
underlying stucture of the database. I specify another XPath that will
nominate the HR employee key to be indexed. Some of my XPath queries would
then become much faster.

On top of this you can then sepecify a mapping between element types and
Zope interfaces. Then you can use the component archtecture to specify
additional views, UI, adapters etc for different element types. This gives
you the XMLWidgets functionality++.

What's important about all of this? A developer can concentrate on a data
structure for the data first and then optimize it easily later. IMHO that
was the killer functionality of RDBMS and that is something zope is weak on.
RE: storing XML can be good... was Re: ANNOUNCE: PT_XP ath [ In reply to ]
I'll have to take some time to digest this fully, but my initial reactions:

- Isn't parsedXML's single storage NOT 'one large text fragment' but a DOM;
the only thing is that the DOM is one Zope object, and must be completely
rewritten on every write? 'One large text fragment' - i.e. what you see on
edit - is just a rendering of the DOM?

- The quickest route to a solution now will likely be the best. Zope's XML
support is strong for several application use-cases now. I have used it for
several applications using ParsedXML, and am relatively happy. That said, I
know that there are improvements that need to be made, especially with Big
documents. The obvious solution that a lot of people are agreeing with here
is chunking using path expressions as boundaries. I might suggest (perhaps
naively) like I did yesterday, that the easiest route to doing this might be
to solidify ParsedXML as it is, get it to pass the unit-tests, etc, and
build a proxy container object around it around it. The container would
contain 1..n number of ParsedXML objects, and a bunch of properties
containing chunking borders expressed as XPath statements. The container
folder object (lets call it 'BigXML') would act as a proxy with traffic
director responsibilities, to read and write from the correct underlying
DocumentFragments stored as ParsedXML...

Sean

-----Original Message-----
From: Jay, Dylan [mailto:djay@avaya.com]
Sent: Monday, October 22, 2001 5:33 PM
To: 'Martijn Pieters'; sean.upton@uniontrib.com
Cc: zope-xml@zope.org
Subject: RE: storing XML can be good... was Re: [Zope-xml] ANNOUNCE:
PT_XP ath


> -----Original Message-----
> From: Martijn Pieters [mailto:mj@zope.com]
> Sent: Tuesday, 23 October 2001 4:27 AM
> To: sean.upton@uniontrib.com
> Cc: zope-xml@zope.org
> Subject: Re: storing XML can be good... was Re: [Zope-xml] ANNOUNCE:
> PT_XPath
>
>
> On Mon, Oct 22, 2001 at 10:40:09AM -0700,
> sean.upton@uniontrib.com wrote:
> > I hear more and more often that XML should not be stored,
> and just be an
> > interchange format. And in most cases, that is likely to
> be correct, but...
> >
> > I think there are some very convincing reasons to, indeed,
> store XML on the
> > server for particular use cases. The best possible
> illustration that I can
> > think of is a use case that I have; sorry that this
> 'manifesto for storing
> > XML' is so lengthy, but I think this position needs
> definitive justification
> > - and I think that for this use case below, I'm right... ;)
> >
> > Eventually, for our internal use, my company will likely be
> creating a
> > product that will support the use and storage of NITF (News
> Industry Text
> > Format) and NewsML - both standards for news content
> defined by the IPTC
> > (International Press Telecommunications Council) and/or the
> Newspaper
> > Association of America (NAA). These DTDs were designed to address
> > deficiencies in current information flow in the newspaper
> industry; NITF, in
> > particular, began in the pre-XML days, as an SGML DTD with
> the specific goal
> > of allowing news producers/vendors to scrap the 150 or so different
> > proprietary text interchange and storage formats used by
> the industry. The
> > idea was that in translation (filtering) between lots of formats,
> > information is almost always lost, and always costly.
>
> To me, this is still no reason to justify an XML storage.
> Note that Zope
> doesn't force you to use a fixed number of fields; the object tree and
> python provide you with much more flexibility. I am convinced
> that it is
> posible to build a Zope app with the same fidelity as the NITF, with
> lossless conversion between the internal storage format and NITF.
>
> Using Zope objects instead of XML would buy you speed and
> scalability. DOMs
> are memory hogs, and CPU intensive to build and manipulate.
> Only use a DOM
> when a custom API isn't feasable. Don't carry around all that
> extra weight
> if you can avoid it!

The actualy physical storage is kind of irrelevent. If it's stored as
objects vs XML text fragments make no difference (other than efficency of
course). What is important is two things:

- It can be represented as WC3 DOM so it can be manipulated using a
standard API

- There is a easy flexible mechanizm for adjusting its storage policy. eg I
can say that all Blah elements need to be indexed, or all foo sub trees can
be chunked as one object since they will always be accessed togeather.

ParsedXML isn't this since it only has one storage, one large text fragment.
XMLDocument also had one policy of one object per element. Custom Zope
objects each with DOM interfaces can be flexible however it would be very
time consuming to change from one policy to another. You would have to
difine new sets of objects and then write conversion scripts that delete the
old objects and create new ones.

So what does that leave? A hypothetical product. Let's call it FlexiXML. I
can import a lot of XML and it will use a defualt policy, let's say that is
to store the a compact parsed representation and do the DOM or XML on the
fly. It would also give a folder, document, properties Zope like API.
Then later I want to optimize this store for both speed and storage
efficency. I specify a couple of XPath queries to select elements that
should be treated as chunked into one ZODB object so to improve storage
efficiency (eg a one whole HR employee record). This then changes the
underlying stucture of the database. I specify another XPath that will
nominate the HR employee key to be indexed. Some of my XPath queries would
then become much faster.

On top of this you can then sepecify a mapping between element types and
Zope interfaces. Then you can use the component archtecture to specify
additional views, UI, adapters etc for different element types. This gives
you the XMLWidgets functionality++.

What's important about all of this? A developer can concentrate on a data
structure for the data first and then optimize it easily later. IMHO that
was the killer functionality of RDBMS and that is something zope is weak on.
Re: storing XML can be good... was Re: ANNOUNCE: PT_XP ath [ In reply to ]
On Tue, Oct 23, 2001 at 09:54:06AM -0700, sean.upton@uniontrib.com wrote:
> - Isn't parsedXML's single storage NOT 'one large text fragment' but a DOM;
> the only thing is that the DOM is one Zope object, and must be completely
> rewritten on every write? 'One large text fragment' - i.e. what you see on
> edit - is just a rendering of the DOM?

Indeed, ParsedXM Lonly holds a DOM, the original text string is never
stored, and it'll serialize the DOM into a string only if requested to do
so.

> - The quickest route to a solution now will likely be the best. Zope's XML
> support is strong for several application use-cases now. I have used it for
> several applications using ParsedXML, and am relatively happy. That said, I
> know that there are improvements that need to be made, especially with Big
> documents. The obvious solution that a lot of people are agreeing with here
> is chunking using path expressions as boundaries. I might suggest (perhaps
> naively) like I did yesterday, that the easiest route to doing this might be
> to solidify ParsedXML as it is, get it to pass the unit-tests, etc, and
> build a proxy container object around it around it. The container would
> contain 1..n number of ParsedXML objects, and a bunch of properties
> containing chunking borders expressed as XPath statements. The container
> folder object (lets call it 'BigXML') would act as a proxy with traffic
> director responsibilities, to read and write from the correct underlying
> DocumentFragments stored as ParsedXML...

It'll be just as easy to modify nodes down the tree by swapping them with a
class that includes Persistant in it's bases. Porbably easier, in fact. This
will have the effect of it being stored in its own transaction. We may even
be able to modify the classes on the fly and have them persist as sperate
records that way.

BigXML would bring the implementation too much too the foreground, with lots
of added overhead and too much UI.

--
Martijn Pieters
| Software Engineer mailto:mj@zope.com
| Zope Corporation http://www.zope.com/
| Creators of Zope http://www.zope.org/
---------------------------------------------
RE: storing XML can be good... was Re: ANNOUNCE: PT_XP ath [ In reply to ]
I think that makes much more sense. I told you my idea was naive ;)

Sean

-----Original Message-----
From: Martijn Pieters [mailto:mj@zope.com]
Sent: Tuesday, October 23, 2001 10:56 AM
To: sean.upton@uniontrib.com
Cc: djay@avaya.com; zope-xml@zope.org
Subject: Re: storing XML can be good... was Re: [Zope-xml] ANNOUNCE:
PT_XP ath


On Tue, Oct 23, 2001 at 09:54:06AM -0700, sean.upton@uniontrib.com wrote:
> - Isn't parsedXML's single storage NOT 'one large text fragment' but a
DOM;
> the only thing is that the DOM is one Zope object, and must be completely
> rewritten on every write? 'One large text fragment' - i.e. what you see
on
> edit - is just a rendering of the DOM?

Indeed, ParsedXM Lonly holds a DOM, the original text string is never
stored, and it'll serialize the DOM into a string only if requested to do
so.

> - The quickest route to a solution now will likely be the best. Zope's
XML
> support is strong for several application use-cases now. I have used it
for
> several applications using ParsedXML, and am relatively happy. That said,
I
> know that there are improvements that need to be made, especially with Big
> documents. The obvious solution that a lot of people are agreeing with
here
> is chunking using path expressions as boundaries. I might suggest
(perhaps
> naively) like I did yesterday, that the easiest route to doing this might
be
> to solidify ParsedXML as it is, get it to pass the unit-tests, etc, and
> build a proxy container object around it around it. The container would
> contain 1..n number of ParsedXML objects, and a bunch of properties
> containing chunking borders expressed as XPath statements. The container
> folder object (lets call it 'BigXML') would act as a proxy with traffic
> director responsibilities, to read and write from the correct underlying
> DocumentFragments stored as ParsedXML...

It'll be just as easy to modify nodes down the tree by swapping them with a
class that includes Persistant in it's bases. Porbably easier, in fact. This
will have the effect of it being stored in its own transaction. We may even
be able to modify the classes on the fly and have them persist as sperate
records that way.

BigXML would bring the implementation too much too the foreground, with lots
of added overhead and too much UI.

--
Martijn Pieters
| Software Engineer mailto:mj@zope.com
| Zope Corporation http://www.zope.com/
| Creators of Zope http://www.zope.org/
---------------------------------------------

_______________________________________________
Zope-xml mailing list
Zope-xml@zope.org
http://lists.zope.org/mailman/listinfo/zope-xml
Re: storing XML can be good... was Re: ANNOUNCE: PT_XP ath [ In reply to ]
"Jay, Dylan" <djay@avaya.com> writes:

> The actualy physical storage is kind of irrelevent. If it's stored as
> objects vs XML text fragments make no difference (other than efficency of
> course).

Well, so long as you don't mind losing any non-XML information if and
when you change from rich objects to pure XML-oriented objects. For
example, right now you have to give up using any Zope attributes, such
as permissions, within a ParsedXML tree. You have to apply that kind
of attribute externally.

> What is important is two things:
>
> - It can be represented as WC3 DOM so it can be manipulated using a
> standard API
>
> - There is a easy flexible mechanizm for adjusting its storage policy. eg I
> can say that all Blah elements need to be indexed, or all foo sub trees can
> be chunked as one object since they will always be accessed togeather.
>
> ParsedXML isn't this since it only has one storage, one large text fragment.
> XMLDocument also had one policy of one object per element. Custom Zope
> objects each with DOM interfaces can be flexible however it would be very
> time consuming to change from one policy to another. You would have to
> difine new sets of objects and then write conversion scripts that delete the
> old objects and create new ones.

I would like to be able to choose my storage according to my needs -
single Zope DOM objects with ParsedXML subtrees, for example.

I don't see it as being especially hard for the developer - all of
these objects need to be XML-centric, so, for example, initializing
one DOM subtree from an existing one and replacing the original
doesn't sound like a big deal. You'll have to be prepared to lose any
non-XML properties if you move from a Zope-rich to XML-bare
environment, or assign properties if you go in the other direction,
but that's unavoidable.

I never thought that it was a requirement that the change be efficient
time or space wise, however. I see it as an architectural change.

--
Karl Anderson kra@monkey.org http://www.monkey.org/~kra/
RE: storing XML can be good... was Re: ANNOUNCE: PT_XP ath [ In reply to ]
> -----Original Message-----
> From: Karl Anderson [mailto:kra@monkey.org]
> Sent: Monday, 29 October 2001 12:41 PM
> To: Jay, Dylan
> Cc: 'Martijn Pieters'; sean.upton@uniontrib.com; zope-xml@zope.org
> Subject: Re: storing XML can be good... was Re: [Zope-xml] ANNOUNCE:
> PT_XP ath
>
>
> "Jay, Dylan" <djay@avaya.com> writes:
>
> > The actualy physical storage is kind of irrelevent. If it's
> stored as
> > objects vs XML text fragments make no difference (other
> than efficency of
> > course).
>
> Well, so long as you don't mind losing any non-XML information if and
> when you change from rich objects to pure XML-oriented objects. For
> example, right now you have to give up using any Zope attributes, such
> as permissions, within a ParsedXML tree. You have to apply that kind
> of attribute externally.

I wa just talking about transforming some stored XML with one storage policy
to another storage policy. I wasn't talking about transforming other kinds
of zope content to XML. That would be an interesting use case however.