Mailing List Archive

Advice on Blob Storage?
This may be more of a zodb / relstorage question - I hope it's ok to ask
on the Zope list.

I'm seeing behavior using relstorage and blobs that I didn't expect:
If I upload a large file, say 2 gigs, I am noticing that our SQL
database also grows by 2 Gigs, along with the blob storage.
After a pack, the space is reclaimed on the SQL side, and everyone is
happy.
FWIW - it's videos that are doing this.

I am pretty sure it's the undo log that's growing, based on the fact
that a pack reclaims the space.

Can this behavior be turned off for a specific field or content type?
So undo logs are preserved for everything BUT this monster of a content
type?

Seems strange to do this tho.

Are there other alternatives, like calling .pack() directly on the
field's storage after it's set?

Our problem is that our sql database grows to a huge size between our
weekly packs, and backups of the sql dumps are becoming unmanageable.
Our blob backups are ready to deal with this kind of size, but not the
sql backups.

----------
Going deeper down the rabbit hole, although I don't think it's relevant,
is the fact that I hacked and replaced the storage class for the field.
Instead of using AnnotationStorage - which I found used as default for
ImageField - I intercept the data during storage.set(), ship it out to a
separate storage facility, and replace the data with a happy message
"This is not where your data is" which is then written to the blobs.
It works just great - keeping our blob storage growth from going
crazy. If you try to 'download' the file from Plone, you'll get the
text file with the happy message.

Now that I've been shown that the Blob Storage is functioning just fine,
but the SQL storage size is going off the charts, I hope I'm not back at
square one.

The goal is to allow users to think they are uploading 4Gb videos into
Plone, when under the covers, we're actually shipping the video files
off to some fancy off-site storage. (Akamai) So we don't have to store
them and back them up on-site, and our blob directories remain
manageable in size.

The storage hack can be seen here:
https://github.com/RadioFreeAsia/rfa.kaltura/blob/master/rfa/kaltura/storage/storage.py


I'm not proud of it, but it works.


--
Mike McFadden
Radio Free Asia
Technical Operations Division
2025 M Street NW
Washington DC 20036 USA

This e-mail message is intended only for the use of the addressee and may contain information that is privileged and confidential. Any unauthorized dissemination, distribution or copying is strictly prohibited. If you receive this transmission in error, please contact network@rfa.org.


_______________________________________________
Zope maillist - Zope@zope.org
https://mail.zope.org/mailman/listinfo/zope
** No cross posts or HTML encoding! **
(Related lists -
https://mail.zope.org/mailman/listinfo/zope-announce
https://mail.zope.org/mailman/listinfo/zope-dev )
Re: Advice on Blob Storage? [ In reply to ]
On Mon, 21 Sep 12:24:45 PM Michael McFadden wrote:
> This may be more of a zodb / relstorage question - I hope it's ok to ask
> on the Zope list.
>
> I'm seeing behavior using relstorage and blobs that I didn't expect:
> If I upload a large file, say 2 gigs, I am noticing that our SQL
> database also grows by 2 Gigs, along with the blob storage.
> After a pack, the space is reclaimed on the SQL side, and everyone is
> happy.
> FWIW - it's videos that are doing this.
>
> I am pretty sure it's the undo log that's growing, based on the fact
> that a pack reclaims the space.
>
> Can this behavior be turned off for a specific field or content type?
> So undo logs are preserved for everything BUT this monster of a content
> type?
>
> Seems strange to do this tho.
>
> Are there other alternatives, like calling .pack() directly on the
> field's storage after it's set?
>
> Our problem is that our sql database grows to a huge size between our
> weekly packs, and backups of the sql dumps are becoming unmanageable.
> Our blob backups are ready to deal with this kind of size, but not the
> sql backups.
>
> ----------
> Going deeper down the rabbit hole, although I don't think it's relevant,
> is the fact that I hacked and replaced the storage class for the field.
> Instead of using AnnotationStorage - which I found used as default for
> ImageField - I intercept the data during storage.set(), ship it out to a
> separate storage facility, and replace the data with a happy message
> "This is not where your data is" which is then written to the blobs.
> It works just great - keeping our blob storage growth from going
> crazy. If you try to 'download' the file from Plone, you'll get the
> text file with the happy message.
>
> Now that I've been shown that the Blob Storage is functioning just fine,
> but the SQL storage size is going off the charts, I hope I'm not back at
> square one.
>
> The goal is to allow users to think they are uploading 4Gb videos into
> Plone, when under the covers, we're actually shipping the video files
> off to some fancy off-site storage. (Akamai) So we don't have to store
> them and back them up on-site, and our blob directories remain
> manageable in size.
>
> The storage hack can be seen here:
> https://github.com/RadioFreeAsia/rfa.kaltura/blob/master/rfa/kaltura/storage
> /storage.py
>
>
> I'm not proud of it, but it works.

Mike,

First of all, kudos on your candor and being willing to share your "hack"
(storage.py).

I've been out of the Zope loop for a while but just thought I'd pony up a
response since your posting was interesting to me, regardless how out of touch
w/ reality my response might be. And being out of the loop, I don't have to
worry any more about looking dumb!

My 1st thought is, why don't you create a content type and store it in the
ZODB at the time the video is uploaded? The type would include the video
metadata (vanilla RSS, Dublin Core, etc) and a link to the off-site content.
Much more helpful than a "not here" message, yeah?

Secondly, I'm wondering why you're using SQL. Is it to interface with legacy
system(s)? But that's probably just my purist streak talking. :-)

IIRC, there are hooks in Zope like "manage_before_save()", "...after_save",
etc. This would be ideal, as you could strip the blob from the request before
doing an insert. Yeah?

Anyway, sorry I can't be more help w/ the specifics of your installation.

Best,
-Tom

_______________________________________________
Zope maillist - Zope@zope.org
https://mail.zope.org/mailman/listinfo/zope
** No cross posts or HTML encoding! **
(Related lists -
https://mail.zope.org/mailman/listinfo/zope-announce
https://mail.zope.org/mailman/listinfo/zope-dev )
Re: Advice on Blob Storage? [ In reply to ]
Hi Michael

Without knowing anything about your setup let me chuck a few stones
into the bushes ..

> I'm seeing behavior using relstorage and blobs that I didn't expect:
> If I upload a large file, say 2 gigs, I am noticing that our SQL database
> also grows by 2 Gigs, along with the blob storage.

If relstorage is growing for blob uploads, I would think something is
wrongly configured.

> Can this behavior be turned off for a specific field or content type? So
> undo logs are preserved for everything BUT this monster of a content type?
>
> Seems strange to do this tho.

Yes, that seems like a plaster on top of a broken bone.

> Going deeper down the rabbit hole, although I don't think it's relevant, is
> the fact that I hacked and replaced the storage class for the field.
> Instead of using AnnotationStorage

This sounds dangerous to me ..

> The goal is to allow users to think they are uploading 4Gb videos into
> Plone, when under the covers, we're actually shipping the video files off to
> some fancy off-site storage. (Akamai)

Configure caching such that client/CDN/varnish/nginx keeps all the big
files that they should.
Use collective.xsendfile to make file requests go directly to the
front-end server (but note "Blob handling in ZODB is very effective
already (async sockets, just like Apache or nginx would do ). [...]
This add-on only removes the need to proxy the file data over socket
connection").

See
http://www.slideshare.net/jensens/2014-ploneconfbristolspeedupplone
for a lot of good tips.

--
jean . .. .... //\\\oo///\\
_______________________________________________
Zope maillist - Zope@zope.org
https://mail.zope.org/mailman/listinfo/zope
** No cross posts or HTML encoding! **
(Related lists -
https://mail.zope.org/mailman/listinfo/zope-announce
https://mail.zope.org/mailman/listinfo/zope-dev )
Re: Advice on Blob Storage? [ In reply to ]
On 09/24/2015 02:52 AM, Tom Russell wrote:
>
> Mike,
>
> First of all, kudos on your candor and being willing to share your "hack"
> (storage.py).
Thanks. I'm humbly getting things done without a lot of python
knowledge and no zope experience. I wouldn't recommend this approach to
anyone, it's just some cowboy hacking.
>
> My 1st thought is, why don't you create a content type and store it in the
> ZODB at the time the video is uploaded? The type would include the video
> metadata (vanilla RSS, Dublin Core, etc) and a link to the off-site content.
> Much more helpful than a "not here" message, yeah?
Yep, the content type is created - it's just the file field that has the
odd storage. All other fields are normal.
>
> Secondly, I'm wondering why you're using SQL. Is it to interface with legacy
> system(s)? But that's probably just my purist streak talking. :-)
Relstorage for load balancing and replication. that's about it. I
inherited the setup (yeah, I know, bad excuse). But Rethinking a
Data.fs solution to the problem is probably not going to help anyway.

>
> IIRC, there are hooks in Zope like "manage_before_save()", "...after_save",
> etc. This would be ideal, as you could strip the blob from the request before
> doing an insert. Yeah?
I'm going to look into this stuff.
I stumbled on some docs that hinted you could call 'pack()' directly on
a single piece of storage, but I've yet to find them again. This might
be a solution.
>
> Anyway, sorry I can't be more help w/ the specifics of your installation.
No worries. Happy to get a reply.



--
Mike McFadden
Radio Free Asia
Technical Operations Division
2025 M Street NW
Washington DC 20036 USA

This e-mail message is intended only for the use of the addressee and may contain information that is privileged and confidential. Any unauthorized dissemination, distribution or copying is strictly prohibited. If you receive this transmission in error, please contact network@rfa.org.


_______________________________________________
Zope maillist - Zope@zope.org
https://mail.zope.org/mailman/listinfo/zope
** No cross posts or HTML encoding! **
(Related lists -
https://mail.zope.org/mailman/listinfo/zope-announce
https://mail.zope.org/mailman/listinfo/zope-dev )
Re: Advice on Blob Storage? [ In reply to ]
On 09/24/2015 08:47 AM, Jean Jordaan wrote:
>
> If relstorage is growing for blob uploads, I would think something is
> wrongly configured.
I'm really thinking the same thing myself, but I wouldn't know the first
place to look to configure this.

This installation was a Plone 2 archetypes-based build that was upgraded
to Plone 4, and that's when the Relstorage changeover was done. I don't
know if that gives any hints.

We have so much archetypes content with specialized code that we've
stuck with archetypes. I have a feeling that archetypes is tightly
coupled with filestorage somehow.

>> Can this behavior be turned off for a specific field or content type? So
>> undo logs are preserved for everything BUT this monster of a content type?
>>
>> Seems strange to do this tho.
> Yes, that seems like a plaster on top of a broken bone.
I hope we don't go down that path.

>
>> Going deeper down the rabbit hole, although I don't think it's relevant, is
>> the fact that I hacked and replaced the storage class for the field.
>> Instead of using AnnotationStorage
> This sounds dangerous to me ..
It's actually working perfectly, and was the original intent. When I
did some quick maths to show how much blob storage would grow based on
how much video content we create, it became cost-preventative to store
the videos in blob storage. The subclassed AnnotationStorage works.
However, I'll be looking into collective.xsendfile to see if I can make
things a bit better.

In a nutshell, Blob Storage is happy - the data are stored elsewhere
happily. The upshot is that you cannot fetch the file from Plone, and
that's just fine for now. If you do fetch it through plone's download,
you get about 80 bytes that say "your file is not here"

The fact that relstorage grows with the upload (and shrinks back with
the pack) is what's troubling.

My spelunking (which I enjoy) has gone deep enough to confirm that the
relstorage growth happens with the transaction's tcp_finish() call, and
I haven't gone deeper yet. What's strange is that the data of the File
field has been replaced by then (in the dangerous manner mentioned
above) and I'm not sure where it's finding it.

All the tom foolery can be seen on github where I replace the field
storage for the file field.

Thanks for the reply.

--
Mike McFadden
Radio Free Asia
Technical Operations Division
2025 M Street NW
Washington DC 20036 USA

This e-mail message is intended only for the use of the addressee and may contain information that is privileged and confidential. Any unauthorized dissemination, distribution or copying is strictly prohibited. If you receive this transmission in error, please contact network@rfa.org.


_______________________________________________
Zope maillist - Zope@zope.org
https://mail.zope.org/mailman/listinfo/zope
** No cross posts or HTML encoding! **
(Related lists -
https://mail.zope.org/mailman/listinfo/zope-announce
https://mail.zope.org/mailman/listinfo/zope-dev )
Re: Advice on Blob Storage? [ In reply to ]
On 09/29/2015 07:39 PM, Michael McFadden wrote:
> On 09/24/2015 08:47 AM, Jean Jordaan wrote:
>>
>> If relstorage is growing for blob uploads, I would think something is
>> wrongly configured.
> I'm really thinking the same thing myself, but I wouldn't know the
> first place to look to configure this.
>
Solved.

Yep. I found that I made a change where my content type stopped
implementing and inheriting from ATBlob and went back to implementing
IFileContent.

"There's your problem"

Must have been a great idea at the time.

I spent the time to learn how schemaextender works now, and the content
type is back to being based off of ATBlob.

I'm still doing the storage Tom Foolery, but working with blobs instead
of filedata now. This makes much more sense.

With the added benefit that I don't take file data and write it out as a
temp file.
plone.app.blob.utils.openBlob() does that work for me now in a
smarter fashion.

I have a slight worry that when I close the blob using the file object
that openBlob gave me, then immediately call consumeFile on the
ZODB.blob, garbage collection may not have time to destroy the weakref
in ZODB.blob and I'll get a 'file opened' exception.

I'm not savvy enough with garbage collection and weakrefs in python to
really be sure about this.

Thanks guys.

--
Mike McFadden
Radio Free Asia
Technical Operations Division
2025 M Street NW
Washington DC 20036 USA

This e-mail message is intended only for the use of the addressee and may contain information that is privileged and confidential. Any unauthorized dissemination, distribution or copying is strictly prohibited. If you receive this transmission in error, please contact network@rfa.org.


_______________________________________________
Zope maillist - Zope@zope.org
https://mail.zope.org/mailman/listinfo/zope
** No cross posts or HTML encoding! **
(Related lists -
https://mail.zope.org/mailman/listinfo/zope-announce
https://mail.zope.org/mailman/listinfo/zope-dev )