Mailing List Archive

Backup program that compresses data but only changes new files.
Howdy,

With my new fiber internet, my poor disks are getting a work out, and
also filling up.  First casualty, my backup disk.  I have one directory
that is . . . well . . . huge.  It's about 7TBs or so.  This is where it
is right now and it's still trying to pack in files. 


/dev/mapper/8tb            7.3T  7.1T  201G  98% /mnt/8tb


Right now, I'm using rsync which doesn't compress files but does just
update things that have changed.  I'd like to find some way, software
but maybe there is already a tool I'm unaware of, to compress data and
work a lot like rsync otherwise.  I looked in app-backup and there is a
lot of options but not sure which fits best for what I want to do. 
Again, backup a directory, compress and only update with changed or new
files.  Generally, it only adds files but sometimes a file gets replaced
as well.  Same name but different size. 

I was trying to go through the list in app-backup one by one but to be
honest, most links included only go to github or something and usually
doesn't tell anything about how it works or anything.  Basically, as far
as seeing if it does what I want, it's useless. It sort of reminds me of
quite a few USE flag descriptions. 

I plan to buy another hard drive pretty soon.  Next month is possible. 
If there is nothing available that does what I want, is there a way to
use rsync and have it set to backup files starting with "a" through "k"
to one spot and then backup "l" through "z" to another?  I could then
split the files into two parts.  I use a script to do this now, if one
could call my little things scripts, so even a complicated command could
work, just may need help figuring out the command.

Thoughts?  Ideas? 

Dale

:-)  :-)
Re: Backup program that compresses data but only changes new files. [ In reply to ]
On Sun, Aug 14, 2022 at 3:44 PM Dale <rdalek1967@gmail.com> wrote:
>
<SNIP>
>
> Thoughts? Ideas?
>
> Dale
>
> :-) :-)

Do you happen to have an old computer laying around? If so check
out TrueNAS Core.

You'd need one small OS drive and 2 backup drives - your current
7TB and one more to build the recommended RAID. It compresses,
saves older revs of files if possible with snapshots. Is supports
NFS mounts for media/etc, chroot jails and lots of other stuff.

The default version has been FreeBSD based so I had some
learning to do but I think there's now a Linux version.

It appears that possibly you have your backup drive in your
computer so it moves backups to a separate machine which
you can locate remotely so you're probably safer in terms of
a fire or some other catastrophic event.

If this appeals and you have the hardware you can build
a box and mess with it but it certainly does the minimal
number of things you are asking for.

HTH,
Mark
Re: Backup program that compresses data but only changes new files. [ In reply to ]
On Sun, Aug 14, 2022 at 6:44 PM Dale <rdalek1967@gmail.com> wrote:
>
> Right now, I'm using rsync which doesn't compress files but does just
> update things that have changed. I'd like to find some way, software
> but maybe there is already a tool I'm unaware of, to compress data and
> work a lot like rsync otherwise.

So, how important is it that it work exactly like rsync?

I use duplicity, in part because I've been using it forever. Restic
seems to be a similar program most are using these days which I
haven't looked at super-closely but I'd look at that first if starting
out.

Duplicity uses librsync, so it backs up exactly the same data as rsync
would, except instead of replicating entire files, it creates streams
of data more like something like tar. So if you back up a million
small files you might get out 1-3 big files. It can compress and
encrypt the data as you wish. The downside is that you don't end up
with something that looks like your original files - you have to run
the restore process to extract them all back out. It is extremely
space-efficient though - if 1 byte changes in the middle of a 10GB
file you'll end up just backing up maybe a kilobyte or so (whatever
the block size is), which is just like rsync.

Typically you rely on metadata to find files that change which is
fast, but I'm guessing you can tell these programs to do a deep scan
which of course requires reading the entire contents, and that will
discover anything that was modified without changing ctime/mtime.

The output files can be split to any size, and the index info (the
metadata) is separate from the raw data. If you're storing to
offline/remote/cloud/whatever storage typically you keep the metadata
cached locally to speed retrieval and to figure out what files have
changed for incrementals. However, if the local cache isn't there
then it will fetch just the indexes from wherever it is stored
(they're small).

It has support for many cloud services - I store mine to AWS S3.

There are also some options that are a little closer to rsync like
rsnapshot and burp. Those don't store compressed (unless there is an
option for that or something), but they do let you rotate through
multiple backups and they'll set up hard links/etc so that they are
de-duplicated. Of course hard links are at the file level so if 1
byte inside a file changes you'll end up with two full copies. It
will still only transfer a single block so the bandwidth requirements
are similar to rsync.

--
Rich
Re: Backup program that compresses data but only changes new files. [ In reply to ]
On Sun, Aug 14, 2022 at 6:44 PM Dale <rdalek1967@gmail.com> wrote:
>
> I plan to buy another hard drive pretty soon. Next month is possible.
> If there is nothing available that does what I want, is there a way to
> use rsync and have it set to backup files starting with "a" through "k"
> to one spot and then backup "l" through "z" to another? I could then
> split the files into two parts.

Oh, I didn't comment on this part, so sorry for the double reply.

If you need backups that span multiple disks your options are very
limited unfortunately. Most linux backup software might output
multiple files but it dumps them all in one place and much of it
assumes that all the files are in one place for restoration. Here are
the options I've found:

1. You can use lvm/zfs/btrfs/whatever to combine multiple disks to
make them look like one disk. This is a workaround, and obviously
you're limited to however many disks you can physically mount at one
time.

2. You can use bacula, which does support changing media, since it was
designed for tape, but unlike tar it can output to a directory.
However, this is not very well-supported and it can be a pain. This
is what I'm doing for large-scale backups. I basically treat a hard
drive like a giant tape. It is fussy to set up and use, and bacula
itself is VERY fussy to use. Oh, and make sure you REALLY understand
it and do some restoration tests because otherwise you could paint
yourself into a corner. I always backup my database, and I have the
bacula software itself running in a container and after every backup I
just create a tarball of the whole container and stick that on the
backup disk (it isn't big, and that solves the bootstrapping problem).
Don't ever use bacula to back up itself - it is terrible for that.

3. Obviously if you have a scratch disk big enough to hold everything
temporarily that also works. You can do your backup, then copy it off
to other drives however you want.

--
Rich
Re: Backup program that compresses data but only changes new files. [ In reply to ]
Mark Knecht wrote:
>
>
> On Sun, Aug 14, 2022 at 3:44 PM Dale <rdalek1967@gmail.com
> <mailto:rdalek1967@gmail.com>> wrote:
> >
> <SNIP>
> >
> > Thoughts?  Ideas?
> >
> > Dale
> >
> > :-)  :-)
>
> Do you happen to have an old computer laying around? If so check 
> out TrueNAS Core.
>
> You'd need one small OS drive and 2 backup drives - your current 
> 7TB and one more to build the recommended RAID. It compresses, 
> saves older revs of files if possible with snapshots. Is supports
> NFS mounts for media/etc, chroot jails and lots of other stuff.
>
> The default version has been FreeBSD based so I had some 
> learning to do but I think there's now a Linux version.
>
> It appears that possibly you have your backup drive in your
> computer so it moves backups to a separate machine which
> you can locate remotely so you're probably safer in terms of
> a fire or some other catastrophic event.
>
> If this appeals and you have the hardware you can build 
> a box and mess with it but it certainly does the minimal 
> number of things you are asking for.
>
> HTH,
> Mark


That may be a option later.  I'm actually considering build a NAS but
right now, costs are preventing that.  I almost have enough that I could
build another computer.  I have a mobo, memory, CPU and such.  I think I
only need a power supply and maybe a video card.  Could use a case for
it to but could mount it on a wall somewhere.  Good air flow.  lol 

Right now, my backups are external hard drives.  I have a 3TB, a 6TB and
a 8TB that sadly is SMR.  They are encrypted and after I do my backup
updates, they go in a fire safe.  I tend to do updates once a week,
usually while I'm doing OS updates. 

At the moment, I'm hoping to find some method that compresses to pack in
more data.  Given this faster fiber internet, I see a lot of data coming
my way. 

By the way, using Surfshark as VPN.  I finally figured out how to use
openvpn.  Then I figured out how to use it as a service.  Working pretty
well.  Still working on router end and figuring out how to get email to
work.  Most VPNs block email.  I'll get around to it one of these days. 

Dale

:-)  :-) 

P. S.  Funny story.  I was doing my updates yesterday.  Usually, I do
them in a chroot and then emerge the binaries on main system.  It had a
pretty good bit to download and usually it takes at least a hour,
sometimes 3 or more, to download.  I started it and went to kitchen.  I
came back and noticed the network was idle.  I thought the emerge had
failed and stopped.  I looked, it was compiling away.  Then I checked
the emerge-fetch.log to see if there was some error there.  I was
confused for a minute.  Then it hit me, fast internet.  In the couple
minutes I was in the kitchen, it downloaded everything and was done. 
ROFL  Not only was it done, it had been done for a while.  By the time I
figured it out, it was already off the chart in gkrellm. 

This new fiber thing is going to take some getting used too.  ;-) 
Re: Backup program that compresses data but only changes new files. [ In reply to ]
Hello,

On 8/14/22 18:44, Dale wrote:
> Thoughts?  Ideas?

You might be interested in borgbackup [1]
It takes delta backups and has de-duplication and compression to save
some space. It supports encryption too.
It's packaged in ::gentoo and you run it on whatever machine you want to
backup and give it its destination, it can be local or on a remote machine.

I've been using it for a while and it works well. I have it configured
on a crontab and it backups my files every night

[1] https://www.borgbackup.org/

--
Julien
Re: Backup program that compresses data but only changes new files. [ In reply to ]
On 15/08/2022 00:20, Dale wrote:
> Right now, my backups are external hard drives.  I have a 3TB, a 6TB and
> a 8TB that sadly is SMR.  They are encrypted and after I do my backup
> updates, they go in a fire safe.  I tend to do updates once a week,
> usually while I'm doing OS updates.

That NAS idea sounds good. And while it would be a hassle, you could
raid-10 those three drives together. Not quite sure how that would play
out with SMR ... Or you could btrfs them.

You don't mention any snapshot mechanism like LVM or btrfs - imho that's
quite important because if your data *changes* you get multiple full
backups for the price of incremental. Although I get the impression you
mostly *add* to your data instead ... :-)

And are there any filesystems that compress in-place? But the other
question is what sort of data do you have? HOW COMPRESSIBLE IS IT?
Compressing jpegs for example is a bad idea - it usually makes them bigger!

Cheers,
Wol
Re: Backup program that compresses data but only changes new files. [ In reply to ]
On 15/8/22 06:44, Dale wrote:
> Howdy,
>
> With my new fiber internet, my poor disks are getting a work out, and
> also filling up.  First casualty, my backup disk.  I have one directory
> that is . . . well . . . huge.  It's about 7TBs or so.  This is where it
> is right now and it's still trying to pack in files.
>
>
> /dev/mapper/8tb            7.3T  7.1T  201G  98% /mnt/8tb
>
>
> Right now, I'm using rsync which doesn't compress files but does just
> update things that have changed.  I'd like to find some way, software
> but maybe there is already a tool I'm unaware of, to compress data and
> work a lot like rsync otherwise.  I looked in app-backup and there is a
> lot of options but not sure which fits best for what I want to do.
> Again, backup a directory, compress and only update with changed or new
> files.  Generally, it only adds files but sometimes a file gets replaced
> as well.  Same name but different size.
>
> I was trying to go through the list in app-backup one by one but to be
> honest, most links included only go to github or something and usually
> doesn't tell anything about how it works or anything.  Basically, as far
> as seeing if it does what I want, it's useless. It sort of reminds me of
> quite a few USE flag descriptions.
>
> I plan to buy another hard drive pretty soon.  Next month is possible.
> If there is nothing available that does what I want, is there a way to
> use rsync and have it set to backup files starting with "a" through "k"
> to one spot and then backup "l" through "z" to another?  I could then
> split the files into two parts.  I use a script to do this now, if one
> could call my little things scripts, so even a complicated command could
> work, just may need help figuring out the command.
>
> Thoughts?  Ideas?
>
> Dale
>
> :-)  :-)
>
The questions you need to ask is how compressible is the data and how
much duplication is in there.  Rsync's biggest disadvantage is it
doesn't keep history, so if you need to restore something from last week
you are SOL.  Honestly, rsync is not a backup program and should only be
used the way you do for data that don't value as an rsync archive is a
disaster waiting to happen from a backup point of view.

Look into dirvish - uses hard links to keep files current but safe, is
easy to restore (looks like a exact copy so you cp the files back if
needed.  Downside is it hammers the hard disk and has no compression so
its only deduplication via history (my backups stabilised about 2x
original size for ~2yrs of history - though you can use something like
btrfs which has filesystem level compression.

My current program is borgbackup which is very sophisticated in how it
stores data - its probably your best bet in fact.  I am storing
literally tens of Tb of raw data on a 4Tb usb3 disk (going back years
and yes, I do restore regularly, and not just for disasters but for
space efficient long term storage I access only rarely.

e.g.:

A single host:

------------------------------------------------------------------------------
                       Original size      Compressed size Deduplicated size
All archives:                3.07 TB              1.96 TB           
151.80 GB

                       Unique chunks         Total chunks
Chunk index:                 1026085             22285913


Then there is my offline storage - it backs up ~15 hosts (in repos like
the above) + data storage like 22 years of email etc. Each host backs up
to its own repo then the offline storage backs that up.  The
deduplicated size is the actual on disk size ... compression varies as
its whatever I used at the time the backup was taken ... currently I
have it set to "auto,zstd,11" but it can be mixed in the same repo (a
repo is a single backup set - you can nest repos which is what I do - so
~45Tb stored on a 4Tb offline disk).  One advantage of a system like
this is chunked data rarely changes, so its only the differences that
are backed up (read the borgbackup docs - interesting)

------------------------------------------------------------------------------
                       Original size      Compressed size Deduplicated size
All archives:               28.69 TB             28.69 TB             
3.81 TB

                       Unique chunks         Total chunks
Chunk index:
Re: Backup program that compresses data but only changes new files. [ In reply to ]
On Monday, August 15, 2022 12:44:11 AM CEST Dale wrote:
> Howdy,
>
> With my new fiber internet, my poor disks are getting a work out, and
> also filling up. First casualty, my backup disk. I have one directory
> that is . . . well . . . huge. It's about 7TBs or so. This is where it
> is right now and it's still trying to pack in files.
>
> /dev/mapper/8tb 7.3T 7.1T 201G 98% /mnt/8tb

<snipped>

> Thoughts? Ideas?

Plenty, see below:

For backups to external disks, I would recommend having a look at "dar" :
$ eix -e dar
* app-backup/dar
Available versions: 2.7.6^t ~2.7.7^t {argon2 curl dar32 dar64 doc gcrypt
gpg lz4 lzo nls rsync threads xattr}
Homepage: http://dar.linux.free.fr/
Description: A full featured backup tool, aimed for disks

It's been around for a while and the developer is active and responds quite
well to questions.
It supports compression (different compression methods), incremental backups
(only need a catalogue of the previous backup for the incremental) and
encryption.

The NAS options others mentioned would also work as they can compress data on
disk and you'd only notice a delay in writing/reading (depending on the
compression method used). I would recommend using one that uses ZFS on-disk as
it's more reliable and robust then BTRFS.

One option that comes available for you now that you are no longer limited to
slow ADSL: Cloud backups.

I use Backblaze (B2) to store compressed backups that haven't been stored on
tape to off-site locations.

But, you can also encrypt the backups locally and store the
encrypted+compressed backupfiles on other cloud storage.

--
Joost
Re: Backup program that compresses data but only changes new files. [ In reply to ]
On Sun, 14 Aug 2022 19:03:25 -0400,
Rich Freeman wrote:
>
> On Sun, Aug 14, 2022 at 6:44 PM Dale <rdalek1967@gmail.com> wrote:
> >
> > Right now, I'm using rsync which doesn't compress files but does just
> > update things that have changed. I'd like to find some way, software
> > but maybe there is already a tool I'm unaware of, to compress data and
> > work a lot like rsync otherwise.
>
> So, how important is it that it work exactly like rsync?
>
> I use duplicity, in part because I've been using it forever. Restic
> seems to be a similar program most are using these days which I
> haven't looked at super-closely but I'd look at that first if starting
> out.
>
> Duplicity uses librsync, so it backs up exactly the same data as rsync
> would, except instead of replicating entire files, it creates streams
> of data more like something like tar. So if you back up a million
> small files you might get out 1-3 big files. It can compress and
> encrypt the data as you wish. The downside is that you don't end up
> with something that looks like your original files - you have to run
> the restore process to extract them all back out. It is extremely
> space-efficient though - if 1 byte changes in the middle of a 10GB
> file you'll end up just backing up maybe a kilobyte or so (whatever
> the block size is), which is just like rsync.
>
> Typically you rely on metadata to find files that change which is
> fast, but I'm guessing you can tell these programs to do a deep scan
> which of course requires reading the entire contents, and that will
> discover anything that was modified without changing ctime/mtime.
>
> The output files can be split to any size, and the index info (the
> metadata) is separate from the raw data. If you're storing to
> offline/remote/cloud/whatever storage typically you keep the metadata
> cached locally to speed retrieval and to figure out what files have
> changed for incrementals. However, if the local cache isn't there
> then it will fetch just the indexes from wherever it is stored
> (they're small).
>
> It has support for many cloud services - I store mine to AWS S3.
>
> There are also some options that are a little closer to rsync like
> rsnapshot and burp. Those don't store compressed (unless there is an
> option for that or something), but they do let you rotate through
> multiple backups and they'll set up hard links/etc so that they are
> de-duplicated. Of course hard links are at the file level so if 1
> byte inside a file changes you'll end up with two full copies. It
> will still only transfer a single block so the bandwidth requirements
> are similar to rsync.

I have been using restic for a while, and although it does not do
compression, there are a couple of nice things it does -- if a file is
in more than one location, or if you rename the file, its smart enough
not to backup any data at all, just the metadata. Also, you never
have to delete the whole backup and start over like you have to do
with duplicity, you can just delete backups older than a certain
number of days and you are good to go. Its in go, so building can be
a pain and I don't like programs which download gobs of stuff from the
internet to build, but it seems to work quite well.

--
Your life is like a penny. You're going to lose it. The question is:
How do
you spend it?

John Covici wb2una
covici@ccs.covici.com
Re: Backup program that compresses data but only changes new files. [ In reply to ]
Rich Freeman wrote:
> On Sun, Aug 14, 2022 at 6:44 PM Dale <rdalek1967@gmail.com> wrote:
>> Right now, I'm using rsync which doesn't compress files but does just
>> update things that have changed. I'd like to find some way, software
>> but maybe there is already a tool I'm unaware of, to compress data and
>> work a lot like rsync otherwise.
> So, how important is it that it work exactly like rsync?
>
> I use duplicity, in part because I've been using it forever. Restic
> seems to be a similar program most are using these days which I
> haven't looked at super-closely but I'd look at that first if starting
> out.
>
> Duplicity uses librsync, so it backs up exactly the same data as rsync
> would, except instead of replicating entire files, it creates streams
> of data more like something like tar. So if you back up a million
> small files you might get out 1-3 big files. It can compress and
> encrypt the data as you wish. The downside is that you don't end up
> with something that looks like your original files - you have to run
> the restore process to extract them all back out. It is extremely
> space-efficient though - if 1 byte changes in the middle of a 10GB
> file you'll end up just backing up maybe a kilobyte or so (whatever
> the block size is), which is just like rsync.
>
> Typically you rely on metadata to find files that change which is
> fast, but I'm guessing you can tell these programs to do a deep scan
> which of course requires reading the entire contents, and that will
> discover anything that was modified without changing ctime/mtime.
>
> The output files can be split to any size, and the index info (the
> metadata) is separate from the raw data. If you're storing to
> offline/remote/cloud/whatever storage typically you keep the metadata
> cached locally to speed retrieval and to figure out what files have
> changed for incrementals. However, if the local cache isn't there
> then it will fetch just the indexes from wherever it is stored
> (they're small).
>
> It has support for many cloud services - I store mine to AWS S3.
>
> There are also some options that are a little closer to rsync like
> rsnapshot and burp. Those don't store compressed (unless there is an
> option for that or something), but they do let you rotate through
> multiple backups and they'll set up hard links/etc so that they are
> de-duplicated. Of course hard links are at the file level so if 1
> byte inside a file changes you'll end up with two full copies. It
> will still only transfer a single block so the bandwidth requirements
> are similar to rsync.
>


Duplicity sounds interesting except that I already have the drive
encrypted.  Keep in mind, these are external drives that I hook up long
enough to complete the backups then back in a fire safe they go.  The
reason I mentioned being like rsync, I don't want to rebuild a backup
from scratch each time as that would be time consuming.  I thought of
using Kbackup ages ago and it rebuilds from scratch each time but it
does have the option of compressing.  That might work for small stuff
but not many TBs of it.  Back in the early 90's, I remember using a
backup software that was incremental.  It would only update files that
changed and would do it over several floppy disks and compressed it as
well.  Something like that nowadays is likely rare if it exists at all
since floppies are long dead.  I either need to split my backup into two
pieces or compress my data.  That is why I mentioned if there is a way
to backup first part of alphabet in one command, switch disks and then
do second part of alphabet to another disk. 

Mostly, I just want to add compression to what I do now.  I figure there
is a tool for it but no idea what it is called.  Another method is
splitting into two parts.  In the long run, either should work and may
end up needing both at some point.  :/   If I could add both now, save
me some problems later on.  I guess.

I might add, I also thought about using a Raspberry Pi thingy and having
sort of a small scale NAS thing.  I'm not sure about that thing either
tho.  Plus, they pricey right now.  $$$

Dale

:-)  :-)
Re: Backup program that compresses data but only changes new files. [ In reply to ]
Julien Roy wrote:
> Hello,
>
> On 8/14/22 18:44, Dale wrote:
>> Thoughts?  Ideas?
>
> You might be interested in borgbackup [1]
> It takes delta backups and has de-duplication and compression to save
> some space. It supports encryption too.
> It's packaged in ::gentoo and you run it on whatever machine you want
> to backup and give it its destination, it can be local or on a remote
> machine.
>
> I've been using it for a while and it works well. I have it configured
> on a crontab and it backups my files every night
>
> [1] https://www.borgbackup.org/
>


Since my drives are external, I do my backups manually.  Well, I start
it when the drives are connected and ready.  I think borgbackup was one
I looked into and it sounded more like a online backup where you store
the data on a server somewhere.  I may be wrong on that tho.  I looked
at several and it got confusing after a bit.  Plus, some were still as
clear as mud.  Why do people link to a place that doesn't tell what
their software does and how anyway.  It seems most think github and such
are good places to link to when it really doesn't tell you anything
unless you want to help develop the software or something.  It would be
like Ford linking to CAD models to sell cars.  :/ 

To all:  I found a good deal on a 10TB drive.  That should suffice for
now.  I might add, it will give me time to figure out a path forward and
I can make other use of that SMR drive.  One thing I thought of as a
negative for a NAS, I can't lock it into my safe, unless it is really
tiny.  As it is, even if a fire comes along, I still got backups.  With
a NAS, I could lose everything, puter, backups and all.  Given I back up
around 12 to 13TBs of data, it could get pricey uploading somewhere. 

I just hope this 10TB drive isn't a SMR.  I googled around and the best
I could find is anything above 8TB is CMR.  It's a WD101EDBZ-11B1DA0.  I
hope that is right.  I'm not totally opposed to SMR even as a backup but
I'd rather not.  The deal I found was for a pull and costs about $110
including shipping.  I looked at a 14TB but my jaw dropped.  $$$$$$$$

I need to look into the LVM snapshot thing some more.  I keep forgetting
that option and I use LVM a LOT here.  Maybe I will find something
between now and filling up that 10TB drive.  ROFL 

Dale

:-)  :-) 
Re: Backup program that compresses data but only changes new files. [ In reply to ]
William Kenworthy wrote:
>
> On 15/8/22 06:44, Dale wrote:
>> Howdy,
>>
>> With my new fiber internet, my poor disks are getting a work out, and
>> also filling up.  First casualty, my backup disk.  I have one directory
>> that is . . . well . . . huge.  It's about 7TBs or so.  This is where it
>> is right now and it's still trying to pack in files.
>>
>>
>> /dev/mapper/8tb            7.3T  7.1T  201G  98% /mnt/8tb
>>
>>
>> Right now, I'm using rsync which doesn't compress files but does just
>> update things that have changed.  I'd like to find some way, software
>> but maybe there is already a tool I'm unaware of, to compress data and
>> work a lot like rsync otherwise.  I looked in app-backup and there is a
>> lot of options but not sure which fits best for what I want to do.
>> Again, backup a directory, compress and only update with changed or new
>> files.  Generally, it only adds files but sometimes a file gets replaced
>> as well.  Same name but different size.
>>
>> I was trying to go through the list in app-backup one by one but to be
>> honest, most links included only go to github or something and usually
>> doesn't tell anything about how it works or anything.  Basically, as far
>> as seeing if it does what I want, it's useless. It sort of reminds me of
>> quite a few USE flag descriptions.
>>
>> I plan to buy another hard drive pretty soon.  Next month is possible.
>> If there is nothing available that does what I want, is there a way to
>> use rsync and have it set to backup files starting with "a" through "k"
>> to one spot and then backup "l" through "z" to another?  I could then
>> split the files into two parts.  I use a script to do this now, if one
>> could call my little things scripts, so even a complicated command could
>> work, just may need help figuring out the command.
>>
>> Thoughts?  Ideas?
>>
>> Dale
>>
>> :-)  :-)
>>
> The questions you need to ask is how compressible is the data and how
> much duplication is in there.  Rsync's biggest disadvantage is it
> doesn't keep history, so if you need to restore something from last
> week you are SOL.  Honestly, rsync is not a backup program and should
> only be used the way you do for data that don't value as an rsync
> archive is a disaster waiting to happen from a backup point of view.
>
> Look into dirvish - uses hard links to keep files current but safe, is
> easy to restore (looks like a exact copy so you cp the files back if
> needed.  Downside is it hammers the hard disk and has no compression
> so its only deduplication via history (my backups stabilised about 2x
> original size for ~2yrs of history - though you can use something like
> btrfs which has filesystem level compression.
>
> My current program is borgbackup which is very sophisticated in how it
> stores data - its probably your best bet in fact.  I am storing
> literally tens of Tb of raw data on a 4Tb usb3 disk (going back years
> and yes, I do restore regularly, and not just for disasters but for
> space efficient long term storage I access only rarely.
>
> e.g.:
>
> A single host:
>
> ------------------------------------------------------------------------------
>
>                        Original size      Compressed size Deduplicated
> size
> All archives:                3.07 TB              1.96 TB           
> 151.80 GB
>
>                        Unique chunks         Total chunks
> Chunk index:                 1026085             22285913
>
>
> Then there is my offline storage - it backs up ~15 hosts (in repos
> like the above) + data storage like 22 years of email etc. Each host
> backs up to its own repo then the offline storage backs that up.  The
> deduplicated size is the actual on disk size ... compression varies as
> its whatever I used at the time the backup was taken ... currently I
> have it set to "auto,zstd,11" but it can be mixed in the same repo (a
> repo is a single backup set - you can nest repos which is what I do -
> so ~45Tb stored on a 4Tb offline disk).  One advantage of a system
> like this is chunked data rarely changes, so its only the differences
> that are backed up (read the borgbackup docs - interesting)
>
> ------------------------------------------------------------------------------
>
>                        Original size      Compressed size Deduplicated
> size
> All archives:               28.69 TB             28.69 TB             
> 3.81 TB
>
>                        Unique chunks         Total chunks
> Chunk index:
>
>
>
>


For the particular drive in question, it is 99.99% videos.  I don't want
to lose any quality but I'm not sure how much they can be compressed to
be honest.  It could be they are already as compressed as they can be
without losing resolution etc.  I've been lucky so far.  I don't think
I've ever needed anything and did a backup losing what I lost on working
copy.  Example.  I update a video only to find the newer copy is corrupt
and wanting the old one back.  I've done it a time or two but I tend to
find that before I do backups.  Still, it is a downside and something
I've thought about before.  I figure when it does happen, it will be
something hard to replace.  Just letting the devil have his day.  :-(

For that reason, I find the version type backups interesting.  It is a
safer method.  You can have a new file but also have a older file as
well just in case new file takes a bad turn.  It is a interesting
thought.  It's one not only I should consider but anyone really. 

As I posted in another reply, I found a 10TB drive that should be here
by the time I do a fresh set of backups.  This will give me more time to
consider things.  Have I said this before a while back???  :/ 

Dale

:-)  :-) 
Re: Backup program that compresses data but only changes new files. [ In reply to ]
On Mon, 15 Aug 2022 04:33:44 -0400,
Dale wrote:
>
> William Kenworthy wrote:
> >
> > On 15/8/22 06:44, Dale wrote:
> >> Howdy,
> >>
> >> With my new fiber internet, my poor disks are getting a work out, and
> >> also filling up.? First casualty, my backup disk.? I have one directory
> >> that is . . . well . . . huge.? It's about 7TBs or so.? This is where it
> >> is right now and it's still trying to pack in files.
> >>
> >>
> >> /dev/mapper/8tb??????????? 7.3T? 7.1T? 201G? 98% /mnt/8tb
> >>
> >>
> >> Right now, I'm using rsync which doesn't compress files but does just
> >> update things that have changed.? I'd like to find some way, software
> >> but maybe there is already a tool I'm unaware of, to compress data and
> >> work a lot like rsync otherwise.? I looked in app-backup and there is a
> >> lot of options but not sure which fits best for what I want to do.
> >> Again, backup a directory, compress and only update with changed or new
> >> files.? Generally, it only adds files but sometimes a file gets replaced
> >> as well.? Same name but different size.
> >>
> >> I was trying to go through the list in app-backup one by one but to be
> >> honest, most links included only go to github or something and usually
> >> doesn't tell anything about how it works or anything.? Basically, as far
> >> as seeing if it does what I want, it's useless. It sort of reminds me of
> >> quite a few USE flag descriptions.
> >>
> >> I plan to buy another hard drive pretty soon.? Next month is possible.
> >> If there is nothing available that does what I want, is there a way to
> >> use rsync and have it set to backup files starting with "a" through "k"
> >> to one spot and then backup "l" through "z" to another?? I could then
> >> split the files into two parts.? I use a script to do this now, if one
> >> could call my little things scripts, so even a complicated command could
> >> work, just may need help figuring out the command.
> >>
> >> Thoughts?? Ideas?
> >>
> >> Dale
> >>
> >> :-)? :-)
> >>
> > The questions you need to ask is how compressible is the data and how
> > much duplication is in there.? Rsync's biggest disadvantage is it
> > doesn't keep history, so if you need to restore something from last
> > week you are SOL.? Honestly, rsync is not a backup program and should
> > only be used the way you do for data that don't value as an rsync
> > archive is a disaster waiting to happen from a backup point of view.
> >
> > Look into dirvish - uses hard links to keep files current but safe, is
> > easy to restore (looks like a exact copy so you cp the files back if
> > needed.? Downside is it hammers the hard disk and has no compression
> > so its only deduplication via history (my backups stabilised about 2x
> > original size for ~2yrs of history - though you can use something like
> > btrfs which has filesystem level compression.
> >
> > My current program is borgbackup which is very sophisticated in how it
> > stores data - its probably your best bet in fact.? I am storing
> > literally tens of Tb of raw data on a 4Tb usb3 disk (going back years
> > and yes, I do restore regularly, and not just for disasters but for
> > space efficient long term storage I access only rarely.
> >
> > e.g.:
> >
> > A single host:
> >
> > ------------------------------------------------------------------------------
> >
> > ?????????????????????? Original size????? Compressed size Deduplicated
> > size
> > All archives:??????????????? 3.07 TB????????????? 1.96 TB???????????
> > 151.80 GB
> >
> > ?????????????????????? Unique chunks???????? Total chunks
> > Chunk index:???????????????? 1026085???????????? 22285913
> >
> >
> > Then there is my offline storage - it backs up ~15 hosts (in repos
> > like the above) + data storage like 22 years of email etc. Each host
> > backs up to its own repo then the offline storage backs that up.? The
> > deduplicated size is the actual on disk size ... compression varies as
> > its whatever I used at the time the backup was taken ... currently I
> > have it set to "auto,zstd,11" but it can be mixed in the same repo (a
> > repo is a single backup set - you can nest repos which is what I do -
> > so ~45Tb stored on a 4Tb offline disk).? One advantage of a system
> > like this is chunked data rarely changes, so its only the differences
> > that are backed up (read the borgbackup docs - interesting)
> >
> > ------------------------------------------------------------------------------
> >
> > ?????????????????????? Original size????? Compressed size Deduplicated
> > size
> > All archives:?????????????? 28.69 TB???????????? 28.69 TB?????????????
> > 3.81 TB
> >
> > ?????????????????????? Unique chunks???????? Total chunks
> > Chunk index:
> >
> >
> >
> >
>
>
> For the particular drive in question, it is 99.99% videos.? I don't want
> to lose any quality but I'm not sure how much they can be compressed to
> be honest.? It could be they are already as compressed as they can be
> without losing resolution etc.? I've been lucky so far.? I don't think
> I've ever needed anything and did a backup losing what I lost on working
> copy.? Example.? I update a video only to find the newer copy is corrupt
> and wanting the old one back.? I've done it a time or two but I tend to
> find that before I do backups.? Still, it is a downside and something
> I've thought about before.? I figure when it does happen, it will be
> something hard to replace.? Just letting the devil have his day.? :-(
>
> For that reason, I find the version type backups interesting.? It is a
> safer method.? You can have a new file but also have a older file as
> well just in case new file takes a bad turn.? It is a interesting
> thought.? It's one not only I should consider but anyone really.?
>
> As I posted in another reply, I found a 10TB drive that should be here
> by the time I do a fresh set of backups.? This will give me more time to
> consider things.? Have I said this before a while back???? :/?
>

zfs would solve your problem of corruption, even without versioning.
You do a scrub at short intervals and at least you would know if the
file is corrupted. Of course, redundancy is better, such as mirroring
and backups take a very short time because sending from one zfs to
another it knows exactly what bytes to send.

--
Your life is like a penny. You're going to lose it. The question is:
How do
you spend it?

John Covici wb2una
covici@ccs.covici.com
Re: Backup program that compresses data but only changes new files. [ In reply to ]
On Mon, Aug 15, 2022 at 3:05 AM Dale <rdalek1967@gmail.com> wrote:
>
> Rich Freeman wrote:
> >
> > Duplicity uses librsync, so it backs up exactly the same data as rsync
> > would, except instead of replicating entire files, it creates streams
> > of data more like something like tar. So if you back up a million
> > small files you might get out 1-3 big files. It can compress and
> > encrypt the data as you wish.
>
> Duplicity sounds interesting except that I already have the drive
> encrypted.

Then don't encrypt it? Both compression and encryption are optional.

> The reason I mentioned being like rsync, I don't want to rebuild a backup
> from scratch each time as that would be time consuming.

Ah, you just want something that does incremental backups. Duplicity
does, along with most decent solutions.

I see lots of talk of NAS and zfs/btrfs and snapshots. IMO these are
NOT really great solutions for backup. NAS can work of course but it
is overkill for backup storage.

NAS, zfs, btrfs, and snapshots are all great things to use with your
live data. I use several of these myself. Your live data should be
protected against bitrot with snapshots/etc. That has nothing to do
with why you want backups.

We're talking about the storage of backups. While you can store
backups on any of these they don't really add much value.

Also, you mentioned SMR, and I would definitely not combine SMR with
most of those. SMR is perfect for backup. It just isn't perfect for
backup using something like rsync that modifies files in place. You
want something that only appends to backup files or creates new ones,
which is basically how most backup software works except for stuff
that works like rsync.

The main issue I think you're going to have is having support for
multi-volume backups if you need to be able to split a backup across
drives. The only thing I've found on Linux that does this is bacula,
and it is a royal pain that I'm embarrassed to even mention. If
somebody knows of another backup solution that can write the output to
disk (a filesystem, not /dev/rmt) and then pause to mount a new disk
when one fills up, I'm all ears. For everything else I've tended to
see people suggest using lvm/mdadm/whatever combine disks into a
single block device so that the backup software doesn't see multiple
disks.

If you do want to go the route of combining your disks then since
you're using SMR I'd probably pick something like lvm that doesn't do
any striping/etc and just fills up one disk then moves to the next.
Then use a simple filesystem (not btrfs/zfs) that just starts at one
end and keeps adding. A log-based filesystem would probably be ideal
but I'm not sure if any are decent. You do have the issue of what you
do when you start to run out of space, unless you can create multiple
sets of disks so that you can complete a new backup before destroying
the old one.

--
Rich
Re: Backup program that compresses data but only changes new files. [ In reply to ]
Am Mon, 15 Aug 2022 03:02:19 -0400
schrieb John Covici <covici@ccs.covici.com>:

> I have been using restic for a while, and although it does not do
> compression, there are a couple of nice things it does

Being a happy restic user myself, I'd like to mention that compression is
available meanwhile
(https://restic.readthedocs.io/en/latest/047_tuning_backup_parameters.html#compression).
However, the feature is rather new, I did not use it so far.


cu
Gerrit
Re: Backup program that compresses data but only changes new files. [ In reply to ]
Am Mon, 15 Aug 2022 12:50:37 +0200
schrieb Gerrit Kühn <gerrit.kuehn@aei.mpg.de>:

> Being a happy restic user myself, I'd like to mention that compression is
> available meanwhile
> (https://restic.readthedocs.io/en/latest/047_tuning_backup_parameters.html#compression).
> However, the feature is rather new, I did not use it so far.

https://forum.restic.net/t/compression-support-has-landed-in-master/4997

Just adding another link to the official announcement from earlier this
year.


cu
Gerrit
Re: Backup program that compresses data but only changes new files. [ In reply to ]
On Monday, 15 August 2022 11:58:14 BST Gerrit Kühn wrote:
> Am Mon, 15 Aug 2022 12:50:37 +0200
>
> schrieb Gerrit Kühn <gerrit.kuehn@aei.mpg.de>:
> > Being a happy restic user myself, I'd like to mention that compression is
> > available meanwhile
> > (https://restic.readthedocs.io/en/latest/047_tuning_backup_parameters.html
> > #compression). However, the feature is rather new, I did not use it so
> > far.
>
> https://forum.restic.net/t/compression-support-has-landed-in-master/4997
>
> Just adding another link to the official announcement from earlier this
> year.
>
>
> cu
> Gerrit

I think In Dale's use case compression is a solution seeking to address the
problem of not enough storage space for backups, but it only makes sense if
the data can be effectively and efficiently compressed. He mentioned 99.99%
of his backup data is video. Video files are not particularly compressible,
although small space savings can be achieved. For example using basic enough
zst parameters '-19 --rsyncable -z' I got just a 1.6% file reduction:

Frames Skips Compressed Uncompressed Ratio Check
1 0 88.9 MiB 90.3 MiB 1.016 XXH64

Even if compression delivers some small space saving, given Dale's new faster
Internet link and local video storage tendencies, compression will only kick
the can down the road. If these are not private or rare videos and remain
available on public streaming platforms, perhaps local storage is no longer
necessary?
Re: Backup program that compresses data but only changes new files. [ In reply to ]
On Sun, 2022-08-14 at 21:42 -0400, Julien Roy wrote:
> Hello,
>
> On 8/14/22 18:44, Dale wrote:
> > Thoughts?  Ideas?
>
> You might be interested in borgbackup [1]
> It takes delta backups and has de-duplication and compression to save
> some space. It supports encryption too.
> It's packaged in ::gentoo and you run it on whatever machine you want to
> backup and give it its destination, it can be local or on a remote machine.

Seconding borg.  Once I switched to this, I never have to touch the
backup process.  I wrote a little handful of shell scripts as wrappers
around borg backup/restore/list and they've been running seamlessly for
years, both locally and off-site.
Re: Backup program that compresses data but only changes new files. [ In reply to ]
On Sun, Aug 14, 2022 at 4:21 PM Dale <rdalek1967@gmail.com> wrote:
>
> Mark Knecht wrote:
>
>
>
> On Sun, Aug 14, 2022 at 3:44 PM Dale <rdalek1967@gmail.com> wrote:
> >
> <SNIP>
> >
> > Thoughts? Ideas?
> >
> > Dale
> >
> > :-) :-)
>
> Do you happen to have an old computer laying around? If so check
> out TrueNAS Core.
<SNIP>
> That may be a option later. I'm actually considering build a NAS but
right now, costs are preventing that. I almost have enough that I could
build another computer. I have a mobo, memory, CPU and such. I think I
only need a power supply and maybe a video card. Could use a case for it
to but could mount it on a wall somewhere. Good air flow. lol
<SNIP>
> This new fiber thing is going to take some getting used too. ;-)

I experienced much of the same thing (more data) when my connection got
faster.

Expense of a separate system to build a NAS is always an issue and you've
received excellent guidance from other folks here about how to do it
locally so I think you're set.

A couple of things:

1) I didn't see mentioned so I will - the NAS, being on the network, is
connected over gigabit Ethernet in my case so backups are significantly
faster than using USB drives, or at least much faster than my older USB. I
get about 800mbit/Sec sustained transfers. Once you get the main backup
done the incremental ones are very fast. (Go to the kitchen fast)

2) The NAS, when attached, is mounted over NFS as a directory and I use
rsync to do the transfers so it's all very familiar on the client side. I
think that's important to you today but likely won't be as much of an issue
if you get used to some new backup application.

3) Compression is done on the NAS and is transparent from the client side.
I can browse directories and retrieve individual files. As I think you
mentioned you won't get much compression - close to zero - for movies but
for my general data and VMs overall I'm getting about 40% so there's a big
disk saving. Compute requirements are pretty low. I bought a used MB with a
6th gen i5 Core processor with 4 cores and it hardly works to do the
compression.

Good luck with whatever you do.

Mark
RE: Backup program that compresses data but only changes new files. [ In reply to ]
>>
>
>
>Duplicity sounds interesting except that I already have the drive encrypted. Keep in mind, these are external drives that I hook up long enough to complete the backups then back in a fire safe they go. The reason I mentioned being like rsync, I don't want to rebuild a backup from scratch each time as that would be time consuming. I thought of using Kbackup ages ago and it rebuilds from scratch each time but it does have the option of compressing. That might work for small stuff but not many TBs of it. Back in the early 90's, I remember using a backup software that was incremental. It would only update files that changed and would do it over several floppy disks and compressed it as well. Something like that nowadays is likely rare if it exists at all since floppies are long dead. I either need to split my backup into two pieces or compress my data. That is why I mentioned if there is a way to backup first part of alphabet in one command, switch disks and then do second part of alphabet to another disk.
>
>Mostly, I just want to add compression to what I do now. I figure there is a tool for it but no idea what it is called. Another method is splitting into two parts. In the long run, either should work and may end up needing both at some point. :/ If I could add both now, save me some problems later on. I guess.
>
>I might add, I also thought about using a Raspberry Pi thingy and having sort of a small scale NAS thing. I'm not sure about that thing either tho. Plus, they pricey right now. $$$
>
>Dale
>
>:-) :-)
>

Ok, so you have a few options here. Duplicity and Borg seem to be two of the most popular, and with good reason. They are quite powerful.

Duplicity due to the massive number of storage backends it supports, meaning that the difference between backing up to your on-site disks or shooting it off over the Internet to practically any storage service you care to think of is one parameter. (And I recommend, if nothing else, coordinating with a friend in a different city to do precisely this. Fire safes are good to have, but the contents don't always survive a really big fire.)

Borg is more picky, it only directly works to a local disk or via ssh. But that's because it has a potent, chunk-based storage algorithm similar to what rsync uses to save transfer bandwidth. It's very good at finding duplicate files, or even duplicate pieces of files, and storing them only once. This makes it amazingly good for things like VM images or other large files which accumulate small changes over time, or full OS backups (you'd be amazed how many duplicate files there are across a Linux OS).

Now, if you want to stick with old stuff that you thoroughly understand, that's fine too. For a dirt simple program capable of incremental backups and splitting the archive between disks you're looking for...

wait for it...

tar.

It's ability to detect files which have changed is largely dependent on filesystem timestamps and the archive bit, so you have to make sure your usage pattern respects those. And it doesn't really do deduplication. But it actually has a reasonable set of backup features, including archive splitting. Your backup storage doesn't even need to support random access, and doesn't even need a filesystem. A bunch of my backups are on BD-REs You just tell tar how big the disk is, pop it in, and hit go. When it's full it asks for another one. There are a few updated versions of tar which add things like indexes for fast seeking and other features which are handy on large data sets.

Personally these days I tend to use Borg, because it deduplicates really well, and archives can be thinned out in any order. It's also useful that you can put the backup archive in "append only" mode so that if anyone gets ransomware onto your system it's much more difficult for them to corrupt your backups.

The other thing is data integrity checking on your storage. Yes, disks have built-in ECC, but it's not terribly good. As annoying as it might be to have to hook up more than one disk at a time, BTRFS RAID triggers not only on complete read failures, but also keeps additional checksums such that it can detect and recover even single bit flips. And it supports in-line compression. (How well that works obviously depends on how compressible your data is.) You can do similar things with LVM and/or mdraid, but the BTRFS checksums are the most comprehensive I've seen so far.

For optical media there's dvdisaster which can generate Reed-Solomon redundancy data in a variety of ways. (Yes, I know, nobody uses optical any more... But what other storage is easily available that's EMP-proof? Solar flares can be wicked when they happen.)

And there's one more, that I haven't used in years, and I'm not sure how well it would work with Gentoo, but it was still alive as of 2020. mondorescue.org is an interesting concept where it takes your currently running system and all the data on it and turns it into a bootable image, with disk-spanning as necessary. It's designed primarily for CentOS, and I've only ever used it with Debian, but when it works it makes bare-metal restores really simple. Boot your backup drive, swap disks when prompted if necessary, and when it's done, there you are, everything right where you left it.

LMP
Re: Backup program that compresses data but only changes new files. [ In reply to ]
On 15/08/2022 10:45, John Covici wrote:
> zfs would solve your problem of corruption, even without versioning.
> You do a scrub at short intervals and at least you would know if the
> file is corrupted. Of course, redundancy is better, such as mirroring
> and backups take a very short time because sending from one zfs to
> another it knows exactly what bytes to send.

I don't think he means a corrupted file, he means a corrupted video. If
the drive faithfully records the corrupted feed, the filesystem is not
going to catch it!

Cheers,
Wol
Re: Backup program that compresses data but only changes new files. [ In reply to ]
On 15/08/2022 11:11, Rich Freeman wrote:
> I see lots of talk of NAS and zfs/btrfs and snapshots. IMO these are
> NOT really great solutions for backup. NAS can work of course but it
> is overkill for backup storage.

Do you want multiple *independent* backups, or do you want *incremental*
backups so you can go back in time. It's nice to have both, but
snapshotting gives you full backups for the price of incremental.

Cheers,
Wol
Re: Backup program that compresses data but only changes new files. [ In reply to ]
On Monday, August 15, 2022 9:05:24 AM CEST Dale wrote:
> Rich Freeman wrote:
> > On Sun, Aug 14, 2022 at 6:44 PM Dale <rdalek1967@gmail.com> wrote:
> >> Right now, I'm using rsync which doesn't compress files but does just
> >> update things that have changed. I'd like to find some way, software
> >> but maybe there is already a tool I'm unaware of, to compress data and
> >> work a lot like rsync otherwise.
> >
> > So, how important is it that it work exactly like rsync?
> >
> > I use duplicity, in part because I've been using it forever. Restic
> > seems to be a similar program most are using these days which I
> > haven't looked at super-closely but I'd look at that first if starting
> > out.
> >
> > Duplicity uses librsync, so it backs up exactly the same data as rsync
> > would, except instead of replicating entire files, it creates streams
> > of data more like something like tar. So if you back up a million
> > small files you might get out 1-3 big files. It can compress and
> > encrypt the data as you wish. The downside is that you don't end up
> > with something that looks like your original files - you have to run
> > the restore process to extract them all back out. It is extremely
> > space-efficient though - if 1 byte changes in the middle of a 10GB
> > file you'll end up just backing up maybe a kilobyte or so (whatever
> > the block size is), which is just like rsync.
> >
> > Typically you rely on metadata to find files that change which is
> > fast, but I'm guessing you can tell these programs to do a deep scan
> > which of course requires reading the entire contents, and that will
> > discover anything that was modified without changing ctime/mtime.
> >
> > The output files can be split to any size, and the index info (the
> > metadata) is separate from the raw data. If you're storing to
> > offline/remote/cloud/whatever storage typically you keep the metadata
> > cached locally to speed retrieval and to figure out what files have
> > changed for incrementals. However, if the local cache isn't there
> > then it will fetch just the indexes from wherever it is stored
> > (they're small).
> >
> > It has support for many cloud services - I store mine to AWS S3.
> >
> > There are also some options that are a little closer to rsync like
> > rsnapshot and burp. Those don't store compressed (unless there is an
> > option for that or something), but they do let you rotate through
> > multiple backups and they'll set up hard links/etc so that they are
> > de-duplicated. Of course hard links are at the file level so if 1
> > byte inside a file changes you'll end up with two full copies. It
> > will still only transfer a single block so the bandwidth requirements
> > are similar to rsync.
>
> Duplicity sounds interesting except that I already have the drive
> encrypted. Keep in mind, these are external drives that I hook up long
> enough to complete the backups then back in a fire safe they go. The
> reason I mentioned being like rsync, I don't want to rebuild a backup
> from scratch each time as that would be time consuming. I thought of
> using Kbackup ages ago and it rebuilds from scratch each time but it
> does have the option of compressing. That might work for small stuff
> but not many TBs of it. Back in the early 90's, I remember using a
> backup software that was incremental. It would only update files that
> changed and would do it over several floppy disks and compressed it as
> well. Something like that nowadays is likely rare if it exists at all
> since floppies are long dead. I either need to split my backup into two
> pieces or compress my data. That is why I mentioned if there is a way
> to backup first part of alphabet in one command, switch disks and then
> do second part of alphabet to another disk.

Actually, there still is a piece of software that does this:
" app-backup/dar "
You can tell it to split the backups into slices of a specific size.

--
Joost
Re: Backup program that compresses data but only changes new files. [ In reply to ]
On Monday, August 15, 2022 12:11:34 PM CEST Rich Freeman wrote:

<snipped>

> The main issue I think you're going to have is having support for
> multi-volume backups if you need to be able to split a backup across
> drives. The only thing I've found on Linux that does this is bacula,
> and it is a royal pain that I'm embarrassed to even mention. If
> somebody knows of another backup solution that can write the output to
> disk (a filesystem, not /dev/rmt) and then pause to mount a new disk
> when one fills up, I'm all ears.

app-backup/dar

For a "brief" guide on how to use it:
http://dar.linux.free.fr/doc/Tutorial.html
Re: Backup program that compresses data but only changes new files. [ In reply to ]
On 15/08/2022 08:52, Dale wrote:
> I just hope this 10TB drive isn't a SMR.  I googled around and the best
> I could find is anything above 8TB is CMR.  It's a WD101EDBZ-11B1DA0.  I
> hope that is right.  I'm not totally opposed to SMR even as a backup but
> I'd rather not.  The deal I found was for a pull and costs about $110
> including shipping.  I looked at a 14TB but my jaw dropped.  $$$$$$$$

Just done a search for you (it helps I know what I'm looking for), but
CMR it is ...

https://nascompares.com/answer/list-of-wd-cmr-and-smr-hard-drives-hdd/

I tend to go for OEM stuff, and know the names of the range I'm looking
for - Seagate Ironwolf, Toshiba N300 - about £170 for 8TB ...

Cheers,
Wol
Re: Backup program that compresses data but only changes new files. [ In reply to ]
On Monday, August 15, 2022 9:52:26 AM CEST Dale wrote:
> Julien Roy wrote:
> > Hello,
> >
> > On 8/14/22 18:44, Dale wrote:
> >> Thoughts? Ideas?
> >
> > You might be interested in borgbackup [1]
> > It takes delta backups and has de-duplication and compression to save
> > some space. It supports encryption too.
> > It's packaged in ::gentoo and you run it on whatever machine you want
> > to backup and give it its destination, it can be local or on a remote
> > machine.
> >
> > I've been using it for a while and it works well. I have it configured
> > on a crontab and it backups my files every night
> >
> > [1] https://www.borgbackup.org/
>
> Since my drives are external, I do my backups manually. Well, I start
> it when the drives are connected and ready. I think borgbackup was one
> I looked into and it sounded more like a online backup where you store
> the data on a server somewhere. I may be wrong on that tho. I looked
> at several and it got confusing after a bit. Plus, some were still as
> clear as mud. Why do people link to a place that doesn't tell what
> their software does and how anyway. It seems most think github and such
> are good places to link to when it really doesn't tell you anything
> unless you want to help develop the software or something. It would be
> like Ford linking to CAD models to sell cars. :/
>
> To all: I found a good deal on a 10TB drive. That should suffice for
> now. I might add, it will give me time to figure out a path forward and
> I can make other use of that SMR drive. One thing I thought of as a
> negative for a NAS, I can't lock it into my safe, unless it is really
> tiny. As it is, even if a fire comes along, I still got backups.

I looked into this as well. A safe works like a nice little oven and the
temperature inside the safe can go really high if it's inside a fire.
Not all storage system (HDDs included) are reliable when the temperatures go
to these extremes.

Make sure the safe is, apart from resistent to fire, also capable of keeping
the heat outside.

--
Joost
Re: Backup program that compresses data but only changes new files. [ In reply to ]
On Mon, Aug 15, 2022 at 2:32 PM Wol <antlists@youngman.org.uk> wrote:
>
> On 15/08/2022 11:11, Rich Freeman wrote:
> > I see lots of talk of NAS and zfs/btrfs and snapshots. IMO these are
> > NOT really great solutions for backup. NAS can work of course but it
> > is overkill for backup storage.
>
> Do you want multiple *independent* backups, or do you want *incremental*
> backups so you can go back in time. It's nice to have both, but
> snapshotting gives you full backups for the price of incremental.

Snapshots don't give you backups at all, unless you first make a
snapshot and then take a backup (full or incremental) of the snapshot,
or serialize them using a send-like mechanism (which can also be full
or incremental).

If you destroy the drives containing the original copy, then you
destroy all the snapshots as well.

COW snapshots are great, but they're more about how you handle your
LIVE data. They don't address the same failure modes as backups. I
use both.

--
Rich
Re: Backup program that compresses data but only changes new files. [ In reply to ]
On Mon, Aug 15, 2022 at 2:34 PM J. Roeleveld <joost@antarean.org> wrote:
>
> Actually, there still is a piece of software that does this:
> " app-backup/dar "
> You can tell it to split the backups into slices of a specific size.

dar is a great tool, but unless something changed I don't think you
can tell it to pause to remount the destination directory when it
fills up. As was pointed out, tar does do this (which I thought was
limited to tape devices, but apparently it works for disk as well).

For most backup tools, if you want to backup 100TB of data to a bunch
of 10TB drives, you basically need to have 100TB of scratch storage
lying around to hold the backup before you split it up, or be able to
mount all your drives at the same time in some kind of combined
filesystem.

--
Rich
Re: Backup program that compresses data but only changes new files. [ In reply to ]
J. Roeleveld wrote:
> On Monday, August 15, 2022 12:44:11 AM CEST Dale wrote:
>> Howdy,
>>
>> With my new fiber internet, my poor disks are getting a work out, and
>> also filling up. First casualty, my backup disk. I have one directory
>> that is . . . well . . . huge. It's about 7TBs or so. This is where it
>> is right now and it's still trying to pack in files.
>>
>> /dev/mapper/8tb 7.3T 7.1T 201G 98% /mnt/8tb
> <snipped>
>
>> Thoughts? Ideas?
> Plenty, see below:
>
> For backups to external disks, I would recommend having a look at "dar" :
> $ eix -e dar
> * app-backup/dar
> Available versions: 2.7.6^t ~2.7.7^t {argon2 curl dar32 dar64 doc gcrypt
> gpg lz4 lzo nls rsync threads xattr}
> Homepage: http://dar.linux.free.fr/
> Description: A full featured backup tool, aimed for disks
>
> It's been around for a while and the developer is active and responds quite
> well to questions.
> It supports compression (different compression methods), incremental backups
> (only need a catalogue of the previous backup for the incremental) and
> encryption.
>
> The NAS options others mentioned would also work as they can compress data on
> disk and you'd only notice a delay in writing/reading (depending on the
> compression method used). I would recommend using one that uses ZFS on-disk as
> it's more reliable and robust then BTRFS.
>
> One option that comes available for you now that you are no longer limited to
> slow ADSL: Cloud backups.
>
> I use Backblaze (B2) to store compressed backups that haven't been stored on
> tape to off-site locations.
>
> But, you can also encrypt the backups locally and store the
> encrypted+compressed backupfiles on other cloud storage.
>
> --
> Joost
>


Dar does sound interesting.  It sounds a lot like what I used way back
in the 90's.  I'm sure it is different software but could work on
floppies then like it does on USB sticks etc today.  Same principal. 

I looked into ZFS as well.  Google helped me find a interesting page.  I
notice it is also used on some NAS setups as well.  It seems to be
advanced and maintained well.  It sounds a little like LVM but may have
more features, such as compression maybe?  I haven't read that far yet. 
I notice it mentions snapshots which LVM also uses. 

Getting plenty of ideas.  I just wish I had a separate building to put a
NAS in that would be safe and climate controlled.  I got a out building
but it gets plenty hot in the summer.  No A/C or anything.  I only heat
it enough to prevent freezing but computers would likely like that anyway. 

Dale

:-)  :-) 
Re: Backup program that compresses data but only changes new files. [ In reply to ]
Mark Knecht wrote:
>
>
> On Sun, Aug 14, 2022 at 4:21 PM Dale <rdalek1967@gmail.com
> <mailto:rdalek1967@gmail.com>> wrote:
> >
> > Mark Knecht wrote:
> >
> >
> >
>
> >
> > Do you happen to have an old computer laying around? If so check
> > out TrueNAS Core.
> <SNIP>
> > That may be a option later.  I'm actually considering build a NAS
> but right now, costs are preventing that.  I almost have enough that I
> could build another computer.  I have a mobo, memory, CPU and such.  I
> think I only need a power supply and maybe a video card.  Could use a
> case for it to but could mount it on a wall somewhere.  Good air flow.
>  lol
> <SNIP>
> > This new fiber thing is going to take some getting used too.  ;-)
>
> I experienced much of the same thing (more data) when my connection
> got faster. 
>
> Expense of a separate system to build a NAS is always an issue and
> you've received excellent guidance from other folks here about how to
> do it locally so I think you're set.
>
> A couple of things:
>
> 1) I didn't see mentioned so I will - the NAS, being on the network,
> is connected over gigabit Ethernet in my case so backups are
> significantly faster than using USB drives, or at least much faster
> than my older USB. I get about 800mbit/Sec sustained transfers. Once
> you get the main backup done the incremental ones are very fast. (Go
> to the kitchen fast) 
>
> 2) The NAS, when attached, is mounted over NFS as a directory and I
> use rsync to do the transfers so it's all very familiar on the client
> side. I think that's important to you today but likely won't be as
> much of an issue if you get used to some new backup application.
>
> 3) Compression is done on the NAS and is transparent from the client
> side. I can browse directories and retrieve individual files. As I
> think you mentioned you won't get much compression - close to zero -
> for movies but for my general data and VMs overall I'm getting about
> 40% so there's a big disk saving. Compute requirements are pretty low.
> I bought a used MB with a 6th gen i5 Core processor with 4 cores and
> it hardly works to do the compression. 
>
> Good luck with whatever you do.
>
> Mark


As it is, I have several options.  In a way, I wish I could tell rsync
to do 1st half of alphabet to one drive and then with next command tell
it to do the 2nd half of alphabet.  That would likely split the data in
half for each one.  If videos can't be compressed, that would be the
best idea, outside a NAS that works as a backup.  On that note, power
from the wall wise, I like the Raspberry Pi idea.  Dang things are
pretty powerful and pull very little power.  I think 20 watts or so at
most for the board itself then add a little for hard drive power.  I
can't imagine a complete setup pulling more than 50 or 75 watts when
idle and maybe even when really busy. 

I need to do something but not sure what yet.  :/   USPS says larger
drive will be here Friday.  That should give me time to run SMART tests
and then update backups Saturday or Sunday.  I already planned what goes
where.  Then I'll have a spare 3TB drive.  I really wish I had bought
several more Rosewill external drive enclosures.  So far, I have yet to
have a drive problem while using one of those.  The eSATA ports are nice
too. 

Dale

:-)  :-) 
Re: Backup program that compresses data but only changes new files. [ In reply to ]
On Monday, August 15, 2022 8:56:30 PM CEST Rich Freeman wrote:
> On Mon, Aug 15, 2022 at 2:34 PM J. Roeleveld <joost@antarean.org> wrote:
> > Actually, there still is a piece of software that does this:
> > " app-backup/dar "
> > You can tell it to split the backups into slices of a specific size.
>
> dar is a great tool, but unless something changed I don't think you
> can tell it to pause to remount the destination directory when it
> fills up. As was pointed out, tar does do this (which I thought was
> limited to tape devices, but apparently it works for disk as well).

Actually, you can with the "-p / --pause" option.
Also, as per the man-page, if you forget this, the process will simply inform
you the target location is full and you can move slices away to a different
location:
"
If the destination filesystem is too small to contain all the slices of the
backup, the -p option (pausing before starting new slices) might be of
interest. Else, in the case the filesystem is full, dar will suspend the
operation, asking for the user to make free space, then continue its
operation. To make free space, the only thing you cannot do is to touch the
slice being written.
"

The pause-option will actually stop between slices and you can umount the
target location and mount a different disk there.

This option has been around for a while.

--
Joost
Re: Backup program that compresses data but only changes new files. [ In reply to ]
On Monday, August 15, 2022 9:07:41 PM CEST Dale wrote:
> J. Roeleveld wrote:
> > On Monday, August 15, 2022 12:44:11 AM CEST Dale wrote:
> >> Howdy,
> >>
> >> With my new fiber internet, my poor disks are getting a work out, and
> >> also filling up. First casualty, my backup disk. I have one directory
> >> that is . . . well . . . huge. It's about 7TBs or so. This is where it
> >> is right now and it's still trying to pack in files.
> >>
> >> /dev/mapper/8tb 7.3T 7.1T 201G 98% /mnt/8tb
> >
> > <snipped>
> >
> >> Thoughts? Ideas?
> >
> > Plenty, see below:
> >
> > For backups to external disks, I would recommend having a look at "dar" :
> > $ eix -e dar
> > * app-backup/dar
> >
> > Available versions: 2.7.6^t ~2.7.7^t {argon2 curl dar32 dar64 doc
> > gcrypt
> >
> > gpg lz4 lzo nls rsync threads xattr}
> >
> > Homepage: http://dar.linux.free.fr/
> > Description: A full featured backup tool, aimed for disks
> >
> > It's been around for a while and the developer is active and responds
> > quite
> > well to questions.
> > It supports compression (different compression methods), incremental
> > backups (only need a catalogue of the previous backup for the
> > incremental) and encryption.
> >
> > The NAS options others mentioned would also work as they can compress data
> > on disk and you'd only notice a delay in writing/reading (depending on
> > the compression method used). I would recommend using one that uses ZFS
> > on-disk as it's more reliable and robust then BTRFS.
> >
> > One option that comes available for you now that you are no longer limited
> > to slow ADSL: Cloud backups.
> >
> > I use Backblaze (B2) to store compressed backups that haven't been stored
> > on tape to off-site locations.
> >
> > But, you can also encrypt the backups locally and store the
> > encrypted+compressed backupfiles on other cloud storage.
> >
> > --
> > Joost
>
> Dar does sound interesting. It sounds a lot like what I used way back
> in the 90's. I'm sure it is different software but could work on
> floppies then like it does on USB sticks etc today. Same principal.

If it was during the 90's, then it wasn't. First version was released in 2002.

> I looked into ZFS as well. Google helped me find a interesting page. I
> notice it is also used on some NAS setups as well. It seems to be
> advanced and maintained well. It sounds a little like LVM but may have
> more features, such as compression maybe? I haven't read that far yet.
> I notice it mentions snapshots which LVM also uses.

ZFS does a lot more then just LVM+Ext4. But it really needs multiple disks for
all the anti-corruption features as well.

> Getting plenty of ideas. I just wish I had a separate building to put a
> NAS in that would be safe and climate controlled. I got a out building
> but it gets plenty hot in the summer. No A/C or anything. I only heat
> it enough to prevent freezing but computers would likely like that anyway.

If you can keep it between optimal temperatures (and stable) the NAS should
manage. There is NO need to keep it at 18C (like some places do).

Also, consider a small AC unit that only cools a small box big enough for the
NAS. No need to cool an entire room.

--
Joost
Re: Backup program that compresses data but only changes new files. [ In reply to ]
Michael wrote:
> On Monday, 15 August 2022 11:58:14 BST Gerrit Kühn wrote:
>> Am Mon, 15 Aug 2022 12:50:37 +0200
>>
>> schrieb Gerrit Kühn <gerrit.kuehn@aei.mpg.de>:
>>> Being a happy restic user myself, I'd like to mention that compression is
>>> available meanwhile
>>> (https://restic.readthedocs.io/en/latest/047_tuning_backup_parameters.html
>>> #compression). However, the feature is rather new, I did not use it so
>>> far.
>> https://forum.restic.net/t/compression-support-has-landed-in-master/4997
>>
>> Just adding another link to the official announcement from earlier this
>> year.
>>
>>
>> cu
>> Gerrit
> I think In Dale's use case compression is a solution seeking to address the
> problem of not enough storage space for backups, but it only makes sense if
> the data can be effectively and efficiently compressed. He mentioned 99.99%
> of his backup data is video. Video files are not particularly compressible,
> although small space savings can be achieved. For example using basic enough
> zst parameters '-19 --rsyncable -z' I got just a 1.6% file reduction:
>
> Frames Skips Compressed Uncompressed Ratio Check
> 1 0 88.9 MiB 90.3 MiB 1.016 XXH64
>
> Even if compression delivers some small space saving, given Dale's new faster
> Internet link and local video storage tendencies, compression will only kick
> the can down the road. If these are not private or rare videos and remain
> available on public streaming platforms, perhaps local storage is no longer
> necessary?


This is correct.  My main reason for compression was to squeeze a little
more mustard in the jar.  I just did a test on a dozen or so videos. 
Uncompressed the videos was 214,931,665.  Compressed it is 200,878,554. 
It's a little saved but other videos may not compress that much.  Just
depends.  Even with that little savings, compression really isn't much
of a solution especially since I do not want any loss in data. 

Given my weird way of doing backups, rsync may be the best option. 
Plus, easier to restore from as well since it just requires a copy
command, any of them will do. 

Looks like bigger hard drives is the best idea for now.  I really need a
proper backup plan tho.  I just wish a NAS would fit in my fire safe.  ;-)

Dale

:-)  :-)
Re: Backup program that compresses data but only changes new files. [ In reply to ]
Wol wrote:
> On 15/08/2022 08:52, Dale wrote:
>> I just hope this 10TB drive isn't a SMR.  I googled around and the best
>> I could find is anything above 8TB is CMR.  It's a WD101EDBZ-11B1DA0.  I
>> hope that is right.  I'm not totally opposed to SMR even as a backup but
>> I'd rather not.  The deal I found was for a pull and costs about $110
>> including shipping.  I looked at a 14TB but my jaw dropped.  $$$$$$$$
>
> Just done a search for you (it helps I know what I'm looking for), but
> CMR it is ...
>
> https://nascompares.com/answer/list-of-wd-cmr-and-smr-hard-drives-hdd/
>
> I tend to go for OEM stuff, and know the names of the range I'm
> looking for - Seagate Ironwolf, Toshiba N300 - about £170 for 8TB ...
>
> Cheers,
> Wol

I can't express how glad I am to hear that.  While the current one is
SMR, I didn't know they even existed until I noticed it kept doing this
bumping thing long after the updates were done.  I could unmount it but
I can tell it was busy doing something but no clue what.  I actually
posted about it on this list and Rich figured it out.  Since then, I try
to avoid them just in case in the future I need to use the drive for
something SMR isn't suited for.  The almost full 8Tb drive will still be
used for backups but I'd never want to put it in my computer and use it
with anything that needs to write a lot.  Heck, even with me downloading
files over this new faster internet would put serious strain on that
drive.  Once the CMR buffer thingy fills up, it slows to about
40MBs/sec.  My internet can do 50MBs/sec easily.  I've seen it hit 70 or
better a few times. 

Glad to know what I found was good info.  I just wonder how long it will
be before even 10TB drives will be SMR.  I also dread having to search
out a 14TB drive later.  :/

Dale

:-)  :-) 
Re: Backup program that compresses data but only changes new files. [ In reply to ]
On Mon, Aug 15, 2022 at 3:20 PM J. Roeleveld <joost@antarean.org> wrote:
>
> Actually, you can with the "-p / --pause" option.
> Also, as per the man-page, if you forget this, the process will simply inform
> you the target location is full and you can move slices away to a different
> location:
> "
> If the destination filesystem is too small to contain all the slices of the
> backup, the -p option (pausing before starting new slices) might be of
> interest. Else, in the case the filesystem is full, dar will suspend the
> operation, asking for the user to make free space, then continue its
> operation. To make free space, the only thing you cannot do is to touch the
> slice being written.
> "
>
> The pause-option will actually stop between slices and you can umount the
> target location and mount a different disk there.
>
> This option has been around for a while.

Hmm, sounds kind of non-ideal.

It sounds like you can either have it pause when full, or pause
between slices. Neither is great.

If it pauses when full, and you can't touch the slice being written,
then you can't unmount the drive it is being written to. So you end
up having to write to a scratch area and keep moving slices off of
that onto another drive. At best that is extra IO which slows things
down, and of course you need scratch space.

If you pause between slices, then you have to have drives of equal
size to store to, otherwise you'll have to swap drives that aren't
completely full.

Ideally you'd want to write until the drive is almost full, then
finish a slice and pause.

However, it at least seems workable if slow. Just have scratch space
for a bunch of slices, let it pause when full, then move slices off as
they are done, and accept that your backups will run at maybe 25% of
the speed of the scratch drive since it will be constantly seeking
between writing new slices and reading old ones. Or if you have
enough RAM you could use a tmpfs for that but that seems really
cumbersome unless you use very small slices and have the shuffling
scripted.

--
Rich
Re: Backup program that compresses data but only changes new files. [ In reply to ]
On Mon, Aug 15, 2022 at 3:28 PM Dale <rdalek1967@gmail.com> wrote:
>
> Given my weird way of doing backups, rsync may be the best option.
> Plus, easier to restore from as well since it just requires a copy
> command, any of them will do.
>

If you don't make small changes inside of large files, you might
actually be fine with tar. It can do compression, though you won't
benefit much from that. It can split archives easily. It can do
incrementals, but I think they are limited to using metadata so it
depends on well-behaved software. What it doesn't do is let you
backup just a small part of a file containing larger changes.

--
Rich
Re: Backup program that compresses data but only changes new files. [ In reply to ]
Wol wrote:
> On 15/08/2022 10:45, John Covici wrote:
>> zfs would solve your problem of corruption, even without versioning.
>> You do a scrub at short intervals and at least you would know if the
>> file is corrupted.  Of course, redundancy is better, such as mirroring
>> and backups take a very short time because sending from one zfs to
>> another it knows exactly what bytes to send.
>
> I don't think he means a corrupted file, he means a corrupted video.
> If the drive faithfully records the corrupted feed, the filesystem is
> not going to catch it!
>
> Cheers,
> Wol

Yep.  Every once in a while, I download a video with better resolution
later to find out it is bad.  It gets part way through and crashes,
stops dead and sits there or just plain doesn't open.  Quite often, it
will have the correct thumbnail so it looks good but it's bad.  If I've
already trashed the old one and updated my backups, I have to go find it
again.  Given how some sites censor stuff, it could be gone for good. 
Generally, I can either catch it in the trash or on the backup that
hasn't been updated yet.  Given time, I'll miss one one day. 

The issues having a lot of files causes.  lol 

Dale

:-)  :-) 
Re: Backup program that compresses data but only changes new files. [ In reply to ]
> As it is, I have several options. In a way, I wish I could tell rsync to
do 1st half of alphabet to one drive and then with next command tell it to
do the 2nd half of alphabet. That would likely split the data in half for
each one.

You can do that, at least with a small kludge I'm sure. rsync supports
excluding directories and file names. As an example:

rsync -avx --port=873 --exclude={.cache,.nv,'google-chrome*',DiskImages}
/home/mark mark@truenas1:/mn
t/MyPool/mark/Backups/science/.

There's a test open (?? -n maybe ??) that allows you to see what it would
do.

I'm sure you can figure that part out. The above line is just in a script
file for me

HTH,
Mark
Re: Backup program that compresses data but only changes new files. [ In reply to ]
On Mon, Aug 15, 2022 at 3:41 PM Dale <rdalek1967@gmail.com> wrote:
>
> Glad to know what I found was good info. I just wonder how long it will
> be before even 10TB drives will be SMR. I also dread having to search
> out a 14TB drive later. :/
>

I think it will be a long time if ever, and here is why.

There are good reasons and bad reasons to use SMR. The reason you
would WANT to use SMR is that you have a task that is well-suited to
their limitations like backup or applications that can use log-style
storage. Ideally you'd want host-managed SMR for this. The benefit
is higher density for the cost, so you'd be doing it to get a drive
that is cheaper than it otherwise would be. However, these are all
things that would appeal to experts who really know what they're
doing.

The bad reason to use SMR is that you're a manufacturer trying to
squeeze out a bit more profit margin, not passing on the savings. In
this case you want to sell the drive to somebody who DOESN'T know what
they're doing, and make it drive-managed.

This is why we've seen SMR in medium-sized drives and not big ones as
would be expected if you assumed it would be employed for the good
reasons. The only people buying 14TB hard drives are people who tend
to know what they're doing, which makes them less of a target for
unscrupulous manufacturers. You wouldn't see them as much in small
drives as the return in capacity isn't as much. The medium sized
drives are big enough to get a return out of using SMR, but small
enough that suckers will be willing to buy them.

At least, that's my theory...

--
Rich
Re: Backup program that compresses data but only changes new files. [ In reply to ]
Mark Knecht wrote:
>
> > As it is, I have several options.  In a way, I wish I could tell
> rsync to do 1st half of alphabet to one drive and then with next
> command tell it to do the 2nd half of alphabet.  That would likely
> split the data in half for each one.  
>
> You can do that, at least with a small kludge I'm sure. rsync supports
> excluding directories and file names. As an example:
>
> rsync -avx --port=873
> --exclude={.cache,.nv,'google-chrome*',DiskImages} /home/mark
> mark@truenas1:/mn
> t/MyPool/mark/Backups/science/.
>
> There's a test open (?? -n maybe ??) that allows you to see what it
> would do.
>
> I'm sure you can figure that part out. The above line is just in a
> script file for me
>
> HTH,
> Mark


In the directory I'm backing up, there is over 400 directories.  That
would be a LOT of --exclude options.  Also, I would have to adjust the
exclude options each time I added a new directory, which can be several
a day sometimes.  The word nightmare comes to mind.  Loss of hair is
also a thought.  :-D 

I'm just glad I got a bigger hard drive coming.  That's the easiest fix
at the moment. 

Dale

:-)  :-) 
Re: Backup program that compresses data but only changes new files. [ In reply to ]
On Mon, Aug 15, 2022 at 1:17 PM Dale <rdalek1967@gmail.com> wrote:
>
> Mark Knecht wrote:
> >
> > > As it is, I have several options. In a way, I wish I could tell
> > rsync to do 1st half of alphabet to one drive and then with next
> > command tell it to do the 2nd half of alphabet. That would likely
> > split the data in half for each one.
> >
> > You can do that, at least with a small kludge I'm sure. rsync supports
> > excluding directories and file names. As an example:
> >
> > rsync -avx --port=873
> > --exclude={.cache,.nv,'google-chrome*',DiskImages} /home/mark
> > mark@truenas1:/mn
> > t/MyPool/mark/Backups/science/.
> >
> > There's a test open (?? -n maybe ??) that allows you to see what it
> > would do.
> >
> > I'm sure you can figure that part out. The above line is just in a
> > script file for me
> >
> > HTH,
> > Mark
>
>
> In the directory I'm backing up, there is over 400 directories. That
> would be a LOT of --exclude options. Also, I would have to adjust the
> exclude options each time I added a new directory, which can be several
> a day sometimes. The word nightmare comes to mind. Loss of hair is
> also a thought. :-D
>
> I'm just glad I got a bigger hard drive coming. That's the easiest fix
> at the moment.
>
> Dale
>
> :-) :-)

I have my movies in a directory called VideoLib and then subdirectories A,
B, C, etc. If you rearranged a little bit you could subdivide your movies
and use symbolic links to make it look more flat. That sort of arrangement
would mean 2 files with 13 excludes in each for the movies. You could leave
the rest alone, create a 3rd file and exclude the top level movie
directories.

Anyway, I just wanted to provide an option. Only you know what works for
your work flow.

Good luck,
Mark
Re: Backup program that compresses data but only changes new files. [ In reply to ]
Rich Freeman wrote:
> On Mon, Aug 15, 2022 at 3:41 PM Dale <rdalek1967@gmail.com> wrote:
>> Glad to know what I found was good info. I just wonder how long it will
>> be before even 10TB drives will be SMR. I also dread having to search
>> out a 14TB drive later. :/
>>
> I think it will be a long time if ever, and here is why.
>
> There are good reasons and bad reasons to use SMR. The reason you
> would WANT to use SMR is that you have a task that is well-suited to
> their limitations like backup or applications that can use log-style
> storage. Ideally you'd want host-managed SMR for this. The benefit
> is higher density for the cost, so you'd be doing it to get a drive
> that is cheaper than it otherwise would be. However, these are all
> things that would appeal to experts who really know what they're
> doing.
>
> The bad reason to use SMR is that you're a manufacturer trying to
> squeeze out a bit more profit margin, not passing on the savings. In
> this case you want to sell the drive to somebody who DOESN'T know what
> they're doing, and make it drive-managed.
>
> This is why we've seen SMR in medium-sized drives and not big ones as
> would be expected if you assumed it would be employed for the good
> reasons. The only people buying 14TB hard drives are people who tend
> to know what they're doing, which makes them less of a target for
> unscrupulous manufacturers. You wouldn't see them as much in small
> drives as the return in capacity isn't as much. The medium sized
> drives are big enough to get a return out of using SMR, but small
> enough that suckers will be willing to buy them.
>
> At least, that's my theory...
>


And that theory makes sense.  After you pointed out that my bumpy thing
was due to it being a SMR drive, I found out that several drive makers
were not telling people they were getting 2nd best drives.  I read about
a LOT of people who use RAID and such getting hit hard with this.  Some
seem to have taken basically new drives and created a doorstop pretty
quick.  I hope it was within the warranty time so the maker can foot the
bill for their deception.  It seems that consumers learned a lesson. 
I'm not sure about the makers but at least now they do make public what
drives are SMR so people who have use cases that result in dead SMR
drives can avoid buying one. 

I'm just glad to know what I ordered is what I expect.  No unpleasant
surprises.  I hope.

Dale

:-)  :-) 
RE: Backup program that compresses data but only changes new files. [ In reply to ]
>-----Original Message-----
>From: Dale <rdalek1967@gmail.com>
>Sent: Monday, August 15, 2022 12:47 PM
>To: gentoo-user@lists.gentoo.org
>Subject: Re: [gentoo-user] Backup program that compresses data but only changes new files.
>
>Wol wrote:
>> On 15/08/2022 10:45, John Covici wrote:
>>> zfs would solve your problem of corruption, even without versioning.
>>> You do a scrub at short intervals and at least you would know if the
>>> file is corrupted. Of course, redundancy is better, such as
>>> mirroring and backups take a very short time because sending from one
>>> zfs to another it knows exactly what bytes to send.
>>
>> I don't think he means a corrupted file, he means a corrupted video.
>> If the drive faithfully records the corrupted feed, the filesystem is
>> not going to catch it!
>>
>> Cheers,
>> Wol
>
>Yep. Every once in a while, I download a video with better resolution later to find out it is bad. It gets part way through and crashes, stops dead and sits there or just plain doesn't open. Quite often, it will have the correct thumbnail so it looks good but it's bad. If I've already trashed the old one and updated my backups, I have to go find it again. Given how some sites censor stuff, it could be gone for good. Generally, I can either catch it in the trash or on the backup that hasn't been updated yet. Given time, I'll miss one one day.
>
>The issues having a lot of files causes. lol
>
>Dale
>
>:-) :-)

You might consider just running ffmpeg or something headless like that over the file to see if there are errors before trashing the old version. Should even be pretty easy to script. If pass, replace old file, if fail raise notice box. Then you don't have to sit and wait for results.

LMP
RE: Backup program that compresses data but only changes new files. [ In reply to ]
>-----Original Message-----
>From: Rich Freeman <rich0@gentoo.org>
>Sent: Monday, August 15, 2022 12:52 PM
>To: gentoo-user@lists.gentoo.org
>Subject: Re: [gentoo-user] Backup program that compresses data but only changes new files.
>
>On Mon, Aug 15, 2022 at 3:41 PM Dale <rdalek1967@gmail.com> wrote:
>>
>> Glad to know what I found was good info. I just wonder how long it
>> will be before even 10TB drives will be SMR. I also dread having to
>> search out a 14TB drive later. :/
>>
>
>I think it will be a long time if ever, and here is why.
>
>There are good reasons and bad reasons to use SMR. The reason you would WANT to use SMR is that you have a task that is well-suited to their limitations like backup or applications that can use log-style storage. Ideally you'd want host-managed SMR for this. The benefit is higher density for the cost, so you'd be doing it to get a drive that is cheaper than it otherwise would be. However, these are all things that would appeal to experts who really know what they're doing.
>
>The bad reason to use SMR is that you're a manufacturer trying to squeeze out a bit more profit margin, not passing on the savings. In this case you want to sell the drive to somebody who DOESN'T know what they're doing, and make it drive-managed.
>
>This is why we've seen SMR in medium-sized drives and not big ones as would be expected if you assumed it would be employed for the good reasons. The only people buying 14TB hard drives are people who tend to know what they're doing, which makes them less of a target for unscrupulous manufacturers. You wouldn't see them as much in small drives as the return in capacity isn't as much. The medium sized drives are big enough to get a return out of using SMR, but small enough that suckers will be willing to buy them.
>
>At least, that's my theory...
>
>--
>Rich
>
>

A big chunk of it is that, when SMR drives came out, there was no reliable OS support for it, so it basically had to be drive-managed. Which then had horrible performance, and the cherry on top was that the drive manufacturers tried to cover up what they'd changed. So that made lots of the big companies doing big storage applications decide that SMR was crap and they simply will not buy SMR drives at any price.

Which is a real pity because there are lots of large-data applications where the write order is pretty much entirely sequential, so a properly designed, host managed system would see virtually no performance loss from SMR, and be able to take advantage of the higher density.

The moral is: be transparent with your customers.

LMP
Re: Backup program that compresses data but only changes new files. [ In reply to ]
Laurence Perkins wrote:
>
>> -----Original Message-----
>> From: Rich Freeman <rich0@gentoo.org>
>> Sent: Monday, August 15, 2022 12:52 PM
>> To: gentoo-user@lists.gentoo.org
>> Subject: Re: [gentoo-user] Backup program that compresses data but only changes new files.
>>
>> On Mon, Aug 15, 2022 at 3:41 PM Dale <rdalek1967@gmail.com> wrote:
>>> Glad to know what I found was good info. I just wonder how long it
>>> will be before even 10TB drives will be SMR. I also dread having to
>>> search out a 14TB drive later. :/
>>>
>> I think it will be a long time if ever, and here is why.
>>
>> There are good reasons and bad reasons to use SMR. The reason you would WANT to use SMR is that you have a task that is well-suited to their limitations like backup or applications that can use log-style storage. Ideally you'd want host-managed SMR for this. The benefit is higher density for the cost, so you'd be doing it to get a drive that is cheaper than it otherwise would be. However, these are all things that would appeal to experts who really know what they're doing.
>>
>> The bad reason to use SMR is that you're a manufacturer trying to squeeze out a bit more profit margin, not passing on the savings. In this case you want to sell the drive to somebody who DOESN'T know what they're doing, and make it drive-managed.
>>
>> This is why we've seen SMR in medium-sized drives and not big ones as would be expected if you assumed it would be employed for the good reasons. The only people buying 14TB hard drives are people who tend to know what they're doing, which makes them less of a target for unscrupulous manufacturers. You wouldn't see them as much in small drives as the return in capacity isn't as much. The medium sized drives are big enough to get a return out of using SMR, but small enough that suckers will be willing to buy them.
>>
>> At least, that's my theory...
>>
>> --
>> Rich
>>
>>
> A big chunk of it is that, when SMR drives came out, there was no reliable OS support for it, so it basically had to be drive-managed. Which then had horrible performance, and the cherry on top was that the drive manufacturers tried to cover up what they'd changed. So that made lots of the big companies doing big storage applications decide that SMR was crap and they simply will not buy SMR drives at any price.
>
> Which is a real pity because there are lots of large-data applications where the write order is pretty much entirely sequential, so a properly designed, host managed system would see virtually no performance loss from SMR, and be able to take advantage of the higher density.
>
> The moral is: be transparent with your customers.
>
> LMP


I do think that there are use cases where a SMR is just fine.  Thing is,
the customer should know what they getting to make sure they have a use
case where SMR works fine.  The drive makers should have told users up
front to avoid such problems and damage to the new technology.  As you
point out, the makers by hiding it, damaged their own reputations when
they tried to hide the facts.  For my backup, it does take longer to
sort the data out but it does work OK.  After I'm done with my backups,
I unmount and close the encryption on the drive.  Then I just let it sit
there until the bumpy thing stops.  Quite often, it's 5 or 10 minutes
but on occasion it will do its bumpy thing for 30 minutes or so.  Some
backup changes are quite large.  Would I have bought it if I knew what
it was and the extra time it takes, no.  While it works, it is
inconvenient for me to have to wait. 

Now that I know about SMR drives, I try to avoid them because I do
sometimes rotate drives around and even tho it is fine for backups, I
can't for example use that drive if I were to build a RAID system in a
NAS or other use cases where SMR drives perform badly.  I think when a
drive is sold that is SMR, it should be in the description to make sure
it is known.  It should have been from the start.  Sadly, it's rare that
it is even now. 

As always, it just depends on what you plan to do with it. 

Dale

:-)  :-) 

P. S.  Please pardon my time between replies.  Since I use a VPN and
most all of them block email, I have to stop the VPN to fetch and send
emails.  Eventually I'll figure out how to tunnel Seamonkey through the
VPN without having to stop it.  Eventually.  Maybe.  lol 
Re: Backup program that compresses data but only changes new files. [ In reply to ]
Mark Knecht wrote:
>
>
> On Mon, Aug 15, 2022 at 1:17 PM Dale <rdalek1967@gmail.com
> <mailto:rdalek1967@gmail.com>> wrote:
> >
> > Mark Knecht wrote:
> > >
> > > > As it is, I have several options.  In a way, I wish I could tell
> > > rsync to do 1st half of alphabet to one drive and then with next
> > > command tell it to do the 2nd half of alphabet.  That would likely
> > > split the data in half for each one.  
> > >
> > > You can do that, at least with a small kludge I'm sure. rsync supports
> > > excluding directories and file names. As an example:
> > >
> > > rsync -avx --port=873
> > > --exclude={.cache,.nv,'google-chrome*',DiskImages} /home/mark
> > > mark@truenas1:/mn
> > > t/MyPool/mark/Backups/science/.
> > >
> > > There's a test open (?? -n maybe ??) that allows you to see what it
> > > would do.
> > >
> > > I'm sure you can figure that part out. The above line is just in a
> > > script file for me
> > >
> > > HTH,
> > > Mark
> >
> >
> > In the directory I'm backing up, there is over 400 directories.  That
> > would be a LOT of --exclude options.  Also, I would have to adjust the
> > exclude options each time I added a new directory, which can be several
> > a day sometimes.  The word nightmare comes to mind.  Loss of hair is
> > also a thought.  :-D
> >
> > I'm just glad I got a bigger hard drive coming.  That's the easiest fix
> > at the moment.
> >
> > Dale
> >
> > :-)  :-)
>
> I have my movies in a directory called VideoLib and then
> subdirectories A, B, C, etc. If you rearranged a little bit you could
> subdivide your movies and use symbolic links to make it look more
> flat. That sort of arrangement would mean 2 files with 13 excludes in
> each for the movies. You could leave the rest alone, create a 3rd file
> and exclude the top level movie directories.
>
> Anyway, I just wanted to provide an option. Only you know what works
> for your work flow.
>
> Good luck,
> Mark
>
>


I did think about doing it that way but sometimes I can't recall the
name of something I'm looking for and just have to scan down the list. 
Breaking it up that way would complicate the matter because I'd have to
go look deeper.  As time goes on, I'd have to break it up even more
which would result in having to go even deeper.  There may be a way to
do it with linking or something but that could get interesting too. 
I've already got some divided up but even that causes issues sometimes. 

It's interesting, having all this data is neat but it can cause brain
teasers to sort through.  ;-)  I'm still considering that Raspberry Pi
thing tho.  Small, power efficient etc etc.  Lots of ideas.

Dale

:-)  :-) 
Re: Backup program that compresses data but only changes new files. [ In reply to ]
J. Roeleveld wrote:
> On Monday, August 15, 2022 9:07:41 PM CEST Dale wrote:
>> J. Roeleveld wrote:
>>> On Monday, August 15, 2022 12:44:11 AM CEST Dale wrote:
>>>> Howdy,
>>>>
>>>> With my new fiber internet, my poor disks are getting a work out, and
>>>> also filling up. First casualty, my backup disk. I have one directory
>>>> that is . . . well . . . huge. It's about 7TBs or so. This is where it
>>>> is right now and it's still trying to pack in files.
>>>>
>>>> /dev/mapper/8tb 7.3T 7.1T 201G 98% /mnt/8tb
>>> <snipped>
>>>
>>>> Thoughts? Ideas?
>>> Plenty, see below:
>>>
>>> For backups to external disks, I would recommend having a look at "dar" :
>>> $ eix -e dar
>>> * app-backup/dar
>>>
>>> Available versions: 2.7.6^t ~2.7.7^t {argon2 curl dar32 dar64 doc
>>> gcrypt
>>>
>>> gpg lz4 lzo nls rsync threads xattr}
>>>
>>> Homepage: http://dar.linux.free.fr/
>>> Description: A full featured backup tool, aimed for disks
>>>
>>> It's been around for a while and the developer is active and responds
>>> quite
>>> well to questions.
>>> It supports compression (different compression methods), incremental
>>> backups (only need a catalogue of the previous backup for the
>>> incremental) and encryption.
>>>
>>> The NAS options others mentioned would also work as they can compress data
>>> on disk and you'd only notice a delay in writing/reading (depending on
>>> the compression method used). I would recommend using one that uses ZFS
>>> on-disk as it's more reliable and robust then BTRFS.
>>>
>>> One option that comes available for you now that you are no longer limited
>>> to slow ADSL: Cloud backups.
>>>
>>> I use Backblaze (B2) to store compressed backups that haven't been stored
>>> on tape to off-site locations.
>>>
>>> But, you can also encrypt the backups locally and store the
>>> encrypted+compressed backupfiles on other cloud storage.
>>>
>>> --
>>> Joost
>> Dar does sound interesting. It sounds a lot like what I used way back
>> in the 90's. I'm sure it is different software but could work on
>> floppies then like it does on USB sticks etc today. Same principal.
> If it was during the 90's, then it wasn't. First version was released in 2002.
>
>> I looked into ZFS as well. Google helped me find a interesting page. I
>> notice it is also used on some NAS setups as well. It seems to be
>> advanced and maintained well. It sounds a little like LVM but may have
>> more features, such as compression maybe? I haven't read that far yet.
>> I notice it mentions snapshots which LVM also uses.
> ZFS does a lot more then just LVM+Ext4. But it really needs multiple disks for
> all the anti-corruption features as well.
>
>> Getting plenty of ideas. I just wish I had a separate building to put a
>> NAS in that would be safe and climate controlled. I got a out building
>> but it gets plenty hot in the summer. No A/C or anything. I only heat
>> it enough to prevent freezing but computers would likely like that anyway.
> If you can keep it between optimal temperatures (and stable) the NAS should
> manage. There is NO need to keep it at 18C (like some places do).
>
> Also, consider a small AC unit that only cools a small box big enough for the
> NAS. No need to cool an entire room.
>
> --
> Joost

If I built a NAS with a Raspberry Pi thing, it would be small enough
that I could just add a handle to it.  Then I could bring it in the
house, do backups and such and then store it in a out building away from
the house.  If I had a climate controlled outbuilding, I could either
run a ethernet cable or use wireless to do backups.  Of course, all this
costs money as does anything I need to do.  :/ 

I did check the price of a Raspberry Pi thingy a couple weeks ago. 
About $150 to $200 would likely get it off to a good start. It's not bad
really but that is a chunk of change.  I was actually looking into that
to run my VPN thing on.  Still thinking on that. 

Dale

:-)  :-) 
Re: Backup program that compresses data but only changes new files. [ In reply to ]
Am Mon, 15 Aug 2022 12:43:19 +0100
schrieb Michael <confabulate@kintzios.com>:

> Even if compression delivers some small space saving, given Dale's new
> faster Internet link and local video storage tendencies, compression
> will only kick the can down the road. If these are not private or rare
> videos and remain available on public streaming platforms, perhaps local
> storage is no longer necessary?

Compression for pre-compressed video data certainly does not achieve much
improvement. I just wanted to point out that restic meanwhile supports it
(as there might be other people reading this who are looking for backup
solutions in different situations).

Getting back to backing up publically available video data. I really can't
tell how much of a hassle it would be (in case of desaster) to get the data
back over the network, how long it would take, if that would be acceptable
etc. For distribution of content, e.g., netflix relies on big storage
boxes using zfs. This is quite reminiscent of the already suggested NAS
solution (and filesystems like zfs come with features like compression,
snapshots and incremental send/receive to faciliate backups).


cu
Gerrit