Mailing List Archive

1 2 3  View All
Re: [OT] SMR drives (WAS: cryptsetup close and device in use when it is not) [ In reply to ]
Am Thu, Jul 29, 2021 at 05:46:16PM +0100 schrieb Wols Lists:

> > Yea. First the SMR fiasco became public and then there was some other PR
> > stunt they did that I can’t remember right now, and I said “I can’t buy WD
> > anymore”. But there is no real alternative these days. And CMR drives are
> > becoming ever rarer, especially in the 2.5? realm. Except for one single
> > seagate model, there isn’t even a bare SATA drive above 2 TB available on
> > the market! Everything above that size is external USB stuff. And those
> > disks don’t come with standard SATA connectors anymore, but have the USB
> > socket soldered onto their PCB.
> >
> Are you talking 2.5" drives here?

I meant in general, but – as I said – “especially in the 2.5? realm”. ;-)
For 3.5?, it’s mostly the low-capacity drives that are affected. Probably
because here the ratio of fixed cost (case, electronics) vs. per-capacity
cost (platters, heads) is higher, so the pressure to reduce manufacturing
cost is also higher. High-capacity drives tend to remain CMR at the mo’.

> The SMR stunt was a real cock-up as far as raid was concerned - they
> moved their WD Red "ideal for raid and NAS" drives over to SMR and
> promptly started killing raid arrays left right and centre as people
> replaced drives ... you now need Red Pro so the advice for raid is just
> "Avoid WD".

Red Plus is fine, too. I think the “Plus” is marketing speak for non-SMR.
Which is why probably SMRs now have the price tag of old CMRs, and the new
CMRs have a “plus” on the price tag.

> From what I can make out with Seagate, the old Barracuda line is pretty
> much all CMR, they had just started making some of them SMR when the
> brown stuff hit the rotating blades.

Seagate made a statement that their NAS drives are not and never will be SMR.



In case someone is interested, here’s a little experience report:

Two days ago, I bought a 2.5? WD My Passport 4 TB for a new off-site backup
strategy I want to implement. They even killed the rubber feet on the
underside to save a few cents. >:'-( ) Interestingly, the even cheaper
elements series (which is the cheapest because it has no complimentary
sofware and no encryption or password feature) still has them. Probably
because its case design is older.

I just finished transferring my existing Borg backup repos. Right at the
beginning, I tested a small repo of 3 GiB and I got good throughput. After
around 2 GiB or so the drive went down to 10 MiB/s for a very long time
(writing at least another 3 GiB, I have no idea what that was).

I was already pondering my options. But once that was over, I’ve since been
writing 1,2 TiB to the drive with rsync happily without any glitches,
averaging at slightly above 100 MiB/s. I used SMR-friendly ext4 settings and
Borg uses datafiles of 500 MiB size, which greatly reduces sprinkled
metadata writes b/c it’s only a few thousand files instead of millions.

According to smartctl, the drive claims to support Trim, but so far I’ve
been unsuccessful to invoke it with fstrim. First I had to enable the
allow-discard option in the underlying LUKS container, which is disabled by
default for security reasons. But either I’m still missing a detail, or the
USB-SATA-bridge really does not support it. Or it does, but the kernel is
unaware: yesterday I read an article about enabling a flag for the USB
controller via a custom UDEV rule. Who knows.

--
Grüße | Greetings | Qapla’
Please do not share anything from, with or about me on any social network.

For a pessimist, the day has 24 bruises.
Re: [OT] SMR drives (WAS: cryptsetup close and device in use when it is not) [ In reply to ]
Am Thu, Jul 29, 2021 at 10:55:18PM +0200 schrieb Frank Steinmetzger:
> In case someone is interested, here’s a little experience report:
> […]
> I just finished transferring my existing Borg backup repos.
> […]
> I’ve since been writing 1,2 TiB to the drive with rsync happily without
> any glitches, averaging at slightly above 100 MiB/s.

Haha, and now I can’t unmount it (device is busy) despite of lsof reporting
no uses. So I did umount -l. That took care of the file system. But I can’t
close the LUKS container either: “device is still in use”. So we are back on
topic after all.

--
Grüße | Greetings | Qapla’
Please do not share anything from, with or about me on any social network.

Even years have an expiry date.
Re: [OT] SMR drives (WAS: cryptsetup close and device in use when it is not) [ In reply to ]
On 30/7/21 4:55 am, Frank Steinmetzger wrote:
> Am Thu, Jul 29, 2021 at 05:46:16PM +0100 schrieb Wols Lists:
>
>>> Yea. First the SMR fiasco became public and then there was some other PR
>>> stunt they did that I can’t remember right now, and I said “I can’t buy WD
>>> anymore”. But there is no real alternative these days. And CMR drives are
>>> becoming ever rarer, especially in the 2.5? realm. Except for one single
>>> seagate model, there isn’t even a bare SATA drive above 2 TB available on
>>> the market! Everything above that size is external USB stuff. And those
>>> disks don’t come with standard SATA connectors anymore, but have the USB
>>> socket soldered onto their PCB.
>>>
>> Are you talking 2.5" drives here?
> I meant in general, but – as I said – “especially in the 2.5? realm”. ;-)
> For 3.5?, it’s mostly the low-capacity drives that are affected. Probably
> because here the ratio of fixed cost (case, electronics) vs. per-capacity
> cost (platters, heads) is higher, so the pressure to reduce manufacturing
> cost is also higher. High-capacity drives tend to remain CMR at the mo’.
>
>> The SMR stunt was a real cock-up as far as raid was concerned - they
>> moved their WD Red "ideal for raid and NAS" drives over to SMR and
>> promptly started killing raid arrays left right and centre as people
>> replaced drives ... you now need Red Pro so the advice for raid is just
>> "Avoid WD".
> Red Plus is fine, too. I think the “Plus” is marketing speak for non-SMR.
> Which is why probably SMRs now have the price tag of old CMRs, and the new
> CMRs have a “plus” on the price tag.
>
>> From what I can make out with Seagate, the old Barracuda line is pretty
>> much all CMR, they had just started making some of them SMR when the
>> brown stuff hit the rotating blades.
> Seagate made a statement that their NAS drives are not and never will be SMR.
>
>
>
> In case someone is interested, here’s a little experience report:
>
> Two days ago, I bought a 2.5? WD My Passport 4 TB for a new off-site backup
> strategy I want to implement. They even killed the rubber feet on the
> underside to save a few cents. >:'-( ) Interestingly, the even cheaper
> elements series (which is the cheapest because it has no complimentary
> sofware and no encryption or password feature) still has them. Probably
> because its case design is older.
>
> I just finished transferring my existing Borg backup repos. Right at the
> beginning, I tested a small repo of 3 GiB and I got good throughput. After
> around 2 GiB or so the drive went down to 10 MiB/s for a very long time
> (writing at least another 3 GiB, I have no idea what that was).
>
> I was already pondering my options. But once that was over, I’ve since been
> writing 1,2 TiB to the drive with rsync happily without any glitches,
> averaging at slightly above 100 MiB/s. I used SMR-friendly ext4 settings and
> Borg uses datafiles of 500 MiB size, which greatly reduces sprinkled
> metadata writes b/c it’s only a few thousand files instead of millions.
>
> According to smartctl, the drive claims to support Trim, but so far I’ve
> been unsuccessful to invoke it with fstrim. First I had to enable the
> allow-discard option in the underlying LUKS container, which is disabled by
> default for security reasons. But either I’m still missing a detail, or the
> USB-SATA-bridge really does not support it. Or it does, but the kernel is
> unaware: yesterday I read an article about enabling a flag for the USB
> controller via a custom UDEV rule. Who knows.
>
I am using a seagate USB3 backup disk (4Tb SMR) for borgbackup on
btrfs.  Yes, it works well on regular backups, but its dismally slow 
for anything else (operations that read or write large amounts of data
at once):

1. Adding a lot of new data to a repo is extra slow

2. btrfs scrub (a couple of days)

3. borg repair (days!)

I had an unscheduled crash that lost a btrfs segment - scrub showed it
as an uncorrectable error so I deleted the file involved and borg repair
zeroed that part of the repo so its still ok. On a regular backup run
its fine, but if recovery time if you have an error is a real problem.

BillK
Re: [OT] SMR drives (WAS: cryptsetup close and device in use when it is not) [ In reply to ]
Am Thu, Jul 29, 2021 at 11:31:48PM +0200 schrieb Frank Steinmetzger:
> Am Thu, Jul 29, 2021 at 10:55:18PM +0200 schrieb Frank Steinmetzger:
> > In case someone is interested, here’s a little experience report:
> > […]
> > I just finished transferring my existing Borg backup repos.
> > […]
> > I’ve since been writing 1,2 TiB to the drive with rsync happily without
> > any glitches, averaging at slightly above 100 MiB/s.
>
> Haha, and now I can’t unmount it (device is busy) despite of lsof reporting
> no uses. So I did umount -l. That took care of the file system. But I can’t
> close the LUKS container either: “device is still in use”. So we are back on
> topic after all.

Riddle solved:
I was about to reboot the machine and for that I closed all tmux panes. That
revealed an mc instance put into the background. It didn’t show a directory
on the drive that I couldn’t unmount, but closing the program allowed me to
close the luks container.

--
Grüße | Greetings | Qapla’
Please do not share anything from, with or about me on any social network.

You shouldn’t always wind up clocks and kids;
you also have to let them run.
Re: [OT] SMR drives (WAS: cryptsetup close and device in use when it is not) [ In reply to ]
On Fri, Jul 30, 2021 at 1:14 AM William Kenworthy <billk@iinet.net.au> wrote:
>
> 2. btrfs scrub (a couple of days)
>

Was this a read-only scrub, or did this involve repair (such as after
losing a disk/etc)?

My understanding of SMR is that it is supposed to perform identically
to CMR for reads. If you've just recently done a bunch of writes I
could see there being some slowdown due to garbage collection (the
drive has a CMR cache which gets written out to the SMR regions), but
other than that I'd think that reads would perform normally.

Now, writes are a whole different matter and SMR is going to perform
terribly unless it is a host-managed drive (which the consumer drives
aren't), and the filesystem is SMR-aware. I'm not aware of anything
FOSS but in theory a log-based filesystem should do just fine on
host-managed SMR, or at least as well as it would do on CMR (log-based
filesystems tend to get fragmented, which is a problem on non-SSDs
unless the application isn't prone to fragmentation in the first
place, such as for logs).

Honestly I feel like the whole SMR thing is a missed opportunity,
mainly because manufacturers decided to use it as a way to save a few
bucks instead of as a new technology that can be embraced as long as
you understand its benefits and limitations. One thing I don't get is
why it is showing up on all sorts of smaller drives. I'd think the
main application would be for large drives - maybe a drive that is
14TB as CMR could have been formatted as 20TB as SMR for the same
price, and somebody could make that trade-off if it was worth it for
the application. Using it on smaller drives where are more likely to
be general-purpose is just going to cause issues for consumers who
have no idea what they're getting into, particularly since the changes
were sneaked into the product line. Somebody really needs to lose
their job over this...

--
Rich
Re: [OT] SMR drives (WAS: cryptsetup close and device in use when it is not) [ In reply to ]
On 30/07/2021 15:29, Rich Freeman wrote:
> Honestly I feel like the whole SMR thing is a missed opportunity,
> mainly because manufacturers decided to use it as a way to save a few
> bucks instead of as a new technology that can be embraced as long as
> you understand its benefits and limitations. One thing I don't get is
> why it is showing up on all sorts of smaller drives.

It's showing up on desktops because - PROVIDED THE CACHE IS LARGER THAN
THE AMOUNT OF DOCUMENTS IS ABLE TO GET THROUGH - it performs just fine.
So if you're using a pre-installed OS, and you mostly use your computer
to surf the web, watch youtube videos, etc, you're not going to stress
said cache.

Then Windows decides to download a major update and response times go
pear-shaped ...

Basically, so long as you don't fill the cache, your typical luser is
unlikely to notice. The snag is people like us are much more likely to
do things that put the cache under i/o pressure, and it's like a
motorway - once the traffic goes above the carrying capacity, throughput
COLLAPSES.

Cheers,
Wol
Re: [OT] SMR drives (WAS: cryptsetup close and device in use when it is not) [ In reply to ]
On Fri, Jul 30, 2021 at 12:50 PM antlists <antlists@youngman.org.uk> wrote:
>
> On 30/07/2021 15:29, Rich Freeman wrote:
> > Honestly I feel like the whole SMR thing is a missed opportunity,
> > mainly because manufacturers decided to use it as a way to save a few
> > bucks instead of as a new technology that can be embraced as long as
> > you understand its benefits and limitations. One thing I don't get is
> > why it is showing up on all sorts of smaller drives.
>
> It's showing up on desktops because - PROVIDED THE CACHE IS LARGER THAN
> THE AMOUNT OF DOCUMENTS IS ABLE TO GET THROUGH - it performs just fine.
> So if you're using a pre-installed OS, and you mostly use your computer
> to surf the web, watch youtube videos, etc, you're not going to stress
> said cache.

Well, in such a configuration an NVMe or even a SATA SSD would be FAR
more desirable.

Though, I guess an issue is consumers who buy for the numbers on the
specs who don't know better. 4TB is better than 0.25TB, so SMR it
is...

--
Rich
Re: [OT] SMR drives (WAS: cryptsetup close and device in use when it is not) [ In reply to ]
On 30/7/21 10:29 pm, Rich Freeman wrote:
> On Fri, Jul 30, 2021 at 1:14 AM William Kenworthy <billk@iinet.net.au> wrote:
>> 2. btrfs scrub (a couple of days)
>>
> Was this a read-only scrub, or did this involve repair (such as after
> losing a disk/etc)?
>
> My understanding of SMR is that it is supposed to perform identically
> to CMR for reads. If you've just recently done a bunch of writes I
> could see there being some slowdown due to garbage collection (the
> drive has a CMR cache which gets written out to the SMR regions), but
> other than that I'd think that reads would perform normally.
>
> Now, writes are a whole different matter and SMR is going to perform
> terribly unless it is a host-managed drive (which the consumer drives
> aren't), and the filesystem is SMR-aware. I'm not aware of anything
> FOSS but in theory a log-based filesystem should do just fine on
> host-managed SMR, or at least as well as it would do on CMR (log-based
> filesystems tend to get fragmented, which is a problem on non-SSDs
> unless the application isn't prone to fragmentation in the first
> place, such as for logs).
>
> Honestly I feel like the whole SMR thing is a missed opportunity,
> mainly because manufacturers decided to use it as a way to save a few
> bucks instead of as a new technology that can be embraced as long as
> you understand its benefits and limitations. One thing I don't get is
> why it is showing up on all sorts of smaller drives. I'd think the
> main application would be for large drives - maybe a drive that is
> 14TB as CMR could have been formatted as 20TB as SMR for the same
> price, and somebody could make that trade-off if it was worth it for
> the application. Using it on smaller drives where are more likely to
> be general-purpose is just going to cause issues for consumers who
> have no idea what they're getting into, particularly since the changes
> were sneaked into the product line. Somebody really needs to lose
> their job over this...
>
No, it was a normal scrub (read only is an option) - I did the scrub
hoping it wasn't necessary but aware that crash halting the OS while
doing a backup while the system was generating ooops after an upgrade
wasn't going to guarantee a clean shutdown. Ive just kicked off a scrub
-r and am getting 41Mb/s speed via the status check (its a usb3 on the
disk side, and usb2 on the PC - configuration: driver=usb-storage
maxpower=30mA speed=480Mbit/s). I will monitor for a couple of hours and
see what happens then swap to a standard scrub and compare the read rate.

rattus ~ # date && btrfs scrub status
/run/media/wdk/cae17311-19ca-4e3c-b476-304e02c50694
Sat 31 Jul 2021 10:55:43 AWST
UUID:             cae17311-19ca-4e3c-b476-304e02c50694
Scrub started:    Sat Jul 31 10:52:07 2021
Status:           running
Duration:         0:03:35
Time left:        22:30:40
ETA:              Sun Aug  1 09:26:23 2021
Total to scrub:   3.23TiB
Bytes scrubbed:   8.75GiB  (0.26%)
Rate:             41.69MiB/s

Error summary:    no errors found


lsusb: Bus 003 Device 007: ID 0bc2:331a Seagate RSS LLC Desktop HDD 5TB
(ST5000DM000)

(seagate lists it as a 5Tb drive managed SMR)

It was sold as a USB3 4Tb desktop expansion drive, fdisk -l shows "Disk
/dev/sde: 3.64 TiB, 4000787029504 bytes, 7814037167 sectors" and Seagate
is calling it 5Tb - marketing!

BillK
Re: [OT] SMR drives (WAS: cryptsetup close and device in use when it is not) [ In reply to ]
On 31/07/21 04:14, William Kenworthy wrote:
> (seagate lists it as a 5Tb drive managed SMR)
>
> It was sold as a USB3 4Tb desktop expansion drive, fdisk -l shows "Disk
> /dev/sde: 3.64 TiB, 4000787029504 bytes, 7814037167 sectors" and Seagate
> is calling it 5Tb - marketing!

Note that it's now official, TB is decimal and TiB is binary, so a 4TB
drive being 3.64TiB makes sense. TB is 10e9, while TiB is 2e30.

btw, you're scrubbing over USB? Are you running a raid over USB? Bad
things are likely to happen ...

Cheers,
Wol
Re: [OT] SMR drives (WAS: cryptsetup close and device in use when it is not) [ In reply to ]
On 31/7/21 11:50 am, Wols Lists wrote:
> On 31/07/21 04:14, William Kenworthy wrote:
>> (seagate lists it as a 5Tb drive managed SMR)
>>
>> It was sold as a USB3 4Tb desktop expansion drive, fdisk -l shows "Disk
>> /dev/sde: 3.64 TiB, 4000787029504 bytes, 7814037167 sectors" and Seagate
>> is calling it 5Tb - marketing!
> Note that it's now official, TB is decimal and TiB is binary, so a 4TB
> drive being 3.64TiB makes sense. TB is 10e9, while TiB is 2e30.
>
> btw, you're scrubbing over USB? Are you running a raid over USB? Bad
> things are likely to happen ...
>
> Cheers,
> Wol
>
I am amused in a cynical way at disk manufacturers using decimal values ...

Its not raid, just a btrfs single on disk (no partition).  Contains a
single borgbackup repo for an offline backup of all the online
borgbackup repo's I have for a 3 times a day backup rota of individual
machines/data stores - I get an insane amount of de-duplication that way
for a slight decrease in conveniance!

BillK
Re: [OT] SMR drives (WAS: cryptsetup close and device in use when it is not) [ In reply to ]
On 31/7/21 11:14 am, William Kenworthy wrote:
> On 30/7/21 10:29 pm, Rich Freeman wrote:
>> On Fri, Jul 30, 2021 at 1:14 AM William Kenworthy <billk@iinet.net.au> wrote:
>>> 2. btrfs scrub (a couple of days)
>>>
>> Was this a read-only scrub, or did this involve repair (such as after
>> losing a disk/etc)?
>>
>> My understanding of SMR is that it is supposed to perform identically
>> to CMR for reads. If you've just recently done a bunch of writes I
>> could see there being some slowdown due to garbage collection (the
>> drive has a CMR cache which gets written out to the SMR regions), but
>> other than that I'd think that reads would perform normally.
>>
>> Now, writes are a whole different matter and SMR is going to perform
>> terribly unless it is a host-managed drive (which the consumer drives
>> aren't), and the filesystem is SMR-aware. I'm not aware of anything
>> FOSS but in theory a log-based filesystem should do just fine on
>> host-managed SMR, or at least as well as it would do on CMR (log-based
>> filesystems tend to get fragmented, which is a problem on non-SSDs
>> unless the application isn't prone to fragmentation in the first
>> place, such as for logs).
>>
>> Honestly I feel like the whole SMR thing is a missed opportunity,
>> mainly because manufacturers decided to use it as a way to save a few
>> bucks instead of as a new technology that can be embraced as long as
>> you understand its benefits and limitations. One thing I don't get is
>> why it is showing up on all sorts of smaller drives. I'd think the
>> main application would be for large drives - maybe a drive that is
>> 14TB as CMR could have been formatted as 20TB as SMR for the same
>> price, and somebody could make that trade-off if it was worth it for
>> the application. Using it on smaller drives where are more likely to
>> be general-purpose is just going to cause issues for consumers who
>> have no idea what they're getting into, particularly since the changes
>> were sneaked into the product line. Somebody really needs to lose
>> their job over this...
>>
> No, it was a normal scrub (read only is an option) - I did the scrub
> hoping it wasn't necessary but aware that crash halting the OS while
> doing a backup while the system was generating ooops after an upgrade
> wasn't going to guarantee a clean shutdown. Ive just kicked off a scrub
> -r and am getting 41Mb/s speed via the status check (its a usb3 on the
> disk side, and usb2 on the PC - configuration: driver=usb-storage
> maxpower=30mA speed=480Mbit/s). I will monitor for a couple of hours and
> see what happens then swap to a standard scrub and compare the read rate.
>
> rattus ~ # date && btrfs scrub status
> /run/media/wdk/cae17311-19ca-4e3c-b476-304e02c50694
> Sat 31 Jul 2021 10:55:43 AWST
> UUID:             cae17311-19ca-4e3c-b476-304e02c50694
> Scrub started:    Sat Jul 31 10:52:07 2021
> Status:           running
> Duration:         0:03:35
> Time left:        22:30:40
> ETA:              Sun Aug  1 09:26:23 2021
> Total to scrub:   3.23TiB
> Bytes scrubbed:   8.75GiB  (0.26%)
> Rate:             41.69MiB/s
>
> Error summary:    no errors found
>
>
> lsusb: Bus 003 Device 007: ID 0bc2:331a Seagate RSS LLC Desktop HDD 5TB
> (ST5000DM000)
>
> (seagate lists it as a 5Tb drive managed SMR)
>
> It was sold as a USB3 4Tb desktop expansion drive, fdisk -l shows "Disk
> /dev/sde: 3.64 TiB, 4000787029504 bytes, 7814037167 sectors" and Seagate
> is calling it 5Tb - marketing!
>
> BillK
>
>
>
>
Still almost same scrub speed and 22.5 hours after running for nearly 2
1/2 hours.

rattus ~ # btrfs scrub status
/run/media/wdk/cae17311-19ca-4e3c-b476-304e02c50694

UUID:             cae17311-19ca-4e3c-b476-304e02c50694
Scrub started:    Sat Jul 31 10:52:07 2021
Status:           running
Duration:         2:22:44
Time left:        20:04:49
ETA:              Sun Aug  1 09:19:43 2021
Total to scrub:   3.23TiB
Bytes scrubbed:   350.41GiB  (10.59%)
Rate:             41.90MiB/s
Error summary:    no errors found
rattus ~ #


Cancelled and restarted it as a normal scrub - same speed/timings - I
think if errors are found, thats when it would slow down.

rattus ~ # btrfs scrub status
/run/media/wdk/cae17311-19ca-4e3c-b476-304e02c50694
UUID:             cae17311-19ca-4e3c-b476-304e02c50694
Scrub started:    Sat Jul 31 13:18:51 2021
Status:           running
Duration:         0:00:05
Time left:        22:27:47
ETA:              Sun Aug  1 11:46:43 2021
Total to scrub:   3.23TiB
Bytes scrubbed:   209.45MiB  (0.01%)
Rate:             41.89MiB/s
Error summary:    no errors found
rattus ~ #

BillK
Re: [OT] SMR drives (WAS: cryptsetup close and device in use when it is not) [ In reply to ]
On Sat, Jul 31, 2021 at 12:58 AM William Kenworthy <billk@iinet.net.au> wrote:
>
> I am amused in a cynical way at disk manufacturers using decimal values ...
>

So, the disk manufacturers obviously have marketing motivations.
However, IMO the programming community would be well-served to just
join basically every other profession/industry on the planet and use
the SI units. If you want to use GiB to measure things by all means
do so, but at least stick the "i" in your output.

You're not going to change ANYTHING by using SI decimal prefixes to
refer to base-2 units. Everybody on the planet who isn't a programmer
is already using SI prefixes, recognizes SI as the authoritative
standards body, and most of the governments on the planet probably
have the SI prefixes enacted as a matter of law. No court on the
planet is going to recognize even the most accomplished computer
scientists on the planet as speaking with authority on this matter.

All sticking to the old prefix meanings does is confuse people,
because when you say "GB" nobody knows what you mean.

Plus it creates other kinds of confusion. Suppose you're measuring
recording densities in KB/mm^2. Under SI prefixes 1KB/mm^2 equals
1MB/m^2, and that is why basically every engineer/scientist/etc on the
planet loves the metric system. If you're going to use base-2 units
for bytes and base-10 for meters, now you have all sorts of conversion
headaches. The base-2 system only makes sense if you never combine
bytes with any other unit. I get that programming tends to be a bit
isolated from engineering and so we like to pretend that never
happens, but in EVERY other discipline units of measure tend to be
combined all the time, and it certainly happens in engineering real
computers that don't use infinitely long tapes and only exist in CS
textbooks. :)

Just to combine replies: by "read-only" scrubbing I wasn't referring
to using "-r" but rather just that the scrub wasn't repairing
anything. A scrub operation will repair problems it finds
automatically, and that would of course take a huge hit on SMR. I'd
expect a scrub that doesn't encounter problems to perform similarly on
CMR/SMR, and something that does a ton of repair to perform terribly
on SMR.

Your numbers suggest that the SMR drive is fine for scrubbing without
errors (and if you have no mirroring/parity of data then you can't do
repairs anyway). I'm guessing the drive was just busy while
scrubbing, and obviously a busy spinning disk isn't going to scrub
very quickly (that might be tunable, but if you prioritize scrubbing
then regular IO will tank - typically you want scrubbing at idle
priority).

--
Rich
Re: [OT] SMR drives (WAS: cryptsetup close and device in use when it is not) [ In reply to ]
On Fri, Jul 30, 2021 at 11:50 PM Wols Lists <antlists@youngman.org.uk> wrote:
>
> btw, you're scrubbing over USB? Are you running a raid over USB? Bad
> things are likely to happen ...

So, USB hosts vary in quality I'm sure, but I've been running USB3
drives on lizardfs for a while now with zero issues.

At first I was shucking them and using LSI HBAs. That was a pain for
a bunch of reasons, and I would have issues probably due to the HBAs
being old or maybe cheap cable issues (and new SAS hardware carries a
hefty price tag).

Then I decided to just try running a drive on USB3 and it worked fine.
This isn't for heavy use, but it basically performs identically to
SATA. I did the math and for spinning disks you can get 2 drives per
host before the data rate starts to become a concern. This is for a
distributed filesystem and I'm just using gigabit ethernet, and the
cluster is needed more for capacity than IOPS, so USB3 isn't the
bottleneck anyway.

I have yet to have a USB drive have any sort of issue, or drop a
connection. And they're running on cheap Pi4s for the most part
(which have two USB3 hosts). If for some reason a drive or host
dropped the filesystem is redundant at the host level, and it also
gracefully recovers data if a host shows back up, but I have yet to
see that even happen due to a USB issue. I've had far more issues
when I was trying to use LSI HBAs on RockPro64 SBCs (which have a PCIe
slot - I had to also use a powered riser).

Now, if you want to do something where you're going to be pulling
closer to max bandwidth out of all your disks at once and you have
more than a few disks and you have it on 10GbE or faster, then USB3
could be a bottleneck unless you have a lot of hosts (though even then
adding USB3 hosts to the motherboard might not be any harder than
adding SATA hosts).

--
Rich
Re: [OT] SMR drives (WAS: cryptsetup close and device in use when it is not) [ In reply to ]
On 31/7/21 8:21 pm, Rich Freeman wrote:
> On Fri, Jul 30, 2021 at 11:50 PM Wols Lists <antlists@youngman.org.uk> wrote:
>> btw, you're scrubbing over USB? Are you running a raid over USB? Bad
>> things are likely to happen ...
> So, USB hosts vary in quality I'm sure, but I've been running USB3
> drives on lizardfs for a while now with zero issues.
>
> At first I was shucking them and using LSI HBAs. That was a pain for
> a bunch of reasons, and I would have issues probably due to the HBAs
> being old or maybe cheap cable issues (and new SAS hardware carries a
> hefty price tag).
>
> Then I decided to just try running a drive on USB3 and it worked fine.
> This isn't for heavy use, but it basically performs identically to
> SATA. I did the math and for spinning disks you can get 2 drives per
> host before the data rate starts to become a concern. This is for a
> distributed filesystem and I'm just using gigabit ethernet, and the
> cluster is needed more for capacity than IOPS, so USB3 isn't the
> bottleneck anyway.
>
> I have yet to have a USB drive have any sort of issue, or drop a
> connection. And they're running on cheap Pi4s for the most part
> (which have two USB3 hosts). If for some reason a drive or host
> dropped the filesystem is redundant at the host level, and it also
> gracefully recovers data if a host shows back up, but I have yet to
> see that even happen due to a USB issue. I've had far more issues
> when I was trying to use LSI HBAs on RockPro64 SBCs (which have a PCIe
> slot - I had to also use a powered riser).
>
> Now, if you want to do something where you're going to be pulling
> closer to max bandwidth out of all your disks at once and you have
> more than a few disks and you have it on 10GbE or faster, then USB3
> could be a bottleneck unless you have a lot of hosts (though even then
> adding USB3 hosts to the motherboard might not be any harder than
> adding SATA hosts).
>
I'll generally agree with your USB3 comments - besides the backup disk I
am running moosefs on 5 odroid HC2's (one old WD red or green on each,
the HC2 is a 32 bit BIG.little arm system and uses a built in USB sata
connection - excellent on a 5.12 kernel, just ok on 4.x series) and an
Odroid C4 (arm64) with 2 asmedia USB3 adaptors from ebay - the adaptors
are crap but do work somewhat with the right tweaks! and a single sata
ssd on the master (intel).  I tried using moosefs with a rpi3B in the
mix and it didn't go well once I started adding data - rpi 4's were not
available when I set it up.  I think that SMR disks will work quite well
on moosefs or lizardfs - I don't see long continuous writes to one disk
but a random distribution of writes across the cluster with gaps between
on each disk (1G network).

With a good adaptor, USB3 is great ... otherwise its been quite
frustrating :(  I do suspect linux and its pedantic correctness trying
to deal with hardware that isn't truly standardised (as in the
manufacturer probably supplies a windows driver that covers it up) is
part of the problem.  These adaptors are quite common and I needed to
apply the ATA command filter and turn off UAS using the usb tweaks
mechanism to stop the crashes and data corruption.  The comments in the
kernel driver code for these adaptors are illuminating!

BillK
Re: [OT] SMR drives (WAS: cryptsetup close and device in use when it is not) [ In reply to ]
On Sat, Jul 31, 2021 at 8:59 AM William Kenworthy <billk@iinet.net.au> wrote:
>
> I tried using moosefs with a rpi3B in the
> mix and it didn't go well once I started adding data - rpi 4's were not
> available when I set it up.

Pi2/3s only have USB2 as far as I'm aware, and they stick the ethernet
port on that USB bus besides. So, they're terrible for anything that
involves IO of any kind.

The Pi4 moves the ethernet off of USB, upgrades it to gigabit, and has
two USB3 hosts, so this is just all-around a missive improvement.
Obviously it isn't going to outclass some server-grade system with a
gazillion PCIe v4 lanes but it is very good for an SBC and the price.

I'd love server-grade ARM hardware but it is just so expensive unless
there is some source out there I'm not aware of. It is crazy that you
can't get more than 4-8GiB of RAM on an affordable arm system.

> I think that SMR disks will work quite well
> on moosefs or lizardfs - I don't see long continuous writes to one disk
> but a random distribution of writes across the cluster with gaps between
> on each disk (1G network).

So, the distributed filesystems divide all IO (including writes)
across all the drives in the cluster. When you have a number of
drives that obviously increases the total amount of IO you can handle
before the SMR drives start hitting the wall. Writing 25GB of data to
a single SMR drive will probably overrun its CMR cache, but if you
split it across 10 drives and write 2.5GB each, there is a decent
chance they'll all have room in the cache, take the write quickly, and
then as long as your writes aren't sustained they can clear the
buffer.

I think you're still going to have an issue in a rebalancing scenario
unless you're adding many drives at once so that the network becomes
rate-limiting instead of the disks. Having unreplicated data sitting
around for days or weeks due to slow replication performance is
setting yourself up for multiple failures. So, I'd still stay away
from them.

If you have 10GbE then your ability to overrun those disks goes way
up. Ditto if you're running something like Ceph which can achieve
higher performance. I'm just doing bulk storage where I care a lot
more about capacity than performance. If I were trying to run a k8s
cluster or something I'd be on Ceph on SSD or whatever.

> With a good adaptor, USB3 is great ... otherwise its been quite
> frustrating :( I do suspect linux and its pedantic correctness trying
> to deal with hardware that isn't truly standardised (as in the
> manufacturer probably supplies a windows driver that covers it up) is
> part of the problem. These adaptors are quite common and I needed to
> apply the ATA command filter and turn off UAS using the usb tweaks
> mechanism to stop the crashes and data corruption. The comments in the
> kernel driver code for these adaptors are illuminating!

Sometimes I wonder. I occasionally get errors in dmesg about
unaligned writes when using zfs. Others have seen these. The zfs
developers seem convinced that the issue isn't with zfs but it simply
is reporting the issue, or maybe it happens under loads that you're
more likely to get with zfs scrubbing (which IMO performs far worse
than with btrfs - I'm guessing it isn't optimized to scan physically
sequentially on each disk but may be doing it in a more logical order
and synchronously between mirror pairs). Sometimes I wonder if there
is just some sort of bug in the HBA drivers, or maybe the hardware on
the motherboard. Consumer PC hardware (like all PC hardware) is
basically a black box unless you have pretty sophisticated testing
equipment and knowledge, so if your SATA host is messing things up how
would you know?

--
Rich
Re: [OT] SMR drives (WAS: cryptsetup close and device in use when it is not) [ In reply to ]
On 31/07/2021 05:58, William Kenworthy wrote:
> Its not raid, just a btrfs single on disk (no partition).  Contains a
> single borgbackup repo for an offline backup of all the online
> borgbackup repo's I have for a 3 times a day backup rota of individual
> machines/data stores - I get an insane amount of de-duplication that way
> for a slight decrease in conveniance!

So - are you using btrfs's replication ability to push a backup? Does it
just push the changed blocks?

I'm planning to pull a similar stunt - I've got my eye on an 8TB Toshiba
N300, which I shall put lvm on, and then (my filesystems are all ext4)
do a snapshot and in-place rsync.

Given that my entire dataset is about 2.5TB (including films, photos etc
that don't change), again, I've got a massive amount of space to hold
backups for ages ...

Cheers,
Wol
Re: [OT] SMR drives (WAS: cryptsetup close and device in use when it is not) [ In reply to ]
Am Sat, Jul 31, 2021 at 08:12:40AM -0400 schrieb Rich Freeman:

> Plus it creates other kinds of confusion. Suppose you're measuring
> recording densities in KB/mm^2. Under SI prefixes 1KB/mm^2 equals
> 1MB/m^2

*Cough* actually, 1 GB/m^2
;-)

--
Grüße | Greetings | Qapla’
Please do not share anything from, with or about me on any social network.

Rather to meditate than sit around.
Re: [OT] SMR drives (WAS: cryptsetup close and device in use when it is not) [ In reply to ]
Am Sat, Jul 31, 2021 at 12:58:29PM +0800 schrieb William Kenworthy:

> Its not raid, just a btrfs single on disk (no partition).  Contains a
> single borgbackup repo for an offline backup of all the online
> borgbackup repo's I have for a 3 times a day backup rota of individual
> machines/data stores

So you are borg’ing a repo into a repo? I am planning on simply rsync’ing
the borg directory from one external HDD to another. Hopefully SMR can cope
with this adequatly.

And you are storing several machines into a single repo? The docs say this
is not supported officially. But I have one repo each for /, /home and data
for both my PC and laptop. Using a wrapper script, I create snapshots that
are named $HOSTNAME_$DATE in each repo.

> - I get an insane amount of de-duplication that way for a slight decrease
> in conveniance!

And thanks to the cache, a new snapshots usually is done very fast. But for
a yet unknown reason, sometimes Borg re-hashes all files, even though I
didn’t touch the cache. In that case it takes 2½ hours to go through my
video directory.

--
Grüße | Greetings | Qapla’
Please do not share anything from, with or about me on any social network.

“If a ship with 1000 investment bankers sinks in heavy seas, it is a
tragedy. If only one of them can swim, it is a disaster.” – Urban Priol
Re: [OT] SMR drives (WAS: cryptsetup close and device in use when it is not) [ In reply to ]
On Sat, Jul 31, 2021 at 8:41 PM Frank Steinmetzger <Warp_7@gmx.de> wrote:
>
> Am Sat, Jul 31, 2021 at 08:12:40AM -0400 schrieb Rich Freeman:
>
> > Plus it creates other kinds of confusion. Suppose you're measuring
> > recording densities in KB/mm^2. Under SI prefixes 1KB/mm^2 equals
> > 1MB/m^2
>
> *Cough* actually, 1 GB/m^2

Ugh, this is why I always did bad on easy tests - I never check my
work. Indeed. I knew there was an 1E6 factor in there but maybe I
forgot I was already starting with 1E3...

--
Rich
Re: [OT] SMR drives (WAS: cryptsetup close and device in use when it is not) [ In reply to ]
On 31/7/21 9:30 pm, Rich Freeman wrote:
> On Sat, Jul 31, 2021 at 8:59 AM William Kenworthy <billk@iinet.net.au> wrote:
>> I tried using moosefs with a rpi3B in the
>> mix and it didn't go well once I started adding data - rpi 4's were not
>> available when I set it up.
> Pi2/3s only have USB2 as far as I'm aware, and they stick the ethernet
> port on that USB bus besides. So, they're terrible for anything that
> involves IO of any kind.
>
> The Pi4 moves the ethernet off of USB, upgrades it to gigabit, and has
> two USB3 hosts, so this is just all-around a missive improvement.
> Obviously it isn't going to outclass some server-grade system with a
> gazillion PCIe v4 lanes but it is very good for an SBC and the price.
>
> I'd love server-grade ARM hardware but it is just so expensive unless
> there is some source out there I'm not aware of. It is crazy that you
> can't get more than 4-8GiB of RAM on an affordable arm system.
Checkout the odroid range.  Same or only slightly $$$ more for a much
better unit than a pi (except for the availability of 8G ram on the pi4)
None of the pi's I have had have come close though I do not have a pi4
and that looks from reading to be much closer in performance.  The
Odroid sites includes comparison charts of odroid aganst the rpi and it
also shows it getting closer in performance.  There are a few other
companies out there too.  I am hoping the popularity of the pi 8G will
push others to match it. I found the supplied 4.9 or 4.14 kernels
problematic with random crashes, espicially if usb was involved.  I am
currently using the 5.12 tobetter kernels and aarch64 or arm32 bit
gentoo userlands.
>
>> I think that SMR disks will work quite well
>> on moosefs or lizardfs - I don't see long continuous writes to one disk
>> but a random distribution of writes across the cluster with gaps between
>> on each disk (1G network).
> So, the distributed filesystems divide all IO (including writes)
> across all the drives in the cluster. When you have a number of
> drives that obviously increases the total amount of IO you can handle
> before the SMR drives start hitting the wall. Writing 25GB of data to
> a single SMR drive will probably overrun its CMR cache, but if you
> split it across 10 drives and write 2.5GB each, there is a decent
> chance they'll all have room in the cache, take the write quickly, and
> then as long as your writes aren't sustained they can clear the
> buffer.
Not strictly what I am seeing.  You request a file from MFS and the
first first free chunkserver with the data replies.  Writing is similar
in that (depending on the creation arguments) a chunk is written
wherever responds fastest then replicated.  Replication is done under
control of an algorithm that replicates a set number of chunks at a time
between a limited number of chunkservers in a stream depending on
replication status.  So I am seeing individual disk activity that is
busy for a few seconds, and then nothing for a short period - this
pattern has become more pronounced as I added chunkservers and would
seem to match the SMR requirements.  If I replace/rebuild (resilver) a
chunkserver, that one is a lot busier, but still not 100% continuous
write or read.  Moosefs uses a throttled replication methodology.  This
is with 7 chunkservers and 9 disks - more is definitely better for
performance.
> I think you're still going to have an issue in a rebalancing scenario
> unless you're adding many drives at once so that the network becomes
> rate-limiting instead of the disks. Having unreplicated data sitting
> around for days or weeks due to slow replication performance is
> setting yourself up for multiple failures. So, I'd still stay away
> from them.
I think at some point I am going to have to add an SMR disk and see what
happens - cant do it now though.
>
> If you have 10GbE then your ability to overrun those disks goes way
> up. Ditto if you're running something like Ceph which can achieve
> higher performance. I'm just doing bulk storage where I care a lot
> more about capacity than performance. If I were trying to run a k8s
> cluster or something I'd be on Ceph on SSD or whatever.
Tried ceph - run away fast :) I have a lot of nearly static data but
also a number of lxc instances (running on an Odroid N2) with both the
LXC instance and data stored on the cluster.  These include email,
calendaring, dns, webservers etc. all work well.  The online borgbackup
repos are also stored on it. Limitations on community moosefs is the
single point of failure that is the master plus the memory resource
requirements on the master.  I improved performance and master memory
requirements considerably by pushing the larger data sets (e.g., Gib of
mail files) into a container file stored on MFS and loop mounted onto
the mailserver lxc instance.  Convoluted but very happy with the
improvement its made.
>> With a good adaptor, USB3 is great ... otherwise its been quite
>> frustrating :( I do suspect linux and its pedantic correctness trying
>> to deal with hardware that isn't truly standardised (as in the
>> manufacturer probably supplies a windows driver that covers it up) is
>> part of the problem. These adaptors are quite common and I needed to
>> apply the ATA command filter and turn off UAS using the usb tweaks
>> mechanism to stop the crashes and data corruption. The comments in the
>> kernel driver code for these adaptors are illuminating!
> Sometimes I wonder. I occasionally get errors in dmesg about
> unaligned writes when using zfs. Others have seen these. The zfs
> developers seem convinced that the issue isn't with zfs but it simply
> is reporting the issue, or maybe it happens under loads that you're
> more likely to get with zfs scrubbing (which IMO performs far worse
> than with btrfs - I'm guessing it isn't optimized to scan physically
> sequentially on each disk but may be doing it in a more logical order
> and synchronously between mirror pairs). Sometimes I wonder if there
> is just some sort of bug in the HBA drivers, or maybe the hardware on
> the motherboard. Consumer PC hardware (like all PC hardware) is
> basically a black box unless you have pretty sophisticated testing
> equipment and knowledge, so if your SATA host is messing things up how
> would you know?
>
BillK
Re: [OT] SMR drives (WAS: cryptsetup close and device in use when it is not) [ In reply to ]
On 1/8/21 8:50 am, Frank Steinmetzger wrote:
> Am Sat, Jul 31, 2021 at 12:58:29PM +0800 schrieb William Kenworthy:
>
>> Its not raid, just a btrfs single on disk (no partition).  Contains a
>> single borgbackup repo for an offline backup of all the online
>> borgbackup repo's I have for a 3 times a day backup rota of individual
>> machines/data stores
> So you are borg’ing a repo into a repo? I am planning on simply rsync’ing
> the borg directory from one external HDD to another. Hopefully SMR can cope
> with this adequatly.
>
> And you are storing several machines into a single repo? The docs say this
> is not supported officially. But I have one repo each for /, /home and data
> for both my PC and laptop. Using a wrapper script, I create snapshots that
> are named $HOSTNAME_$DATE in each repo.

Basicly yes: I use a once per hour snapshot of approximately 500Gib of
data on moosefs, plus borgbackups 3 times a day to individual repos on
moosefs for each host.  3 times a day, the latest snapshot is stuffed
into a borg repo on moosefs and the old  snapshots are deleted.  I
currently manually push all the repos into a borg repo on the USB3 SMR
drive once a day or so.

1. rsync (and cp etc.) are dismally slow on SMR - use where you have to,
avoid otherwise.

2. borgbackup with small updates goes very quick

3. borgbackup often to keep changes between updates small - time to
backup will stay short.

4. borg'ing a repo into a repo works extreemly well - however there are
catches based around backup set names and the file change tests used.
(ping me if you want the details)

5. Yes, I have had disasters (i.e., a poorly thought out rm -rf in a
moosefs directory, unstable power that took awhile to cure, ...)
requiring underfire restoration of both large and small datasets - it works!

6. Be careful of snapshot resources on moosefs - moosefs has a defined
amount of memory for each file stored.  Even with the lazy snapshot
method, taking a snapshot will about double the memory usage on the
master for that portion of the filesystem.  Also taking too many
snapshots multiplies the effect.  Once you go into swap, it becomes a
recovery effort.  Also keep in mind that trashtime is carried into the
snapshot so the data may exist in trash even after deletion - its
actually easy to create a DOS condition by not paying attention to this.

BillK


>
>> - I get an insane amount of de-duplication that way for a slight decrease
>> in conveniance!
> And thanks to the cache, a new snapshots usually is done very fast. But for
> a yet unknown reason, sometimes Borg re-hashes all files, even though I
> didn’t touch the cache. In that case it takes 2½ hours to go through my
> video directory.
>
Re: [OT] SMR drives (WAS: cryptsetup close and device in use when it is not) [ In reply to ]
On 1/8/21 8:50 am, Frank Steinmetzger wrote:
> Am Sat, Jul 31, 2021 at 12:58:29PM +0800 schrieb William Kenworthy:
>
> ...
> And thanks to the cache, a new snapshots usually is done very fast. But for
> a yet unknown reason, sometimes Borg re-hashes all files, even though I
> didn’t touch the cache. In that case it takes 2½ hours to go through my
> video directory.
>
Borg will do that as an extra method of ensuring its not missed any
changes.  I think the default is every 26 times it visits a file so its
a big hit the first time it starts but semi-randomises over time, it can
be set or disabled via an environment variable.

BillK
Re: [OT] SMR drives (WAS: cryptsetup close and device in use when it is not) [ In reply to ]
On 1/8/21 11:36 am, William Kenworthy wrote:
> On 1/8/21 8:50 am, Frank Steinmetzger wrote:
>> Am Sat, Jul 31, 2021 at 12:58:29PM +0800 schrieb William Kenworthy:
>>
>>> Its not raid, just a btrfs single on disk (no partition).  Contains a
>>> single borgbackup repo for an offline backup of all the online
>>> borgbackup repo's I have for a 3 times a day backup rota of individual
>>> machines/data stores
>> So you are borg’ing a repo into a repo? I am planning on simply rsync’ing
>> the borg directory from one external HDD to another. Hopefully SMR can cope
>> with this adequatly.
>>
>> And you are storing several machines into a single repo? The docs say this
>> is not supported officially. But I have one repo each for /, /home and data
>> for both my PC and laptop. Using a wrapper script, I create snapshots that
>> are named $HOSTNAME_$DATE in each repo.
> Basicly yes: I use a once per hour snapshot of approximately 500Gib of
> data on moosefs, plus borgbackups 3 times a day to individual repos on
> moosefs for each host.  3 times a day, the latest snapshot is stuffed
> into a borg repo on moosefs and the old  snapshots are deleted.  I
> currently manually push all the repos into a borg repo on the USB3 SMR
> drive once a day or so.
>
> 1. rsync (and cp etc.) are dismally slow on SMR - use where you have to,
> avoid otherwise.
>

> forgot to mention

1a. borgbackup repos are not easily copy'able - each repo has a unique
ID and copy'ing via rsync creates a duplicate, not a new repo with a new
cache and metadata which depending on how you use can cause
corruption/data loss.  Google it.

BillK
Re: [OT] SMR drives (WAS: cryptsetup close and device in use when it is not) [ In reply to ]
On Sat, Jul 31, 2021 at 11:05 PM William Kenworthy <billk@iinet.net.au> wrote:
>
> On 31/7/21 9:30 pm, Rich Freeman wrote:
> >
> > I'd love server-grade ARM hardware but it is just so expensive unless
> > there is some source out there I'm not aware of. It is crazy that you
> > can't get more than 4-8GiB of RAM on an affordable arm system.
> Checkout the odroid range. Same or only slightly $$$ more for a much
> better unit than a pi (except for the availability of 8G ram on the pi4)

Oh, they have been on my short list.

I was opining about the lack of cheap hardware with >8GB of RAM, and I
don't believe ODROID offers anything like that. I'd be happy if they
just took DDR4 on top of whatever onboard RAM they had.

My SBCs for the lizardfs cluster are either Pi4s or RockPro64s. The
Pi4 addresses basically all the issues in the original Pis as far as
I'm aware, and is comparable to most of the ODroid stuff I believe (at
least for the stuff I need), and they're still really cheap. The
RockPro64 was a bit more expensive but also performs nicely - I bought
that to try playing around with LSI HBAs to get many SATA drives on
one SBC.

I'm mainly storing media so capacity matters more than speed. At the
time most existing SBCs either didn't have SATA or had something like
1-2 ports, and that means you're ending up with a lot of hosts. Sure,
it would perform better, but it costs more. Granted, at the start I
didn't want more than 1-2 drives per host anyway until I got up to
maybe 5 or so hosts just because that is where you see the cluster
perform well and have decent safety margins, but at this point if I
add capacity it will be to existing hosts.

> Tried ceph - run away fast :)

Yeah, it is complex, and most of the tools for managing it created
concerns that if something went wrong they could really mess the whole
thing up fast. The thing that pushed me away from it was reports that
it doesn't perform well only a few OSDs and I wanted something I could
pilot without buying a lot of hardware. Another issue is that at
least at the time I was looking into it they wanted OSDs to have 1GB
of RAM per 1TB of storage. That is a LOT of RAM. Aside from the fact
that RAM is expensive, it basically eliminates the ability to use
low-power cheap SBCs for all the OSDs, which is what I'm doing with
lizardfs. I don't care about the SBCs being on 24x7 when they pull a
few watts each peak, and almost nothing when idle. If I want to
attach even 4x14TB hard drives to an SBC though it would need 64GB of
RAM per the standards of Ceph at the time. Good luck finding a cheap
low-power ARM board that has 64GB of RAM - anything that even had DIMM
slots was something crazy like $1k at the time and at that point I
might as well build full PCs.

It seems like they've backed off on the memory requirements, maybe,
but I'd want to check on that. I've seen stories of bad things
happening when the OSDs don't have much RAM and you run into a
scenario like:
1. Lose disk, cluster starts to rebuild.
2. Lose another disk, cluster queues another rebuild.
3. Oh, first disk comes back, cluster queues another rebuild to
restore the first disk.
4. Replace the second failed disk, cluster queues another rebuild.

Apparently at least in the old days all the OSDs had to keep track of
all of that and they'd run out of RAM and basically melt down, unless
you went around adding more RAM to every OSD.

With LizardFS the OSDs basically do nothing at all but pipe stuff to
disk. If you want to use full-disk encryption then there is a CPU hit
for that, but that is all outside of Lizardfs and dm-crypt at least is
reasonable. (zfs on the other hand does not hardware accelerate it on
SBCs as far as I can tell and that hurts.)

> I improved performance and master memory
> requirements considerably by pushing the larger data sets (e.g., Gib of
> mail files) into a container file stored on MFS and loop mounted onto
> the mailserver lxc instance. Convoluted but very happy with the
> improvement its made.

Yeah, I've noticed as you described in the other email memory depends
on number of files, and it depends on having it all in RAM at once.
I'm using it for media storage mostly so the file count is modest. I
do use snapshots but only a few at a time so it can handle that.
While the master is running on amd64 with plenty of RAM I do have
shadow masters set up on SBCs and I do want to be able to switch over
to one if something goes wrong, so I want RAM use to be acceptable.
It really doesn't matter how much space the files take up - just now
many inodes you have.

--
Rich
Re: [OT] SMR drives (WAS: cryptsetup close and device in use when it is not) [ In reply to ]
Am Sun, Aug 01, 2021 at 11:46:02AM +0800 schrieb William Kenworthy:

> >> And you are storing several machines into a single repo? The docs say this
> >> is not supported officially. But I have one repo each for /, /home and data
> >> for both my PC and laptop. Using a wrapper script, I create snapshots that
> >> are named $HOSTNAME_$DATE in each repo.
> > Basicly yes: I use a once per hour snapshot of approximately 500Gib of
> > data on moosefs, plus borgbackups 3 times a day to individual repos on
> > moosefs for each host.  3 times a day, the latest snapshot is stuffed
> > into a borg repo on moosefs and the old  snapshots are deleted.  I
> > currently manually push all the repos into a borg repo on the USB3 SMR
> > drive once a day or so.
> >
> > 1. rsync (and cp etc.) are dismally slow on SMR - use where you have to,
> > avoid otherwise.
> >
>
> > forgot to mention
>
> 1a. borgbackup repos are not easily copy'able - each repo has a unique
> ID and copy'ing via rsync creates a duplicate, not a new repo with a new
> cache and metadata which depending on how you use can cause
> corruption/data loss.  Google it.

Yup. Today I did my (not so) weekly backup and rsynced the repo to the new
drive. After that I wanted to compare performance of my old 3 TB drive and
the new SMR one by deleting a snapshot from the repo on each drive. But Borg
objected on the second deletion, because “the cache was newer”. But that’s
okay. I actually like this, as this will prevent me from chaning two repos
in parallel which would make them incompatible.

--
Grüße | Greetings | Qapla’
Please do not share anything from, with or about me on any social network.

Fat stains become like new if they are regularly treated with butter.

1 2 3  View All