Mailing List Archive

Hard drive error from SMART
Howdy,

As some know, I recently moved a LOT of data around.  Seems to have
stressed one of my drives.  I got a email from SMART reporting a error. 
It's info:


The following warning/error was logged by the smartd daemon:

Device: /dev/sdd [SAT], 1 Currently unreadable (pending) sectors


The following warning/error was logged by the smartd daemon:

Device: /dev/sdd [SAT], 1 Offline uncorrectable sectors


This is from smartctl. 


ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE     
UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   083   064   044    Pre-fail 
Always       -       23544426
  3 Spin_Up_Time            0x0003   087   086   000    Pre-fail 
Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age  
Always       -       50
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail 
Always       -       4
  7 Seek_Error_Rate         0x000f   094   060   045    Pre-fail 
Always       -       2694155454
  9 Power_On_Hours          0x0032   073   073   000    Old_age  
Always       -       24299 (121 195 0)
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail 
Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age  
Always       -       35
184 End-to-End_Error        0x0032   100   100   099    Old_age  
Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age  
Always       -       0
188 Command_Timeout         0x0032   100   086   000    Old_age  
Always       -       14 14 14
189 High_Fly_Writes         0x003a   100   100   000    Old_age  
Always       -       0
190 Airflow_Temperature_Cel 0x0022   061   059   040    Old_age  
Always       -       39 (Min/Max 30/41)
191 G-Sense_Error_Rate      0x0032   092   092   000    Old_age  
Always       -       17952
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age  
Always       -       498
193 Load_Cycle_Count        0x0032   100   100   000    Old_age  
Always       -       1044
194 Temperature_Celsius     0x0022   039   041   000    Old_age  
Always       -       39 (0 18 0 0 0)
195 Hardware_ECC_Recovered  0x001a   031   001   000    Old_age  
Always       -       23544426
197 Current_Pending_Sector  0x0012   100   100   000    Old_age  
Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age  
Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age  
Always       -       0
203 Run_Out_Cancel          0x00b3   100   100   099    Pre-fail 
Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age  
Offline      -       24215h+54m+57.249s
241 Total_LBAs_Written      0x0000   100   253   000    Old_age  
Offline      -       18070332014
242 Total_LBAs_Read         0x0000   100   253   000    Old_age  
Offline      -       18343277504



The nutshell is #5 up there.  #198 was a issue until I ran the long
selftest.  It moved to #5 plus added 3 or 4 it seems.  According to
google results, it should be fine for now.  Still, a replacement drive
is on the way and I've unmount the drives for that LVM.  They still
spinning and running a selftest but nothing else should be accessing
them.  This is also from the selftest. 


SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining 
LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Self-test routine in progress 90%    
24299         -
# 2  Short offline       Completed without error       00%    
24298         -
# 3  Extended offline    Completed without error       00%    
24291         -
# 4  Extended offline    Aborted by host               10%    
24266         -
# 5  Short offline       Completed without error       00%    
24218         -
# 6  Short offline       Completed without error       00%    
24194         -
# 7  Short offline       Completed without error       00%    
24171         -
# 8  Short offline       Completed without error       00%    
24146         -

The one I aborted was because it was stuck on 10% for well over a day. 
The whole test doesn't take that long, or shouldn't anyway.  I restarted
it shortly after that.  I might add, the test did take many hours longer
than it estimated which from my past experience is quite odd.  It's
usually pretty accurate.  Still, it completed and shows it passed, just
has a boo boo on it.  I also did a file system check it fixed a couple
problems and a bunch of little things I see corrected often on bootup. 
Something about length of something.  Seems trivial. 

Given the low number and it showing it corrected that error, and then
passed a short and long test, is this drive "safe enough" to keep in
service?  I have backups just in case but just curious what others know
from experience.  At least this isn't one of those nasty messages that
the drive will die within 24 hours.  I got one of those ages ago and it
didn't miss it by much.  A little over 30 hours or so later, it was a
door stop.  It would spin but it couldn't even be seen by the BIOS. 
Maybe drives are getting better and SMART is getting better as well. 

Thoughts.  Replace as soon as drive arrives or wait and see?

Dale

:-)  :-)
Re: Hard drive error from SMART [ In reply to ]
On 12/04/2022 02:27, Dale wrote:
> The one I aborted was because it was stuck on 10% for well over a day.
> The whole test doesn't take that long, or shouldn't anyway.  I restarted
> it shortly after that.  I might add, the test did take many hours longer
> than it estimated which from my past experience is quite odd.  It's
> usually pretty accurate.  Still, it completed and shows it passed, just
> has a boo boo on it.  I also did a file system check it fixed a couple
> problems and a bunch of little things I see corrected often on bootup.
> Something about length of something.  Seems trivial.

Given that the firmware SOMETIMES gets its knickers in a twist,
especially consumer drives (not sure what yours are?), and read errors
are a dime a dozen, I wouldn't worry that much about ONE error.

Do another SMART test after your next reboot. Any NEW errors will be a
red flag, but just this one again? Don't worry.
>
> Given the low number and it showing it corrected that error, and then
> passed a short and long test, is this drive "safe enough" to keep in
> service?  I have backups just in case but just curious what others know
> from experience.  At least this isn't one of those nasty messages that
> the drive will die within 24 hours.  I got one of those ages ago and it
> didn't miss it by much.  A little over 30 hours or so later, it was a
> door stop.  It would spin but it couldn't even be seen by the BIOS.
> Maybe drives are getting better and SMART is getting better as well.

SMART is a lot better than it was, but remember, it only picks up wear
and tear. Mechanical failure is just as deadly, and usually strikes out
of the blue. I saw some stats somewhere it's something like 1/3, 2/3
wear and tear picked up by SMART, and mechanical failure undetectable by
smart. Can't remember which stat was which.
>
> Thoughts.  Replace as soon as drive arrives or wait and see?

If you get a couple of errors, then no more for months, the drive is
probably fine. If you get new errors every time you test, ditch it ASAP.

Either way, make sure it's backed up!

Cheers,
Wol
Re: Hard drive error from SMART [ In reply to ]
Wols Lists wrote:
> On 12/04/2022 02:27, Dale wrote:
>> The one I aborted was because it was stuck on 10% for well over a day.
>> The whole test doesn't take that long, or shouldn't anyway.  I restarted
>> it shortly after that.  I might add, the test did take many hours longer
>> than it estimated which from my past experience is quite odd.  It's
>> usually pretty accurate.  Still, it completed and shows it passed, just
>> has a boo boo on it.  I also did a file system check it fixed a couple
>> problems and a bunch of little things I see corrected often on bootup.
>> Something about length of something.  Seems trivial.
>
> Given that the firmware SOMETIMES gets its knickers in a twist,
> especially consumer drives (not sure what yours are?), and read errors
> are a dime a dozen, I wouldn't worry that much about ONE error.
>
> Do another SMART test after your next reboot. Any NEW errors will be a
> red flag, but just this one again? Don't worry.


That seems to be what my google searches revealed.  After all, nothing
is perfect.  I'm sometimes surprised that drives aren't shipped with a
couple of these.  I'll keep my backups up to date as usual tho.  ;-)


>>
>> Given the low number and it showing it corrected that error, and then
>> passed a short and long test, is this drive "safe enough" to keep in
>> service?  I have backups just in case but just curious what others know
>> from experience.  At least this isn't one of those nasty messages that
>> the drive will die within 24 hours.  I got one of those ages ago and it
>> didn't miss it by much.  A little over 30 hours or so later, it was a
>> door stop.  It would spin but it couldn't even be seen by the BIOS.
>> Maybe drives are getting better and SMART is getting better as well.
>
> SMART is a lot better than it was, but remember, it only picks up wear
> and tear. Mechanical failure is just as deadly, and usually strikes
> out of the blue. I saw some stats somewhere it's something like 1/3,
> 2/3 wear and tear picked up by SMART, and mechanical failure
> undetectable by smart. Can't remember which stat was which.

My understanding is that SMART detects media problems and sometimes even
when a electronic component is getting out of spec.  However, it is
unlikely to detect that the spindle motor or the mechanism that moves
the heads is about to go out.  It can detect some things but not
everything.  From my understanding, it is mostly about monitoring the
magnetic media itself.  It is however, better than nothing at all. 


>>
>> Thoughts.  Replace as soon as drive arrives or wait and see?
>
> If you get a couple of errors, then no more for months, the drive is
> probably fine. If you get new errors every time you test, ditch it ASAP.
>
> Either way, make sure it's backed up!
>
> Cheers,
> Wol
>
>


Sounds like a plan.  Drive should be here Friday.  I'll keep a eye on
it.  It's down to 10% on long selftest and no errors reported yet.  I'll
keep the drive unmounted until Friday tho, just in case. 

Thanks for the opinions. 

Dale

:-)  :-) 
RE: Hard drive error from SMART [ In reply to ]
> -----Original Message-----
> From: Dale <rdalek1967@gmail.com>
> Sent: Monday, April 11, 2022 6:28 PM
> To: gentoo-user@lists.gentoo.org
> Subject: [gentoo-user] Hard drive error from SMART
>
> Given the low number and it showing it corrected that error, and then passed a short and long test, is this drive "safe enough" to keep in service? I have backups just in case but just curious what others know from experience. At least this isn't one of those nasty messages that the drive will die within 24 hours. I got one of those ages ago and it didn't miss it by much. A little over 30 hours or so later, it was a door stop. It would spin but it couldn't even be seen by the BIOS.
> Maybe drives are getting better and SMART is getting better as well.
>
> Thoughts. Replace as soon as drive arrives or wait and see?
>
> Dale
>
> :-) :-)
>
When it's just one or two errors like that and they don't keep going up I tend to treat it as an isolated incident, but the drive still goes into the pool I use with RAID just in case.

Preferably a setup where you can lose more than one disk without losing the data.

Note that, depending on where the bad sector is, when it gets remapped the extra seek necessary to read that logical address could slow the drive down substantially. Make sure your filesystem's root inode or something doesn't end up on top of it.

Sometimes I miss the old drives where all this was handled by the OS and so you knew exactly what sector was bad and your filesystem could be told to just not use it. Made scanning for bad sectors more annoying, but deciding how bad the drive was rather easier.

LMP
Re: Hard drive error from SMART [ In reply to ]
On Mon, Apr 11, 2022 at 9:27 PM Dale <rdalek1967@gmail.com> wrote:
>
> Thoughts. Replace as soon as drive arrives or wait and see?
>

So, first of all just about all my hard drives are in a RAID at this
point, so I have a higher tolerance for issues.

If a drive is under warranty I'll usually try to see if they will RMA
it. More often than not they will, and in that case there is really
no reason not to. I'll do advance shipping and replace the drive
before sending the old one back so that I mostly have redundancy the
whole time.

If it isn't under warranty then I'll scrub it and see what happens.
I'll of course do SMART self-tests, but usually an error like this
won't actually clear until you overwrite the offline sector so that
the drive can reallocate it. A RAID scrub/resilver/etc will overwrite
the sector with the correct contents which will allow this to happen.
(Otherwise there is no way for the drive to recover - if it knew what
was stored there it wouldn't have an error in the first place.)

If an error comes back then I'll replace the drive. My drives are
pretty large at this point so I don't like keeping unreliable drives
around. It just increases the risk of double failures, given that a
large hard drive can take more than a day to replace. Write speeds
just don't keep pace with capacities. I do have offline backups but I
shudder at the thought of how long one of those would take to restore.

--
Rich
Re: Hard drive error from SMART [ In reply to ]
Rich Freeman wrote:
> On Mon, Apr 11, 2022 at 9:27 PM Dale <rdalek1967@gmail.com> wrote:
>> Thoughts. Replace as soon as drive arrives or wait and see?
>>
> So, first of all just about all my hard drives are in a RAID at this
> point, so I have a higher tolerance for issues.
>
> If a drive is under warranty I'll usually try to see if they will RMA
> it. More often than not they will, and in that case there is really
> no reason not to. I'll do advance shipping and replace the drive
> before sending the old one back so that I mostly have redundancy the
> whole time.
>
> If it isn't under warranty then I'll scrub it and see what happens.
> I'll of course do SMART self-tests, but usually an error like this
> won't actually clear until you overwrite the offline sector so that
> the drive can reallocate it. A RAID scrub/resilver/etc will overwrite
> the sector with the correct contents which will allow this to happen.
> (Otherwise there is no way for the drive to recover - if it knew what
> was stored there it wouldn't have an error in the first place.)
>
> If an error comes back then I'll replace the drive. My drives are
> pretty large at this point so I don't like keeping unreliable drives
> around. It just increases the risk of double failures, given that a
> large hard drive can take more than a day to replace. Write speeds
> just don't keep pace with capacities. I do have offline backups but I
> shudder at the thought of how long one of those would take to restore.
>


Sadly, I don't have RAID here but to be honest, I really need to have it
given the data and my recent luck with hard drives.  Drives used to get
dumped because they were just to small to use anymore.  Nowadays, they
seem to break in some fashion long before their usefulness ends their
lives. 

I remounted the drives and did a backup.  For anyone running up on this,
just in case one of the files got corrupted, I used a little trick to
see if I can figure out which one may be bad if any.  I took my rsync
commands from my little script and ran them one at a time with --dry-run
added.  If a file was to be updated on the backup that I hadn't changed
or added, I was going to check into it before updating my backups.  It
could be that the backup file was still good and the file on my drive
reporting problems was bad.  In that case, I would determine which was
good and either restore it from backups or allow it to be updated if
needed.  Either way, I should have a good file since the drive claims to
have fixed the problem.  Now let us pray.  :-D 

Drive isn't under warranty.  I may have to start buying new drives from
dealers.  Sometimes I find drives that are pulled from systems and have
very few hours on them.  Still, warranty may not last long.  Saves a lot
of money tho. 

USPS claims drive is on the way.  Left a distribution point and should
update again when it gets close.  First said Saturday, then said
Friday.  I think Friday is about right but if the wind blows right,
maybe Thursday. 

I hope I have another port and power cable plug for the swap out.  At
least now, I can unmount it and swap without a lot of rebooting.  Since
it's on LVM, that part is easy.  Regretfully I have experience on that
process.  :/

Thanks to all. 

Dale

:-)  :-) 
RE: Hard drive error from SMART [ In reply to ]
> -----Original Message-----
> From: Dale <rdalek1967@gmail.com>
> Sent: Tuesday, April 12, 2022 10:08 AM
> To: gentoo-user@lists.gentoo.org
> Subject: Re: [gentoo-user] Hard drive error from SMART
>
> Rich Freeman wrote:
> > On Mon, Apr 11, 2022 at 9:27 PM Dale <rdalek1967@gmail.com> wrote:
> >> Thoughts. Replace as soon as drive arrives or wait and see?
> >>
> > So, first of all just about all my hard drives are in a RAID at this
> > point, so I have a higher tolerance for issues.
> >
> > If a drive is under warranty I'll usually try to see if they will RMA
> > it. More often than not they will, and in that case there is really
> > no reason not to. I'll do advance shipping and replace the drive
> > before sending the old one back so that I mostly have redundancy the
> > whole time.
> >
> > If it isn't under warranty then I'll scrub it and see what happens.
> > I'll of course do SMART self-tests, but usually an error like this
> > won't actually clear until you overwrite the offline sector so that
> > the drive can reallocate it. A RAID scrub/resilver/etc will overwrite
> > the sector with the correct contents which will allow this to happen.
> > (Otherwise there is no way for the drive to recover - if it knew what
> > was stored there it wouldn't have an error in the first place.)
> >
> > If an error comes back then I'll replace the drive. My drives are
> > pretty large at this point so I don't like keeping unreliable drives
> > around. It just increases the risk of double failures, given that a
> > large hard drive can take more than a day to replace. Write speeds
> > just don't keep pace with capacities. I do have offline backups but I
> > shudder at the thought of how long one of those would take to restore.
> >
>
>
> Sadly, I don't have RAID here but to be honest, I really need to have it given the data and my recent luck with hard drives. Drives used to get dumped because they were just to small to use anymore. Nowadays, they seem to break in some fashion long before their usefulness ends their lives.
>
> I remounted the drives and did a backup. For anyone running up on this, just in case one of the files got corrupted, I used a little trick to see if I can figure out which one may be bad if any. I took my rsync commands from my little script and ran them one at a time with --dry-run added. If a file was to be updated on the backup that I hadn't changed or added, I was going to check into it before updating my backups. It could be that the backup file was still good and the file on my drive reporting problems was bad. In that case, I would determine which was good and either restore it from backups or allow it to be updated if needed. Either way, I should have a good file since the drive claims to have fixed the problem. Now let us pray. :-D
>
> Drive isn't under warranty. I may have to start buying new drives from dealers. Sometimes I find drives that are pulled from systems and have very few hours on them. Still, warranty may not last long. Saves a lot of money tho.
>
> USPS claims drive is on the way. Left a distribution point and should update again when it gets close. First said Saturday, then said Friday. I think Friday is about right but if the wind blows right, maybe Thursday.
>
> I hope I have another port and power cable plug for the swap out. At least now, I can unmount it and swap without a lot of rebooting. Since it's on LVM, that part is easy. Regretfully I have experience on that process. :/
>
> Thanks to all.
>
> Dale
>
> :-) :-)
>
>
You can get up to 16X SATA PCI-e cards these days for pretty cheap. So as long as you have the power to run another drive or two there's not much reason not to do RAID on the important stuff. Also, the SATA protocol allows for port expanders, which are also pretty cheap.

One of my favorite things about BTRFS is the data checksums. If the drive returns garbage, it turns into a read error. Also, if you can't do real RAID, but have excess space you can tell it to keep two copies of everything. Doesn't help with total drive failure, but does protect against the occasional failed sector. If you don't mind writes taking twice as long anyway.

LMP
Re: Hard drive error from SMART [ In reply to ]
Am Tue, Apr 12, 2022 at 12:08:24PM -0500 schrieb Dale:
> Rich Freeman wrote:
> > On Mon, Apr 11, 2022 at 9:27 PM Dale <rdalek1967@gmail.com> wrote:
> >> Thoughts. Replace as soon as drive arrives or wait and see?
> >>
> > So, first of all just about all my hard drives are in a RAID at this
> > point, so I have a higher tolerance for issues.

> Sadly, I don't have RAID here but to be honest, I really need to have it
> given the data and my recent luck with hard drives. 

Plus, if you do a Raid 5 or Raid-Z1, you use your capacity more efficiently
with just three drives. However, when I was building my NAS 5½ years ago,
there was already an article about Raid-5 becoming obsolete due to the ever
rising drive capacity. Because if you have a failed drive and need to
replace and rebuild, the chances that another drive fails during rebuild
rises with the drive capacity.

> Drives used to get dumped because they were just to small to use anymore. 
> Nowadays, they seem to break in some fashion long before their usefulness
> ends their lives. 

I recently bought a passive mini-pc (zotac zbox) and just for the fun of it
installed a 160 GB HDD that maxes out at aronud 40 MiB/s. You do NOT want to
run a modern Linux desktop on such a drive. :D

> I remounted the drives and did a backup.  For anyone running up on this,
> just in case one of the files got corrupted, I used a little trick to
> see if I can figure out which one may be bad if any.  I took my rsync
> commands from my little script and ran them one at a time with --dry-run
> added.

I actually developed a tool for that. It creates and checks md5 checksums
recursively and *per directory*. Whenever I copy stuff from somewhere, like
a music album, I do an immediate md5 run on that directory. And when I later
copy that stuff around, I simply run the tool again on the copy (after the
FS cache was flushed, for example by unmounting and remounting) to see
whether the checksums are still valid.

You can find it on github: https://github.com/felf/dh
It’s a single-file python application, because I couldn’t be bothered with
the myriad ways of creating a python package. ;-)

--
Grüße | Greetings | Salut | Qapla’
Please do not share anything from, with or about me on any social network.

A horse comes into a bar.
Barkeep: “Hey!”
Horse: “Sure.”
RE: Hard drive error from SMART [ In reply to ]
> -----Original Message-----
> From: Frank Steinmetzger <Warp_7@gmx.de>
> Sent: Tuesday, April 12, 2022 10:39 AM
> To: gentoo-user@lists.gentoo.org
> Subject: Re: [gentoo-user] Hard drive error from SMART
>
>
> I actually developed a tool for that. It creates and checks md5 checksums recursively and *per directory*. Whenever I copy stuff from somewhere, like a music album, I do an immediate md5 run on that directory. And when I later copy that stuff around, I simply run the tool again on the copy (after the FS cache was flushed, for example by unmounting and remounting) to see whether the checksums are still valid.
>
> You can find it on github: https://github.com/felf/dh It’s a single-file python application, because I couldn’t be bothered with the myriad ways of creating a python package. ;-)
>
> --
> Grüße | Greetings | Salut | Qapla’
> Please do not share anything from, with or about me on any social network.
>
> A horse comes into a bar.
> Barkeep: “Hey!”
> Horse: “Sure.”
>
There's also app-crypt/md5deep

Does a number of hashes, is threaded, has options for piecewise hashing and a matching mode for using the hashes to find duplicates. Also a number of input and output filters for those cases where you don't want to hash everything.

Also can output a number of formats, but reformatting is generally trivial.

LMP
Re: Hard drive error from SMART [ In reply to ]
Laurence Perkins wrote:
>> -----Original Message-----
>> From: Dale <rdalek1967@gmail.com>
>> Sent: Tuesday, April 12, 2022 10:08 AM
>> To: gentoo-user@lists.gentoo.org
>> Subject: Re: [gentoo-user] Hard drive error from SMART
>>
>> Rich Freeman wrote:
>>> On Mon, Apr 11, 2022 at 9:27 PM Dale <rdalek1967@gmail.com> wrote:
>>>> Thoughts. Replace as soon as drive arrives or wait and see?
>>>>
>>> So, first of all just about all my hard drives are in a RAID at this
>>> point, so I have a higher tolerance for issues.
>>>
>>> If a drive is under warranty I'll usually try to see if they will RMA
>>> it. More often than not they will, and in that case there is really
>>> no reason not to. I'll do advance shipping and replace the drive
>>> before sending the old one back so that I mostly have redundancy the
>>> whole time.
>>>
>>> If it isn't under warranty then I'll scrub it and see what happens.
>>> I'll of course do SMART self-tests, but usually an error like this
>>> won't actually clear until you overwrite the offline sector so that
>>> the drive can reallocate it. A RAID scrub/resilver/etc will overwrite
>>> the sector with the correct contents which will allow this to happen.
>>> (Otherwise there is no way for the drive to recover - if it knew what
>>> was stored there it wouldn't have an error in the first place.)
>>>
>>> If an error comes back then I'll replace the drive. My drives are
>>> pretty large at this point so I don't like keeping unreliable drives
>>> around. It just increases the risk of double failures, given that a
>>> large hard drive can take more than a day to replace. Write speeds
>>> just don't keep pace with capacities. I do have offline backups but I
>>> shudder at the thought of how long one of those would take to restore.
>>>
>>
>> Sadly, I don't have RAID here but to be honest, I really need to have it given the data and my recent luck with hard drives. Drives used to get dumped because they were just to small to use anymore. Nowadays, they seem to break in some fashion long before their usefulness ends their lives.
>>
>> I remounted the drives and did a backup. For anyone running up on this, just in case one of the files got corrupted, I used a little trick to see if I can figure out which one may be bad if any. I took my rsync commands from my little script and ran them one at a time with --dry-run added. If a file was to be updated on the backup that I hadn't changed or added, I was going to check into it before updating my backups. It could be that the backup file was still good and the file on my drive reporting problems was bad. In that case, I would determine which was good and either restore it from backups or allow it to be updated if needed. Either way, I should have a good file since the drive claims to have fixed the problem. Now let us pray. :-D
>>
>> Drive isn't under warranty. I may have to start buying new drives from dealers. Sometimes I find drives that are pulled from systems and have very few hours on them. Still, warranty may not last long. Saves a lot of money tho.
>>
>> USPS claims drive is on the way. Left a distribution point and should update again when it gets close. First said Saturday, then said Friday. I think Friday is about right but if the wind blows right, maybe Thursday.
>>
>> I hope I have another port and power cable plug for the swap out. At least now, I can unmount it and swap without a lot of rebooting. Since it's on LVM, that part is easy. Regretfully I have experience on that process. :/
>>
>> Thanks to all.
>>
>> Dale
>>
>> :-) :-)
>>
>>
> You can get up to 16X SATA PCI-e cards these days for pretty cheap. So as long as you have the power to run another drive or two there's not much reason not to do RAID on the important stuff. Also, the SATA protocol allows for port expanders, which are also pretty cheap.
>
> One of my favorite things about BTRFS is the data checksums. If the drive returns garbage, it turns into a read error. Also, if you can't do real RAID, but have excess space you can tell it to keep two copies of everything. Doesn't help with total drive failure, but does protect against the occasional failed sector. If you don't mind writes taking twice as long anyway.
>
> LMP


I looked into a card a good while back and they were pretty pricey at
the time.  You happen to have some search terms I can search for on
ebay, Amazon etc?  I know some chipsets work better on Linux out of the
box.  I don't need to buy one that doesn't work or only works with the
threat of a sledge hammer.  lol  I've also looked into that other thing,
SAS? or something.  It's been a while tho. 

I'm pretty good at doing backups.  I do Gentoo updates on Saturday, and
sometimes Sunday.  While the updates are downloading, I update my
backups.  It's almost like a religion for me.  I was just more cautious
earlier.  I suspect a file could be corrupted somewhere but wanted to be
sure it wasn't something important.  I have some files that if lost, I
may not can download again.  They don't exist.  A few I got from some
Govt archive that are really old but since removed, or at least I can't
find them anymore. 

I've given serious thought to switching to BTRFS.  Thing is, I'm still
trying to get LVM figured out.  Plus, LVM is well maintained and should
be for a good long while, plus it works for me.  Still, if I could
afford to have several new drives all at once, I'd certainly play with
it.  It could very well be better.  The one thing I wish, LVM had a GUI
where you could do everything from it.  During my recent rearrangement
of drives, I learned that you can't do a lot of things within webmin. 
It does some things but not everything.  Plus, you have to have a
running GUI to use it.  In that case, I had to unmount /home which meant
no KDE, so no Webmin either.  Still, that could cause trouble too.  I
dunno. 

Thanks.

Dale

:-)  :-)
Re: Hard drive error from SMART [ In reply to ]
Am Tue, Apr 12, 2022 at 06:09:13PM +0000 schrieb Laurence Perkins:

> > I actually developed a tool for that. It creates and checks md5
> > checksums recursively and *per directory*. Whenever I copy stuff from
> > somewhere, like a music album, I do an immediate md5 run on that
> > directory. And when I later copy that stuff around, I simply run the
> > tool again on the copy (after the FS cache was flushed, for example by
> > unmounting and remounting) to see whether the checksums are still valid.
> >
> There's also app-crypt/md5deep
>
> Does a number of hashes, is threaded, has options for piecewise hashing and a matching mode for using the hashes to find duplicates. Also a number of input and output filters for those cases where you don't want to hash everything.

I knew about md5deep when I started with my own tool (as can be read in the
readme ;-) ). But md5deep used one single md5 file at a tree’s root, whereas
I wanted one file per directory in a tree. The reason being that I wanted to
be able to copy individual directories and still check their hashes without
editing checksum files.

--
Grüße | Greetings | Salut | Qapla’
Please do not share anything from, with or about me on any social network.

If you were born feet-first, then, for a short moment,
you wore your mother as a hat.
Re: Hard drive error from SMART [ In reply to ]
On 12/04/2022 18:21, Laurence Perkins wrote:
> You can get up to 16X SATA PCI-e cards these days for pretty cheap. So as long as you have the power to run another drive or two there's not much reason not to do RAID on the important stuff. Also, the SATA protocol allows for port expanders, which are also pretty cheap.
>
> One of my favorite things about BTRFS is the data checksums. If the drive returns garbage, it turns into a read error. Also, if you can't do real RAID, but have excess space you can tell it to keep two copies of everything. Doesn't help with total drive failure, but does protect against the occasional failed sector. If you don't mind writes taking twice as long anyway.

https://raid.wiki.kernel.org/index.php/Linux_Raid

https://raid.wiki.kernel.org/index.php/System2020

That system in the second link is the system being used to type this
message ...

Cheers,
Wol
Re: Hard drive error from SMART [ In reply to ]
On Tue, Apr 12, 2022 at 1:08 PM Dale <rdalek1967@gmail.com> wrote:
>
> I remounted the drives and did a backup. For anyone running up on this,
> just in case one of the files got corrupted, I used a little trick to
> see if I can figure out which one may be bad if any. I took my rsync
> commands from my little script and ran them one at a time with --dry-run
> added. If a file was to be updated on the backup that I hadn't changed
> or added, I was going to check into it before updating my backups.

Unless you're using the --checksum option on rsync this isn't likely
to be effective. By default rsync only looks at size and mtime, so it
isn't going to back up a file unless you intentionally changed it. If
data was silently corrupted this wouldn't detect a change at all
without the --checksum option.

Ultimately if you care about silent corruptions you're best off using
a solution that actually achieves this. btrfs, zfs, or something
whipped up with dm-integrity would be best. At a file level you could
store multiple files and hashes, or use a solution like PAR2. Plain
mdadm raid1 will fix issues if the drive detects and reports errors
(the drive typically has a checksum to do this, but it is a black box
and may not always work). The other solutions will reliably detect
and possibly recover errors even if the drive fails to detect them (a
so-called silent error).

Just about all my linux data these days is on a solution that detects
silent errors - zfs or lizardfs. On ssd-based systems where I don't
want to invest in mirroring I still run zfs to detect errors and just
use frequent backups (ssds are small anyway so they're cheap to
frequently back up, especially if they're on zfs where there are
send-based backup scripts for this, and typically this is for OS
drives where things don't change much anyway).

--
Rich
RE: Hard drive error from SMART [ In reply to ]
>-----Original Message-----
>From: Dale <rdalek1967@gmail.com>
>Sent: Tuesday, April 12, 2022 11:22 AM
>To: gentoo-user@lists.gentoo.org
>Subject: Re: [gentoo-user] Hard drive error from SMART
>
>Laurence Perkins wrote:
>>> -----Original Message-----
>>> From: Dale <rdalek1967@gmail.com>
>>> Sent: Tuesday, April 12, 2022 10:08 AM
>>> To: gentoo-user@lists.gentoo.org
>>> Subject: Re: [gentoo-user] Hard drive error from SMART
>>>
>>> Rich Freeman wrote:
>>>> On Mon, Apr 11, 2022 at 9:27 PM Dale <rdalek1967@gmail.com> wrote:
>>>>> Thoughts. Replace as soon as drive arrives or wait and see?
>>>>>
>>>> So, first of all just about all my hard drives are in a RAID at this
>>>> point, so I have a higher tolerance for issues.
>>>>
>>>> If a drive is under warranty I'll usually try to see if they will
>>>> RMA it. More often than not they will, and in that case there is
>>>> really no reason not to. I'll do advance shipping and replace the
>>>> drive before sending the old one back so that I mostly have
>>>> redundancy the whole time.
>>>>
>>>> If it isn't under warranty then I'll scrub it and see what happens.
>>>> I'll of course do SMART self-tests, but usually an error like this
>>>> won't actually clear until you overwrite the offline sector so that
>>>> the drive can reallocate it. A RAID scrub/resilver/etc will
>>>> overwrite the sector with the correct contents which will allow this to happen.
>>>> (Otherwise there is no way for the drive to recover - if it knew
>>>> what was stored there it wouldn't have an error in the first place.)
>>>>
>>>> If an error comes back then I'll replace the drive. My drives are
>>>> pretty large at this point so I don't like keeping unreliable drives
>>>> around. It just increases the risk of double failures, given that a
>>>> large hard drive can take more than a day to replace. Write speeds
>>>> just don't keep pace with capacities. I do have offline backups but
>>>> I shudder at the thought of how long one of those would take to restore.
>>>>
>>>
>>> Sadly, I don't have RAID here but to be honest, I really need to have it given the data and my recent luck with hard drives. Drives used to get dumped because they were just to small to use anymore. Nowadays, they seem to break in some fashion long before their usefulness ends their lives.
>>>
>>> I remounted the drives and did a backup. For anyone running up on
>>> this, just in case one of the files got corrupted, I used a little
>>> trick to see if I can figure out which one may be bad if any. I took
>>> my rsync commands from my little script and ran them one at a time
>>> with --dry-run added. If a file was to be updated on the backup that
>>> I hadn't changed or added, I was going to check into it before
>>> updating my backups. It could be that the backup file was still good
>>> and the file on my drive reporting problems was bad. In that case, I
>>> would determine which was good and either restore it from backups or
>>> allow it to be updated if needed. Either way, I should have a good
>>> file since the drive claims to have fixed the problem. Now let us
>>> pray. :-D
>>>
>>> Drive isn't under warranty. I may have to start buying new drives from dealers. Sometimes I find drives that are pulled from systems and have very few hours on them. Still, warranty may not last long. Saves a lot of money tho.
>>>
>>> USPS claims drive is on the way. Left a distribution point and should update again when it gets close. First said Saturday, then said Friday. I think Friday is about right but if the wind blows right, maybe Thursday.
>>>
>>> I hope I have another port and power cable plug for the swap out. At
>>> least now, I can unmount it and swap without a lot of rebooting.
>>> Since it's on LVM, that part is easy. Regretfully I have experience
>>> on that process. :/
>>>
>>> Thanks to all.
>>>
>>> Dale
>>>
>>> :-) :-)
>>>
>>>
>> You can get up to 16X SATA PCI-e cards these days for pretty cheap. So as long as you have the power to run another drive or two there's not much reason not to do RAID on the important stuff. Also, the SATA protocol allows for port expanders, which are also pretty cheap.
>>
>> One of my favorite things about BTRFS is the data checksums. If the drive returns garbage, it turns into a read error. Also, if you can't do real RAID, but have excess space you can tell it to keep two copies of everything. Doesn't help with total drive failure, but does protect against the occasional failed sector. If you don't mind writes taking twice as long anyway.
>>
>> LMP
>
>
>I looked into a card a good while back and they were pretty pricey at the time. You happen to have some search terms I can search for on ebay, Amazon etc? I know some chipsets work better on Linux out of the box. I don't need to buy one that doesn't work or only works with the threat of a sledge hammer. lol I've also looked into that other thing, SAS? or something. It's been a while tho.
>
>I'm pretty good at doing backups. I do Gentoo updates on Saturday, and sometimes Sunday. While the updates are downloading, I update my backups. It's almost like a religion for me. I was just more cautious earlier. I suspect a file could be corrupted somewhere but wanted to be sure it wasn't something important. I have some files that if lost, I may not can download again. They don't exist. A few I got from some Govt archive that are really old but since removed, or at least I can't find them anymore.
>
>I've given serious thought to switching to BTRFS. Thing is, I'm still trying to get LVM figured out. Plus, LVM is well maintained and should be for a good long while, plus it works for me. Still, if I could afford to have several new drives all at once, I'd certainly play with it. It could very well be better. The one thing I wish, LVM had a GUI where you could do everything from it. During my recent rearrangement of drives, I learned that you can't do a lot of things within webmin. It does some things but not everything. Plus, you have to have a running GUI to use it. In that case, I had to unmount /home which meant no KDE, so no Webmin either. Still, that could cause trouble too. I dunno.
>
>Thanks.
>
>Dale
>
>:-) :-)
>
>

I went with a couple of https://www.amazon.com/MZHOU-Profile-Bracket-Support-Converter/dp/B08L7W8QFT/ in a couple different sizes for two of my mass storage systems and they seem to be doing OK.

The difference between the cheap vendors and the expensive vendors these days tends to be quality control. So plug it in, load it up, run it hard for a few hours. If it doesn't die relatively quickly you're usually good.

Especially if you have RAID with checksums it's difficult for a controller to mangle things too badly even if it does have an issue.

Remember: Data does not exist if it doesn't exist in at least three places. So you still want off-site backups in case your house burns down. Especially for irreplaceable things.

If you have friends who also want off-site backups and you leave your machines running all the time then tahoe-lafs is pretty decent. For that matter they don't even have to really be friends, you really only have to be able to trust them to not selfishly hog all the space.

I use BTRFS RAID1 for a lot of stuff. So far it's been pretty good at catching dropped bits and recovering from failures. It has a bit of the RAID issue where a drive could fail while you're doing a recovery since it only guarantees integrity with one dud drive regardless of the number of drives in the pool. But since each chunk is only written to two drives instead of spread across all of them the rebuild time stays relatively short and even if another drive does fail you'll only lose some of the data instead of all of it. This also means that the wasted space when your drives aren't all the same size is kept to a minimum.

ZFS and similar are arguably better for larger arrays, but are also more hassle to set up.

LVM is good for being able to swap out drives easily but with the modern, huge drives you really want data checksums if you can get them. Otherwise all it takes is a flipped bit somewhere to wreck your data and drive firmware doesn't always notice. I think you can do that with LVM, but I've never looked into it for certain.

LMP
Re: Hard drive error from SMART [ In reply to ]
On 12/04/2022 20:41, Laurence Perkins wrote:
> LVM is good for being able to swap out drives easily but with the modern, huge drives you really want data checksums if you can get them. Otherwise all it takes is a flipped bit somewhere to wreck your data and drive firmware doesn't always notice. I think you can do that with LVM, but I've never looked into it for certain.

Look at that link for my system that I posted. I use dm-integrity, so a
flipped bit will trigger a failure at the raid-5 level and recover.

For those people looking at btrfs - note that parity-raid (5 or 6) is
not a wise idea at the moment so you don't get two-failure protection ...

Cheers,
Wol
Re: Hard drive error from SMART [ In reply to ]
Laurence Perkins wrote:
> I went with a couple of https://www.amazon.com/MZHOU-Profile-Bracket-Support-Converter/dp/B08L7W8QFT/ in a couple different sizes for two of my mass storage systems and they seem to be doing OK.
>
> The difference between the cheap vendors and the expensive vendors these days tends to be quality control. So plug it in, load it up, run it hard for a few hours. If it doesn't die relatively quickly you're usually good.
>
> Especially if you have RAID with checksums it's difficult for a controller to mangle things too badly even if it does have an issue.
>
> Remember: Data does not exist if it doesn't exist in at least three places. So you still want off-site backups in case your house burns down. Especially for irreplaceable things.
>
> If you have friends who also want off-site backups and you leave your machines running all the time then tahoe-lafs is pretty decent. For that matter they don't even have to really be friends, you really only have to be able to trust them to not selfishly hog all the space.
>
> I use BTRFS RAID1 for a lot of stuff. So far it's been pretty good at catching dropped bits and recovering from failures. It has a bit of the RAID issue where a drive could fail while you're doing a recovery since it only guarantees integrity with one dud drive regardless of the number of drives in the pool. But since each chunk is only written to two drives instead of spread across all of them the rebuild time stays relatively short and even if another drive does fail you'll only lose some of the data instead of all of it. This also means that the wasted space when your drives aren't all the same size is kept to a minimum.
>
> ZFS and similar are arguably better for larger arrays, but are also more hassle to set up.
>
> LVM is good for being able to swap out drives easily but with the modern, huge drives you really want data checksums if you can get them. Otherwise all it takes is a flipped bit somewhere to wreck your data and drive firmware doesn't always notice. I think you can do that with LVM, but I've never looked into it for certain.
>
> LMP

I looked at that card and read some of the reviews.  Some claim they had
issues but I suspect a driver problem.  Can you do a lspci -k and see
what driver it uses for that card on your system?  If yours works fine,
I'd want to use the same driver. 

That is a lot of drives tho.  I need to build a NAS thingy.  lol

Dale

:-)  :-) 
Re: Hard drive error from SMART [ In reply to ]
Wols Lists wrote:
> On 12/04/2022 18:21, Laurence Perkins wrote:
>> You can get up to 16X SATA PCI-e cards these days for pretty cheap. 
>> So as long as you have the power to run another drive or two there's
>> not much reason not to do RAID on the important stuff.  Also, the
>> SATA protocol allows for port expanders, which are also pretty cheap.
>>
>> One of my favorite things about BTRFS is the data checksums.  If the
>> drive returns garbage, it turns into a read error.  Also, if you
>> can't do real RAID, but have excess space you can tell it to keep two
>> copies of everything.  Doesn't help with total drive failure, but
>> does protect against the occasional failed sector.  If you don't mind
>> writes taking twice as long anyway.
>
> https://raid.wiki.kernel.org/index.php/Linux_Raid
>
> https://raid.wiki.kernel.org/index.php/System2020
>
> That system in the second link is the system being used to type this
> message ...
>
> Cheers,
> Wol
>
>


Neat setup.  I need something similar for a NAS setup thingy.  Just got
way to much going on right now. 

Dale

:-)  :-) 
Re: Hard drive error from SMART [ In reply to ]
Rich Freeman wrote:
> On Tue, Apr 12, 2022 at 1:08 PM Dale <rdalek1967@gmail.com> wrote:
>> I remounted the drives and did a backup. For anyone running up on this,
>> just in case one of the files got corrupted, I used a little trick to
>> see if I can figure out which one may be bad if any. I took my rsync
>> commands from my little script and ran them one at a time with --dry-run
>> added. If a file was to be updated on the backup that I hadn't changed
>> or added, I was going to check into it before updating my backups.
> Unless you're using the --checksum option on rsync this isn't likely
> to be effective. By default rsync only looks at size and mtime, so it
> isn't going to back up a file unless you intentionally changed it. If
> data was silently corrupted this wouldn't detect a change at all
> without the --checksum option.
>
> Ultimately if you care about silent corruptions you're best off using
> a solution that actually achieves this. btrfs, zfs, or something
> whipped up with dm-integrity would be best. At a file level you could
> store multiple files and hashes, or use a solution like PAR2. Plain
> mdadm raid1 will fix issues if the drive detects and reports errors
> (the drive typically has a checksum to do this, but it is a black box
> and may not always work). The other solutions will reliably detect
> and possibly recover errors even if the drive fails to detect them (a
> so-called silent error).
>
> Just about all my linux data these days is on a solution that detects
> silent errors - zfs or lizardfs. On ssd-based systems where I don't
> want to invest in mirroring I still run zfs to detect errors and just
> use frequent backups (ssds are small anyway so they're cheap to
> frequently back up, especially if they're on zfs where there are
> send-based backup scripts for this, and typically this is for OS
> drives where things don't change much anyway).
>


My hope was if it was corrupted and something changed then I'd see it in
the list.  If nothing changed then rsync wouldn't change anything on the
backups either.  I'll look into that option tho.  May be something for
the future.  ;-)  I suspect it would slow things down quite a bit tho. 

Dale

:-)  :-)
Re: Hard drive error from SMART [ In reply to ]
On Tue, Apr 12, 2022 at 3:01 PM Dale <rdalek1967@gmail.com> wrote:
<SNIP>
> Neat setup. I need something similar for a NAS setup thingy. Just got
> way to much going on right now.
>
> Dale
>
> :-) :-)
>

LOL. Watching this thread made me start a round of backups to my NAS
thingy Dale. ;-)

Mark
RE: Hard drive error from SMART [ In reply to ]
>-----Original Message-----
>From: Wol <antlists@youngman.org.uk>
>Sent: Tuesday, April 12, 2022 2:51 PM
>To: gentoo-user@lists.gentoo.org
>Subject: Re: [gentoo-user] Hard drive error from SMART
>
>On 12/04/2022 20:41, Laurence Perkins wrote:
>> LVM is good for being able to swap out drives easily but with the modern, huge drives you really want data checksums if you can get them. Otherwise all it takes is a flipped bit somewhere to wreck your data and drive firmware doesn't always notice. I think you can do that with LVM, but I've never looked into it for certain.
>
>Look at that link for my system that I posted. I use dm-integrity, so a flipped bit will trigger a failure at the raid-5 level and recover.
>
>For those people looking at btrfs - note that parity-raid (5 or 6) is not a wise idea at the moment so you don't get two-failure protection ...

Specifically if the system crashes or has a power failure there may be some data left hanging until it can complete a scrub. Disk failures during that period may lose some of said data.

How much of a risk that is depends on the stability of your power and kernel and how much data turnover you have. I only use it on systems with UPS power and additional backups. Needs careful monitoring of the drives too since system crashes due to drive failures can leave you in rather a sticky mess.

>
>Cheers,
>Wol
>
>
Re: Hard drive error from SMART [ In reply to ]
> For those people looking at btrfs - note that parity-raid (5 or 6) is not a wise idea at the moment so you don't get two-failure protection ...
>
> Cheers,
> Wol
>
I've been reading that this is less and less true. The write-hole issue is rather old now (first reported around 2016 I think?) From what I read from various sources, the developpers have made some progress and the problem is getting harder and harder to reproduce, for instance, [1].
Although some people recommend using RAID1 for the metadata, and RAID5/6 for the data, just in case.


Julien
[1] https://unixsheikh.com/articles/battle-testing-zfs-btrfs-and-mdadm-dm.html#btrfs-raid-5
Re: Hard drive error from SMART [ In reply to ]
Am Tue, Apr 12, 2022 at 05:03:01PM -0500 schrieb Dale:
> Rich Freeman wrote:
> > On Tue, Apr 12, 2022 at 1:08 PM Dale <rdalek1967@gmail.com> wrote:
> >> I remounted the drives and did a backup. For anyone running up on this,
> >> just in case one of the files got corrupted, I used a little trick to
> >> see if I can figure out which one may be bad if any. I took my rsync
> >> commands from my little script and ran them one at a time with --dry-run
> >> added. If a file was to be updated on the backup that I hadn't changed
> >> or added, I was going to check into it before updating my backups.
> > Unless you're using the --checksum option on rsync this isn't likely
> > to be effective.

> My hope was if it was corrupted and something changed then I'd see it in
> the list.  If nothing changed then rsync wouldn't change anything on the
> backups either.  I'll look into that option tho.  May be something for
> the future.  ;-)  I suspect it would slow things down quite a bit tho. 

The advantage of an integrity scheme (like ZFS or comparing with a checksum
file) over your rsync approach is that you only need to read all the datas™
from one drive instead of two. Plus: if rsync actually detects a change, it
doesn’t know which of the two drives introduced the error. You need to find
out yourself after the fact (which probably won’t be hard, but still, it’s
one more manual step).

--
Grüße | Greetings | Salut | Qapla’
Please do not share anything from, with or about me on any social network.

“An itching nose must be scratched.” … Kosh (Star Wreck)
Re: Hard drive error from SMART [ In reply to ]
Frank Steinmetzger wrote:
> Am Tue, Apr 12, 2022 at 05:03:01PM -0500 schrieb Dale:
>> Rich Freeman wrote:
>>> On Tue, Apr 12, 2022 at 1:08 PM Dale <rdalek1967@gmail.com> wrote:
>>>> I remounted the drives and did a backup. For anyone running up on this,
>>>> just in case one of the files got corrupted, I used a little trick to
>>>> see if I can figure out which one may be bad if any. I took my rsync
>>>> commands from my little script and ran them one at a time with --dry-run
>>>> added. If a file was to be updated on the backup that I hadn't changed
>>>> or added, I was going to check into it before updating my backups.
>>> Unless you're using the --checksum option on rsync this isn't likely
>>> to be effective.
>> My hope was if it was corrupted and something changed then I'd see it in
>> the list.  If nothing changed then rsync wouldn't change anything on the
>> backups either.  I'll look into that option tho.  May be something for
>> the future.  ;-)  I suspect it would slow things down quite a bit tho. 
> The advantage of an integrity scheme (like ZFS or comparing with a checksum
> file) over your rsync approach is that you only need to read all the datas™
> from one drive instead of two. Plus: if rsync actually detects a change, it
> doesn’t know which of the two drives introduced the error. You need to find
> out yourself after the fact (which probably won’t be hard, but still, it’s
> one more manual step).
>


In this case, if something had changed, I'd have no problem manually
checking the file to be sure which was good and which was bad.  Given
the error is recent on my drive, I'd suspect the backups to still be a
good file.  For that reason, I'd suspect the backup file to be good
therefore not to be overwritten.  I was trying to avoid a bad file
replacing a good file on the backup which then destroys all good files
and leaves only bad ones.  This is why I like that SMART at least let me
know there is a problem. 

Sometimes things has to be done manually which is often the best way. 
Just depends on the situation I guess. 

Dale

:-)  :-) 
Re: Hard drive error from SMART [ In reply to ]
Am Tue, Apr 12, 2022 at 06:01:11PM -0500 schrieb Dale:

> > The advantage of an integrity scheme (like ZFS or comparing with a checksum
> > file) over your rsync approach is that you only need to read all the datas™
> > from one drive instead of two. Plus: if rsync actually detects a change, it
> > doesn’t know which of the two drives introduced the error. You need to find
> > out yourself after the fact (which probably won’t be hard, but still, it’s
> > one more manual step).
>
> In this case, if something had changed, I'd have no problem manually
> checking the file to be sure which was good and which was bad.

Consider a big video file, which I know you like to accumulate from youtube
and the likes. How do you find out the broken one? By watching it and trying
to find the one image or audio frame that is garbled? The drive might return
zeros or other garbage (bit flip) instead of actual content without SMART
noticing it (uncorrectable error).

> Given
> the error is recent on my drive, I'd suspect the backups to still be a
> good file.  For that reason, I'd suspect the backup file to be good
> therefore not to be overwritten.  I was trying to avoid a bad file
> replacing a good file on the backup which then destroys all good files
> and leaves only bad ones.  This is why I like that SMART at least let me
> know there is a problem. 

I also tend to rely on smart, but it’s not all-knowing and probably not
infallible.

> Sometimes things has to be done manually which is often the best way. 
> Just depends on the situation I guess. 

--
Grüße | Greetings | Salut | Qapla’
Please do not share anything from, with or about me on any social network.

The only thing still keeping me here is Earth’s gravity.
Re: Hard drive error from SMART [ In reply to ]
Frank Steinmetzger wrote:
> Am Tue, Apr 12, 2022 at 06:01:11PM -0500 schrieb Dale:
>
>>> The advantage of an integrity scheme (like ZFS or comparing with a checksum
>>> file) over your rsync approach is that you only need to read all the datas™
>>> from one drive instead of two. Plus: if rsync actually detects a change, it
>>> doesn’t know which of the two drives introduced the error. You need to find
>>> out yourself after the fact (which probably won’t be hard, but still, it’s
>>> one more manual step).
>> In this case, if something had changed, I'd have no problem manually
>> checking the file to be sure which was good and which was bad.
> Consider a big video file, which I know you like to accumulate from youtube
> and the likes. How do you find out the broken one? By watching it and trying
> to find the one image or audio frame that is garbled? The drive might return
> zeros or other garbage (bit flip) instead of actual content without SMART
> noticing it (uncorrectable error).
>

In this case, I'd likely rename one file and keep them both until I can
figure out which is good.  That said, I'd certainly keep the backup copy
because odds are, it is good since the error came well after my last
backup.  At this point tho, I don't know what file was on that bad spot. 


>> Given
>> the error is recent on my drive, I'd suspect the backups to still be a
>> good file.  For that reason, I'd suspect the backup file to be good
>> therefore not to be overwritten.  I was trying to avoid a bad file
>> replacing a good file on the backup which then destroys all good files
>> and leaves only bad ones.  This is why I like that SMART at least let me
>> know there is a problem. 
> I also tend to rely on smart, but it’s not all-knowing and probably not
> infallible.
>
>


This is very true.  I mentioned elsewhere that things like spindle motor
failure or the motor that moves the heads are usually not detectable. 
Some component failures can be detected but not all or even most from
what I've read.  Basically, the best you can hope for is SMART seeing a
bad spot on the media itself.  That it seems it can detect most of the
time. 

TL;DR next two paragraphs.  Just a interesting story along this line.  I
used to work in parts at a fortune 500 office company.  We had millions
of dollars of just computer stuff in inventory just for computers.  That
was in early 90's.  They also had copiers and their parts, paper etc
etc.  We used a NCR computer for a computer system for the whole
company.  At the end of the building was a speed bump so people wouldn't
go flying down the one lane road between the building and fence on the
property line.  One day a large truck almost empty went a little faster
than normal over the last speed bump.  It shook the building to the
point I could feel it about 150 feet away.  The computer room was like
50 feet away from that side of the building.  It seems the hard drive
felt it very well.  One, maybe more, of the head(s) got under the media
and started peeling it off the platter and made a really ugly screeching
sound.  No routine shutdown, they just pulled the plug.  As you can
imagine tho, it did no good.  Even way back then drives of that speed
were spinning fast enough.  I suspect even by the time a person could
blink it was way past fixing. 

That of course was way before SMART came along but SMART would never be
able to predict such a failure.  Even NCR said it was likely a 1 in a
million chance that the truck hits just when the head was moving over a
weak spot.  Several thousand dollars later, and a private plane bringing
in a new drive, the drive was replaced.  Of course, the idiot in charge
had no backups that were of any use.  All of them were several weeks
old, likely over a month.  Luckily he stayed far away from me for at
least a month.  Otherwise, I'd likely still be in jail, with my hands
around the neck of his corpse.  :-@

SMART isn't a sure thing but it can help in some cases which is better
than nothing at all. 

Dale

:-)  :-) 
Re: Hard drive error from SMART [ In reply to ]
Howdy,

I got the drive and pvmove is doing its thing.  I would like to unplug
one of the drives and physically move them around without shutting down
my system.  Is there a way to tell LVM to disable the drives while I'm
doing this and restart them when done?  I found the command vgchange -a
n<name> but I'm not sure if that is correct.  Honestly, I want to be
really sure before I unplug things.  I assume the "n" changes to "y" to
restart them? 

Thanks.

Dale

:-)  :-) 

P. S.  BTW, the drive has passed two new tests with no error.  The tests
are slower than usual tho.  I'm not sure why tho. 
Re: Hard drive error from SMART [ In reply to ]
On Fri, 15 Apr 2022 11:49:21 -0400,
Dale wrote:
>
> Howdy,
>
> I got the drive and pvmove is doing its thing.? I would like to unplug
> one of the drives and physically move them around without shutting down
> my system.? Is there a way to tell LVM to disable the drives while I'm
> doing this and restart them when done?? I found the command vgchange -a
> n<name> but I'm not sure if that is correct.? Honestly, I want to be
> really sure before I unplug things.? I assume the "n" changes to "y" to
> restart them??
>
> Thanks.
>
> Dale
>
> :-)? :-)?
>
> P. S.? BTW, the drive has passed two new tests with no error.? The tests
> are slower than usual tho.? I'm not sure why tho.?
>

No, you can't do that till the pmove is over.

--
Your life is like a penny. You're going to lose it. The question is:
How do
you spend it?

John Covici wb2una
covici@ccs.covici.com
Re: Hard drive error from SMART [ In reply to ]
John Covici wrote:
> On Fri, 15 Apr 2022 11:49:21 -0400,
> Dale wrote:
>> Howdy,
>>
>> I got the drive and pvmove is doing its thing.  I would like to unplug
>> one of the drives and physically move them around without shutting down
>> my system.  Is there a way to tell LVM to disable the drives while I'm
>> doing this and restart them when done?  I found the command vgchange -a
>> n<name> but I'm not sure if that is correct.  Honestly, I want to be
>> really sure before I unplug things.  I assume the "n" changes to "y" to
>> restart them? 
>>
>> Thanks.
>>
>> Dale
>>
>> :-)  :-) 
>>
>> P. S.  BTW, the drive has passed two new tests with no error.  The tests
>> are slower than usual tho.  I'm not sure why tho. 
>>
> No, you can't do that till the pmove is over.
>


Yea.  I was planning to wait until pvmove was done.  It actually
finished not to long after I sent the message.  It was what prompted me
to see if this is possible.  I found a page that talks about it but the
info didn't explain it much.  I'm pretty sure that is the right command
but given the limited info, I wasn't sure.  Reading the man page helped
a little but still wasn't 100% sure then either.  Thing is, I only have
to unplug and move one of the two drives on that group. 

Sounds like the right command tho.  If not, someone speak up. 

Dale

:-)  :-) 
Re: Hard drive error from SMART [ In reply to ]
Dale wrote:
> John Covici wrote:
>> On Fri, 15 Apr 2022 11:49:21 -0400,
>> Dale wrote:
>>> Howdy,
>>>
>>> I got the drive and pvmove is doing its thing.  I would like to unplug
>>> one of the drives and physically move them around without shutting down
>>> my system.  Is there a way to tell LVM to disable the drives while I'm
>>> doing this and restart them when done?  I found the command vgchange -a
>>> n<name> but I'm not sure if that is correct.  Honestly, I want to be
>>> really sure before I unplug things.  I assume the "n" changes to "y" to
>>> restart them? 
>>>
>>> Thanks.
>>>
>>> Dale
>>>
>>> :-)  :-) 
>>>
>>> P. S.  BTW, the drive has passed two new tests with no error.  The tests
>>> are slower than usual tho.  I'm not sure why tho. 
>>>
>> No, you can't do that till the pmove is over.
>>
>
> Yea.  I was planning to wait until pvmove was done.  It actually
> finished not to long after I sent the message.  It was what prompted me
> to see if this is possible.  I found a page that talks about it but the
> info didn't explain it much.  I'm pretty sure that is the right command
> but given the limited info, I wasn't sure.  Reading the man page helped
> a little but still wasn't 100% sure then either.  Thing is, I only have
> to unplug and move one of the two drives on that group. 
>
> Sounds like the right command tho.  If not, someone speak up. 
>
> Dale
>
> :-)  :-) 
>


For anyone searching and running up on this thread.  That command did
work to disable the drive.  I'm not sure if I should have used pvchange
to disable /dev/sdk1 or not.  The problem I did run into was getting it
back.  I ran the command to enable it but it didn't work as expected.  I
had files missing.  So, I unmounted it, ran pvscan, vgscan and lvscan in
that order.  I then ran the command above again to be sure and remounted
the LV group.  It worked that time. All files were there.  So, either
one has to rescan them or I should have also ran pvchange to disable as
well.  Maybe someone else can expand on this.

While I'm at it.  Is there a way to reset the sdk part?  The old was sdd
and I was hoping when I moved the drive, it would change with it.  The
reason is, usually when I hook up my external drives, they use sdk.  I'm
sort of set up for that.  A couple other things use sdk as well.  I'm
not sure if there is a easy way to do that or not.  Wonder if it will
reset when I reboot???

Dale

:-)  :-) 
Re: Hard drive error from SMART [ In reply to ]
Am Fri, Apr 15, 2022 at 10:49:21AM -0500 schrieb Dale:
> Howdy,
>
> I got the drive and pvmove is doing its thing.  I would like to unplug
> one of the drives and physically move them around without shutting down
> my system.  Is there a way to tell LVM to disable the drives while I'm
> doing this and restart them when done?

Be aware that SATA hot-plugging must be enabled in the BIOS for each
individual SATA port (at least that’s the case on my board). I’m not sure
what a difference it actually makes, though.

--
Grüße | Greetings | Salut | Qapla’
Please do not share anything from, with or about me on any social network.

Be regular. Eat cron flakes.
Re: Hard drive error from SMART [ In reply to ]
Frank Steinmetzger wrote:
> Am Fri, Apr 15, 2022 at 10:49:21AM -0500 schrieb Dale:
>> Howdy,
>>
>> I got the drive and pvmove is doing its thing.  I would like to unplug
>> one of the drives and physically move them around without shutting down
>> my system.  Is there a way to tell LVM to disable the drives while I'm
>> doing this and restart them when done?
> Be aware that SATA hot-plugging must be enabled in the BIOS for each
> individual SATA port (at least that’s the case on my board). I’m not sure
> what a difference it actually makes, though.
>


I enabled that the first time I cut the system on after building it.  I
couldn't think of any reason not to have it enabled really.  It would be
like making USB require rebooting before plugging/unplugging something. 
Certainly better than the old IDE days. 

I have googled and can not find a way to reset udev and it naming
drives.  I may have to rework some things since the drive kept the sdk
instead of switching to sdd when I made the physical change.  Thing is,
I suspect it will when I reboot the next time.  It also triggered
messages from SMART too.  It got upset that it couldn't find sdd anymore. 

Dale

:-)  :-) 
Re: Hard drive error from SMART [ In reply to ]
On Sat, Apr 16, 2022 at 10:59 AM Dale <rdalek1967@gmail.com> wrote:
>
> I have googled and can not find a way to reset udev and it naming
> drives. I may have to rework some things since the drive kept the sdk
> instead of switching to sdd when I made the physical change. Thing is,
> I suspect it will when I reboot the next time.

IMO it is best to make that not matter. If you're referencing drives
by letter in configuration files, you're just asking for some change
to re-order things and cause problems.

You're using LVM, so all the drives should be assembled based on their
embedded metadata. It is fine to reference whatever temporary device
name you're using when running pvmove/pvcreate since that doesn't
really get stored anywhere. If you are directly mounting anything
without using LVM then it is best to use labels/uuids/etc to identify
partitions.

> It also triggered
> messages from SMART too. It got upset that it couldn't find sdd anymore.

That is typical when hotswapping. I believe smartd only scans drives
at startup, and of course if a drive does go offline it isn't a bad
thing that it is noisy about it. From a quick read of the manpage
SIGHUP might or might not get it to rescan the drives, and if not you
can just restart it. The daemon works by polling so if there are any
pending issues they should still get picked up after restarting the
daemon.

--
Rich
Re: Hard drive error from SMART [ In reply to ]
On Saturday, 16 April 2022 15:59:25 BST Dale wrote:
> Frank Steinmetzger wrote:
> > Am Fri, Apr 15, 2022 at 10:49:21AM -0500 schrieb Dale:
> >> Howdy,
> >>
> >> I got the drive and pvmove is doing its thing. I would like to unplug
> >> one of the drives and physically move them around without shutting down
> >> my system. Is there a way to tell LVM to disable the drives while I'm
> >> doing this and restart them when done?
> >
> > Be aware that SATA hot-plugging must be enabled in the BIOS for each
> > individual SATA port (at least that’s the case on my board). I’m not sure
> > what a difference it actually makes, though.
>
> I enabled that the first time I cut the system on after building it. I
> couldn't think of any reason not to have it enabled really. It would be
> like making USB require rebooting before plugging/unplugging something.
> Certainly better than the old IDE days.
>
> I have googled and can not find a way to reset udev and it naming
> drives. I may have to rework some things since the drive kept the sdk
> instead of switching to sdd when I made the physical change. Thing is,
> I suspect it will when I reboot the next time. It also triggered
> messages from SMART too. It got upset that it couldn't find sdd anymore.
>
> Dale
>
> :-) :-)

Have a look at this post. It explains why you could end up with a race
condition if you set up udev rules to name disks in different order than what
the kernel assigns:

https://www.linuxquestions.org/questions/linux-hardware-18/udev-persistent-disk-name-4175450519/#post4893847
Re: Hard drive error from SMART [ In reply to ]
Rich Freeman wrote:
> On Sat, Apr 16, 2022 at 10:59 AM Dale <rdalek1967@gmail.com> wrote:
>> I have googled and can not find a way to reset udev and it naming
>> drives. I may have to rework some things since the drive kept the sdk
>> instead of switching to sdd when I made the physical change. Thing is,
>> I suspect it will when I reboot the next time.
> IMO it is best to make that not matter. If you're referencing drives
> by letter in configuration files, you're just asking for some change
> to re-order things and cause problems.
>
> You're using LVM, so all the drives should be assembled based on their
> embedded metadata. It is fine to reference whatever temporary device
> name you're using when running pvmove/pvcreate since that doesn't
> really get stored anywhere. If you are directly mounting anything
> without using LVM then it is best to use labels/uuids/etc to identify
> partitions.

I have to use sd** when using cryptsetup to decrypt the drive.  I
haven't found a way around that that is easier yet.  My command was
something like cryptsetup open /dev/sdk1 <name> and then it asks for the
password.  After that, I use UUID and a entry in fstab to mount.  If
there is a easier way, I'm open to it.  I have three external drives and
as long as I only power them up one at a time, they all used sdk.  Now
they use sdd and I keep trying to type in sdk, from habit.  :/

My next project, find a good external drive enclosure like the three I
got now.  They no longer available tho.  I like them because they have a
fan, a eSATA port and a nifty display to let me know things are
working.  Really a good price for the features.  I don't like USB
connected drives.  Long story.


>> It also triggered
>> messages from SMART too. It got upset that it couldn't find sdd anymore.
> That is typical when hotswapping. I believe smartd only scans drives
> at startup, and of course if a drive does go offline it isn't a bad
> thing that it is noisy about it. From a quick read of the manpage
> SIGHUP might or might not get it to rescan the drives, and if not you
> can just restart it. The daemon works by polling so if there are any
> pending issues they should still get picked up after restarting the
> daemon.
>


Yea, it is a good thing.  I just disabled it for sdd, enabled for the
new sdk and restarted the service.  It was happy then but getting a
email from SMART always makes my heart beat a few extra beats and
sometimes causes me to swallow big too.  It's rarely good news.  Maybe
the next reboot will sort things out.  Then I get to switch everything
back to the old way again.  :/

Dale

:-)  :-) 
Re: Hard drive error from SMART [ In reply to ]
Michael wrote:
> On Saturday, 16 April 2022 15:59:25 BST Dale wrote:
>> Frank Steinmetzger wrote:
>>> Am Fri, Apr 15, 2022 at 10:49:21AM -0500 schrieb Dale:
>>>> Howdy,
>>>>
>>>> I got the drive and pvmove is doing its thing. I would like to unplug
>>>> one of the drives and physically move them around without shutting down
>>>> my system. Is there a way to tell LVM to disable the drives while I'm
>>>> doing this and restart them when done?
>>> Be aware that SATA hot-plugging must be enabled in the BIOS for each
>>> individual SATA port (at least that’s the case on my board). I’m not sure
>>> what a difference it actually makes, though.
>> I enabled that the first time I cut the system on after building it. I
>> couldn't think of any reason not to have it enabled really. It would be
>> like making USB require rebooting before plugging/unplugging something.
>> Certainly better than the old IDE days.
>>
>> I have googled and can not find a way to reset udev and it naming
>> drives. I may have to rework some things since the drive kept the sdk
>> instead of switching to sdd when I made the physical change. Thing is,
>> I suspect it will when I reboot the next time. It also triggered
>> messages from SMART too. It got upset that it couldn't find sdd anymore.
>>
>> Dale
>>
>> :-) :-)
> Have a look at this post. It explains why you could end up with a race
> condition if you set up udev rules to name disks in different order than what
> the kernel assigns:
>
> https://www.linuxquestions.org/questions/linux-hardware-18/udev-persistent-disk-name-4175450519/#post4893847

I think I've read about that before.  Gonna read it in a minute.  What
I'd like is a way to reset it back to like it would be with a fresh
install for example.  I figure there is a config file somewhere that
stores this sort of thing but no clue where it is tho. 

Oh well.  Maybe one day.  ;-)

Dale

:-)  :-) 
Re: Hard drive error from SMART [ In reply to ]
On Sat, 16 Apr 2022 12:45:20 -0500, Dale wrote:

> > You're using LVM, so all the drives should be assembled based on their
> > embedded metadata. It is fine to reference whatever temporary device
> > name you're using when running pvmove/pvcreate since that doesn't
> > really get stored anywhere. If you are directly mounting anything
> > without using LVM then it is best to use labels/uuids/etc to identify
> > partitions.
>
> I have to use sd** when using cryptsetup to decrypt the drive.  I
> haven't found a way around that that is easier yet.  My command was
> something like cryptsetup open /dev/sdk1 <name> and then it asks for the
> password.

Use /dev/disks/by/partlabel/foo or /dev/disks/by-partuuid/bar.


--
Neil Bothwick

If Yoda so strong in force is, why words in right order he cannot put?
Re: Hard drive error from SMART [ In reply to ]
Am Sat, Apr 16, 2022 at 12:45:20PM -0500 schrieb Dale:

> My next project, find a good external drive enclosure like the three I
> got now.  They no longer available tho.  I like them because they have a
> fan, a eSATA port and a nifty display to let me know things are
> working.  Really a good price for the features.  I don't like USB
> connected drives.  Long story.

How about a table-top dock?
- no cable salad, caused by each enclosure having its own power supply and
data cable
- disks are used “naked”, so no heat buildup and you are more flexible

Here are some models with eSATA:
https://skinflint.co.uk/?cat=hddocks&xf=4426_eSATA
And one of them even has four slots ? even fewer cables.

That’s of course if you use the disks intermittently and store them away
inbetween. If you plan on running them for longer durations at a time, it
may be better to use a proper enclosure, in order to protect the disks from
physical influences (impacts, short-circuits). Also, those SATA connectors
are not designed to be connected often. I think I read about 50 cycles
somewhere.

--
Grüße | Greetings | Salut | Qapla’
Please do not share anything from, with or about me on any social network.

The knowing don’t talk much, the talking don’t know much.
Re: Hard drive error from SMART [ In reply to ]
On Sat, Apr 16, 2022 at 19:58:22 +0100, Neil Bothwick wrote:

> --
> Neil Bothwick

> If Yoda so strong in force is, why words in right order he cannot put?

Vielleicht, weil seine Muttersprache Deutsch ist. :-)

--
Alan Mackenzie (Nuremberg, Germany).
Re: Hard drive error from SMART [ In reply to ]
On Sat, 16 Apr 2022 20:09:23 +0000, Alan Mackenzie wrote:

> > If Yoda so strong in force is, why words in right order he cannot
> > put?
>
> Vielleicht, weil seine Muttersprache Deutsch ist. :-)

RLFO


--
Neil Bothwick

SITCOM: Single Income, Two Children, Oppressive Mortgage
Re: Hard drive error from SMART [ In reply to ]
Neil Bothwick wrote:
> On Sat, 16 Apr 2022 12:45:20 -0500, Dale wrote:
>
>>> You're using LVM, so all the drives should be assembled based on their
>>> embedded metadata. It is fine to reference whatever temporary device
>>> name you're using when running pvmove/pvcreate since that doesn't
>>> really get stored anywhere. If you are directly mounting anything
>>> without using LVM then it is best to use labels/uuids/etc to identify
>>> partitions.
>> I have to use sd** when using cryptsetup to decrypt the drive.  I
>> haven't found a way around that that is easier yet.  My command was
>> something like cryptsetup open /dev/sdk1 <name> and then it asks for the
>> password.
> Use /dev/disks/by/partlabel/foo or /dev/disks/by-partuuid/bar.
>
>

That's even more typing than /dev/sdk.  Some things I do easily by using
tab completion and all.  When mounting, I let fstab remember the UUID
for it.  Very little typing and don't have to remember things.  ;-) 
It's not like UUIDs are made to remember either.  :-[  I think I put a
label on the drive but things are a bit different when using
cryptsetup.  At least I think they are.  The easiest thing, just having
the replacement drive as sdd again and me having sdk as my external
drive.  I still think a reboot is going to correct this.  I can't
imagine it not given how the drives are plugged in.  I just wish there
was a easy solution in the meantime.  To be honest, I've had several
times where this would come in handy.  This is just yet another one.

Your way would be consistent tho.  If I could script this, it would be
the best way to do it.  Script it once, done.  Of course, we know my
scripting skills are minimal at best.  If you could say I even have
scripting skills.  lol

Dale

:-)  :-) 
Re: Hard drive error from SMART [ In reply to ]
Frank Steinmetzger wrote:
> Am Sat, Apr 16, 2022 at 12:45:20PM -0500 schrieb Dale:
>
>> My next project, find a good external drive enclosure like the three I
>> got now.  They no longer available tho.  I like them because they have a
>> fan, a eSATA port and a nifty display to let me know things are
>> working.  Really a good price for the features.  I don't like USB
>> connected drives.  Long story.
> How about a table-top dock?
> - no cable salad, caused by each enclosure having its own power supply and
> data cable
> - disks are used “naked”, so no heat buildup and you are more flexible
>
> Here are some models with eSATA:
> https://skinflint.co.uk/?cat=hddocks&xf=4426_eSATA
> And one of them even has four slots ? even fewer cables.
>
> That’s of course if you use the disks intermittently and store them away
> inbetween. If you plan on running them for longer durations at a time, it
> may be better to use a proper enclosure, in order to protect the disks from
> physical influences (impacts, short-circuits). Also, those SATA connectors
> are not designed to be connected often. I think I read about 50 cycles
> somewhere.
>

I've looked into those.  They do have advantages for sure.  One, the
bare drives take up less room in my fire safe.  Lots smaller than the
enclosures I have now.  My concern has always been the
plugging/unplugging a lot and dust when not in use.  I didn't know how
long those connectors are supposed to last but the bad thing is, when it
goes, the drive is gone to, plus the data.   I do like that it is in
open air which takes care of cooling pretty well.  I do my backups once
a week so it isn't as often as some situations but it isn't rare either. 

I've found a enclosure since my post but got to wait until next income
boost to get one.  May buy a few of them if I can.  I think the ones I
found have fans but no display but that's OK.  I really like having the
fan more than the display.  It likely doesn't help with huge airflow but
it gives it some airflow. 

I'm running pretty short on space in my case.  I have a Cooler Master
HAF-932 case.  I'm out of 3.5" spots.  I need to get some 5 1/4" to 3.5"
adapters.  I got some plastic thingys but they don't work in my case. 
It has that push button thingy and the plastic adapter is to loose for
my comfort.  Plus, it has little cooling too.  The 3.5" bays have that
big fan blowing on them. Working on a plan.  Maybe this is a good excuse
to start working on a NAS.  :/

Dale

:-)  :-) 
Re: Hard drive error from SMART [ In reply to ]
On Sat, Apr 16, 2022 at 3:53 PM Dale <rdalek1967@gmail.com> wrote:
<SNIP>
Maybe this is a good excuse
> to start working on a NAS. :/

That's my vote. (For the second time)

I'm using a FreeBSD Nas (TrueNAS) but they recently came out with a
Linux version which you might be more comfortable with. If you use a
1Gb/S or higher network connection it's quite fast.

You can also go the Synology route via Amazon. You can get a 2-disk
NAS chassis which does RAID for around $250 last time I looked.

Good luck whatever you do.

Mark
Re: Hard drive error from SMART [ In reply to ]
Mark Knecht wrote:
> On Sat, Apr 16, 2022 at 3:53 PM Dale <rdalek1967@gmail.com> wrote:
> <SNIP>
> Maybe this is a good excuse
>> to start working on a NAS. :/
> That's my vote. (For the second time)
>
> I'm using a FreeBSD Nas (TrueNAS) but they recently came out with a
> Linux version which you might be more comfortable with. If you use a
> 1Gb/S or higher network connection it's quite fast.
>
> You can also go the Synology route via Amazon. You can get a 2-disk
> NAS chassis which does RAID for around $250 last time I looked.
>
> Good luck whatever you do.
>
> Mark

Other than being another piece of equipment running up a light bill, it
is the best way to deal with this.  The way I'm doing now is a bit of a
struggle at times.  I just need to get other things done first, from a
money perspective which inflation isn't helping on.  A trip to the
grocery story is no fun anymore. 

One of these days tho.  I just gotta do it.

Dale

:-)  :-) 
Re: Hard drive error from SMART [ In reply to ]
On Sat, Apr 16, 2022 at 6:39 PM Dale <rdalek1967@gmail.com> wrote:
>
> Neil Bothwick wrote:
> > Use /dev/disks/by/partlabel/foo or /dev/disks/by-partuuid/bar.
> >
>
> That's even more typing than /dev/sdk. Some things I do easily by using
> tab completion and all. When mounting, I let fstab remember the UUID
> for it.

That's what copy/paste is for. How often are you editing your
crypttab anyway? This way when you move drives around they still
work.

> It's not like UUIDs are made to remember either.

blkid is your friend.

This is for config files, not random mounting/unmounting. I use the
dynamic device nodes all the time if I'm just plugging a drive in and
looking at it. However, if I'm going to put it in a config file I use
a persistent ID so that I'm not running into breakage anytime things
change.

When I'm setting it up it is just a few extra seconds to look up the
UUID and copy/paste it. When the system randomly breaks I have to go
digging through logs and config files to figure out what went wrong.
It pays for me to spend a little more time on getting my config right
when everything is fresh in my head, because when I'm troubleshooting
it will take a little while just to figure out what I did when I set
it up.

Here is an example of one of my cryptsetup files:
cd1 UUID="1cbd5860-3469-41f7-8658-acd83d1957a0" /cd1.key

(This is using a random key stored in a file, which works for this
particular situation. Obviously the drive is only as secure as that
file.)

The corresponding drive blkid output is:
/dev/sdb1: UUID="1cbd5860-3469-41f7-8658-acd83d1957a0"
TYPE="crypto_LUKS" PARTUUID="a4a383a8-24c2-f74b-94d8-ca4ffc366327"

Oh, and look at that - the first drive I set up on this system is
actually the second drive that got assigned a device name. It was
probably /dev/sda1 when I first set it up, and I added another drive
since then.

The contained drive shows up as:
/dev/mapper/cd1: UUID="a2721813-4d10-4f69-ab2a-4beb0d6e95d7" TYPE="ext4"

(No LVM here - this is storage for a distributed filesystem so the
volume management is effectively above the filesystem level. I can
add other drives to the cluster and they're in the pool, and if I want
to move data off this drive I can just edit a config file and the data
will be moved while online. The encryption is mainly so that if a
drive fails I don't have to worry about anybody recovering data from
it.)

--
Rich
Re: Hard drive error from SMART [ In reply to ]
Rich Freeman wrote:
> On Sat, Apr 16, 2022 at 6:39 PM Dale <rdalek1967@gmail.com> wrote:
>> Neil Bothwick wrote:
>>> Use /dev/disks/by/partlabel/foo or /dev/disks/by-partuuid/bar.
>>>
>> That's even more typing than /dev/sdk. Some things I do easily by using
>> tab completion and all. When mounting, I let fstab remember the UUID
>> for it.
> That's what copy/paste is for. How often are you editing your
> crypttab anyway? This way when you move drives around they still
> work.

What is crypttab?  I type in the command manually.  It's what the howtos
showed.  I can't find a crypttab file.  This may make things easier.  My
usual names are 8tb, 6tb and pri, short for private.  Ran out of other
names. ROFL 


>
>> It's not like UUIDs are made to remember either.
> blkid is your friend.
>
> This is for config files, not random mounting/unmounting. I use the
> dynamic device nodes all the time if I'm just plugging a drive in and
> looking at it. However, if I'm going to put it in a config file I use
> a persistent ID so that I'm not running into breakage anytime things
> change.
>
> When I'm setting it up it is just a few extra seconds to look up the
> UUID and copy/paste it. When the system randomly breaks I have to go
> digging through logs and config files to figure out what went wrong.
> It pays for me to spend a little more time on getting my config right
> when everything is fresh in my head, because when I'm troubleshooting
> it will take a little while just to figure out what I did when I set
> it up.
>
> Here is an example of one of my cryptsetup files:
> cd1 UUID="1cbd5860-3469-41f7-8658-acd83d1957a0" /cd1.key
>
> (This is using a random key stored in a file, which works for this
> particular situation. Obviously the drive is only as secure as that
> file.)
>
> The corresponding drive blkid output is:
> /dev/sdb1: UUID="1cbd5860-3469-41f7-8658-acd83d1957a0"
> TYPE="crypto_LUKS" PARTUUID="a4a383a8-24c2-f74b-94d8-ca4ffc366327"
>
> Oh, and look at that - the first drive I set up on this system is
> actually the second drive that got assigned a device name. It was
> probably /dev/sda1 when I first set it up, and I added another drive
> since then.
>
> The contained drive shows up as:
> /dev/mapper/cd1: UUID="a2721813-4d10-4f69-ab2a-4beb0d6e95d7" TYPE="ext4"
>
> (No LVM here - this is storage for a distributed filesystem so the
> volume management is effectively above the filesystem level. I can
> add other drives to the cluster and they're in the pool, and if I want
> to move data off this drive I can just edit a config file and the data
> will be moved while online. The encryption is mainly so that if a
> drive fails I don't have to worry about anybody recovering data from
> it.)
>


I use passwords here.  I just type in sdk1 and it worked before this
drive move.  I never tried to go any further than the howtos I found
about using cryptsetup.  No clue on the file.  I don't see one here and
don't recall reading about it either.  Gonna google on that a bit. 

Interesting. 

Dale

:-)  :-) 
Re: Hard drive error from SMART [ In reply to ]
On Sat, 16 Apr 2022 23:44:58 -0500, Dale wrote:

> >> That's even more typing than /dev/sdk. Some things I do easily by
> >> using tab completion and all. When mounting, I let fstab remember
> >> the UUID for it.
> > That's what copy/paste is for. How often are you editing your
> > crypttab anyway? This way when you move drives around they still
> > work.
>
> What is crypttab?  I type in the command manually.

Then use a shell alias, even less typing.


--
Neil Bothwick

Ninety-Ninety Rule Of Project Schedules - The first ninety percent of
the task takes ninety percent of the time, and the last ten percent
takes the other ninety percent of the time.
Re: Hard drive error from SMART [ In reply to ]
On Sat, Apr 16, 2022 at 6:06 PM Dale <rdalek1967@gmail.com> wrote:
>
> Mark Knecht wrote:
> > On Sat, Apr 16, 2022 at 3:53 PM Dale <rdalek1967@gmail.com> wrote:
> > <SNIP>
> > Maybe this is a good excuse
> >> to start working on a NAS. :/
> > That's my vote. (For the second time)
> >
> > I'm using a FreeBSD Nas (TrueNAS) but they recently came out with a
> > Linux version which you might be more comfortable with. If you use a
> > 1Gb/S or higher network connection it's quite fast.
> >
> > You can also go the Synology route via Amazon. You can get a 2-disk
> > NAS chassis which does RAID for around $250 last time I looked.
> >
> > Good luck whatever you do.
> >
> > Mark
>
> Other than being another piece of equipment running up a light bill, it
> is the best way to deal with this. The way I'm doing now is a bit of a
> struggle at times. I just need to get other things done first, from a
> money perspective which inflation isn't helping on. A trip to the
> grocery story is no fun anymore.
>
> One of these days tho. I just gotta do it.
>
> Dale

I hear you about groceries and inflation. Wol pushed me to build my
first one just using an old computer. I had an old machine - case,
power supply with a bad motherboard so I purchased an i3-2120 CPU @
3.30GHz motherboard with 8GB memory used at a computer store for $40.
Surprisingly that's more than enough CPU & memory for basic backups.
No matter what you're going to have to pay for the drives whether they
go in your box, in external cases or in a backup machine.

I only turn it on to do backups or to retrieve data so not much electricity.
Re: Hard drive error from SMART [ In reply to ]
Mark Knecht wrote:
> On Sat, Apr 16, 2022 at 6:06 PM Dale <rdalek1967@gmail.com> wrote:
>> Mark Knecht wrote:
>>> On Sat, Apr 16, 2022 at 3:53 PM Dale <rdalek1967@gmail.com> wrote:
>>> <SNIP>
>>> Maybe this is a good excuse
>>>> to start working on a NAS. :/
>>> That's my vote. (For the second time)
>>>
>>> I'm using a FreeBSD Nas (TrueNAS) but they recently came out with a
>>> Linux version which you might be more comfortable with. If you use a
>>> 1Gb/S or higher network connection it's quite fast.
>>>
>>> You can also go the Synology route via Amazon. You can get a 2-disk
>>> NAS chassis which does RAID for around $250 last time I looked.
>>>
>>> Good luck whatever you do.
>>>
>>> Mark
>> Other than being another piece of equipment running up a light bill, it
>> is the best way to deal with this. The way I'm doing now is a bit of a
>> struggle at times. I just need to get other things done first, from a
>> money perspective which inflation isn't helping on. A trip to the
>> grocery story is no fun anymore.
>>
>> One of these days tho. I just gotta do it.
>>
>> Dale
> I hear you about groceries and inflation. Wol pushed me to build my
> first one just using an old computer. I had an old machine - case,
> power supply with a bad motherboard so I purchased an i3-2120 CPU @
> 3.30GHz motherboard with 8GB memory used at a computer store for $40.
> Surprisingly that's more than enough CPU & memory for basic backups.
> No matter what you're going to have to pay for the drives whether they
> go in your box, in external cases or in a backup machine.
>
> I only turn it on to do backups or to retrieve data so not much electricity.
>
>


I was wanting to have a NAS that also puts video on my TV.  That way I
can turn off my puter and still watch TV.  It would be as much a media
system as a NAS.  I have a mobo, ram and I think I have a extra video
card somewhere.  I'd need a case, power supply and such.  I'd also need
a place to put all this which is going to be interesting.  I'd want
plenty of hard drive bays tho.  I found a fractal 804 case that caught
my eye.  Can't recall all the details tho. 

Still, needs money and right now, I got to many other coals in the
fire.  Plus, I'm trying to figure out this crypttab thing.  From what
I've read, it is for opening encrypted drives during boot up which is
not really what I want.  I can boot and login into my KDE without
anything encrypted being mounted.  Kinda like this new setup really. 

I'll be so glad when fiber internet gets here.  I think I'm going with
the 500Mb/sec plan.  Costs about the same as my current 1.5Mb/sec plan. 
lol 

Dale 

:-)  :-) 
Re: Hard drive error from SMART [ In reply to ]
On Sun, Apr 17, 2022 at 10:22 AM Dale <rdalek1967@gmail.com> wrote:
<SNIP>
>
> I was wanting to have a NAS that also puts video on my TV. That way I
> can turn off my puter and still watch TV. It would be as much a media
> system as a NAS. I have a mobo, ram and I think I have a extra video
> card somewhere. I'd need a case, power supply and such. I'd also need
> a place to put all this which is going to be interesting. I'd want
> plenty of hard drive bays tho. I found a fractal 804 case that caught
> my eye. Can't recall all the details tho.
>
> Still, needs money and right now, I got to many other coals in the
> fire. Plus, I'm trying to figure out this crypttab thing. From what
> I've read, it is for opening encrypted drives during boot up which is
> not really what I want. I can boot and login into my KDE without
> anything encrypted being mounted. Kinda like this new setup really.
>
> I'll be so glad when fiber internet gets here. I think I'm going with
> the 500Mb/sec plan. Costs about the same as my current 1.5Mb/sec plan.
> lol
>
> Dale

I believe all of that can be done on TrueNAS, and most likely with
any of the prepackaged boxes like Synology, but I've not do it myself.

Most modern flatscreens can access NAS servers and play video
and or music over the network so the NAS server itself
need not have a GPU. I did put a VGA in both of mine as building
them is easier, but it wasn't strictly necessary. TrueNAS can be
built on a headless machine if you know the IP address.

As for FreeBSD, they have 'jails' which I think are more or less
chroot environments, so you can put whatever MythTV is called
these days in a jail and run it from there. People do that with
DNS, network monitors and all sorts of things. (Assuming
you have enough compute power.)

No need to do any of this now. It's good that you're thinking
about solutions so that when the money comes along you'll
be ready.

Cheers,
Mark
Re: Hard drive error from SMART [ In reply to ]
Mark Knecht wrote:
> On Sun, Apr 17, 2022 at 10:22 AM Dale <rdalek1967@gmail.com> wrote:
> <SNIP>
>> I was wanting to have a NAS that also puts video on my TV. That way I
>> can turn off my puter and still watch TV. It would be as much a media
>> system as a NAS. I have a mobo, ram and I think I have a extra video
>> card somewhere. I'd need a case, power supply and such. I'd also need
>> a place to put all this which is going to be interesting. I'd want
>> plenty of hard drive bays tho. I found a fractal 804 case that caught
>> my eye. Can't recall all the details tho.
>>
>> Still, needs money and right now, I got to many other coals in the
>> fire. Plus, I'm trying to figure out this crypttab thing. From what
>> I've read, it is for opening encrypted drives during boot up which is
>> not really what I want. I can boot and login into my KDE without
>> anything encrypted being mounted. Kinda like this new setup really.
>>
>> I'll be so glad when fiber internet gets here. I think I'm going with
>> the 500Mb/sec plan. Costs about the same as my current 1.5Mb/sec plan.
>> lol
>>
>> Dale
> I believe all of that can be done on TrueNAS, and most likely with
> any of the prepackaged boxes like Synology, but I've not do it myself.
>
> Most modern flatscreens can access NAS servers and play video
> and or music over the network so the NAS server itself
> need not have a GPU. I did put a VGA in both of mine as building
> them is easier, but it wasn't strictly necessary. TrueNAS can be
> built on a headless machine if you know the IP address.
>
> As for FreeBSD, they have 'jails' which I think are more or less
> chroot environments, so you can put whatever MythTV is called
> these days in a jail and run it from there. People do that with
> DNS, network monitors and all sorts of things. (Assuming
> you have enough compute power.)
>
> No need to do any of this now. It's good that you're thinking
> about solutions so that when the money comes along you'll
> be ready.
>
> Cheers,
> Mark
>
>


When I bought my current TV, I avoided the smart ones.  At the time, it
was new technology and people were talking about how buggy it was so I
bought a regular TV.  If I had to buy one today, I'd buy a smart one. 
They seem to work pretty well now.  Nice and stable at least.  Still, I
check to make sure whatever I buy is based on Linux as its OS.  One can
usually check the manual and see the copyright notice in the last few
pages.  It mentions the kernel.  If it mentions windoze, I move on.  LQ
is almost always Linux based.

I'm at the point where I know I need to do this.  It's just getting
there.  I even thought about putting the OS on a USB stick.  After all,
once booted, it won't access the stick very often.  I could even load it
into memory at boot up and it not even need the stick at all once
booted.  Like is done with some Gentoo install media. 

One of these days.

Dale

:-)  :-) 

P. S.  New drive seems to be working fine.  Now to figure out what to do
with old one.  :-D
Re: Hard drive error from SMART [ In reply to ]
Rich Freeman wrote:
> On Sat, Apr 16, 2022 at 6:39 PM Dale <rdalek1967@gmail.com> wrote:
>> Neil Bothwick wrote:
>>> Use /dev/disks/by/partlabel/foo or /dev/disks/by-partuuid/bar.
>>>
>> That's even more typing than /dev/sdk. Some things I do easily by using
>> tab completion and all. When mounting, I let fstab remember the UUID
>> for it.
> That's what copy/paste is for. How often are you editing your
> crypttab anyway? This way when you move drives around they still
> work.
>

I did a google search for crypttab.  After reading what its purpose is,
I see why I don't have one.  It seems it is more for decrypting and
mounting things during bootup.  I don't need to mount encrypted data to
boot up or even log into KDE.  I just need it to access data when
needed.  Most of the encrypted data that I access often is actually my
external drives.  When I leave home, I close the encrypted data.  When I
get home, I open it and remount it.  If I need it for something. 

One day I may encrypt my /home directory.  Maybe.  I don't really see
the need since any data I want protected can just be put on the
encrypted part I have now. Anyway, I suspect when I reboot, this is will
be back to the old way.  I thought I was going to have a opportunity to
do that last night.  My lights went off for a few seconds.  UPS kicked
in and they came back on.  It's not over yet tho.  ;-)

Or am I missing something?

Dale

:-)  :-) 
Re: Hard drive error from SMART [ In reply to ]
On Sun, Apr 17, 2022 at 11:04 AM Dale <rdalek1967@gmail.com> wrote:
<SNIP>
>
> When I bought my current TV, I avoided the smart ones. At the time, it
> was new technology and people were talking about how buggy it was so I
> bought a regular TV. If I had to buy one today, I'd buy a smart one.
> They seem to work pretty well now. Nice and stable at least. Still, I
> check to make sure whatever I buy is based on Linux as its OS. One can
> usually check the manual and see the copyright notice in the last few
> pages. It mentions the kernel. If it mentions windoze, I move on. LQ
> is almost always Linux based.
>
> I'm at the point where I know I need to do this. It's just getting
> there. I even thought about putting the OS on a USB stick. After all,
> once booted, it won't access the stick very often. I could even load it
> into memory at boot up and it not even need the stick at all once
> booted. Like is done with some Gentoo install media.
>
> One of these days.

Fair enough.

You might also investigate whether a newer Roku/AppleTV type
machine will access a network share. I suspect they will.

TrueNAS will run from a USB stick. You'll need two - one for
the setup media and a second to install it to, but after that you
only need storage drives to hold your backups or media.

I think a NAS for backups and media playback makes sense.
You want the machine on most of the time, but if you shut it
down it won't generally stop you from using your main computer.

On the other hand, with NVMe drives in my new machine I
have no spinning media so I use the NAS as a network store
much as you envision watching movies on your TV, but for me
it's mostly astrophotography data.

Have fun. Happy Easter if you celebrate it. Happy Sunday if
you don't.

Cheers,
Mark
Re: Hard drive error from SMART [ In reply to ]
Neil Bothwick wrote:
> On Sat, 16 Apr 2022 23:44:58 -0500, Dale wrote:
>
>>>> That's even more typing than /dev/sdk. Some things I do easily by
>>>> using tab completion and all. When mounting, I let fstab remember
>>>> the UUID for it.
>>> That's what copy/paste is for. How often are you editing your
>>> crypttab anyway? This way when you move drives around they still
>>> work.
>> What is crypttab?  I type in the command manually.
> Then use a shell alias, even less typing.
>
>


I've done a couple basic alias things here but never grasped it enough
to do anything beyond making ls run with -al each time.  I think there
is another one I did but it was long ago.  I'd have to dig to find it. 

My biggest thing, I'm so used to using sdk1 that I'm likely to have to
hit the backspace key quite often until this gets sorted out.  My OS
stuff is on sda, sdb, sdc and was on sdd.  Anything above that was
external.  If one of the storms knocks my lights out, I may get a chance
to reboot.  See if that fixes things.

Dale

:-)  :-)
Re: Hard drive error from SMART [ In reply to ]
On Sun, 17 Apr 2022 13:45:39 -0500, Dale wrote:

> >> What is crypttab?  I type in the command manually.
> > Then use a shell alias, even less typing.
>
> I've done a couple basic alias things here but never grasped it enough
> to do anything beyond making ls run with -al each time.  I think there
> is another one I did but it was long ago.  I'd have to dig to find it. 

alias docrypt='cryptsetup whatever you normally type'

Put that in your profile and you can then mount open the encrypted drives
by typing docrypt. And if your setup changes, you change the alias but the
command you type stays the same.

Or you could use a shell script to open and mount with one command.

#!/bin/sh
cryptsetup whatever
mount whatever


--
Neil Bothwick

Celery is not food. It is a member of the plywood family.
Re: Hard drive error from SMART [ In reply to ]
Neil Bothwick wrote:
> On Sun, 17 Apr 2022 13:45:39 -0500, Dale wrote:
>
>>>> What is crypttab?  I type in the command manually.
>>> Then use a shell alias, even less typing.
>> I've done a couple basic alias things here but never grasped it enough
>> to do anything beyond making ls run with -al each time.  I think there
>> is another one I did but it was long ago.  I'd have to dig to find it. 
> alias docrypt='cryptsetup whatever you normally type'
>
> Put that in your profile and you can then mount open the encrypted drives
> by typing docrypt. And if your setup changes, you change the alias but the
> command you type stays the same.
>
> Or you could use a shell script to open and mount with one command.
>
> #!/bin/sh
> cryptsetup whatever
> mount whatever
>
>


I have to enter a password in the middle of that.  I don't know how that
would work.  As I've said before, my "scripts" are so simple, they may
not even be called scripts.  They're just files with commands in them. 

If nothing changes when I get around to rebooting, I'll get into this
some more. 

Dale

:-)  :-) 
Re: Hard drive error from SMART [ In reply to ]
On Mon, 18 Apr 2022 09:06:11 -0500, Dale wrote:

> > #!/bin/sh
> > cryptsetup whatever
> > mount whatever
> >
> >
>
>
> I have to enter a password in the middle of that.  I don't know how that
> would work.  As I've said before, my "scripts" are so simple, they may
> not even be called scripts.  They're just files with commands in them. 
>
> If nothing changes when I get around to rebooting, I'll get into this
> some more. 

It will prompt for the password, just as if you ran the command manually.


--
Neil Bothwick

One of the nice things about standards is that there are so many of them.
Re: Hard drive error from SMART [ In reply to ]
Neil Bothwick wrote:
> On Mon, 18 Apr 2022 09:06:11 -0500, Dale wrote:
>
>>> #!/bin/sh
>>> cryptsetup whatever
>>> mount whatever
>>>
>>>
>>
>> I have to enter a password in the middle of that.  I don't know how that
>> would work.  As I've said before, my "scripts" are so simple, they may
>> not even be called scripts.  They're just files with commands in them. 
>>
>> If nothing changes when I get around to rebooting, I'll get into this
>> some more. 
> It will prompt for the password, just as if you ran the command manually.
>
>


Finally got around to trying this.  I went to town today and locked it
up before I left.  When I came back, used your little script trick and
it worked great.  It mounts and everything for me.  Now I'll make one to
umount and close as well.  No prompting so it should be easy enough. 

Thanks for the tip.

Dale

:-)  :-)