Mailing List Archive

Effect in guest of a hard read error from the host's storage ?
Not having experienced it on any of my Xen systems I’ve managed in the past, I just wanted to check what happens when the host’s storage encounters an unrecoverable read error ?
My expectation would be that the error would be passed up the chain (host’s OS, virtual disk emulator, guest OS disk driver, guest OS filesystem, guest OS program), appear as an unrecoverable error to the guest, and hence passed up to the program attempting to read the file - so really nothing different to the same error on bare metal.

Has anyone encountered this, and can confirm if that’s how Xen+Linux handles it ?

The reason for asking is that I also use Parallels on a Mac and have this situation. Cloning the host’s drive (Carbon Copy Cloner) “does the right thing” - CCC copies the files it can and reports the ones it can’t. A couple of the ones it can’t are virtual disk files for Parallels guests.
When I try to recover what I can (and any unreadable files might not be important anyway but I don’t know which they are) using the same technique in the guest, Parallels doesn’t behave like that. It pops up a dialog to say a critical error has occurred, with two options - retry (which of course doesn’t do anything useful), or to stop the guest. So the guest OS never gets to see the error, it just gets killed. Thus meaningful recovery is impossible as the guest gets stopped without being able to tell me what file(s) is affected.
I’ve been “discussing” this with Parallels support and at the moment I’ve reached the point where they’re telling me that this is normal when things are virtualised - which is setting off the BS detector for me.

Simon
Re: Effect in guest of a hard read error from the host's storage ? [ In reply to ]
On Sat, Apr 2, 2022 at 8:28 PM Simon Hobson <simon@thehobsons.co.uk> wrote:

> Not having experienced it on any of my Xen systems I’ve managed in the
> past, I just wanted to check what happens when the host’s storage
> encounters an unrecoverable read error ?
> My expectation would be that the error would be passed up the chain
> (host’s OS, virtual disk emulator, guest OS disk driver, guest OS
> filesystem, guest OS program), appear as an unrecoverable error to the
> guest, and hence passed up to the program attempting to read the file - so
> really nothing different to the same error on bare metal.
>
> Has anyone encountered this, and can confirm if that’s how Xen+Linux
> handles it ?
>

Juergen / Roger / Anthony -- any idea what would happen with either blkback
or qdisk on a hard read error? (Or blktap, for that matter?)


> The reason for asking is that I also use Parallels on a Mac and have this
> situation. Cloning the host’s drive (Carbon Copy Cloner) “does the right
> thing” - CCC copies the files it can and reports the ones it can’t. A
> couple of the ones it can’t are virtual disk files for Parallels guests.
> When I try to recover what I can (and any unreadable files might not be
> important anyway but I don’t know which they are) using the same technique
> in the guest, Parallels doesn’t behave like that. It pops up a dialog to
> say a critical error has occurred, with two options - retry (which of
> course doesn’t do anything useful), or to stop the guest. So the guest OS
> never gets to see the error, it just gets killed. Thus meaningful recovery
> is impossible as the guest gets stopped without being able to tell me what
> file(s) is affected.
> I’ve been “discussing” this with Parallels support and at the moment I’ve
> reached the point where they’re telling me that this is normal when things
> are virtualised - which is setting off the BS detector for me.
>

I think it probably depends on a lot of factors; it wouldn't surprise me if
more consumer-grade virtual machine software, like Parallels or VirtualBox
don't do the hard work of writing and testing those sorts of paths, while
more enterprise-grade software might. That said, I don't think I've ever
heard anyone ask the question of Xen, so it's possible that it simply
hasn't been considered.

Another thing to consider is that the ability to pass the error on to the
guest in a system like Parallels depends not only on the correctness of its
own block datapath in the presence of errors, but also the correctness of
the surrounding operating system. If the host OS doesn't handle those
kinds of errors gracefully in a reliable fashion, there's nothing Parallels
can really do to make up for it.

-George
Re: Effect in guest of a hard read error from the host's storage ? [ In reply to ]
On 07.04.22 12:10, George Dunlap wrote:
>
>
> On Sat, Apr 2, 2022 at 8:28 PM Simon Hobson <simon@thehobsons.co.uk
> <mailto:simon@thehobsons.co.uk>> wrote:
>
> Not having experienced it on any of my Xen systems I’ve managed in the past,
> I just wanted to check what happens when the host’s storage encounters an
> unrecoverable read error ?
> My expectation would be that the error would be passed up the chain (host’s
> OS, virtual disk emulator, guest OS disk driver, guest OS filesystem, guest
> OS program), appear as an unrecoverable error to the guest, and hence passed
> up to the program attempting to read the file - so really nothing different
> to the same error on bare metal.
>
> Has anyone encountered this, and can confirm if that’s how Xen+Linux handles
> it ?
>
>
> Juergen / Roger / Anthony -- any idea what would happen with either blkback or
> qdisk on a hard read error?  (Or blktap, for that matter?)

Any I/O error for a request issued via the PV block device protocol will be
handed back to the frontend via a generic error "BLKIF_RSP_ERROR", which is
handed back to the block layer as I/O error on the guest side.


Juergen
Re: Effect in guest of a hard read error from the host's storage ? [ In reply to ]
On Thu, Apr 07, 2022 at 12:20:27PM +0200, Juergen Gross wrote:
> On 07.04.22 12:10, George Dunlap wrote:
> >
> >
> > On Sat, Apr 2, 2022 at 8:28 PM Simon Hobson <simon@thehobsons.co.uk
> > <mailto:simon@thehobsons.co.uk>> wrote:
> >
> > Not having experienced it on any of my Xen systems I’ve managed in the past,
> > I just wanted to check what happens when the host’s storage encounters an
> > unrecoverable read error ?
> > My expectation would be that the error would be passed up the chain (host’s
> > OS, virtual disk emulator, guest OS disk driver, guest OS filesystem, guest
> > OS program), appear as an unrecoverable error to the guest, and hence passed
> > up to the program attempting to read the file - so really nothing different
> > to the same error on bare metal.
> >
> > Has anyone encountered this, and can confirm if that’s how Xen+Linux handles
> > it ?
> >
> >
> > Juergen / Roger / Anthony -- any idea what would happen with either
> > blkback or qdisk on a hard read error?  (Or blktap, for that matter?)
>
> Any I/O error for a request issued via the PV block device protocol will be
> handed back to the frontend via a generic error "BLKIF_RSP_ERROR", which is
> handed back to the block layer as I/O error on the guest side.

It seems to be the same when the backend is "qdisk", QEMU sets the
request status to BLKIF_RSP_ERROR on I/O error.

--
Anthony PERARD
Re: Effect in guest of a hard read error from the host's storage ? [ In reply to ]
George Dunlap <dunlapg@umich.edu> wrote:

>> On Sat, Apr 2, 2022 at 8:28 PM Simon Hobson <simon@thehobsons.co.uk> wrote:
>> Not having experienced it on any of my Xen systems I’ve managed in the past, I just wanted to check what happens when the host’s storage encounters an unrecoverable read error ?
>> My expectation would be that the error would be passed up the chain (host’s OS, virtual disk emulator, guest OS disk driver, guest OS filesystem, guest OS program), appear as an unrecoverable error to the guest, and hence passed up to the program attempting to read the file - so really nothing different to the same error on bare metal.
>>
>> Has anyone encountered this, and can confirm if that’s how Xen+Linux handles it ?
>>
> Juergen / Roger / Anthony -- any idea what would happen with either blkback or qdisk on a hard read error? (Or blktap, for that matter?)
>
>> The reason for asking is that I also use Parallels on a Mac and have this situation. Cloning the host’s drive (Carbon Copy Cloner) “does the right thing” - CCC copies the files it can and reports the ones it can’t. A couple of the ones it can’t are virtual disk files for Parallels guests.
>> When I try to recover what I can (and any unreadable files might not be important anyway but I don’t know which they are) using the same technique in the guest, Parallels doesn’t behave like that. It pops up a dialog to say a critical error has occurred, with two options - retry (which of course doesn’t do anything useful), or to stop the guest. So the guest OS never gets to see the error, it just gets killed. Thus meaningful recovery is impossible as the guest gets stopped without being able to tell me what file(s) is affected.
>> I’ve been “discussing” this with Parallels support and at the moment I’ve reached the point where they’re telling me that this is normal when things are virtualised - which is setting off the BS detector for me.
>>
> I think it probably depends on a lot of factors; it wouldn't surprise me if more consumer-grade virtual machine software, like Parallels or VirtualBox don't do the hard work of writing and testing those sorts of paths, while more enterprise-grade software might. That said, I don't think I've ever heard anyone ask the question of Xen, so it's possible that it simply hasn't been considered.
>
> Another thing to consider is that the ability to pass the error on to the guest in a system like Parallels depends not only on the correctness of its own block datapath in the presence of errors, but also the correctness of the surrounding operating system. If the host OS doesn't handle those kinds of errors gracefully in a reliable fashion, there's nothing Parallels can really do to make up for it.

Yes, I imagine it’s “quite complicated” to catch all the possible exceptions.
But given that they do some versions which are definitely not “consumer orientated” and they have been doing it for a long time* I’d have expected better.

* I vaguely recall having been using it since an architecture change (dropping of a different endian emulation ?) between PowerPC 4 and PowerPC 5 stopped the old Virtual-PC emulator working. Something like 20 years ago.



Juergen Gross <jgross@suse.com> wrote:

> Any I/O error for a request issued via the PV block device protocol will be
> handed back to the frontend via a generic error "BLKIF_RSP_ERROR", which is
> handed back to the block layer as I/O error on the guest side.

Thanks, that sounds like exactly what I’d expect to happen.



Anthony PERARD <anthony.perard@citrix.com> wrote:

> It seems to be the same when the backend is "qdisk", QEMU sets the
> request status to BLKIF_RSP_ERROR on I/O error.




Thanks all for taking the time to reply - and apologies for the delay in responding.

I had a third remote session with a support agent today (Indian ? call centre - nothing against them personally, but the old saying about “pay peanuts, get monkeys” comes to mind) - and as I’ve come to expect he’d not read all the notes, I had to explain yet again what the problem is, tactfully explain that "no none of this is inherent in virtualisation and it’s only Parallels implementation that makes it fail in this way”. Today I had the “pleasure” of demonstrating that no I can’t just copy the whole virtual disk file (it fails when it hits the unreadable block) and no I can’t just copy files using Finder within the guest (triggers Parallels to pop up the same critical warning error and only give options to retry, as if that’s going to do any different, or to kill the guest).
It will be interesting to see whether my request for escalation to a higher level gets noticed.
Almost as much fun as dealing with the hell desk when I have a problem with the Windows laptop I have to use for ${day_job} !


Thanks, Simon