Mailing List Archive

Instability with cx18-perf branch
Hi Andy

I tried the cx18-perf branch to see if it would fix my minor problem when
recording multiple streams (would occasionally drop a buffer when
recording). This branch unfortunately made things much worse and has
appears to have made my myth backend to crash three times in the last three
days.

I have two cards, one a HVR1600 and the other a DViCO FusionHDTV7 Dual
Express. The problem seems to randomly happen with the HVR1600 HD tuner
starts recording. It has a lower priority than the two DViCO HD tuners so it
only kicks in occansionally.

What can I do to help debug the problem?

John
Re: Instability with cx18-perf branch [ In reply to ]
On Thu, 2009-04-23 at 09:37 -0700, John Lundell wrote:
> Hi Andy
>
> I tried the cx18-perf branch to see if it would fix my minor problem
> when recording multiple streams (would occasionally drop a buffer when
> recording). This branch unfortunately made things much worse and has
> appears to have made my myth backend to crash three times in the last
> three days.

Is this happening for analog captures, digital captures, or both with
the HVR-1600?

Assuming this is happening for analog captures only, we may be running
into a BUG_ON() in the driver when reading a buffer and scheduling the
work object that the outgoing work handler is already acting upon. I
know what to do if this is the case. (I thought the kernel handled this
case gracefully, but I wasn't totally clear.) The "random" nature
sounds about right for this sort of race condition.


> I have two cards, one a HVR1600 and the other a DViCO FusionHDTV7 Dual
> Express. The problem seems to randomly happen with the HVR1600 HD
> tuner starts recording. It has a lower priority than the two DViCO HD
> tuners so it only kicks in occansionally.
>
> What can I do to help debug the problem?

Hmmm. Could you please send the portions of the MythTV log at the time
of the capture and through the crash? I'd like to see what MythTV is
griping about.

Also could you look for cx18 related messages in dmesg
or /var/log/messages for that same time period?

Also, look for any "Oops" or "Bug" messages in the dmesg
or /var/log/messages, I'd like to see the complete dump from such a
message.



If nothing's obvious after looking at all that, we'll starting turning
on extra logging from the cx18 driver with the "debug=..." module
parameter.

$ modinfo cx18
[...]
parm: debug:Debug level (bitmask). Default: 0
1/0x0001: warning
2/0x0002: info
4/0x0004: mailbox
8/0x0008: dma
16/0x0010: ioctl
32/0x0020: file
64/0x0040: i2c
128/0x0080: irq
256/0x0100: high volume
[...]

We'll probably be interested in info, warning, mailbox, dma, file, and
irq (debug=175) and maybe with high volume turned on (debug=431) but the
log file will be enormous in that case.

Regards,
Andy

> John



_______________________________________________
ivtv-devel mailing list
ivtv-devel@ivtvdriver.org
http://ivtvdriver.org/mailman/listinfo/ivtv-devel
Re: Instability with cx18-perf branch [ In reply to ]
On Thu, 2009-04-23 at 09:37 -0700, John Lundell wrote:
> Hi Andy
>
> I tried the cx18-perf branch to see if it would fix my minor problem
> when recording multiple streams (would occasionally drop a buffer when
> recording). This branch unfortunately made things much worse and has
> appears to have made my myth backend to crash three times in the last
> three days.

Is this happening for analog captures, digital captures, or both with
the HVR-1600?

Assuming this is happening for analog captures only, we may be running
into a BUG_ON() in the driver when reading a buffer and scheduling the
work object that the outgoing work handler is already acting upon. I
know what to do if this is the case. (I thought the kernel handled this
case gracefully, but I wasn't totally clear.) The "random" nature
sounds about right for this sort of race condition.


> I have two cards, one a HVR1600 and the other a DViCO FusionHDTV7 Dual
> Express. The problem seems to randomly happen with the HVR1600 HD
> tuner starts recording. It has a lower priority than the two DViCO HD
> tuners so it only kicks in occansionally.
>
> What can I do to help debug the problem?

Hmmm. Could you please send the portions of the MythTV log at the time
of the capture and through the crash? I'd like to see what MythTV is
griping about.

Also could you look for cx18 related messages in dmesg
or /var/log/messages for that same time period?

Also, look for any "Oops" or "Bug" messages in the dmesg
or /var/log/messages, I'd like to see the complete dump from such a
message.



If nothing's obvious after looking at all that, we'll starting turning
on extra logging from the cx18 driver with the "debug=..." module
parameter.

$ modinfo cx18
[...]
parm: debug:Debug level (bitmask). Default: 0
1/0x0001: warning
2/0x0002: info
4/0x0004: mailbox
8/0x0008: dma
16/0x0010: ioctl
32/0x0020: file
64/0x0040: i2c
128/0x0080: irq
256/0x0100: high volume
[...]

We'll probably be interested in info, warning, mailbox, dma, file, and
irq (debug=175) and maybe with high volume turned on (debug=431) but the
log file will be enormous in that case.

Regards,
Andy

> John



_______________________________________________
ivtv-devel mailing list
ivtv-devel@ivtvdriver.org
http://ivtvdriver.org/mailman/listinfo/ivtv-devel
Re: Instability with cx18-perf branch [ In reply to ]
On Thu, 2009-04-23 at 21:44 -0400, Andy Walls wrote:
> On Thu, 2009-04-23 at 09:37 -0700, John Lundell wrote:
> > Hi Andy
> >
> > I tried the cx18-perf branch to see if it would fix my minor problem
> > when recording multiple streams (would occasionally drop a buffer when
> > recording). This branch unfortunately made things much worse and has
> > appears to have made my myth backend to crash three times in the last
> > three days.
>
> Is this happening for analog captures, digital captures, or both with
> the HVR-1600?
>
> Assuming this is happening for analog captures only, we may be running
> into a BUG_ON() in the driver when reading a buffer and scheduling the
> work object that the outgoing work handler is already acting upon. I
> know what to do if this is the case. (I thought the kernel handled this
> case gracefully, but I wasn't totally clear.) The "random" nature
> sounds about right for this sort of race condition.
>
>
> > I have two cards, one a HVR1600 and the other a DViCO FusionHDTV7 Dual
> > Express. The problem seems to randomly happen with the HVR1600 HD
> > tuner starts recording. It has a lower priority than the two DViCO HD
> > tuners so it only kicks in occansionally.
> >
> > What can I do to help debug the problem?
>
> Hmmm. Could you please send the portions of the MythTV log at the time
> of the capture and through the crash? I'd like to see what MythTV is
> griping about.
>
> Also could you look for cx18 related messages in dmesg
> or /var/log/messages for that same time period?
>
> Also, look for any "Oops" or "Bug" messages in the dmesg
> or /var/log/messages, I'd like to see the complete dump from such a
> message.
>

John,

I'll have time to work on cx18 problems this evening (EDT). Could you
provide some log output before then?

Since this particular changeset is more complex than most, I'd like to
get any problems debugged before I forget any nuances. I also know I
have a time window tonight when the kids won't break my concentration
every 5 minutes. :)

Regards,
Andy

> If nothing's obvious after looking at all that, we'll starting turning
> on extra logging from the cx18 driver with the "debug=..." module
> parameter.
>
> $ modinfo cx18
> [...]
> parm: debug:Debug level (bitmask). Default: 0
> 1/0x0001: warning
> 2/0x0002: info
> 4/0x0004: mailbox
> 8/0x0008: dma
> 16/0x0010: ioctl
> 32/0x0020: file
> 64/0x0040: i2c
> 128/0x0080: irq
> 256/0x0100: high volume
> [...]
>
> We'll probably be interested in info, warning, mailbox, dma, file, and
> irq (debug=175) and maybe with high volume turned on (debug=431) but the
> log file will be enormous in that case.
>
> Regards,
> Andy
>
> > John
>


_______________________________________________
ivtv-devel mailing list
ivtv-devel@ivtvdriver.org
http://ivtvdriver.org/mailman/listinfo/ivtv-devel
Re: Instability with cx18-perf branch [ In reply to ]
On Sat, Apr 25, 2009 at 5:28 AM, Andy Walls <awalls@radix.net> wrote:

> On Thu, 2009-04-23 at 21:44 -0400, Andy Walls wrote:
> > On Thu, 2009-04-23 at 09:37 -0700, John Lundell wrote:
> > > Hi Andy
> > >
> > > I tried the cx18-perf branch to see if it would fix my minor problem
> > > when recording multiple streams (would occasionally drop a buffer when
> > > recording). This branch unfortunately made things much worse and has
> > > appears to have made my myth backend to crash three times in the last
> > > three days.
> >
> > Is this happening for analog captures, digital captures, or both with
> > the HVR-1600?
> >
> > Assuming this is happening for analog captures only, we may be running
> > into a BUG_ON() in the driver when reading a buffer and scheduling the
> > work object that the outgoing work handler is already acting upon. I
> > know what to do if this is the case. (I thought the kernel handled this
> > case gracefully, but I wasn't totally clear.) The "random" nature
> > sounds about right for this sort of race condition.
> >
> >
> > > I have two cards, one a HVR1600 and the other a DViCO FusionHDTV7 Dual
> > > Express. The problem seems to randomly happen with the HVR1600 HD
> > > tuner starts recording. It has a lower priority than the two DViCO HD
> > > tuners so it only kicks in occansionally.
> > >
> > > What can I do to help debug the problem?
> >
> > Hmmm. Could you please send the portions of the MythTV log at the time
> > of the capture and through the crash? I'd like to see what MythTV is
> > griping about.
> >
> > Also could you look for cx18 related messages in dmesg
> > or /var/log/messages for that same time period?
> >
> > Also, look for any "Oops" or "Bug" messages in the dmesg
> > or /var/log/messages, I'd like to see the complete dump from such a
> > message.
> >
>
> John,
>
> I'll have time to work on cx18 problems this evening (EDT). Could you
> provide some log output before then?
>
> Since this particular changeset is more complex than most, I'd like to
> get any problems debugged before I forget any nuances. I also know I
> have a time window tonight when the kids won't break my concentration
> every 5 minutes. :)
>
> Regards,
> Andy
>

Hi Andy,

Sure, I can try some tests this morning.

John
Re: Instability with cx18-perf branch [ In reply to ]
On Sat, Apr 25, 2009 at 9:44 AM, John Lundell <jdlundell@gmail.com> wrote:

> On Sat, Apr 25, 2009 at 5:28 AM, Andy Walls <awalls@radix.net> wrote:
>
>> On Thu, 2009-04-23 at 21:44 -0400, Andy Walls wrote:
>> > On Thu, 2009-04-23 at 09:37 -0700, John Lundell wrote:
>> > > Hi Andy
>> > >
>> > > I tried the cx18-perf branch to see if it would fix my minor problem
>> > > when recording multiple streams (would occasionally drop a buffer when
>> > > recording). This branch unfortunately made things much worse and has
>> > > appears to have made my myth backend to crash three times in the last
>> > > three days.
>> >
>> > Is this happening for analog captures, digital captures, or both with
>> > the HVR-1600?
>> >
>> > Assuming this is happening for analog captures only, we may be running
>> > into a BUG_ON() in the driver when reading a buffer and scheduling the
>> > work object that the outgoing work handler is already acting upon. I
>> > know what to do if this is the case. (I thought the kernel handled this
>> > case gracefully, but I wasn't totally clear.) The "random" nature
>> > sounds about right for this sort of race condition.
>> >
>> >
>> > > I have two cards, one a HVR1600 and the other a DViCO FusionHDTV7 Dual
>> > > Express. The problem seems to randomly happen with the HVR1600 HD
>> > > tuner starts recording. It has a lower priority than the two DViCO HD
>> > > tuners so it only kicks in occansionally.
>> > >
>> > > What can I do to help debug the problem?
>> >
>> > Hmmm. Could you please send the portions of the MythTV log at the time
>> > of the capture and through the crash? I'd like to see what MythTV is
>> > griping about.
>> >
>> > Also could you look for cx18 related messages in dmesg
>> > or /var/log/messages for that same time period?
>> >
>> > Also, look for any "Oops" or "Bug" messages in the dmesg
>> > or /var/log/messages, I'd like to see the complete dump from such a
>> > message.
>> >
>>
>> John,
>>
>> I'll have time to work on cx18 problems this evening (EDT). Could you
>> provide some log output before then?
>>
>> Since this particular changeset is more complex than most, I'd like to
>> get any problems debugged before I forget any nuances. I also know I
>> have a time window tonight when the kids won't break my concentration
>> every 5 minutes. :)
>>
>> Regards,
>> Andy
>>
>
> Hi Andy,
>
> Sure, I can try some tests this morning.
>
> John
>
>
> Hi Andy,

I ran a bunch of tests where I started and tuners recording and then started
and stopped tuners one by one to see if I could get a crash and of course,
observing the process makes it work, no crashes. I have attached the output
from dmesg for two different tries.

On the mythbackend crash log, there was nothing, just going to record
program xyz and then nothing else.

John
Re: Instability with cx18-perf branch [ In reply to ]
On Sat, 2009-04-25 at 13:23 -0700, John Lundell wrote:
>
> On Sat, Apr 25, 2009 at 9:44 AM, John Lundell <jdlundell@gmail.com>
> wrote:
>

>
> Hi Andy,
>
> I ran a bunch of tests where I started and tuners recording and then
> started and stopped tuners one by one to see if I could get a crash
> and of course, observing the process makes it work, no crashes.


:) Kind of like quantum mechanics.


> I have attached the output from dmesg for two different tries.

OK. I've looked at them.

As you already noted, there is no evidence of any crashes.

I'll note that you have a setup very similar to Brandon Jenkins' setup:
4 cores with cx18-0 sharing IRQ 19 with a AHCI disk controller and a USB
hub. You may experience an occasional lost buffer if writing to disks
hooked to that disk controller during captures.

I say this because, on occassion, your system isn't servicing the
CX23418 interrupt in a timely fashion ("Possibly falling behind" with
the ones that say "while processing" being not as late). You're not
losing video buffers though. The only missed buffer sweep ups ("it must
have dropped out of rotation") I see are happening after the
CAPTURE_STOP when the CX23418 rapid fires back a bunch of empty buffers
- nothing to worry about.


> On the mythbackend crash log, there was nothing, just going to record
> program xyz and then nothing else.


Alright. If you can't reproduce the bug in a day or two, I'll make a
small prophylactic patch -- to ensure the cx18 driver doesn't attempt to
queue a work object that's already queued -- and ask that to be
pulled.

I suspect that patch may not be necessary, but it closes off the most
likely failure mode that I can think of that would cause an app to
crash. When an app reads from an analog TV cature stream, the driver
ends up calling cx18_stream_put_buf_fw() when a buffer is emptied.
cx18_stream_put_buf_fw() then calls queue_work() attempting to queue the
work object for that stream. If the kernel doesn't gracefully handle
attempted (re)queueing of an already queued work object, then I could
see how the cx18 driver would cause MythTV to crash (with an Oops or Bug
in /var/log/messages).

The only other thing I could have gotten wrong is the cx18 driver
internal buffer queueing. But I was so paranoid about that, I'm pretty
sure I got it right. Though, feel free to inspect the patches near the
tip of the cx18-perf repo if you want to try and spot anything amiss:

http://linuxtv.org/hg/~awalls/cx18-perf/


Regards,
Andy


> John


_______________________________________________
ivtv-devel mailing list
ivtv-devel@ivtvdriver.org
http://ivtvdriver.org/mailman/listinfo/ivtv-devel