Kernel crash during video transcoding

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

Kernel crash during video transcoding

Alexandre Levy
Hi,

I installed the port drm-devel-kmod for Plex to be able to transcode videos
using the integrated GPU of my Intel Celeron G5900.

I'm running r364031 and the kernel is compiled with GENERIC-NODEBUG profile.

Transcoding has been working fine for quite a while now but one video
transcoding is causing a kernel panic that is reproducible all the time
with that particular video. It seems like it's caused by the i915kms module
(call of i915_gms_fault() in the stack) :

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0xdf
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80bdd2b4
stack pointer           = 0x0:0xfffffe00d2be56d0
frame pointer           = 0x0:0xfffffe00d2be56d0
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 4611 (Plex Transcoder)
trap number             = 12
panic: page fault
cpuid = 0
time = 1596976796
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
0xfffffe00d2be5390
vpanic() at vpanic+0x182/frame 0xfffffe00d2be53e0
panic() at panic+0x43/frame 0xfffffe00d2be5440
trap_fatal() at trap_fatal+0x387/frame 0xfffffe00d2be54a0
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00d2be54f0
trap() at trap+0x271/frame 0xfffffe00d2be5600
calltrap() at calltrap+0x8/frame 0xfffffe00d2be5600
--- trap 0xc, rip = 0xffffffff80bdd2b4, rsp = 0xfffffe00d2be56d0, rbp =
0xfffffe00d2be56d0 ---
_rw_wowned() at _rw_wowned+0x4/frame 0xfffffe00d2be56d0
vm_page_busy_acquire() at vm_page_busy_acquire+0x141/frame
0xfffffe00d2be5710
remap_io_mapping() at remap_io_mapping+0x120/frame 0xfffffe00d2be5760
i915_gem_fault() at i915_gem_fault+0x25f/frame 0xfffffe00d2be57d0
linux_cdev_pager_populate() at linux_cdev_pager_populate+0x11b/frame
0xfffffe00d2be5840
vm_fault() at vm_fault+0x3d1/frame 0xfffffe00d2be5950
vm_fault_trap() at vm_fault_trap+0x60/frame 0xfffffe00d2be5990
trap_pfault() at trap_pfault+0x19c/frame 0xfffffe00d2be59e0
trap() at trap+0x3f1/frame 0xfffffe00d2be5af0
calltrap() at calltrap+0x8/frame 0xfffffe00d2be5af0
--- trap 0xc, rip = 0x80296659a, rsp = 0x7fffffffbd38, rbp = 0x80fc00000 ---
KDB: enter: panic

I don't see any crash dump in /var/crash despite having the right
configuration and I should have enough space on my swap device (128GB USB
drive) :

$ cat /etc/rc.conf | grep dump
dumpdev="AUTO"

$ swapinfo
Device          1K-blocks     Used    Avail Capacity
/dev/gpt/crash0 121307096        0 121307096     0%

$ cat /etc/fstab
/dev/gpt/crash0 none    swap    sw              0       0

$ dumpon -l
gpt/crash0

Not sure why no dump was generated, is it because the kernel was compiled
with the GENERIC-NODEBUG profile ? However I see various KDB options in the
GENERIC profile that are inherited by GENERIC-NODEBUG.

Happy to recompile the kernel with GENERIC profile if it's required.

Thank you.
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Kernel crash during video transcoding

Hans Petter Selasky-6
Hi,

On 2020-08-10 00:19, Alexandre Levy wrote:

> Hi,
>
> I installed the port drm-devel-kmod for Plex to be able to transcode videos
> using the integrated GPU of my Intel Celeron G5900.
>
> I'm running r364031 and the kernel is compiled with GENERIC-NODEBUG profile.
>
> Transcoding has been working fine for quite a while now but one video
> transcoding is causing a kernel panic that is reproducible all the time
> with that particular video. It seems like it's caused by the i915kms module
> (call of i915_gms_fault() in the stack) :

If you compile the kernel using GENERIC and then enable debugging in the
i915 kms and reproduce, we might get a more clear picture!

It is a so called NULL pointer you've experienced.

--HPS

>
> Fatal trap 12: page fault while in kernel mode
> cpuid = 0; apic id = 00
> fault virtual address   = 0xdf
> fault code              = supervisor read data, page not present
> instruction pointer     = 0x20:0xffffffff80bdd2b4
> stack pointer           = 0x0:0xfffffe00d2be56d0
> frame pointer           = 0x0:0xfffffe00d2be56d0
> code segment            = base 0x0, limit 0xfffff, type 0x1b
>                          = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags        = interrupt enabled, resume, IOPL = 0
> current process         = 4611 (Plex Transcoder)
> trap number             = 12
> panic: page fault
> cpuid = 0
> time = 1596976796
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> 0xfffffe00d2be5390
> vpanic() at vpanic+0x182/frame 0xfffffe00d2be53e0
> panic() at panic+0x43/frame 0xfffffe00d2be5440
> trap_fatal() at trap_fatal+0x387/frame 0xfffffe00d2be54a0
> trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00d2be54f0
> trap() at trap+0x271/frame 0xfffffe00d2be5600
> calltrap() at calltrap+0x8/frame 0xfffffe00d2be5600
> --- trap 0xc, rip = 0xffffffff80bdd2b4, rsp = 0xfffffe00d2be56d0, rbp =
> 0xfffffe00d2be56d0 ---
> _rw_wowned() at _rw_wowned+0x4/frame 0xfffffe00d2be56d0
> vm_page_busy_acquire() at vm_page_busy_acquire+0x141/frame
> 0xfffffe00d2be5710
> remap_io_mapping() at remap_io_mapping+0x120/frame 0xfffffe00d2be5760
> i915_gem_fault() at i915_gem_fault+0x25f/frame 0xfffffe00d2be57d0
> linux_cdev_pager_populate() at linux_cdev_pager_populate+0x11b/frame
> 0xfffffe00d2be5840
> vm_fault() at vm_fault+0x3d1/frame 0xfffffe00d2be5950
> vm_fault_trap() at vm_fault_trap+0x60/frame 0xfffffe00d2be5990
> trap_pfault() at trap_pfault+0x19c/frame 0xfffffe00d2be59e0
> trap() at trap+0x3f1/frame 0xfffffe00d2be5af0
> calltrap() at calltrap+0x8/frame 0xfffffe00d2be5af0
> --- trap 0xc, rip = 0x80296659a, rsp = 0x7fffffffbd38, rbp = 0x80fc00000 ---
> KDB: enter: panic
>
> I don't see any crash dump in /var/crash despite having the right
> configuration and I should have enough space on my swap device (128GB USB
> drive) :
>
> $ cat /etc/rc.conf | grep dump
> dumpdev="AUTO"
>
> $ swapinfo
> Device          1K-blocks     Used    Avail Capacity
> /dev/gpt/crash0 121307096        0 121307096     0%
>
> $ cat /etc/fstab
> /dev/gpt/crash0 none    swap    sw              0       0
>
> $ dumpon -l
> gpt/crash0
>
> Not sure why no dump was generated, is it because the kernel was compiled
> with the GENERIC-NODEBUG profile ? However I see various KDB options in the
> GENERIC profile that are inherited by GENERIC-NODEBUG.
>
> Happy to recompile the kernel with GENERIC profile if it's required.
>
> Thank you.
> _______________________________________________
> [hidden email] mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "[hidden email]"
>

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Kernel crash during video transcoding

Alexandre Levy
I'm recompiling the kernel using GENERIC at the moment but I'm not sure how
to enable debugging in i915 kms, there is no compile option for that, am I
missing something ?

Le lun. 10 août 2020 à 08:44, Hans Petter Selasky <[hidden email]> a
écrit :

> Hi,
>
> On 2020-08-10 00:19, Alexandre Levy wrote:
> > Hi,
> >
> > I installed the port drm-devel-kmod for Plex to be able to transcode
> videos
> > using the integrated GPU of my Intel Celeron G5900.
> >
> > I'm running r364031 and the kernel is compiled with GENERIC-NODEBUG
> profile.
> >
> > Transcoding has been working fine for quite a while now but one video
> > transcoding is causing a kernel panic that is reproducible all the time
> > with that particular video. It seems like it's caused by the i915kms
> module
> > (call of i915_gms_fault() in the stack) :
>
> If you compile the kernel using GENERIC and then enable debugging in the
> i915 kms and reproduce, we might get a more clear picture!
>
> It is a so called NULL pointer you've experienced.
>
> --HPS
>
> >
> > Fatal trap 12: page fault while in kernel mode
> > cpuid = 0; apic id = 00
> > fault virtual address   = 0xdf
> > fault code              = supervisor read data, page not present
> > instruction pointer     = 0x20:0xffffffff80bdd2b4
> > stack pointer           = 0x0:0xfffffe00d2be56d0
> > frame pointer           = 0x0:0xfffffe00d2be56d0
> > code segment            = base 0x0, limit 0xfffff, type 0x1b
> >                          = DPL 0, pres 1, long 1, def32 0, gran 1
> > processor eflags        = interrupt enabled, resume, IOPL = 0
> > current process         = 4611 (Plex Transcoder)
> > trap number             = 12
> > panic: page fault
> > cpuid = 0
> > time = 1596976796
> > KDB: stack backtrace:
> > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> > 0xfffffe00d2be5390
> > vpanic() at vpanic+0x182/frame 0xfffffe00d2be53e0
> > panic() at panic+0x43/frame 0xfffffe00d2be5440
> > trap_fatal() at trap_fatal+0x387/frame 0xfffffe00d2be54a0
> > trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00d2be54f0
> > trap() at trap+0x271/frame 0xfffffe00d2be5600
> > calltrap() at calltrap+0x8/frame 0xfffffe00d2be5600
> > --- trap 0xc, rip = 0xffffffff80bdd2b4, rsp = 0xfffffe00d2be56d0, rbp =
> > 0xfffffe00d2be56d0 ---
> > _rw_wowned() at _rw_wowned+0x4/frame 0xfffffe00d2be56d0
> > vm_page_busy_acquire() at vm_page_busy_acquire+0x141/frame
> > 0xfffffe00d2be5710
> > remap_io_mapping() at remap_io_mapping+0x120/frame 0xfffffe00d2be5760
> > i915_gem_fault() at i915_gem_fault+0x25f/frame 0xfffffe00d2be57d0
> > linux_cdev_pager_populate() at linux_cdev_pager_populate+0x11b/frame
> > 0xfffffe00d2be5840
> > vm_fault() at vm_fault+0x3d1/frame 0xfffffe00d2be5950
> > vm_fault_trap() at vm_fault_trap+0x60/frame 0xfffffe00d2be5990
> > trap_pfault() at trap_pfault+0x19c/frame 0xfffffe00d2be59e0
> > trap() at trap+0x3f1/frame 0xfffffe00d2be5af0
> > calltrap() at calltrap+0x8/frame 0xfffffe00d2be5af0
> > --- trap 0xc, rip = 0x80296659a, rsp = 0x7fffffffbd38, rbp = 0x80fc00000
> ---
> > KDB: enter: panic
> >
> > I don't see any crash dump in /var/crash despite having the right
> > configuration and I should have enough space on my swap device (128GB USB
> > drive) :
> >
> > $ cat /etc/rc.conf | grep dump
> > dumpdev="AUTO"
> >
> > $ swapinfo
> > Device          1K-blocks     Used    Avail Capacity
> > /dev/gpt/crash0 121307096        0 121307096     0%
> >
> > $ cat /etc/fstab
> > /dev/gpt/crash0 none    swap    sw              0       0
> >
> > $ dumpon -l
> > gpt/crash0
> >
> > Not sure why no dump was generated, is it because the kernel was compiled
> > with the GENERIC-NODEBUG profile ? However I see various KDB options in
> the
> > GENERIC profile that are inherited by GENERIC-NODEBUG.
> >
> > Happy to recompile the kernel with GENERIC profile if it's required.
> >
> > Thank you.
> > _______________________________________________
> > [hidden email] mailing list
> > https://lists.freebsd.org/mailman/listinfo/freebsd-current
> > To unsubscribe, send any mail to "
> [hidden email]"
> >
>
>
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Kernel crash during video transcoding

Hans Petter Selasky-6
On 2020-08-10 10:41, Alexandre Levy wrote:
> I'm recompiling the kernel using GENERIC at the moment but I'm not sure how
> to enable debugging in i915 kms, there is no compile option for that, am I
> missing something ?

Type:

make config

Before building the i915kms port, then there should be a DEBUG option
you can select.

--HPS
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Kernel crash during video transcoding

Alexandre Levy
Ah thanks, I was doing a make config-recursive and that didn't show the
DEBUG option. It's recompiling the module with DEBUG now.

Le lun. 10 août 2020 à 09:43, Hans Petter Selasky <[hidden email]> a
écrit :

> On 2020-08-10 10:41, Alexandre Levy wrote:
> > I'm recompiling the kernel using GENERIC at the moment but I'm not sure
> how
> > to enable debugging in i915 kms, there is no compile option for that, am
> I
> > missing something ?
>
> Type:
>
> make config
>
> Before building the i915kms port, then there should be a DEBUG option
> you can select.
>
> --HPS
>
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Kernel crash during video transcoding

Alexandre Levy
I could reproduce with GENERIC kernel and i915 kms compiled with DEBUG and
I got this additional info (still no crash dump though) :

Kernel page fault with the following non-sleepable locks held:
kernel: exclusive rw vm object (vm object) r = 0 (0xfffff8037533bc60)
locked @
/usr/ports/graphics/drm-devel-kmod/work/drm-kmod-drm_v5.3_4/drivers/gpu/drm/i915/intel_freebsd.c:186

Looking at the code, the error happens during the call to VM_OBJECT_WLOCK
(memory page locking for write ?) in the intel_freebsd.c (see [1] below).
I'm out for a few days but I'll try to dig more into it when I'm back next
weekend although I have no experience in the drm-devel-kmod codebase. In
the meantime if you have any suggestions on debugging this further I'm
happy to follow them.

Thanks again.


[1] i915/intel_freebsd.c
int
remap_io_mapping(struct vm_area_struct *vma, unsigned long addr,
    unsigned long pfn, unsigned long size, struct io_mapping *iomap)
{
    vm_page_t m;
    vm_object_t vm_obj;
    vm_memattr_t attr;
    vm_paddr_t pa;
    vm_pindex_t pidx, pidx_start;
    int count, rc;

    attr = iomap->attr;
    count = size >> PAGE_SHIFT;
    pa = pfn << PAGE_SHIFT;
    pidx_start = OFF_TO_IDX(addr);
    rc = 0;
    vm_obj = vma->vm_obj;

    vma->vm_pfn_first = pidx_start;

    >>> VM_OBJECT_WLOCK(vm_obj); <<<
    for (pidx = pidx_start; pidx < pidx_start + count;
        pidx++, pa += PAGE_SIZE) {
retry:
        m = vm_page_grab(vm_obj, pidx, VM_ALLOC_NOCREAT);
        if (m == NULL) {
            m = PHYS_TO_VM_PAGE(pa);
            if (!vm_page_busy_acquire(m, VM_ALLOC_WAITFAIL))
                goto retry;
            if (vm_page_insert(m, vm_obj, pidx)) {
                vm_page_xunbusy(m);
                VM_OBJECT_WUNLOCK(vm_obj);
                vm_wait(NULL);
                VM_OBJECT_WLOCK(vm_obj);
                goto retry;
            }
            vm_page_valid(m);
        }
        pmap_page_set_memattr(m, attr);
        vma->vm_pfn_count++;
    }
    VM_OBJECT_WUNLOCK(vm_obj);
    return (rc);
}

Le lun. 10 août 2020 à 09:44, Alexandre Levy <[hidden email]> a écrit :

> Ah thanks, I was doing a make config-recursive and that didn't show the
> DEBUG option. It's recompiling the module with DEBUG now.
>
> Le lun. 10 août 2020 à 09:43, Hans Petter Selasky <[hidden email]> a
> écrit :
>
>> On 2020-08-10 10:41, Alexandre Levy wrote:
>> > I'm recompiling the kernel using GENERIC at the moment but I'm not sure
>> how
>> > to enable debugging in i915 kms, there is no compile option for that,
>> am I
>> > missing something ?
>>
>> Type:
>>
>> make config
>>
>> Before building the i915kms port, then there should be a DEBUG option
>> you can select.
>>
>> --HPS
>>
>
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Kernel crash during video transcoding

Hans Petter Selasky-6
On 2020-08-10 12:59, Alexandre Levy wrote:
> I could reproduce with GENERIC kernel and i915 kms compiled with DEBUG and
> I got this additional info (still no crash dump though) :

If you have the debugger enabled, you will need to type "dump" in the
crash handler to get the core-dump.

--HPS
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Kernel crash during video transcoding

Hans Petter Selasky-6
In reply to this post by Alexandre Levy
Hi,

On 2020-08-10 12:59, Alexandre Levy wrote:
> Looking at the code, the error happens during the call to VM_OBJECT_WLOCK
> (memory page locking for write ?) in the intel_freebsd.c (see [1] below).
> I'm out for a few days but I'll try to dig more into it when I'm back next
> weekend although I have no experience in the drm-devel-kmod codebase. In
> the meantime if you have any suggestions on debugging this further I'm
> happy to follow them.

The problem is likely that the vm_obj is NULL.

I think I recall that this function is special and can only be called
from a certain context, unlike in Linux. Will need the full backtrace
with line numbers in order to debug this.

--HPS
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Kernel crash during video transcoding

Alexandre Levy
Hi,

I could finally generate a crash dump even with a black screen, I had to
guess I was in the crash handler and I type "dump" and enter which worked.
The driver logs "[drm] Cannot find any crtc or sizes" which I guess is the
reason why I couldn't see anything on my screen.

Back to the initial problem, I could start a kgdb session, loaded the
i915kms.ko symbols and here are the results :

(kgdb) bt
#0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
#1  doadump (textdump=0) at /usr/src/sys/kern/kern_shutdown.c:394
#2  0xffffffff8049c26a in db_dump (dummy=<optimized out>,
dummy2=<unavailable>, dummy3=<unavailable>, dummy4=<unavailable>) at
/usr/src/sys/ddb/db_command.c:575
#3  0xffffffff8049c02c in db_command (last_cmdp=<optimized out>,
cmd_table=<optimized out>, dopager=1) at /usr/src/sys/ddb/db_command.c:482
#4  0xffffffff8049bd9d in db_command_loop () at
/usr/src/sys/ddb/db_command.c:535
#5  0xffffffff8049f048 in db_trap (type=<optimized out>, code=<optimized
out>) at /usr/src/sys/ddb/db_main.c:270
#6  0xffffffff80c1b374 in kdb_trap (type=3, code=0, tf=<optimized out>) at
/usr/src/sys/kern/subr_kdb.c:699
#7  0xffffffff8100ca98 in trap (frame=0xfffffe00d7567300) at
/usr/src/sys/amd64/amd64/trap.c:576
#8  <signal handler called>
#9  kdb_enter (why=0xffffffff811d5de0 "panic", msg=<optimized out>) at
/usr/src/sys/kern/subr_kdb.c:486
#10 0xffffffff80bd00be in vpanic (fmt=<optimized out>, ap=<optimized out>)
at /usr/src/sys/kern/kern_shutdown.c:902
#11 0xffffffff80bcfe53 in panic (fmt=0xffffffff81c8c7c8 <cnputs_mtx>
"\b\214\031\201\377\377\377\377") at /usr/src/sys/kern/kern_shutdown.c:839
#12 0xffffffff8100cee7 in trap_fatal (frame=0xfffffe00d7567600, eva=0) at
/usr/src/sys/amd64/amd64/trap.c:915
#13 0xffffffff8100c360 in trap (frame=0xfffffe00d7567600) at
/usr/src/sys/amd64/amd64/trap.c:212
#14 <signal handler called>
#15 _rw_wowned (c=0x2659c92217d5aa52) at /usr/src/sys/kern/kern_rwlock.c:270
#16 0xffffffff80ec23ed in vm_page_busy_acquire (m=0xfffffe00040ff9e8,
allocflags=16) at /usr/src/sys/vm/vm_page.c:884
#17 0xffffffff82b4e980 in remap_io_mapping (vma=0xfffff80315148300,
addr=<optimized out>, pfn=<optimized out>, size=<optimized out>,
iomap=<optimized out>)
    at
/usr/ports/graphics/drm-devel-kmod/work/drm-kmod-drm_v5.3_4/drivers/gpu/drm/i915/intel_freebsd.c:193
#18 0xffffffff82be1c5f in i915_gem_fault (dummy=<optimized out>,
vmf=<optimized out>)
    at
/usr/ports/graphics/drm-devel-kmod/work/drm-kmod-drm_v5.3_4/drivers/gpu/drm/i915/gem/i915_gem_mman.c:367
#19 0xffffffff82cb5ddf in linux_cdev_pager_populate
(vm_obj=0xfffff80368501420, pidx=<optimized out>, fault_type=<optimized
out>, max_prot=<optimized out>,
    first=0xfffffe00d7567868, last=0xfffffe00d7567888) at
/usr/src/sys/compat/linuxkpi/common/src/linux_compat.c:554
#20 0xffffffff80ea9e8f in vm_pager_populate (object=0x2659c92217d5aa52,
pidx=18446741874754451944, fault_type=0, max_prot=0 '\000',
first=<optimized out>, last=<optimized out>)
    at /usr/src/sys/vm/vm_pager.h:172
#21 vm_fault_populate (fs=<optimized out>) at /usr/src/sys/vm/vm_fault.c:444
#22 vm_fault_allocate (fs=<optimized out>) at
/usr/src/sys/vm/vm_fault.c:1028
#23 vm_fault (map=<optimized out>, vaddr=<optimized out>,
fault_type=<optimized out>, fault_flags=<optimized out>, m_hold=<optimized
out>) at /usr/src/sys/vm/vm_fault.c:1338
#24 0xffffffff80ea98ee in vm_fault_trap (map=0xfffffe00c0f539e8,
vaddr=<optimized out>, fault_type=<optimized out>, fault_flags=0,
signo=0xfffffe00d7567ac4,
    ucode=0xfffffe00d7567ac0) at /usr/src/sys/vm/vm_fault.c:585
#25 0xffffffff8100d0de in trap_pfault (frame=0xfffffe00d7567b00,
usermode=<optimized out>, signo=<optimized out>, ucode=0xffffffff81d1de80
<w_locklistdata+160624>)
    at /usr/src/sys/amd64/amd64/trap.c:817
#26 0xffffffff8100c72c in trap (frame=0xfffffe00d7567b00) at
/usr/src/sys/amd64/amd64/trap.c:340
#27 <signal handler called>
#28 0x000000080296659a in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffffffbf38
(kgdb) list *0xffffffff82be1c5f
0xffffffff82be1c5f is in i915_gem_fault
(/usr/ports/graphics/drm-devel-kmod/work/drm-kmod-drm_v5.3_4/drivers/gpu/drm/i915/gem/i915_gem_mman.c:367).
362             ret = i915_vma_pin_fence(vma);
363             if (ret)
364                     goto err_unpin;
365
366             /* Finally, remap it using the new GTT offset */
367             ret = remap_io_mapping(area,
368                                    area->vm_start +
(vma->ggtt_view.partial.offset << PAGE_SHIFT),
369                                    (ggtt->gmadr.start +
vma->node.start) >> PAGE_SHIFT,
370                                    min_t(u64, vma->size, area->vm_end -
area->vm_start),
371                                    &ggtt->iomap);
(kgdb) list *0xffffffff82b4e980
0xffffffff82b4e980 is in remap_io_mapping
(/usr/ports/graphics/drm-devel-kmod/work/drm-kmod-drm_v5.3_4/drivers/gpu/drm/i915/intel_freebsd.c:193).
188                 pidx++, pa += PAGE_SIZE) {
189     retry:
190                     m = vm_page_grab(vm_obj, pidx, VM_ALLOC_NOCREAT);
191                     if (m == NULL) {
192                             m = PHYS_TO_VM_PAGE(pa);
193                             if (!vm_page_busy_acquire(m,
VM_ALLOC_WAITFAIL))
194                                     goto retry;
195                             if (vm_page_insert(m, vm_obj, pidx)) {
196                                     vm_page_xunbusy(m);
197                                     VM_OBJECT_WUNLOCK(vm_obj);
(kgdb) list *0xffffffff80ec23ed
0xffffffff80ec23ed is in vm_page_busy_acquire
(/usr/src/sys/vm/vm_page.c:884).
879                     if (vm_page_tryacquire(m, allocflags))
880                             return (true);
881                     if ((allocflags & VM_ALLOC_NOWAIT) != 0)
882                             return (false);
883                     if (obj != NULL)
884                             locked = VM_OBJECT_WOWNED(obj);
885                     else
886                             locked = false;
887                     MPASS(locked || vm_page_wired(m));
888                     if (_vm_page_busy_sleep(obj, m, m->pindex, "vmpba",
allocflags,

It seems like the problem occured when calling vm_page_busy_acquire(m,
VM_ALLOC_WAITFAIL) where m might be a NULL pointer ? I am very new to
kernel debugging so not sure where to go from there.

Thanks.

Le lun. 10 août 2020 à 12:04, Hans Petter Selasky <[hidden email]> a
écrit :

> Hi,
>
> On 2020-08-10 12:59, Alexandre Levy wrote:
> > Looking at the code, the error happens during the call to VM_OBJECT_WLOCK
> > (memory page locking for write ?) in the intel_freebsd.c (see [1] below).
> > I'm out for a few days but I'll try to dig more into it when I'm back
> next
> > weekend although I have no experience in the drm-devel-kmod codebase. In
> > the meantime if you have any suggestions on debugging this further I'm
> > happy to follow them.
>
> The problem is likely that the vm_obj is NULL.
>
> I think I recall that this function is special and can only be called
> from a certain context, unlike in Linux. Will need the full backtrace
> with line numbers in order to debug this.
>
> --HPS
>
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Kernel crash during video transcoding

Alexandre Levy
Hi,

I looked at the crash dump and the code more closely:

#18 0xffffffff82be1c5f in i915_gem_fault (dummy=<optimized out>,
vmf=<optimized out>)
    at
/usr/ports/graphics/drm-devel-kmod/work/drm-kmod-drm_v5.3_4/drivers/gpu/drm/i915/gem/i915_gem_mman.c:367
(kgdb) p area->vm_obj->lock
$43 = {lock_object = {lo_name = 0xffffffff8112c767 "vm object", lo_flags =
627245056, lo_data = 0, lo_witness = 0xfffff8045f575800}, rw_lock =
18446741878623409920}

So vm_obj is not NULL and has a rw_lock member

Now at intel_freebsd.c:193 (frame #17) the driver calls
vm_page_busy_acquire(m, VM_ALLOC_WAITFAIL). 'm' is the page grabbed from
vm_obj of the calling frame.

The panic occurs in kern_rwlock.c:270 in frame #15 when
calling rw_wowner(rwlock2rw(c)) so something goes wrong either in rw_wowner
or in rwlock2rw.

Looking at rwlock2rw() :

/*
 * Return the rwlock address when the lock cookie address is provided.
 * This functionality assumes that struct rwlock* have a member named
rw_lock.
 */
#define rwlock2rw(c)    (__containerof(c, struct rwlock, rw_lock))

I think this one is just extracting out the rw_lock member of the passed in
struct. However I don't understand what the cookie address is about due to
my lack of knowledge on kernel locking concepts. So maybe something is
wrong with the cookie or the rw_lock value itself.

Looking at rw_wowner() :

/*
 * Return a pointer to the owning thread if the lock is write-locked or
 * NULL if the lock is unlocked or read-locked.
 */

#define lv_rw_wowner(v)                         \
    ((v) & RW_LOCK_READ ? NULL :                    \
     (struct thread *)RW_OWNER((v)))

#define rw_wowner(rw)   lv_rw_wowner(RW_READ_VALUE(rw))

I don't think that one could cause a panic but again I'm not experienced
enough to be sure, it seems this either returns the thread that owns the
lock or NULL if no thread owns it.

The is also the fact that the driver calls vm_page_busy_acquire with the
VM_ALLOC_WAITFAIL flag which is defined in vm_page.h as :

#define VM_ALLOC_WAITFAIL   0x0010  /* (acf) Sleep and return error */

Could this be the reason of the panic as in we try to lock, then cannot and
eventually just return an error without retrying ? There is the flag
VM_ALLOC_WAITOK that says /* (acf) Sleep and retry */. Should I try to
patch intel_freebsd.c to use this flag instead ?

Thanks.

Le sam. 15 août 2020 à 20:35, Alexandre Levy <[hidden email]> a écrit :

> Hi,
>
> I could finally generate a crash dump even with a black screen, I had to
> guess I was in the crash handler and I type "dump" and enter which worked.
> The driver logs "[drm] Cannot find any crtc or sizes" which I guess is the
> reason why I couldn't see anything on my screen.
>
> Back to the initial problem, I could start a kgdb session, loaded the
> i915kms.ko symbols and here are the results :
>
> (kgdb) bt
> #0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
> #1  doadump (textdump=0) at /usr/src/sys/kern/kern_shutdown.c:394
> #2  0xffffffff8049c26a in db_dump (dummy=<optimized out>,
> dummy2=<unavailable>, dummy3=<unavailable>, dummy4=<unavailable>) at
> /usr/src/sys/ddb/db_command.c:575
> #3  0xffffffff8049c02c in db_command (last_cmdp=<optimized out>,
> cmd_table=<optimized out>, dopager=1) at /usr/src/sys/ddb/db_command.c:482
> #4  0xffffffff8049bd9d in db_command_loop () at
> /usr/src/sys/ddb/db_command.c:535
> #5  0xffffffff8049f048 in db_trap (type=<optimized out>, code=<optimized
> out>) at /usr/src/sys/ddb/db_main.c:270
> #6  0xffffffff80c1b374 in kdb_trap (type=3, code=0, tf=<optimized out>) at
> /usr/src/sys/kern/subr_kdb.c:699
> #7  0xffffffff8100ca98 in trap (frame=0xfffffe00d7567300) at
> /usr/src/sys/amd64/amd64/trap.c:576
> #8  <signal handler called>
> #9  kdb_enter (why=0xffffffff811d5de0 "panic", msg=<optimized out>) at
> /usr/src/sys/kern/subr_kdb.c:486
> #10 0xffffffff80bd00be in vpanic (fmt=<optimized out>, ap=<optimized out>)
> at /usr/src/sys/kern/kern_shutdown.c:902
> #11 0xffffffff80bcfe53 in panic (fmt=0xffffffff81c8c7c8 <cnputs_mtx>
> "\b\214\031\201\377\377\377\377") at /usr/src/sys/kern/kern_shutdown.c:839
> #12 0xffffffff8100cee7 in trap_fatal (frame=0xfffffe00d7567600, eva=0) at
> /usr/src/sys/amd64/amd64/trap.c:915
> #13 0xffffffff8100c360 in trap (frame=0xfffffe00d7567600) at
> /usr/src/sys/amd64/amd64/trap.c:212
> #14 <signal handler called>
> #15 _rw_wowned (c=0x2659c92217d5aa52) at
> /usr/src/sys/kern/kern_rwlock.c:270
> #16 0xffffffff80ec23ed in vm_page_busy_acquire (m=0xfffffe00040ff9e8,
> allocflags=16) at /usr/src/sys/vm/vm_page.c:884
> #17 0xffffffff82b4e980 in remap_io_mapping (vma=0xfffff80315148300,
> addr=<optimized out>, pfn=<optimized out>, size=<optimized out>,
> iomap=<optimized out>)
>     at
> /usr/ports/graphics/drm-devel-kmod/work/drm-kmod-drm_v5.3_4/drivers/gpu/drm/i915/intel_freebsd.c:193
> #18 0xffffffff82be1c5f in i915_gem_fault (dummy=<optimized out>,
> vmf=<optimized out>)
>     at
> /usr/ports/graphics/drm-devel-kmod/work/drm-kmod-drm_v5.3_4/drivers/gpu/drm/i915/gem/i915_gem_mman.c:367
> #19 0xffffffff82cb5ddf in linux_cdev_pager_populate
> (vm_obj=0xfffff80368501420, pidx=<optimized out>, fault_type=<optimized
> out>, max_prot=<optimized out>,
>     first=0xfffffe00d7567868, last=0xfffffe00d7567888) at
> /usr/src/sys/compat/linuxkpi/common/src/linux_compat.c:554
> #20 0xffffffff80ea9e8f in vm_pager_populate (object=0x2659c92217d5aa52,
> pidx=18446741874754451944, fault_type=0, max_prot=0 '\000',
> first=<optimized out>, last=<optimized out>)
>     at /usr/src/sys/vm/vm_pager.h:172
> #21 vm_fault_populate (fs=<optimized out>) at
> /usr/src/sys/vm/vm_fault.c:444
> #22 vm_fault_allocate (fs=<optimized out>) at
> /usr/src/sys/vm/vm_fault.c:1028
> #23 vm_fault (map=<optimized out>, vaddr=<optimized out>,
> fault_type=<optimized out>, fault_flags=<optimized out>, m_hold=<optimized
> out>) at /usr/src/sys/vm/vm_fault.c:1338
> #24 0xffffffff80ea98ee in vm_fault_trap (map=0xfffffe00c0f539e8,
> vaddr=<optimized out>, fault_type=<optimized out>, fault_flags=0,
> signo=0xfffffe00d7567ac4,
>     ucode=0xfffffe00d7567ac0) at /usr/src/sys/vm/vm_fault.c:585
> #25 0xffffffff8100d0de in trap_pfault (frame=0xfffffe00d7567b00,
> usermode=<optimized out>, signo=<optimized out>, ucode=0xffffffff81d1de80
> <w_locklistdata+160624>)
>     at /usr/src/sys/amd64/amd64/trap.c:817
> #26 0xffffffff8100c72c in trap (frame=0xfffffe00d7567b00) at
> /usr/src/sys/amd64/amd64/trap.c:340
> #27 <signal handler called>
> #28 0x000000080296659a in ?? ()
> Backtrace stopped: Cannot access memory at address 0x7fffffffbf38
> (kgdb) list *0xffffffff82be1c5f
> 0xffffffff82be1c5f is in i915_gem_fault
> (/usr/ports/graphics/drm-devel-kmod/work/drm-kmod-drm_v5.3_4/drivers/gpu/drm/i915/gem/i915_gem_mman.c:367).
> 362             ret = i915_vma_pin_fence(vma);
> 363             if (ret)
> 364                     goto err_unpin;
> 365
> 366             /* Finally, remap it using the new GTT offset */
> 367             ret = remap_io_mapping(area,
> 368                                    area->vm_start +
> (vma->ggtt_view.partial.offset << PAGE_SHIFT),
> 369                                    (ggtt->gmadr.start +
> vma->node.start) >> PAGE_SHIFT,
> 370                                    min_t(u64, vma->size, area->vm_end
> - area->vm_start),
> 371                                    &ggtt->iomap);
> (kgdb) list *0xffffffff82b4e980
> 0xffffffff82b4e980 is in remap_io_mapping
> (/usr/ports/graphics/drm-devel-kmod/work/drm-kmod-drm_v5.3_4/drivers/gpu/drm/i915/intel_freebsd.c:193).
> 188                 pidx++, pa += PAGE_SIZE) {
> 189     retry:
> 190                     m = vm_page_grab(vm_obj, pidx, VM_ALLOC_NOCREAT);
> 191                     if (m == NULL) {
> 192                             m = PHYS_TO_VM_PAGE(pa);
> 193                             if (!vm_page_busy_acquire(m,
> VM_ALLOC_WAITFAIL))
> 194                                     goto retry;
> 195                             if (vm_page_insert(m, vm_obj, pidx)) {
> 196                                     vm_page_xunbusy(m);
> 197                                     VM_OBJECT_WUNLOCK(vm_obj);
> (kgdb) list *0xffffffff80ec23ed
> 0xffffffff80ec23ed is in vm_page_busy_acquire
> (/usr/src/sys/vm/vm_page.c:884).
> 879                     if (vm_page_tryacquire(m, allocflags))
> 880                             return (true);
> 881                     if ((allocflags & VM_ALLOC_NOWAIT) != 0)
> 882                             return (false);
> 883                     if (obj != NULL)
> 884                             locked = VM_OBJECT_WOWNED(obj);
> 885                     else
> 886                             locked = false;
> 887                     MPASS(locked || vm_page_wired(m));
> 888                     if (_vm_page_busy_sleep(obj, m, m->pindex,
> "vmpba", allocflags,
>
> It seems like the problem occured when calling vm_page_busy_acquire(m,
> VM_ALLOC_WAITFAIL) where m might be a NULL pointer ? I am very new to
> kernel debugging so not sure where to go from there.
>
> Thanks.
>
> Le lun. 10 août 2020 à 12:04, Hans Petter Selasky <[hidden email]> a
> écrit :
>
>> Hi,
>>
>> On 2020-08-10 12:59, Alexandre Levy wrote:
>> > Looking at the code, the error happens during the call to
>> VM_OBJECT_WLOCK
>> > (memory page locking for write ?) in the intel_freebsd.c (see [1]
>> below).
>> > I'm out for a few days but I'll try to dig more into it when I'm back
>> next
>> > weekend although I have no experience in the drm-devel-kmod codebase. In
>> > the meantime if you have any suggestions on debugging this further I'm
>> > happy to follow them.
>>
>> The problem is likely that the vm_obj is NULL.
>>
>> I think I recall that this function is special and can only be called
>> from a certain context, unlike in Linux. Will need the full backtrace
>> with line numbers in order to debug this.
>>
>> --HPS
>>
>
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Kernel crash during video transcoding

Hans Petter Selasky-6
On 2020-08-16 17:28, Alexandre Levy wrote:
> Now at intel_freebsd.c:193 (frame #17) the driver calls
> vm_page_busy_acquire(m, VM_ALLOC_WAITFAIL). 'm' is the page grabbed from
> vm_obj of the calling frame.

Can you check if "m" is NULL at this point?

--HPS
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Kernel crash during video transcoding

Alexandre Levy
"m" is not NULL :

(kgdb) frame 16
#16 0xffffffff80ec23ed in vm_page_busy_acquire (m=0xfffffe00040ff9e8,
allocflags=16) at /usr/src/sys/vm/vm_page.c:884
(kgdb) p *m
$2 = {plinks = {q = {tqe_next = 0x578491b51dd60510, tqe_prev =
0xd78c11bd9dde8518}, s = {ss = {sle_next = 0x578491b51dd60510}}, memguard =
{p = 6306325585301210384,
      v = 15531808720989095192}, uma = {slab = 0x578491b51dd60510, zone =
0xd78c11bd9dde8518}}, listq = {tqe_next = 0xd78c11bd9dde8518, tqe_prev =
0x265bc92017d7aa38},
  object = 0x2659c92217d5aa3a, pindex = 2758957463725517354, phys_addr =
2758957463725517354, md = {pv_list = {tqh_first = 0x2e49c1321fc5a22a,
tqh_last = 0x3e4bd1300fc7b228},
    pv_gen = 265794104, pat_mode = 1046204704}, ref_count = 257405624,
busy_lock = 1054593440, a = {{flags = 4757, queue = 48 '0', act_count = 134
'\206'}, _bits = 2251297429},
  order = 98 'b', pool = 204 '\314', flags = 75 'K', oflags = 105 'i',
psind = -107 '\225', segind = 18 '\022', valid = 48 '0', dirty = 134 '\206'}

I had to recompile drm-devel-kmod with make WITH_DEBUG=yes DEBUG_FLAGS="-g
-O0" because "m" was optimized out. I then started a kgdb session with the
same crash dump than before, loaded the module symbols with add-kld
/boot/modules/i915kms.ko and I now have a different backtrace from frames
#17 to #28.

Also the panic doesn't occur when I plug a screen to the HDMI port (which
now works for some reason...) and I can see the frame #17 is now the
following :

#17 0xffffffff82b4e980 in intel_plane_can_remap
(plane_state=0xfffff80315148300)
    at
/usr/ports/graphics/drm-devel-kmod/work/drm-kmod-drm_v5.3_4/drivers/gpu/drm/i915/display/intel_display.c:2583

and used to be :

#17 0xffffffff82b4e980 in remap_io_mapping (vma=0xfffff80315148300,
addr=<optimized out>, pfn=<optimized out>, size=<optimized out>,
iomap=<optimized out>)

I don't understand why the backtrace changed although the crash dump is the
same as before. Any suggestions ?

Le dim. 16 août 2020 à 18:19, Hans Petter Selasky <[hidden email]> a
écrit :

> On 2020-08-16 17:28, Alexandre Levy wrote:
> > Now at intel_freebsd.c:193 (frame #17) the driver calls
> > vm_page_busy_acquire(m, VM_ALLOC_WAITFAIL). 'm' is the page grabbed from
> > vm_obj of the calling frame.
>
> Can you check if "m" is NULL at this point?
>
> --HPS
>
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Kernel crash during video transcoding

Hans Petter Selasky-6
On 2020-08-16 22:23, Alexandre Levy wrote:

> (kgdb) p *m
> $2 = {plinks = {q = {tqe_next = 0x578491b51dd60510, tqe_prev =
> 0xd78c11bd9dde8518}, s = {ss = {sle_next = 0x578491b51dd60510}}, memguard =
> {p = 6306325585301210384,
>        v = 15531808720989095192}, uma = {slab = 0x578491b51dd60510, zone =
> 0xd78c11bd9dde8518}}, listq = {tqe_next = 0xd78c11bd9dde8518, tqe_prev =
> 0x265bc92017d7aa38},
>    object = 0x2659c92217d5aa3a, pindex = 2758957463725517354, phys_addr =
> 2758957463725517354, md = {pv_list = {tqh_first = 0x2e49c1321fc5a22a,
> tqh_last = 0x3e4bd1300fc7b228},
>      pv_gen = 265794104, pat_mode = 1046204704}, ref_count = 257405624,
> busy_lock = 1054593440, a = {{flags = 4757, queue = 48 '0', act_count = 134
> '\206'}, _bits = 2251297429},
>    order = 98 'b', pool = 204 '\314', flags = 75 'K', oflags = 105 'i',
> psind = -107 '\225', segind = 18 '\022', valid = 48 '0', dirty = 134 '\206'}

This "m" structure looks freed.

It looks like a use after free issue.

Can you enter this in GDB:

set print pretty on

Then dump some more structures you can get hold of?

--HPS
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Kernel crash during video transcoding

Alexandre Levy
For reference, below is the backtrace then further down I printed the
structures I could access :

#0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
#1  doadump (textdump=0) at /usr/src/sys/kern/kern_shutdown.c:394
#2  0xffffffff8049c26a in db_dump (dummy=<optimized out>,
dummy2=<unavailable>, dummy3=<unavailable>, dummy4=<unavailable>) at
/usr/src/sys/ddb/db_command.c:575
#3  0xffffffff8049c02c in db_command (last_cmdp=<optimized out>,
cmd_table=<optimized out>, dopager=1) at /usr/src/sys/ddb/db_command.c:482
#4  0xffffffff8049bd9d in db_command_loop () at
/usr/src/sys/ddb/db_command.c:535
#5  0xffffffff8049f048 in db_trap (type=<optimized out>, code=<optimized
out>) at /usr/src/sys/ddb/db_main.c:270
#6  0xffffffff80c1b374 in kdb_trap (type=3, code=0, tf=<optimized out>) at
/usr/src/sys/kern/subr_kdb.c:699
#7  0xffffffff8100ca98 in trap (frame=0xfffffe00d7567300) at
/usr/src/sys/amd64/amd64/trap.c:576
#8  <signal handler called>
#9  kdb_enter (why=0xffffffff811d5de0 "panic", msg=<optimized out>) at
/usr/src/sys/kern/subr_kdb.c:486
#10 0xffffffff80bd00be in vpanic (fmt=<optimized out>, ap=<optimized out>)
at /usr/src/sys/kern/kern_shutdown.c:902
#11 0xffffffff80bcfe53 in panic (fmt=0xffffffff81c8c7c8 <cnputs_mtx>
"\b\214\031\201\377\377\377\377") at /usr/src/sys/kern/kern_shutdown.c:839
#12 0xffffffff8100cee7 in trap_fatal (frame=0xfffffe00d7567600, eva=0) at
/usr/src/sys/amd64/amd64/trap.c:915
#13 0xffffffff8100c360 in trap (frame=0xfffffe00d7567600) at
/usr/src/sys/amd64/amd64/trap.c:212
#14 <signal handler called>
#15 _rw_wowned (c=0x2659c92217d5aa52) at /usr/src/sys/kern/kern_rwlock.c:270
#16 0xffffffff80ec23ed in vm_page_busy_acquire (m=0xfffffe00040ff9e8,
allocflags=16) at /usr/src/sys/vm/vm_page.c:884
#17 0xffffffff82b4e980 in intel_plane_can_remap
(plane_state=0xfffff80315148300)
    at
/usr/ports/graphics/drm-devel-kmod/work/drm-kmod-drm_v5.3_4/drivers/gpu/drm/i915/display/intel_display.c:2583
#18 0xffffffff82be1c5f in skl_ddb_get_pipe_allocation_limits (dev_priv=0x0,
cstate=0x1, total_data_rate=18446735292251509792, ddb=0xfffff80368501438,
alloc=0xfffff80315148300,
    num_active=0xfffffe00eb0b6c58) at
/usr/ports/graphics/drm-devel-kmod/work/drm-kmod-drm_v5.3_4/drivers/gpu/drm/i915/intel_pm.c:3928
#19 0xffffffff82cb5ddf in ?? () at
/usr/src/sys/compat/linuxkpi/common/include/linux/kref.h:68 from
/boot/modules/i915kms.ko
#20 0xffffffff80ea9e8f in vm_pager_populate (object=0x2659c92217d5aa52,
pidx=18446741874754451944, fault_type=0, max_prot=0 '\000',
first=<optimized out>, last=<optimized out>)
    at /usr/src/sys/vm/vm_pager.h:172
#21 vm_fault_populate (fs=<optimized out>) at /usr/src/sys/vm/vm_fault.c:444
#22 vm_fault_allocate (fs=<optimized out>) at
/usr/src/sys/vm/vm_fault.c:1028
#23 vm_fault (map=<optimized out>, vaddr=<optimized out>,
fault_type=<optimized out>, fault_flags=<optimized out>, m_hold=<optimized
out>) at /usr/src/sys/vm/vm_fault.c:1338
#24 0xffffffff80ea98ee in vm_fault_trap (map=0xfffffe00c0f539e8,
vaddr=<optimized out>, fault_type=<optimized out>, fault_flags=0,
signo=0xfffffe00d7567ac4,
    ucode=0xfffffe00d7567ac0) at /usr/src/sys/vm/vm_fault.c:585
#25 0xffffffff8100d0de in trap_pfault (frame=0xfffffe00d7567b00,
usermode=<optimized out>, signo=<optimized out>, ucode=0xffffffff81d1de80
<w_locklistdata+160624>)
    at /usr/src/sys/amd64/amd64/trap.c:817
#26 0xffffffff8100c72c in trap (frame=0xfffffe00d7567b00) at
/usr/src/sys/amd64/amd64/trap.c:340
#27 <signal handler called>
#28 0x000000080296659a in ?? ()

(kgdb) frame 24
(kgdb) p *map
$35 = {
  header = {
    left = 0xfffff802b72c4060,
    right = 0xfffff803681965a0,
    start = 140737488355328,
    end = 4096,
    next_read = 0,
    max_free = 0,
    object = {
      vm_object = 0x0,
      sub_map = 0x0
    },
    offset = 0,
    eflags = 524288,
    protection = 0 '\000',
    max_protection = 0 '\000',
    inheritance = 0 '\000',
    read_ahead = 0 '\000',
    wired_count = 0,
    cred = 0x0,
    wiring_thread = 0x0
  },
  lock = {
    lock_object = {
      lo_name = 0xffffffff81183cec "vm map (user)",
      lo_flags = 36896768,
      lo_data = 0,
      lo_witness = 0xfffff8045f575780
    },
    sx_lock = 1
  },
  system_mtx = {
    lock_object = {
      lo_name = 0xffffffff81136b96 "vm map (system)",
      lo_flags = 21168128,
      lo_data = 0,
      lo_witness = 0xfffff8045f575580
    },
    mtx_lock = 0
  },
  nentries = 172,
  size = 199905280,
  timestamp = 792,
  needs_wakeup = 0 '\000',
  system_map = 0 '\000',
  flags = 0 '\000',
  root = 0xfffff803686b1c00,
  pmap = 0xfffffe00c0f53b08,
  anon_loc = 34366283776,
  busy = 0
}
(kgdb) frame 15
#15 _rw_wowned (c=0x2659c92217d5aa52) at /usr/src/sys/kern/kern_rwlock.c:270
270             return (rw_wowner(rwlock2rw(c)) == curthread);
(kgdb) p/x c
$14 = 0x2659c92217d5aa52
(kgdb) up
#16 0xffffffff80ec23ed in vm_page_busy_acquire (m=0xfffffe00040ff9e8,
allocflags=16) at /usr/src/sys/vm/vm_page.c:884
884                             locked = VM_OBJECT_WOWNED(obj);
(kgdb) p *m
$16 = {
  plinks = {
    q = {
      tqe_next = 0x578491b51dd60510,
      tqe_prev = 0xd78c11bd9dde8518
    },
    s = {
      ss = {
        sle_next = 0x578491b51dd60510
      }
    },
    memguard = {
      p = 6306325585301210384,
      v = 15531808720989095192
    },
    uma = {
      slab = 0x578491b51dd60510,
      zone = 0xd78c11bd9dde8518
    }
  },
  listq = {
    tqe_next = 0xd78c11bd9dde8518,
    tqe_prev = 0x265bc92017d7aa38
  },
  object = 0x2659c92217d5aa3a,
  pindex = 2758957463725517354,
  phys_addr = 2758957463725517354,
  md = {
    pv_list = {
      tqh_first = 0x2e49c1321fc5a22a,
      tqh_last = 0x3e4bd1300fc7b228
    },
    pv_gen = 265794104,
    pat_mode = 1046204704
  },
  ref_count = 257405624,
  busy_lock = 1054593440,
  a = {
    {
      flags = 4757,
      queue = 48 '0',
      act_count = 134 '\206'
    },
    _bits = 2251297429
  },
  order = 98 'b',
  pool = 204 '\314',
  flags = 75 'K',
  oflags = 105 'i',
  psind = -107 '\225',
  segind = 18 '\022',
  valid = 48 '0',
  dirty = 134 '\206'
}
(kgdb) up
#17 0xffffffff82b4e980 in intel_plane_can_remap
(plane_state=0xfffff80315148300)
    at
/usr/ports/graphics/drm-devel-kmod/work/drm-kmod-drm_v5.3_4/drivers/gpu/drm/i915/display/intel_display.c:2583
2583            if (plane->id == PLANE_CURSOR)
(kgdb) p *plane_state
$18 = {
  base = {
    plane = 0x0,
    crtc = 0x300000,
    fb = 0x100000,
    fence = 0x1b,
    crtc_x = 104451,
    crtc_y = 0,
    crtc_w = 734353152,
    crtc_h = 4294965248,
    src_x = 3949985792,
    src_y = 4294966784,
    src_h = 2193719064,
    src_w = 4294967295,
    alpha = 30720,
    pixel_blend_mode = 64271,
    rotation = 4294965250,
    zpos = 0,
    normalized_zpos = 0,
    color_encoding = DRM_COLOR_YCBCR_BT601,
    color_range = DRM_COLOR_YCBCR_LIMITED_RANGE,
    fb_damage_clips = 0x0,
    src = {
      x1 = 0,
      y1 = 0,
      x2 = 353665888,
      y2 = -2045
    },
    dst = {
      x1 = 1750078496,
      y1 = -2045,
      x2 = 0,
      y2 = 0
    },
    visible = false,
    commit = 0xffffffff82cc3370 <gem_record_fences+48>,
    state = 0x0
  },
  view = {
    type = I915_GGTT_VIEW_NORMAL,
    {
      partial = {
        offset = 0,
        size = 0
      },
      rotated = {
        plane = {{
            width = 0,
            height = 0,
            stride = 0,
            offset = 0
          }, {
            width = 0,
            height = 0,
            stride = 0,
            offset = 0
          }}
      },
      remapped = {
        plane = {{
            width = 0,
            height = 0,
            stride = 0,
            offset = 0
          }, {
            width = 0,
            height = 0,
            stride = 0,
            offset = 0
          }},
        unused_mbz = 0
      }
    }
  },
  vma = 0x0,
  flags = 0,
  color_plane = {{
      offset = 0,
      stride = 0,
      x = 0,
      y = 0
    }, {
      offset = 0,
      stride = 0,
      x = 0,
      y = 0
    }},
  ctl = 0,
  color_ctl = 0,
  scaler_id = 0,
  linked_plane = 0xfffff80315148500,
  slave = 353665024,
  ckey = {
    plane_id = 4294965251,
    min_value = 3735929054,
    channel_mask = 3735929054,
    max_value = 3735929054,
    flags = 3735928833
  }
}
(kgdb) p *plane_state->linked_plane
$19 = {
  base = {
    dev = 0xfffff802f50d3910,
    head = {
      next = 0xfffff80315148400,
      prev = 0xdeadc0dedeadc0de
    },
    name = 0xdeadc001deadc0de <error: Cannot access memory at address
0xdeadc001deadc0de>,
    mutex = {
      mutex = {
        base = {
          sx = {
            lock_object = {
              lo_name = 0x28274 <error: Cannot access memory at address
0x28274>,
              lo_flags = 5,
              lo_data = 0,
              lo_witness = 0x60
            },
            sx_lock = 3907697
          }
        },
        condvar = {
          cv_description = 0x0,
          cv_waiters = 50644
        },
        ctx = 0x3336663265336563
      },
      head = {
        next = 0x6433633439633264,
        prev = 0x3131623462353561
      }
    },
    base = {
      id = 912548663,
      type = 825506101,
      properties = 0x61632e3436656c2d,
      refcount = {
        refcount = {
          counter = 761620579
        }
      },
      free_cb = 0xdeadc0dedead004b
    },
    possible_crtcs = 3735929054,
    format_types = 0xdeadc0dedeadc0de,
    format_count = 3735929054,
    format_default = 222,
    modifiers = 0xdeadc0dedeadc0de,
    modifier_count = 3735929054,
    crtc = 0xdeadc0dedeadc0de,
    fb = 0xdeadc0dedeadc0de,
    old_fb = 0xdeadc0dedeadc0de,
    funcs = 0xdeadc0dedeadc0de,
    properties = {
      count = -559038242,
      properties = {0xdeadc0dedeadc0de, 0xdeadc0dedeadc0de,
0xdeadc0dedeadc0de, 0xdeadc0dedeadc0de, 0xffffffff825f20c0 <M_SOLARIS>,
0xdeadc0dedeadc0de <repeats 19 times>},
      values = {16045693110842147038 <repeats 12 times>,
18446744071601856704, 16045693110842147038 <repeats 11 times>}
    },
    type = (DRM_PLANE_TYPE_CURSOR | unknown: 3735929052),
    index = 3735929054,
    helper_private = 0xdeadc0dedeadc0de,
    state = 0xdeadc0dedeadc0de,
    alpha_property = 0xdeadc0dedeadc0de,
    zpos_property = 0xdeadc0dedeadc0de,
    rotation_property = 0xdeadc0dedeadc0de,
    blend_mode_property = 0xdeadc0dedeadc0de,
    color_encoding_property = 0xdeadc0dedeadc0de,
    color_range_property = 0xdeadc0dedeadc0de
  },
  i9xx_plane = (PLANE_C | unknown: 3735929052),
  id = 3735929054,
  pipe = -559038242,
  has_fbc = 222,
  has_ccs = 192,
  frontbuffer_bit = 3735929054,
  cursor = {
    base = 3735929054,
    cntl = 3735929054,
    size = 3735929054
  },
  max_stride = 0xdeadc0dedeadc0de,
  update_plane = 0xdeadc0dedeadc0de,
  update_slave = 0xdeadc0dedeadc0de,
  disable_plane = 0xdeadc0dedeadc0de,
  get_hw_state = 0xdeadc0dedeadc0de,
  check_plane = 0xdeadc0dedeadc0de
}

Le lun. 17 août 2020 à 09:03, Hans Petter Selasky <[hidden email]> a
écrit :

> On 2020-08-16 22:23, Alexandre Levy wrote:
> > (kgdb) p *m
> > $2 = {plinks = {q = {tqe_next = 0x578491b51dd60510, tqe_prev =
> > 0xd78c11bd9dde8518}, s = {ss = {sle_next = 0x578491b51dd60510}},
> memguard =
> > {p = 6306325585301210384,
> >        v = 15531808720989095192}, uma = {slab = 0x578491b51dd60510, zone
> =
> > 0xd78c11bd9dde8518}}, listq = {tqe_next = 0xd78c11bd9dde8518, tqe_prev =
> > 0x265bc92017d7aa38},
> >    object = 0x2659c92217d5aa3a, pindex = 2758957463725517354, phys_addr =
> > 2758957463725517354, md = {pv_list = {tqh_first = 0x2e49c1321fc5a22a,
> > tqh_last = 0x3e4bd1300fc7b228},
> >      pv_gen = 265794104, pat_mode = 1046204704}, ref_count = 257405624,
> > busy_lock = 1054593440, a = {{flags = 4757, queue = 48 '0', act_count =
> 134
> > '\206'}, _bits = 2251297429},
> >    order = 98 'b', pool = 204 '\314', flags = 75 'K', oflags = 105 'i',
> > psind = -107 '\225', segind = 18 '\022', valid = 48 '0', dirty = 134
> '\206'}
>
> This "m" structure looks freed.
>
> It looks like a use after free issue.
>
> Can you enter this in GDB:
>
> set print pretty on
>
> Then dump some more structures you can get hold of?
>
> --HPS
>
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Kernel crash during video transcoding

Hans Petter Selasky-6
In reply to this post by Alexandre Levy
On 2020-08-16 22:23, Alexandre Levy wrote:
> Any suggestions ?

Are there any simple steps to reproduce this?

--HPS
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Kernel crash during video transcoding

Alexandre Levy
I re-installed the user land in my jail after re-compiling the sources and
from that point I don't have the issue anymore. Seems like some libraries
were not properly updated or something like that (maybe libva).

In any case if it happens again I'll try to generate a test video with
ffmpeg and try to transcode it with similar parameters. That'd be the
easiest way to reproduce the issue.

Thanks for your insights.

Le ven. 21 août 2020 à 22:23, Hans Petter Selasky <[hidden email]> a
écrit :

> On 2020-08-16 22:23, Alexandre Levy wrote:
> > Any suggestions ?
>
> Are there any simple steps to reproduce this?
>
> --HPS
>
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[hidden email]"