getpid() performance

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

getpid() performance

Julian Grajkowski
Hi,

I am working on a contiguous memory allocator which frequently calls
getpid() in user space and I have noticed very poor performance of this
function call. I measured this call performance using following code:

inline uint64_t rdtsc_start(void)
{
    uint32_t cycles_high;
    uint32_t cycles_low;

    asm volatile("lfence\n\t"
                 "rdtscp\n\t"
                 "mov %%edx, %0\n\t"
                 "mov %%eax, %1\n\t"
                 : "=r" (cycles_high), "=r" (cycles_low)
                 : : "%rax", "%rdx", "%rcx");

    return (((uint64_t)cycles_high << 32) | cycles_low);
}


inline uint64_t rdtsc_end(void)
{
    uint32_t cycles_high;
    uint32_t cycles_low;

    asm volatile("rdtscp\n\t"
                 "mov %%edx, %0\n\t"
                 "mov %%eax, %1\n\t"
                 "lfence\n\t"
                 : "=r" (cycles_high), "=r" (cycles_low)
                 : : "%rax", "%rdx", "%rcx");

    return (((uint64_t)cycles_high << 32) | cycles_low);
}

This way I measured ~320 cycles used for getpid() on FreeBSD 12.1. For
comparison, in Linux (CentOS 7) this call uses ~10 cycles. I am aware that
this should not be compared directly. as these are different systems, but
such a big difference in performance is an issue for me, as getpid() is
called very often in my code.

Is such a poor performance of getpid() a known problem and is it possible
that this might be improved in future releases?

Measurements were done on the same mahcine with following setup:

CPU: Intel(R) Atom(TM) CPU C3958 @ 2.00GHz (2000.06-MHz K8-class CPU)
FreeBSD/SMP: Multiprocessor System Detected: 16 CPUs

8GB RAM (2x4GB):
        Type: DDR4
        Type Detail: Synchronous Unbuffered (Unregistered)
        Speed: 2400 MT/s

Thank you very much in advance for any help.

Kind regards,
Julian
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-drivers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: getpid() performance

Warner Losh
On Wed, Sep 16, 2020 at 1:15 AM Julian Grajkowski <
[hidden email]> wrote:

> Hi,
>
> I am working on a contiguous memory allocator which frequently calls
> getpid() in user space and I have noticed very poor performance of this
> function call. I measured this call performance using following code:
>
> inline uint64_t rdtsc_start(void)
> {
>     uint32_t cycles_high;
>     uint32_t cycles_low;
>
>     asm volatile("lfence\n\t"
>                  "rdtscp\n\t"
>                  "mov %%edx, %0\n\t"
>                  "mov %%eax, %1\n\t"
>                  : "=r" (cycles_high), "=r" (cycles_low)
>                  : : "%rax", "%rdx", "%rcx");
>
>     return (((uint64_t)cycles_high << 32) | cycles_low);
> }
>
>
> inline uint64_t rdtsc_end(void)
> {
>     uint32_t cycles_high;
>     uint32_t cycles_low;
>
>     asm volatile("rdtscp\n\t"
>                  "mov %%edx, %0\n\t"
>                  "mov %%eax, %1\n\t"
>                  "lfence\n\t"
>                  : "=r" (cycles_high), "=r" (cycles_low)
>                  : : "%rax", "%rdx", "%rcx");
>
>     return (((uint64_t)cycles_high << 32) | cycles_low);
> }
>
> This way I measured ~320 cycles used for getpid() on FreeBSD 12.1. For
> comparison, in Linux (CentOS 7) this call uses ~10 cycles. I am aware that
> this should not be compared directly. as these are different systems, but
> such a big difference in performance is an issue for me, as getpid() is
> called very often in my code.
>
> Is such a poor performance of getpid() a known problem and is it possible
> that this might be improved in future releases?
>

glibc optimizes getpid() system call so it only calls it once and returns a
cached value (which is in line with 10 cycles, there's no way you can
save/restore state in 10 cycles, let alone do a dispatch). FreeBSD doesn't.

Warner


> Measurements were done on the same mahcine with following setup:
>
> CPU: Intel(R) Atom(TM) CPU C3958 @ 2.00GHz (2000.06-MHz K8-class CPU)
> FreeBSD/SMP: Multiprocessor System Detected: 16 CPUs
>
> 8GB RAM (2x4GB):
>         Type: DDR4
>         Type Detail: Synchronous Unbuffered (Unregistered)
>         Speed: 2400 MT/s
>
> Thank you very much in advance for any help.
>
> Kind regards,
> Julian
> _______________________________________________
> [hidden email] mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-drivers
> To unsubscribe, send any mail to "[hidden email]"
>
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-drivers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: getpid() performance

John-Mark Gurney-2
Warner Losh wrote this message on Wed, Sep 16, 2020 at 01:24 -0600:

> On Wed, Sep 16, 2020 at 1:15 AM Julian Grajkowski <
> [hidden email]> wrote:
>
> > Hi,
> >
> > I am working on a contiguous memory allocator which frequently calls
> > getpid() in user space and I have noticed very poor performance of this
> > function call. I measured this call performance using following code:
> >
> > inline uint64_t rdtsc_start(void)
> > {
> >     uint32_t cycles_high;
> >     uint32_t cycles_low;
> >
> >     asm volatile("lfence\n\t"
> >                  "rdtscp\n\t"
> >                  "mov %%edx, %0\n\t"
> >                  "mov %%eax, %1\n\t"
> >                  : "=r" (cycles_high), "=r" (cycles_low)
> >                  : : "%rax", "%rdx", "%rcx");
> >
> >     return (((uint64_t)cycles_high << 32) | cycles_low);
> > }
> >
> >
> > inline uint64_t rdtsc_end(void)
> > {
> >     uint32_t cycles_high;
> >     uint32_t cycles_low;
> >
> >     asm volatile("rdtscp\n\t"
> >                  "mov %%edx, %0\n\t"
> >                  "mov %%eax, %1\n\t"
> >                  "lfence\n\t"
> >                  : "=r" (cycles_high), "=r" (cycles_low)
> >                  : : "%rax", "%rdx", "%rcx");
> >
> >     return (((uint64_t)cycles_high << 32) | cycles_low);
> > }
> >
> > This way I measured ~320 cycles used for getpid() on FreeBSD 12.1. For
> > comparison, in Linux (CentOS 7) this call uses ~10 cycles. I am aware that
> > this should not be compared directly. as these are different systems, but
> > such a big difference in performance is an issue for me, as getpid() is
> > called very often in my code.
> >
> > Is such a poor performance of getpid() a known problem and is it possible
> > that this might be improved in future releases?
> >
>
> glibc optimizes getpid() system call so it only calls it once and returns a
> cached value (which is in line with 10 cycles, there's no way you can
> save/restore state in 10 cycles, let alone do a dispatch). FreeBSD doesn't.

if you really need to see if your process has forked (I assume that is
why you're calling getpid so frequently), you can mmap a page, and using
minherit's INHERIT_ZERO so that all the data in that page will be zero'd
on fork.  You can then change your getpid check to something like:

pid_t *page_with_inherit_zero_set;

pid_t
my_getpid()
{
        if (page_with_inherit_zero_set == NULL)
                allocate_page_and_set_inherit_zero();

        if (*page_with_inherit_zero_set == 0) {
                *page_with_inherit_zero_set = getpid();

        return *page_with_inherit_zero_set;
}

and you should see similar improvements.

Though this might allow you to move this logic to a better place in
your code.

--
  John-Mark Gurney Voice: +1 415 225 5579

     "All that I will do, has been done, All that I have, has not."
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-drivers
To unsubscribe, send any mail to "[hidden email]"