Optimize execution of processes by CPU core

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Optimize execution of processes by CPU core

freebsd-hackers mailing list
Hi all,

I am trying to optimize the execution of a CPU-intensive workload where I am running multiple instances of a program. The moment the program ends (expected behavior), the calling shell script verifies the results and if its good it reruns the program. The machines I am running this on have 8 cores, but ps reports that some of the processes frequently run on the same CPU, so I suspect I am not getting optimized performance. If possible, and if most efficient, I would like to run each process on its own CPU core.

Are there any best practices on how to run something like this? I understand cpuset can perform some functionality around this, but I do not understand the tooling (The man page speaks of a CPU set?) Would I do something like "cpuset -c -l 0 program arg1 arg2 arg3" in one script, and then "cpuset -c -l 1 program arg1 arg2 arg3" in the next up to 7?

Obviously it would be best to re-write the program to handle multiple threads in in an optimized way, but that would take more time than the optimization would likely save.

Thanks,

---
Farhan Khan
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Optimize execution of processes by CPU core

Slawa Olhovchenkov
On Wed, Feb 20, 2019 at 05:35:21PM +0000, Farhan Khan via freebsd-hackers wrote:

> Hi all,
>
> I am trying to optimize the execution of a CPU-intensive workload where I am running multiple instances of a program. The moment the program ends (expected behavior), the calling shell script verifies the results and if its good it reruns the program. The machines I am running this on have 8 cores, but ps reports that some of the processes frequently run on the same CPU, so I suspect I am not getting optimized performance. If possible, and if most efficient, I would like to run each process on its own CPU core.
>
> Are there any best practices on how to run something like this? I understand cpuset can perform some functionality around this, but I do not understand the tooling (The man page speaks of a CPU set?) Would I do something like "cpuset -c -l 0 program arg1 arg2 arg3" in one script, and then "cpuset -c -l 1 program arg1 arg2 arg3" in the next up to 7?


just "cpuset -l 0 program arg1 arg2 arg3" and etc. (w/o "-c")


> Obviously it would be best to re-write the program to handle multiple threads in in an optimized way, but that would take more time than the optimization would likely save.

for every thread:

  char name[128];
  cpuset_t mask;
  pthread_attr_t attr;
 
  pthread_attr_init(&attr);
  CPU_ZERO(&mask);
  CPU_SET(cpu, &mask);
  pthread_attr_setaffinity_np(&attr, sizeof(mask), &mask);
  pthread_create(&tid[n], &attr, worker_thread, (void *)args);
  snprintf(name, 128, "worker CPU#%d", cpu);
  pthread_set_name_np(tid[n], name);
 
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Optimize execution of processes by CPU core

Eugene Grosbein-10
In reply to this post by freebsd-hackers mailing list
21.02.2019 0:35, Farhan Khan via freebsd-hackers wrote:
> Hi all,
>
> I am trying to optimize the execution of a CPU-intensive workload where I am running multiple instances of a program.
> The moment the program ends (expected behavior), the calling shell script verifies the results and if its good it reruns the program.
> The machines I am running this on have 8 cores, but ps reports that some of the processes frequently run on the same CPU,
> so I suspect I am not getting optimized performance.

System scheduler switches useland processed from one CPU core to another often enough
so each CPU has even load and you cannot see that with naked. But you can easily verify
if load is even or not checking sysctl kern.cp_times that shows five monotonically increasing
counters per each CPU core. For example, in case of dual-core system:

$ sysctl kern.cp_times
kern.cp_times: 14789486 132229 14016113 327160 949428773 14374865 139326 12056998 2941012 949179638
$ sysctl kern.clockrate
kern.clockrate: { hz = 1000, tick = 1000, profhz = 8126, stathz = 127 }

There are "stathz" ticks per second and for each core exactly one of five counters is incremented by one:
user, nice, system, interrupt, idle. That is, each 5th counter is incremented
if corresponding CPU core was idle during "the tick".

You can save output of sysctl kern.cp_times, run your test, stop it and save output again.
Then compare difference of each 5th counter and they should be approximately equal.

You can even draw graphs if you periodically get samples of the sysctl, compute diffs with previous samples,
divide diffs by period length in seconds and then divide again by "stathz" value.
Multiply by 100 to get idle time of single CPU code in percents for the period.
Repeat for each core.

I use net-mgmt/mrtg to draw such per-CPU graphs for my servers, it works just fine.

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Optimize execution of processes by CPU core

Pieter de Goeje
Op 20-2-2019 om 19:19 schreef Eugene Grosbein:

> 21.02.2019 0:35, Farhan Khan via freebsd-hackers wrote:
>> Hi all,
>>
>> I am trying to optimize the execution of a CPU-intensive workload where I am running multiple instances of a program.
>> The moment the program ends (expected behavior), the calling shell script verifies the results and if its good it reruns the program.
>> The machines I am running this on have 8 cores, but ps reports that some of the processes frequently run on the same CPU,
>> so I suspect I am not getting optimized performance.
>
> System scheduler switches useland processed from one CPU core to another often enough
> so each CPU has even load and you cannot see that with naked. But you can easily verify
> if load is even or not checking sysctl kern.cp_times that shows five monotonically increasing
> counters per each CPU core. For example, in case of dual-core system:
>
> $ sysctl kern.cp_times
> kern.cp_times: 14789486 132229 14016113 327160 949428773 14374865 139326 12056998 2941012 949179638
> $ sysctl kern.clockrate
> kern.clockrate: { hz = 1000, tick = 1000, profhz = 8126, stathz = 127 }
>
> There are "stathz" ticks per second and for each core exactly one of five counters is incremented by one:
> user, nice, system, interrupt, idle. That is, each 5th counter is incremented
> if corresponding CPU core was idle during "the tick".
>
> You can save output of sysctl kern.cp_times, run your test, stop it and save output again.
> Then compare difference of each 5th counter and they should be approximately equal.
>
> You can even draw graphs if you periodically get samples of the sysctl, compute diffs with previous samples,
> divide diffs by period length in seconds and then divide again by "stathz" value.
> Multiply by 100 to get idle time of single CPU code in percents for the period.
> Repeat for each core.
>
> I use net-mgmt/mrtg to draw such per-CPU graphs for my servers, it works just fine.

`top -P` does the same thing. It displays a
user/nice/system/interrupt/idle line for each CPU.

- Pieter
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Optimize execution of processes by CPU core

Eugene Grosbein-10
22.02.2019 18:19, Pieter de Goeje wrote:

> `top -P` does the same thing. It displays a user/nice/system/interrupt/idle line for each CPU.

Sure, but it does not show accumulated monotonic numbers needed to draw reliable graphs.
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"