FreeBSD 10 and PostgreSQL 9.3 scalability issues

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

FreeBSD 10 and PostgreSQL 9.3 scalability issues

Attilio Rao-2
[ Please CC me as I'm not subscribed to FreeBSD mailing lists ]

Recently Bryan Drewery and I have been looking at this issue, in
particular after that some people has been pointing us to DragonflyBSD
/ Linux benchmarks.

Usually DB workloads are interesting mostly because they can expose
some real performance problems in kernel intensive workloads
(scalability of scheduler, VM, VFS, network stack, etc.). More
generally, however, some extra-attention must be put when the test is
performed, especially by avoiding I/O (to increase predictability and
avoid latency fluctuations).
We have done tests similar to what Florian Smeets has been doing on
netperf's cluster giant-ape1 (a XEON E7 4(nodes)x10(cores) machine)
and I've come to the conclusion that the tests comparing dragonflybsd,
Linux and FreeBSD have intrinsics problems.

Essentially having the client and the backend of PGSQL on the same
machine makes them share the data, getting to much faster results.
However, the more cache levels they share, the faster the results will
be. When the client becomes heavily multithreaded, in particular, the
data becomes so unpredictably spreaded that it is difficult to say how
much cache-sharing/trashing effects are coming into play.
However I can make you an example that explains it well: with a full
DB in memory and writes to tmpfs (so no real I/O) and a *single
client* configuration, we were getting around +20% if the client and
the backend were running on the same chip (so sharing L2 cache) rather
than running on 2 different domains. If you consider multiple clients,
all touching the same data, caming to play, it becomes a pretty
unpredictable behaviour.
I can also tell that Florian has tried to benchmark on the same
machine in the past and got very unstable numbers, as when using all
the 40-cores available, with fluctuations in the range of +/-10%, when
trying around 10 times. I think that this explains why this was really
the case.

I'm not going to claim that FreeBSD will be kick-ass on this type of
workload but I think that results reported so far are biased and I
think that a more realistic behaviour (that I hope to start exploring
soon) would involve having PGSQL clients to run on separate machines
so that we can just benchmark the backend behaviour. After all, PGSQL
people raccoment this as well:

"A limitation of pgbench is that it can itself become the bottleneck
when trying to test a large number of client sessions. This can be
alleviated by running pgbench on a different machine from the database
server, although low network latency will be essential. It might even
be useful to run several pgbench instances concurrently, on several
client machines, against the same database server."

To be honest, I'm a bit worried that with a realistic/physical test
FreeBSD is going to be more limited by NIC / TCP stack bottlenecks
than real CPU / memory ones (the ones really interesting to analyze
further kernel scalability) but there is no way than try it with
performing hardware to see where we are staying.

A fairer approach would be maybe to just stick all the clients into a
single, different domain and the backend into another. However there
will still be some data sharing among them, invalidating the test at
all the effects and having clients/backend to compete for the same

While looking into this, however I noted something that is
interesting: the EST / cpufreq driver is essentially broken. It is not
going to attach to newest Intel microarchs (Nehalem, SB, etc). From
what my experience is, I can tell that enabling EST and possibly
disabling turbo-boost makes a nice difference. Without such
capabilities controlled by the est driver we can end up in having
sub-optimal performances on Intel CPUs (I can tell that for giant-ape1
this wasn't the case, as everything was already setup properly, but we
cannot assume this for all the machines booting FreeBSD I think).
Possibly some time might be spent on this part of the code to be
properly available.


Peace can only be achieved by understanding - A. Einstein
[hidden email] mailing list
To unsubscribe, send any mail to "[hidden email]"