The next wave in BI

Topic: TPC-H fun with Greenplum

Date: 30/01/2010

By: Luke Lonergan

Subject: Re: Re: Some hints


A few things:

- No parameter changes should be needed in the postgresql.conf files for Greenplum, especially not shared_buffers. This is true of all cases including the original poster and the TPC-H test. enable_seqscan is not set to 0 by default, so this indicates that the parameters have been changed.

- The number of segments should generally be set to the number of cores on the machine. GP is all about parallelism and the number of segments used controls how much of the CPU will be used to answer queries. If you increase from 2 to 8, you should see something like a 400% speedup of your queries, depending on how fast your disk is. Note that because of the parallelism outrunning your disk, you should (must) use the XFS filesystem to avoid the fragmentation pollution that ext2/3/4 have.

- In the general case, we find that GP SNE is faster than Postgres 8.x due to faster core elements like sort, hashagg, etc, even when not using a lot of parallelism, though the key advantage of GP is parallelism.

- GP implements the full SQL 2003 analytical query support with windowing, cube, etc. This is very useful and important for analytical workloads.

So, in short, in your case I'd also say you should try again and this time don't change the parameters, use the number of segments equal to the number of CPU cores and GP will be faster on every query.

- Luke