PostgreSQL Performance Optimization

Recently, I’ve been dealing with databases at work which have millions if not BILLIONS of records.  So as you can imagine, having Postgres running smoothly and as quickly as possible, is of utmost importance.  So, as a guide, and compiled from a number of sources. Obviously faster, better, bigger hardware will make the database faster, but there are often other steps you can take to get PostgreSQL working a bit smarter, and a bit harder. The first place to start with Postgres optimization is the Postgres configuration. The list below provides a guide (use at your own discretion) of some of the primary settings relating to resource use. Even small tweaks can have a big impact on server performance.

max_connections. This option sets the maximum number of database back end processes to have at any one time. Use this feature to ensure that you do not launch so many back ends that you begin swapping to disk and kill the performance of all the children. Depending on your application it may be better to deny the connection entirely rather than degrade the performance of all of the other children.

shared_buffers. Editing this option is the simplest way to improve the performance of your database server. The default is pretty low for most modern hardware. Shared buffers defines a block of memory that PostgreSQL will use to hold requests that are awaiting attention from the kernel buffer and CPU. The default value is quite low for any real world workload and need to be beefed up. However, unlike databases like Oracle, more is not always better. There is a threshold above which increasing this value can hurt performance.

PLEASE NOTE. PostgreSQL counts a lot on the OS to cache data files and hence does not bother with duplicating its file caching effort. The shared buffers parameter assumes that OS is going to cache a lot of files and hence it is generally very low compared with system RAM. Even for a dataset in excess of 20GB, a setting of 128MB may be too much, if you have only 1GB RAM and an aggressive-at-caching OS like Linux.

Note that on Windows (and on PostgreSQL versions before 8.1), large values for shared_buffers aren’t as effective, and you may find better results keeping it relatively low (at most around 50,000, possibly less) and using the OS cache more instead.

It’s likely you will have to increase the amount of memory your operating system allows you to allocate at once to set the value for shared_buffers this high. If you set it above what’s supported, you’ll get a message like this:

IpcMemoryCreate: shmget(key=5432001, size=415776768, 03600) failed: Invalid argument

This error usually means that PostgreSQL's request for a shared memory
segment exceeded your kernel's SHMMAX parameter. You can either
reduce the request size or reconfigure the kernel with larger SHMMAX.
To reduce the request size (currently 415776768 bytes), reduce
PostgreSQL's shared_buffers parameter (currently 50000) and/or
its max_connections parameter (currently 12).

effective_cache_size. This value tells PostgreSQL’s optimizer how much memory PostgreSQL has available for caching data and helps in determing whether or not it use an index or not. The larger the value increases the likely hood of using an index. effective_cache_size should be set to an estimate of how much memory is available for disk caching by the operating system, after taking into account what’s used by the OS itself, dedicated PostgreSQL memory, and other applications. This is a guideline for how memory you expect to be available in the OS buffer cache, not an allocation! This value is used only by the PostgreSQL query planner to figure out whether plans it’s considering would be expected to fit in RAM or not. If it’s set too low, indexes may not be used for executing queries the way you’d expect.

Setting effective_cache_size to 1/2 of total memory would be a normal conservative setting, and 3/4 of memory is a more aggressive but still reasonable amount. You might find a better estimate by looking at your operating system’s statistics. On UNIX-like systems, add the free+cached numbers from free or top to get an estimate. On Windows see the “System Cache” size in the Windows Task Manager’s Performance tab. Changing this setting does not require restarting the database (HUP is enough). .

work_mem. This option is used to control the amount of memory using in sort operations and hash tables. While you may need to increase the amount of memory if you do a ton of sorting in your application, care needs to be taken. This isn’t a system wide parameter, but a per operation one. So if a complex query has several sort operations in it it will use multiple work_mem units of memory. Not to mention that multiple backends could be doing this at once. This query can often lead your database server to swap if the value is too large. This option was previously called sort_mem in older versions of PostgreSQL.

max_fsm_pages. This option helps to control the free space map. When something is deleted from a table it isn’t removed from the disk immediately, it is simply marked as “free” in the free space map. The space can then be reused for any new INSERTs that you do on the table. If your setup has a high rate of DELETEs and INSERTs it may be necessary increase this value to avoid table bloat.  Sets the maximum number of disk pages for which free space will be tracked in the shared free-space map. Properly adjust upward to make vacuum runs faster and eliminate/reduce the need to “vacuum full” or “reindex”. Should be slightly more than the total number of data pages which will be touched by updates and deletes between vacuums. Requires little memory (6 bytes per slot), so be generous adjusting its size. When running vacuum with “verbose” option, DB engine advises you about the proper size.

fsync. This option determines if all your WAL pages are fsync()’ed to disk before a transactions is committed. Having this on is safer, but can reduce write performance. If fsync is not enabled there is the chance of unrecoverable data corruption. Turn this off at your own risk.

commit_delay = and commit_siblings. These options are used in concert to help improve performance by writing out multiple transactions that are committing at once. If there are commit_siblings number of backends active at the instant your transaction is committing then the server waiting commit_delay microseconds to try and commit multiple transactions at once.

random_page_cost. Sets estimated cost of non-sequentially fetching. Lower it to influence the optimizer to perform index scans over table scans.

Note that many of these options consume shared memory and it will probably be necessary to increase the amount of shared memory allowed on your system to get the most out of these options.

If you are after a more comprehensive list of Postgres’ Tuning and Performance, the PostgreSQL documentation has a great wiki on the subject.

The other place that often gets overlooked for performance enhancement, is the actual database queries themselves.  I must admit ignorance myself to the Postgres inclusing of the ‘EXPLAIN ANALYSE’ keywords preceding any SQL statement which returns a very comprehensive trace of the query through the database entry, including specific timings, index use etc, which can be a big eye opener to tables, sorts or indexes which maybe being used incorrectly, or just being slow.  Here is an example of explain analyse on a SQL statement on a very large database:

EXPLAIN ANALYSE SELECT items.etag, subscriptions.subscription_data
FROM items, subscriptions WHERE items.item_id = subscriptions.item_id;

returns the entire query plan, like:

"Hash Join  (cost=1.29..22.38 rows=50 width=64) (actual time=0.055..0.084 rows=21 loops=1)"
"  Hash Cond: (subscriptions.item_id = items.item_id)"
"  ->  Seq Scan on subscriptions  (cost=0.00..17.70 rows=770 width=36) (actual time=0.010..0.012 rows=21 loops=1)"
"  ->  Hash  (cost=1.13..1.13 rows=13 width=36) (actual time=0.027..0.027 rows=13 loops=1)"
"        ->  Seq Scan on items  (cost=0.00..1.13 rows=13 width=36) (actual time=0.008..0.014 rows=13 loops=1)"
"Total runtime: 0.154 ms"