6 nice things not know enough about PostgreSQL

by regilero

30 10 2008
6-nice-things-not-know-enough-about-postgresql

With the new PostgreSQL server versions in place (8.2 and 8.3) and in a more general way with the 8.x series some nice fonctionnalities have benn added. Let’s have a short look at som interesting ones:

1) WITH FILLFACTOR=50 in CREATE TABLE instructions (since 8.2):
FILLFACTOR is 100% by default and is a good default setting for tables where the basic usage is INSERTS (and select). But when you know that you’ll make a lot of UPDATEs on your rows you should decrease this factor. This way some space on the table will be reserved near your inserted rows. This space will then be used as a work zone when you’ll make an UPDATE on the row. And the magic effect is that this work zone won’t be at the end of the table but near your row, in the same page in memory. see postgreSQL documentation page for details.

2) RETURNING on INSERT INTO to get your INSERTED Id (since 8.2):
The classical way to get you ‘last insert Id’ in PostgreSQl as always been using currval(SEQUENCE)
This is right and secure as PRIMARY KEYS ar usually defined as SEQUENCEs with DEFAULT nextval(SEQUENCE). And currvall render the last value set by nextval in the current session (others concurrent sessions cannot interfere with it). But that’s not something easy to understand for newbies and very bad example with max(id) can always be found googling around. Now you can add a RETURNING MyId code on your INSERT query and the result of your insert won’t be the row OID anymore but your Id (or anything else if you want). Consult postgreSQL documentation page for details.

3) TOAST FIELDS:
TOAST means ‘The Oversized-Attribute Storage Technique‘. You can set up to 1Gb in one field of your row. This column won’t be saved in the same physical file as the others. Another file will be created to store such big fields. postgreSQL documentation page is still the best reference.
If you wonder about the size of your tables and the physical files on your filesystem you should not. Your tables are always split in files of 2Gb. And Toast values are stored on their own files.

4) TABLE INHERITANCE:
You can define a table B as child of table A. Request on table A will then render rows from A and B tables.
With ONLY keyword you can limit requests on A with A rows. A could have several tables (B, C, D, etc). Indexes are done tables by table, and are by this way shorter. This is quite powerfull but you’ll have some problems with contraints. UNIQUE constraints for example are done for each table. You cannot ensure A+B+C+D rows will not share the same value for this ‘UNIQUE’ constraint. Setting Referential intergrity from one of this table to a Z table is easy (but should be done for each table). But setting the reverse relation from Z to A+B+C+D isn’t possible. You should really look postgreSQL documentation page, as always.

5) TABLE PARTITIONNING:
One of the most powerfull thing you can do with INHERITANCE is table PARTITIONNING. Using TABLESPACEs you can define several different physical storage locations for your databases. TABLESPACES can easily be used for a database, a table, or even for an index (or the WAL sync log). This is fine. You can use several storage devices with different characteristics, each adapted to your differents needs (capacity, speed, sync/async, etc). But this combined with INHERITANCE becomes even more powerfull:
Define table A as an empty table.
Define table B and C as child tables of A, and use different tablespaces for B and C. You then have a virtual A table with his conent spread on diferent storage devices (or not, you could use the TABLESPACE on the same storage but you’ll lose most of the power of the ‘thing’). Your benefits? smaller indexes, on different devices, which can run in parallel, some problems with constraints as with point 4), but this is not a problem for all tables, and for a huge table this TABLESPACE splitting could be a coll thing to study. Have a look at postgreSQL documentation page. One last point, you’ll have to defined how the rows are splitted with the different tables (ranges, or domains, or anything else), you’ll maybe have to check RULES as well, even with simple INHERITANCE , beacuse INSERT for example should be done on the child table, and INSERT on the main TABLE should be redirected elsewhere.

6) NOTIFY/LISTEN:
PostgreSQL has a builtin fonctionnality for Observer/observable Design Pattern. You can NOTIFY something, as an SQL command and at the end of your transaction (or directly if you’re not in a transaction) others SQL sessions which have registered this notification with LISTEN will get your notification (the doc). Usefull with server processes (while true processes), a cli process in PHp for example with builtin pg lib but not with PDO actually. Here is as well a Java example and examples in python, the demo2a/b files.



Database performance in Web applications

by Stéphane

29 10 2008

It’s more efficient to connect a Web application with an Unix Domain
Socket than TCP/IP one (reduced overhead) so I’ll explain the required
configuration with the following pairs:
1 - TurboGears/SQLAlchemy
2 - Django/PostgreSQL
3 - Django/MySQL

1 - TurboGears/SA

SQLObject is dead, isn’t it? So With SQLalchemy, the syntax is:

sqlalchemy.dburi="postgres:///dbname?user=mydbuser&password=XXXXXX" ([1])
http://docs.turbogears.org/1.0/DatabasePostgres

2 - Django/PostgreSQL

You just need to define DATABASE_ENGINE = ‘postgresql_psycopg2′ and DATABASE_NAME. Leave DATABASE_HOST setting empty to use UDS.

3 - Django/MySQL

Create a database in UTF8, either with default-character-set = utf8
under [mysqld] section in the my.cnf file or with an explicit ‘create
database bla charset=utf8;’

In settings.py:


DATABASE_HOST = '/var/run/mysqld/mysqld.sock'
DATABASE_OPTIONS = {
'read_default_file': '/etc/mysql/my.cnf',
'init_command': 'SET storage_engine=INNODB'
}

A - Note about PostgreSQL

When the user isn’t the same one who runs the process, you must edit the PostgreSQL configuration (/etc/postgresql/8.3/main/pg_hba.conf):
# "local" is for Unix domain socket connections only
local   user     database      md5
local   all      all           ident sameuser

You must create an user with a encrypted password (encrypted by default).

$ CREATEUSER username
$ psql
postgres=# ALTER USER username WITH ENCRYPTED PASSWORD 'my_password';

If you want to be sure, remove the lines with ‘host’ to deny nonlocal connections.



Saving power with Linux and reducing the boot time

by Stéphane

12 06 2008

There are a great web site and some tools about this subject (thanks to Intel and their great developers)
http://www.lesswatts.org

Services

The first step is to remove some useless services or rarely used, I chosen the following ones :

apt-get remove bluz-utils hplip* scim* tracker*

update-rc.d -f pcmcia remove, for mysql, apache2, postgresql-8.3 (it’s easy to launch when necessary)

You can really check the impact of your changes with bootchart and view the result with eog /var/log/bootchart/*.png (my laptop starts in 27 seconds).

In gnome-session-properties, I also unchecked Applet Tracker, Evolution Alarm Notifier, Bluetooth Manager (don’t forget to save in the third tab).

Settings

You can measure the number of wakeups per second with the wonderful PowerTop ($ sudo powertop), on my computer the main guilty was the proprietary nvidia driver (around 60 fps like the refresh rate of my screen), I added the following line in my xorg.conf to resolve that issue (~ 2 wps after) :

Option         “OnDemandVBlankInterrupts” “true”

I also blacklisted some modules in /etc/modprobe.d/blacklist

blacklist pcmcia
blacklist yenta_socket
blacklist rsrc_nonstatic

With no Wifi and no USB mouse connected, with my GNOME Desktop, a gnome-terminal and Emacs, I have only 20 wakeups per second (not bad :).

I wrote a little script to test some more aggressive settings if they works fine I will add them to sysctl.conf:

# By setting this to ‘1′, under light load scenarios, the process
# load is distributed such that all the cores in a processor package
# are busy before distributing the process load to other processor
# packages.
echo 1 > /sys/devices/system/cpu/sched_mc_power_saving

# From 500 by default
echo 1500 > /proc/sys/vm/dirty_writeback_centisecs

# SATA
echo min_power > /sys/class/scsi_host/host0/link_power_management_policy

# Sound
echo 1 > /sys/module/snd_hda_intel/parameters/power_save

I don’t use the laptop_mode or hdparm because I don’t want to stress my hard drive with too many spin up and down. It’s not necessary to insert the ‘noatime’ option in fstab with Ubuntu Hardy Heron because the new ‘relatime’ option is already activated.



Reduce open calls on Ubuntu

by Stéphane

13 05 2008

You’ve installed a fresh Ubuntu and you’re not English. Just try:

$ strace gnome-terminal

and you’ll see this long list of calls only to open gtk20.mo (12 calls on Hardy Heron):

open("/usr/share/locale/fr_FR.UTF-8/LC_MESSAGES/gtk20.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/fr_FR.utf8/LC_MESSAGES/gtk20.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/fr_FR/LC_MESSAGES/gtk20.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/fr.UTF-8/LC_MESSAGES/gtk20.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/fr.utf8/LC_MESSAGES/gtk20.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/fr/LC_MESSAGES/gtk20.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale-langpack/fr_FR.UTF-8/LC_MESSAGES/gtk20.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale-langpack/fr_FR.utf8/LC_MESSAGES/gtk20.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale-langpack/fr_FR/LC_MESSAGES/gtk20.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale-langpack/fr.UTF-8/LC_MESSAGES/gtk20.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale-langpack/fr.utf8/LC_MESSAGES/gtk20.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale-langpack/fr/LC_MESSAGES/gtk20.mo", O_RDONLY) = 3

There are many solutions to workaround this performance problem, either you use my little script reduce-mo-open-calls.sh which detect your current locale and create only some symlinks, or you can try someting more adventurous (I don’t know the result after a upgrade of your langpack), move /usr/share/locale-langpack/fr/LC_MESSAGES/* to /usr/share/locale/fr/LC_MESSAGES/ and create the following symlinks after deleting of /usr/share/locale-langpack/fr:

ln -s /usr/share/locale/fr /usr/share/locale-langpack/fr

then from /usr/share/locale/, /usr/share/locale-langpack and /usr/lib/locale:

ln -sfvT fr fr_FR.UTF-8
ln -sfvT fr fr_FR.utf8
ln -sfvT fr fr.UTF-8
ln -sfvT fr fr.utf8
ln -sfvT fr fr_FR

If someone knows a better solution to configure the paths order, I’m interested! I’m intend to post a bug report if I don’t see any strange systems calls in the next weeks.

This hack removes 89 open calls at the launch of gnome-terminal (the result is better with bigger applications), it’s pretty nice, isn’t it?



Waf build system

by Stéphane

17 03 2008

I integrate the Waf build system to libmodbus. Waf is a really nice project I hope Thomas Nagy will have time to update his documentation. The code is small and really easy to browse but the most impressive difference is the execution time:

time (./autogen.sh && ./configure && make && sudo make install)
=> 21.576s (with AM_DISABLE_STATIC)

time (./waf configure && ./waf build && sudo ./waf install)
=> 2.779s

The output is nice (compact and colorful), you can also use Waf in daemon mode (works with inotify) and each time a file of your project is modified, Waf build the new component. The size of the configure script is ~700Kb and if you choose to embed Waf it’s only 100Kb. Last but not least, your build system doesn’t check for Fortran anymore :).

Thanks to Ali Sabil for his demonstration of Waf at FOSDEM!



Performance of a PHP application with APC

by Stéphane

9 03 2008

To evaluate the performance of Vertimus with a opcode cache like APC, I used xdebug and Kcachegrind.
The results are really interesting, without APC, the index page has a total time cost of 191 032, the Zend Framework requires to use many classes, PHP is not really fast to parse and execute this code :

Vertimus without APC

and with APC, the total time cost is only 123 904:
Vertimus with APC

The CPU load is reduced by 36% but you need a bit of memory to store cache data (30 Mo by default). The results has been obtained with APC 3.0.16 and the following configuration:

;;;;;;;;;;;;;;;;;;
; APC ;
;;;;;;;;;;;;;;;;;;
extension=apc.so
apc.enabled=1
apc.shm_segments=1
apc.shm_size=30
apc.ttl=7200
apc.user_ttl=7200



Cairo-Clock is now really faster!

by Stéphane

3 03 2008

capture-cairo-clock-par-macslow.pngCairo-Clock is CPU intensive and consume around 13 % of CPU on my Core 2 Duo (on one CPU), the workload is spread on Xorg (10 %), compiz.real (2 %) and cairo-clock (1 %) with a smoothing value defined at 30 (default). The second hand drawing is disabled (default).

I wrote some patches to reduce the CPU consumption and now, Cairo-Clock doesn’t appear in ‘top’ list, woooh! The solution was simple, it useless to refresh the clock at 30 Hz if the second hand is not drawn. I also use common code to manage timeout and g_timeadd_seconds reduce the number of interrupts when many applications use this function.

I prefer bzr to git but I published my patches on github so MacSlow can take my patches here: http://github.com/stephane/cairo-clock/tree/master

PS: I couldn’t resist to translate in French (and fix Cairo-Clock for translation).