I was working with a coworker last week (John Gorkos) on creating a Partitioning implementation for one of our internal products that was "self managing" so that we would not need cron or sql jobs to keep it tuned.
We have a need to write large amounts of time based data that can be purged within a defined timeframe and keep a "rolling window" of data persistent in the database. It would be ideal if we could keep N number of partitions of this data and have that "automagically" managed so that an administrator would not have to do that. Also, the cost of managing this must be O(1) as performance is key to this application and we cannot have expensive COUNT(*) queries running to determine if there is or is not data in these partitions.
So we analyzed our options for managing partitions. We first looked at using RULES as it was something we had done in the past but found that using a simple trigger was slightly cleaner and met our needs.
I have included a create_partitions_example.pl script that will create the partition schema that you need for as many tables as you'd like to keep as well as a audit example script that contains the DDL script generatged from the create_partitions_example.pl
Let's look at some code.
First, here is the trigger and what it is doing.
CREATE OR REPLACE FUNCTION audit_insert_trigger()
RETURNS TRIGGER AS $$
DECLARE
last integer;
tablename integer;
seqval bigint;
BEGIN
last := date_part('DAY', NEW.time);
tablename := last % 3;
-- RAISE NOTICE 'The value of last is %', last;
-- RAISE NOTICE 'The value of tablename is %', tablename;
IF tablename = 0 THEN
INSERT INTO audit_part0 VALUES (nextval('audit_seq0'), NEW.time, NEW.foo, NEW.bar);
seqval := last_value from audit_seq1;
IF seqval <> 1 THEN
ALTER SEQUENCE audit_seq1 RESTART WITH 1;
TRUNCATE TABLE audit_part1;
END IF;
ELSIF tablename = 1 THEN
INSERT INTO audit_part1 VALUES (nextval('audit_seq1'), NEW.time, NEW.foo, NEW.bar);
seqval := last_value from audit_seq2;
IF seqval <> 2 THEN
ALTER SEQUENCE audit_seq2 RESTART WITH 2;
TRUNCATE TABLE audit_part2;
END IF;
ELSE
INSERT INTO audit_part2 VALUES (nextval('audit_seq2'), NEW.time, NEW.foo, NEW.bar);
seqval := last_value from audit_seq0;
IF seqval <> 0 THEN
ALTER SEQUENCE audit_seq0 RESTART WITH 0;
TRUNCATE TABLE audit_part0;
END IF;
END IF;
RETURN NULL;
END;
$$
LANGUAGE plpgsql;
The Trigger is very simple. We first use a MOD on date_part() of the time coming in so that we can place the data into the correct day bucket. We set unique sequences for each partition so that we can read the sequence value to tell whether the partition has data in it or not. We then look at the "next" partition, each time we insert into a current partition to see if it's empty. If it's not, we reset the sequence to it's starting point, and also truncate that table. (I think a truncate would have been enough there actually, but it's currently both)
We have very high transaction volumes traversing through these partitions so we feel comfortable truncating just the next table but if that wasn't the case for you, you may want to truncate the next couple of tables to be sure you never missed truncating a table. You could then stack up as many partitions as you'd like data retention. For example, if you wanted to keep 4 day's back, you'd create 7 tables and truncate 3 going forward. Increasing data retention would be just a matter of increasing the total number of tables (from 7 to something that would satify the data retention need)
Here is an example of creating an "audit" partitioned table set (3 partitions) that is self managed.
------------------------------------------------------------------------------------------------
DROP TABLE audit CASCADE;
DROP SEQUENCE audit_seq0;
DROP SEQUENCE audit_seq1;
DROP SEQUENCE audit_seq2;
CREATE TABLE audit (
id bigint not null,
time timestamp with time zone not null default now(),
foo integer not null,
bar varchar
);
CREATE SEQUENCE audit_seq0 INCREMENT BY 3 MINVALUE 0 START WITH 0;
CREATE SEQUENCE audit_seq1 INCREMENT BY 3 MINVALUE 1 START WITH 1;
CREATE SEQUENCE audit_seq2 INCREMENT BY 3 MINVALUE 2 START WITH 2;
CREATE TABLE audit_part0 (id bigint not null default nextval('audit_seq0')) INHERITS (audit);
CREATE INDEX audit_part0_time_idx ON audit_part0(time);
CREATE TABLE audit_part1 (id bigint not null default nextval('audit_seq1')) INHERITS (audit);
CREATE INDEX audit_part1_time_idx ON audit_part1(time);
CREATE TABLE audit_part2 (id bigint not null default nextval('audit_seq2')) INHERITS (audit);
CREATE INDEX audit_part2_time_idx ON audit_part2(time);
CREATE OR REPLACE FUNCTION audit_insert_trigger()
RETURNS TRIGGER AS $$
DECLARE
last integer;
tablename integer;
seqval bigint;
BEGIN
last := date_part('DAY', NEW.time);
tablename := last % 3;
-- RAISE NOTICE 'The value of last is %', last;
-- RAISE NOTICE 'The value of tablename is %', tablename;
IF tablename = 0 THEN
INSERT INTO audit_part0 VALUES (nextval('audit_seq0'), NEW.time, NEW.foo, NEW.bar);
seqval := last_value from audit_seq1;
IF seqval <> 1 THEN
ALTER SEQUENCE audit_seq1 RESTART WITH 1;
TRUNCATE TABLE audit_part1;
END IF;
ELSIF tablename = 1 THEN
INSERT INTO audit_part1 VALUES (nextval('audit_seq1'), NEW.time, NEW.foo, NEW.bar);
seqval := last_value from audit_seq2;
IF seqval <> 2 THEN
ALTER SEQUENCE audit_seq2 RESTART WITH 2;
TRUNCATE TABLE audit_part2;
END IF;
ELSE
INSERT INTO audit_part2 VALUES (nextval('audit_seq2'), NEW.time, NEW.foo, NEW.bar);
seqval := last_value from audit_seq0;
IF seqval <> 0 THEN
ALTER SEQUENCE audit_seq0 RESTART WITH 0;
TRUNCATE TABLE audit_part0;
END IF;
END IF;
RETURN NULL;
END;
$$
LANGUAGE plpgsql;
CREATE TRIGGER insert_audit_trigger
BEFORE INSERT ON audit
FOR EACH ROW EXECUTE PROCEDURE audit_insert_trigger();
------------------------------------------------------------------------------------------------
Here is the Perl code to generate the above DDL. It can be used to generate any number
of partitions that you need.
------------------------------------------------------------------------------------------------
#!/usr/bin/perl
my $tn = $ARGV[0]; # base table name
my $tc = $ARGV[1]; # table count
my $unit = $ARGV[2] || "DAY"; # time units, as recognized by date_part
print "DROP TABLE $tn CASCADE;\n";
for ( my $i=0; $i<$tc; $i++ ) {
print "DROP SEQUENCE ${tn}_seq$i;\n";
}
print "\n";
print "CREATE TABLE $tn (\n";
print " id bigint not null,\n";
print " time timestamp with time zone not null default now(),\n";
print " foo integer not null,\n";
print " bar varchar\n";
# Other values are added here as needed
print ");\n";
print "\n";
for ( my $i=0; $i<$tc; $i++ ) {
print "CREATE SEQUENCE ${tn}_seq$i INCREMENT BY $tc MINVALUE $i START WITH $i;\n";
}
print "\n";
for ( my $i=0; $i<$tc; $i++ ) {
print "CREATE TABLE ${tn}_part$i (id bigint not null default nextval('${tn}_seq$i')) INHERITS ($tn);\n";
# other indices are added here as needed.
print "CREATE INDEX ${tn}_part${i}_time_idx ON ${tn}_part$i(time);\n";
}
print "\n";
print "CREATE OR REPLACE FUNCTION ${tn}_insert_trigger()\n";
print "RETURNS TRIGGER AS \$\$\n";
print "DECLARE\n";
print " last integer;\n";
print " tablename integer;\n";
print " seqval bigint;\n";
print "BEGIN\n";
print " last := date_part('$unit', NEW.time);\n";
print " tablename := last % $tc;\n";
print "-- RAISE NOTICE 'The value of last is %', last;\n";
print "-- RAISE NOTICE 'The value of tablename is %', tablename;\n";
for ( my $i=0; $i<$tc-1; $i++ ) {
my $nextval = $i+1;
if ( $nextval > $tc ) { $nextval = 0; }
if ( $i == 0 ) {
print " IF tablename = $i THEN\n";
} else {
print " ELSIF tablename = $i THEN\n";
}
print " INSERT INTO ${tn}_part$i VALUES (nextval('${tn}_seq$i'), NEW.time, NEW.foo, NEW.bar);\n";
print " seqval := last_value from ${tn}_seq$nextval;\n";
print " IF seqval <> $nextval THEN\n";
print " ALTER SEQUENCE ${tn}_seq$nextval RESTART WITH $nextval;\n";
print " TRUNCATE TABLE ${tn}_part$nextval;\n";
print " END IF;\n";
}
print " ELSE\n";
my $nextval = $tc-1;
print " INSERT INTO ${tn}_part$nextval VALUES (nextval('${tn}_seq$nextval'), NEW.time, NEW.foo, NEW.bar);\n";
print " seqval := last_value from ${tn}_seq0;\n";
print " IF seqval <> 0 THEN\n";
print " ALTER SEQUENCE ${tn}_seq0 RESTART WITH 0;\n";
print " TRUNCATE TABLE ${tn}_part0;\n";
print " END IF;\n";
print " END IF;\n";
print " RETURN NULL;\n";
print "END;\n";
print "\$\$\n";
print "LANGUAGE plpgsql;\n";
print "\n";
print "CREATE TRIGGER insert_${tn}_trigger\n";
print " BEFORE INSERT ON $tn\n";
print " FOR EACH ROW EXECUTE PROCEDURE ${tn}_insert_trigger();\n";
Thanks again to my coworker John Gorkos who brought me into this project and worked with me to get example scripts to post into this blog.
Regards,
Paul
Saturday, July 4. 2009
PostgreSQL 8.4 makes it's own fireworks in July
Downloaded PostgreSQL 8.4 and I must say I'm impressed. My favorite features so far is the parallel restore capabilities, the CTE enhancements (I have to write these in other DB systems and keeping things similar in PostgreSQL makes things easier for me.) and the pgScript scripting framework.
Hats off to the PostgreSQL Team for another great release.
Hats off to the PostgreSQL Team for another great release.
Thursday, March 26. 2009
Wow, it's been a while
Just got the blog running again and wanted to post an interesting query I dug up today.
Looking for Buffer hit cache ratio in PostgreSQL?
SELECT datname,
blks_read,
blks_hit,
round((blks_hit::float / (blks_read+blks_hit+1) * 100)::numeric, 2) as cachehitratio
FROM pg_stat_database
WHERE datname !~ '^(template(0|1)|postgres)$'
ORDER BY cachehitratio desc;
That was an excerpt from a larger shell script that you could run to automate this collection. Here is that script...
http://www.nagiosexchange.org/cgi-bin/page.cgi?g=Detailed%2F2675.html;d=1
(Corrected an alias name from original version)
Looking for Buffer hit cache ratio in PostgreSQL?
SELECT datname,
blks_read,
blks_hit,
round((blks_hit::float / (blks_read+blks_hit+1) * 100)::numeric, 2) as cachehitratio
FROM pg_stat_database
WHERE datname !~ '^(template(0|1)|postgres)$'
ORDER BY cachehitratio desc;
That was an excerpt from a larger shell script that you could run to automate this collection. Here is that script...
http://www.nagiosexchange.org/cgi-bin/page.cgi?g=Detailed%2F2675.html;d=1
(Corrected an alias name from original version)
(Page 1 of 1, totaling 3 entries)


