Monday, April 28. 2008
I did quite a bit of work for 8.3 on closing holes that allowed improperly encoded data into the database, so it's now quite hard, if not impossible, to get Postgres to store invalidly encoded text data. However, I've never had to deal with that much in real life until today - luckily all my clients and work assignments have used encodings where that wasn't an issue. Today, on the other hand, my new client gave me a database dump to deal with that is in UTF8 and contains invalid data (the client currently uses 8.1.x - part of my job will be to get them to 8.3). "Just use iconv to remove the bad data'" the client told me.
Unfortunately, this doesn't quite work.
Iconv will remove byte sequences that don't obey the UTF8 sequence rules. The problem is, that's not enough to ensure that the sequence actually designates an actual UTF8 character. I found the following sequence left by iconv that Postgres quite rightly refused to allow: 0xf7b09280. I have no idea what it's meant to be. I suspect some client program simply took some input bytes and escaped it regardless of sanity. To understand why this is not valid, see the third table here: http://en.wikipedia.org/wiki/UTF-8#Description
So, if iconv leaves some bad stuff, how can I get rid of it? Enter the admin's Swiss Army chainsaw: Perl. A little experimentation found that this would work:
perl -spi.bak -e 's/[\xf5-\xff\xc0\xc1][\x80-\xbf]*//g' output-from-iconv
The resulting file loaded without error. So now I can go to work on solving the client's real problem
Tuesday, April 1. 2008
The entire staff of Dunslane Consulting, LLC (i.e. my wife Melanie and I) drove up to College Park, Md last Thursday. We spend Friday viewing the portrait of Stephen Colbert in the restroom of the National Portrait Gallery and wandering among the beautiful cherry blossoms on the trees around the Tidal Pool. Then we started doing what we actually went up there for, namely the PostgreSQL Conference East (meaning the east of the USA, or perhaps of North America).
A good time was had by all, I think. I gave a talk on "The Zen of Postgres, or How to Become a Happy Hacker". The idea was to give people a few tips on how to get into contributing code to Postgres. I enjoyed and found useful especially the talks by Greg Smith on shared buffers, Theo Schlossnagle on dtrace and other Solaris features that make it a great platform for Postgres, and Vivek Khera on how he used partitioning to improve performance out of sight.
It was especially good, too, to see Jan Wieck. I had a good talk with him about what he's doing, and I think we'll see some more great features for Postgres from him soon.
Some people traveled from a long way away to attend, too, including Magnus Hagander from Sweden, and Nikolay Samokhvalov from Russia. Amazing.
Lastly, it was good to see a courageous friend from the MySQL community. Welcome to Baron Schwartz. I hope we see more of this cross fertilization.
Congratulations to Selena, Joshua, and all those others involved in organizing the conference.
Saturday, December 29. 2007
A little while ago I asked on the hackers list about doing minimal updates, i.e. not doing an update if the actual records don't change. The reason for this is that I'm working in an environment where some tables have lots and lots of indexes, which makes updates quite expensive. Detecting when an update actually involves a change isn't always easy, and if the update involves lots of fields it gets verbose, error-prone, and costly.
The consensus solution was to have a trigger that does a very cheap comparison, mostly using memcmp(). Yesterday, with some advice from Tom Lane, I got this working.
I'm going to submit something like this for 8.4 as a utility function. But I need it working now on 8.2.
The code is in the extended body - it should work with both 8.2 and 8.3. Here's the makefile:
MODULES = min_update_trigger
PG_CONFIG = pg_config
PGXS := $(shell $(PG_CONFIG) --pgxs)
include $(PGXS)
To use it, run "make install" and then in your db do:
CREATE OR REPLACE FUNCTION min_update_trigger()
RETURNS trigger
AS 'min_update_trigger'
language C;
And for each table where you want this to work, install the trigger:
CREATE TRIGGER _min BEFORE UPDATE ON mytable
FOR EACH ROW
EXECUTE PROCEDURE min_update_trigger();
You can disable or enable the trigger as needed. To disable it, do:
ALTER TABLE mytable DISABLE TRIGGER _min;
and re-enable it with:
ALTER TABLE mytable ENABLE TRIGGER _min;
Continue reading "Minimal Update Trigger"
Friday, April 6. 2007
Here's a little image that my son Tom (author of the recently merged enums patch) just made. Guess who it's meant to be?
He likes this one better than the earlier effort, so in deference to his artistic sensibilities I replaced the one I had here before.
Friday, August 18. 2006
The script I pasted recently had some problems, which I have now fixed.
It was very annoying that one of these generated slonik segmentation faults instead of giving some meaningful error message. It took me half a day to find out the problem.
Gavin Sherry tells me he is working on some APIs that will make building a slony management GUI easier. That's good news.
Meanwhile, the speed in copying the subscribed set on my 9Gb database was quite commendable, and since I got it working it has run without a squeak.
Tuesday, August 15. 2006
My son Tom is visiting here in North Carolina, on his way to a working holiday in the UK. He and I have decided to knock over the enums project that I unfortunately didn't get done in time for 8.2, so we'll be posting some discussion Real Soon Now.
As part of my new job (see message on -hackers list) I am looking at Slony-1 replication for our databases.
First impression: it looks cool but needs much better docs, plus some handholding tools. Windows people will undoubtedly run a mile from this, unless there are good tools I don't know about.
Someone gave me a script called meta-slonik, which helps to set things up, but it has some serious deficiencies. Two of them are:
- it doesn't handle stuff in multiple schemas nicely
- it ignores potential candidate keys, and wants to create a pseudo key on every table that doesn't have a formal primary key
So I have made some mods to the script to address these problems. The modified script can be downloaded here
Tuesday, January 24. 2006
We had some fun and games with gids and suexec on developer.postgresql.org, but now that's been sorted out I have the BuildFarm SOAP example client running there as a CGI script. See http://developer.postgresql.org/~adunstan/get_bf_status_soap.cgi.
I did ponder for a few secs making a new generation of BuildFarm switch from the fairly primitive way of communicating results to the server to using SOAP or a similar mechanism. I quickly rejected the idea, as it would require installation of a large new non-standard perl modulewith quite a few dependencies on all the BuildFarm members. Up to now I have been fairly careful only to use standard modules, and I think we need to keep it that way.
Wednesday, January 18. 2006
I asked Devrim and he wasn't sure how to do this. It's surprisingly hidden, although every blog you see seems to have some sort of list of links that the author has set up.
The way I found to do this is to use the HTML Nugget plugin. So that's how I got the links to my PostgreSQL page, resumé and PayPal button back on my blog page.
Saturday, January 7. 2006
I have been away a long time. Time flies when you're having fun, I guess.
Yesterday, I needed to put together a small SOAP demo, and I decided to create one using the PostgreSQL buildfarm.
On the buildfarm, we have a status page that is really a dashboard, showing the latest build for each member for each branch in the last 30 days. Links on the dashboard page let you drill down further for details, but it's a great place to see the overall state of things.
Michael Glaesemann has talked of turning the dashboard page into an RSS feed - I think this should help.
Creating the service from the main dashboard script was absurdly easy. Once I had SOAP::Lite installed I added a few lines and removed some extraneous stuff and was done. It took me about half an hour. The client was even simpler, and took me a few minutes.
Here is the client in its entirety:
#!/usr/bin/perl
use SOAP::Lite;
my $obj = SOAP::Lite
->uri('http://www.pgbuildfarm.org/PGBuildFarm')
->proxy('http://www.pgbuildfarm.org/cgi-bin/show_status_soap.pl')
;
my $data = $obj->get_status->result;
my @fields = qw( branch sysname stage status
operating_system os_version
compiler compiler_version architecture
when_ago snapshot build_flags
);
my $head = join (' | ', @fields);
print $head,"\n";
foreach my $datum (@$data)
{
my $line = join (' | ', @{$datum}{@fields});
print $line,"\n";
}
Creating an RSS feed with that info should not be terribly hard, especially if you could just feed $data to a Template Toolkit template, instead of outputting it using text as we do at the script bottom. This structure is all that is in fact identical to the one used to construct the dashboard page, so all the links on sees there should be able to be constructed for an RSS feed.
Saturday, July 2. 2005
Feature freeze is on Monday. I got in the validator function, and, at the last minute, a way to return a perl array where the function returns a pg array. Abhijit got in a patch to do cursor fetching the right way. That's huge.
We still need to revisit composite and array arguments, to turn them into hashrefs/arrayrefs. Then plperl will just handle such things naturally. And we need to support the polymorphic ANYARRAY and ANYELEMENT pseudotypes. At that point plperl will arguably be at least as powerful and useable as plpgsql. Pretty impressive
I also sent in a patch to put language handlers in pg_catalog instead of public. That means you could easily drop the public schema with no ill effects. Dropping use of public still needs to be done by contrib modules. We should really deprecate use of public.
Meanwhile, buildfarm email notifications are done. Members can put notification of just their hosts in their config file, and anyone can subscribe to the general digest lists. I have also started putting hard-to-harvest info on owner email addresses on the members, history and log pages, so that people with questions (e.g. Tom Lane) can contact them directly.
I am now looking at the best way to provide SubVersion for pgfoundry members. This also ties in with my day job, as I have committed to moving us from CVS to SVN. Of course, the pgfoundry setup is more complicated. But I think I can see how to do it ... more later when I have the scheme worked out.
Friday, June 17. 2005
Time flies when you're having fun ... I have been so busy these last 4 or 5 weeks I haven't had any time to put anything here.
That includes 2 weeks in Australia and more time recovering from jet lag.
So, news:
Buildfarm:
- we now report the actual steps passed, as well as the one that fails (if any).
- we now have preliminary support for VPATH builds
- I have started work on mail notification support, both via subscribable digest mailing lists, and direct to recipients chosen by the buildfarm members.
PLPerl:
- Abhijit Menon-Sen's return_next patch has gone in. That's a huge enhancement - means we can now return giant resultsets. My testing shows it's faster too - 40% resuction in run time on counting a returned 10,000,000 row set.
- It carried my "use strict" patch along with it

- I also have a tiny patch outstanding to trap and log non-fatal warnings - these were currently being dropped.
- UTF8 support has also gone in.
- there are still some urgent items I want, and that we might be in trouble getting:
- array support - previous patch was wrong and in any case bitrotted.
- support for fetching huge resultsets - no response to Abhijit's queries on the -hackers list
- validator function - I might look at this
- The first-call thing looks very iffy - I'm not going to pursue it unless I find a better way than what PL/R uses
Latest madcap scheme: I'm thinking of adding PL/JavaScript to Postgres, just for fun. It should actually be quite a good fit, especially now that Postgres has named function params.
Meanwhile, Devrim has decided to move to blog s/w that supports Postgres properly. Sound decision. A pity that all the customisation of my blog got lost. Oh, well, function matters more than form.
Sunday, May 15. 2005
Yesterday Tom committed the work to make it easy to test server-side PLs (plperl, pltcl, plython ...). Today I committed the buildfarm enhancement to use it. We're not retrofitting any of this - it only works on HEAD branch right now, and only the latest buildfarm version supports it - I'll be tagging and building a new release later today. Still, this is good going forward. I leave for aus in 12 days and I wanted to get this done before then.
This has been a slight diversion ... partly spurred by some ill considered efforts to push PLs out of the core. But it does get done a feature request. Not testing PLs was something of a hole.
All this make it important to get some more comprehensive tests for the PLs.
Now ... back to plperl improvements ...
Wednesday, May 11. 2005
Darnit! Just when I thought I'd got done with CSV stuff somebody in IRC finds a bug in multiline fields in 8.0. The fix was fairly easy to find - one line as it turns out. And the bug isn't present in the HEAD branch, because of the reworking I did to support multiline fields better. But it's very annoying nevertheless.
Saturday, May 7. 2005
The other day David Fetter noticed that there was a problem with CVS tip on his FC3 machine - the regression suite failed because psql bombed with a runtime link failure.
At first he thought it might be his machine setup. A quick check on the buildfarm showed that the FC3 box there was having the same problem. So that set his mind at rest. But even more, going back through the build history to where the problem first occurred it was very easy by looking at the log of the changed files to find the cause, which turned out to be the introduction of the --as-needed flag for ld.
All this was found in a few minutes. That's what buildfarm was created for.
|