Friday, July 3. 2009Parallel pg_restore for PostgreSQL 8.4
I try to complete at least one significant feature item per PostgreSQL release. This time the feature is making pg_restore run in parallel. This is quite important for many users, particularly some large enterprise users.
It's important that people understand what this will do and what it won't do. pg_restores runs a number of steps. In conventional mode it simply runs them all in a single connection to the database, one after the other. In parallel mode it first runs all the quick and easy steps, essentially those that don't involve any data access, such as table and function creation, in a single connection, just like conventional mode. Then it runs the remaining steps each in its own connection. The steps are the same, and there is no parallelism within a given step. For example, a single COPY to a table is not parallelised. Rather, we run it in parallel with other data intensive steps. The maximum amount of parellelism is controlled by the user. This will involve some experimentation to get to the sweet spot for your setup. A good place to start is the number of physical processors you have available. The idea here is to improve the situation where the CPU is the limiting factor, and allow you to drive the restoration rate up to where IO is in fact the limiting factor. With very high end hardware we believe that you can drive the parallelism quite high. Like many performance features, this one might well require several releases to tweak it for optimal performance gain. The program works by keeping a pool of slots to be used for the steps that are run in parallel. One possible area for improvement is in the algorithm that selects the item to be used for a slot as it it becomes available. Currently we keep a queue of items that have no remaining unrestored dependencies. An item gets put on the queue as soon as all the items it depends on have been restored. This is likely to be a fairly good approximation of an optimal algorithm, but there might well be a way of tweaking it. Another possible area of optimsation would be to take some notice of the tablespace that each item affects, and try to balance these, so we use as many IO channels as possible. What is important is that we have now got the basic framework of parallel restore, so that some researchers can easily experiment with various tweaks to improve the performance. pg_restore is going to be with us for quite a long time. Even if we manage to get pg_upgrade working pretty well, that will take quite a bit of time, and there is currently no guarantee that it will for for every release. So I expect pg_restore to be the most common method of upgrading for quite some time, making it run as fast as possible is thus still a significant requirement. I'm proud to have been able to contribute this feature to Postgres, and look forward to other people improving it further as time goes by. Wednesday, June 24. 2009Workation
Tomorrow we pack the family and pets up and head off to Hilton Head Island for eight days. A little swimming, a little jetskiing, probably too much good dining, lots of relaxation.
I have been prepping a laptop to take with me. Unfortunately, several clients want me to be able to do things if an emergency arises, so I need the laptop to be able to talk to their VPNs. That means I need to support whatever they use. Everybody seems to use something different. So far I am supporting Windows and Checkpoint VPN on WIndows (since that's the only place the client runs) and OpenVPN, CiscoVPN and Juniper on Linux. Many of these have the very annoying habit of locking the client machine out from any access to anything other than the VPN. That can be more than a small inconvenience. I complained about it to one client the other day and the response was "Yes, we hate it too." Not enough, apparently, to change it. Overall, I have found OpenVPN to be the nicest VPN so far to use on Linux. On Windows, the native VPN works pretty simply too, and is a good fit if you're only going to be talking to Windows boxes via Remote Desktop. I decided yesterday to upgrade the Linux install on the laptop from Fedora 10 to Fedora 11. You would think that after all this time they would be able to handle upgrading their own packages nicely, but it didn't work any better than the last time I tried to upgrade a RedHat machine, back around the time of RedHat 9. So I backed up what little stuff I had that wasn't actually a Fedora artifact, and reinstalled. The other thing about upgrading that made it so much fun was when anaconda decided to barf when trying to mount my swap file, which it couldn't see. I had to comment it out of the fstab. Why anaconda needs to mount the swap file to upgrade, when it doesn't need any to install, AFAIK, is beyond me. One nice thing I noticed is that Fedora 11 has a much nicer naming scheme for volume groups and logical volumes. Good work there. Anyway, now I have F11, with all the VPNs working and tested. I'm hoping that customers won't be too demanding so that I can grab a few minutes to continue with making the buildfarm client support git. Friday, June 19. 2009Misuse of SQL
Today I was reminded of something I saw about 15 years ago, when working as a systems adminstrator. One of the application developers had a large amount of data that needed to be sorted. Knowing SQL, he wrote a program in C with embedded SQL which created a table, read in the data, and then selected it with an ORDER BY clause. Problem solved.
Talk about a sledgehammer to crack a nut. Switching metaphors, I guess it's a case of everything looking like a nail if all you have is a hammer. Note that 10 lines of C code would have sorted the data in a fraction of the time. Or a simple call to the unix sort utility. Even if the data were too big to fit in memory, something much lighter like using a Berkeley DB btree or hash would have hardly been much more trouble. Today I saw something similar. Someone created an in-memory SQLite table, and set up an index on it, just for fast random access to it. And this was in PHP which has associative arrays which presumably offer fast keyed access (by hashing the keys I guess - I know nothing of the innards of PHP.) After rolling on the floor laughing for a while, I ripped the guts out of this thing and just emulated it using an array of arrays. Guess what? Performance was fine and the program no longer relies on SQLite, the non-PDO version of which isn't available on the version of PHP installed on the machine in question. SQL is not a general purpose tool. It's a tool for manipulating data with quite specialised needs, and using it for something else is just silly. Monday, June 8. 2009Software kinda guy
Some people drool over hardware specs, They will talk animatedly about drives and CPUs and such.
I'm not one of these. In fact, I'm not really good with hardware at all. Discussions like the above leave me fairly bored. Elegant algorithms, and elegant expressions of algorithms, are things that interest and occasionally delight me. But not machine specs, and the like. Now that might surprise some in a person who has made his living with computers for about 25 years. But it's always been so. I recall at one stage of my career working on some machines that I had never seen, and I wouldn't have known what they looked like if I fell over them. They lived in the data centre or computer room or whatever it was called back then, and all I needed to know was how to drive them from my terminal. I knew some basics about the hardware, but nothing terribly specific. Perhaps this comes from the fact that I have never been very well coordinated physically (possibly a function of my bad vision, but possibly just genetic - it's been so from before high school), and so I tend to fumble things a lot. That doesn't matter too much when you can undo your typing in an editor, say, but it matters a lot when you're carrying things or plugging them in. I once saw someone drop a DASD pack, a rather expensive storage item back then for big iron machines, and maybe that too has frightened me a bit of getting too involved with hardware. Anyway, I think it's probably a sort of Mars/Venus thing. These thoughts have absolutely nothing to do with Postgres, except in the sense that they tell you something about why I spend my time working on it rather than on stuff closer to the metal. All this popped into my head while I was wrestling with hardware, of course. I'm converting an oldish laptop to be dual boot, and moving data from an old 100Gb drive to a new 320Gb drive. In the course of exploring the wonders of Windows file systems and cloning them, I must have swapped the old and new drives in and out half a dozen times, dropping screws along the way, and finding them again. The process is almost complete. I finally got it all working with a nice little package called SystemRescueCD which appears to do the job quite nicely. In particular, there's a package included there called FSArchiver which seems to work very nicely. Soon I will have my daughter happily dual booting this laptop. (And no, a virtual Windows machine would not have done - some of the software needs to work direct with the hardware.) Now, back to work ... Tuesday, June 2. 2009Still having fun with git
The buildfarm contains some mildly elaborate checking to make sure that the working copy it has is what we think it is. This is very important, since we absolutely must know that what is being reported on is a faithful copy of what is in the central repository. We had some odd experiences in the early days where one or two people were building against copies that had built in or even that they had tested changes in.
This is one of the areas where git differs a lot from CVS (and Subversion, with which I am also fairly familiar). In a distributed system, you can commit to the local repository, not just to the central repository. In effect you have not one repository that gets all the commits, but a netwrok of such repositories. Making one repository authoritative is a social construct rather than a technological one. So, feeling my way around that and learning all the commands and the way things work, so that I can check for consistency in a sane way, is taking a bit of time and effort. No doubt it's all worth it. But it's a bit of a slog. Monday, May 25. 2009Giggle
One of the most amusing presentations at pgcon was Josh Tolley's lightning talk. I guess it's not surprising that this should be so, as he is also the author of PL/LOLcode ("kthxbye"). This talk was entitled "How not to review a patch" and the video is now available at http://hosting3.epresence.tv/fosslc/1/watch/125.aspx at about slide #246. If you weren't there, check it out - it's a hoot.
Sufficient unto the day is the git thereof
Various commenters have directly or indirectly pointed out what I was missing w.r.t. git in my previous blog entry. I have enough to proceed with the buildfarm stuff, but clearly I have a lot to learn still. The next task facing me is to learn how to find out what has changed (if anything) on the branch in question since some given point in time - in this case since the last time the buildfarm ran a snapshot. After that I can look at how to work as a committer (using my other hat).
git thee behind me
Yesterday, a bunch of us did touristy things in Ottawa, and then I travelled home, arriving about 9.30 pm. The boat ride on the Rideau Canal was mildly amusing, and the trip made more pleasant by being able to spend the first leg chatting with my PGX colleague Miho Ishikura.
At the developers meeting held on Wednesday, it was agreed that I would try to get the buildfarm working with git as well as CVS. I started on that today. So far, I am a bit puzzled, though. It's not clear to me how I a copy of just a branch, and have it stick. i.e., I want the git equivalent of "cvs co -r branch-name" followed by "cvs update". The closest I got was to do "git clone" followed by "git checkout origin/branch-name" for the first part. However, when I subseqequently did "git pull" it barked at me about needing to specify which branch to update, and suggested I might want to set up some stuff in a config file. This probably won't upset most developers, because they almost always work against the HEAD branch, but it will potentially upset committers, who have to do some work on back branches, and it upsets me as the main buildfarm developer, because the buildfarm works on a copy of the code for each branch. There's a reason for this - if we didn't you couldn't run concurrently on different branches. I'm sure I'll be able to work around it, but I'm mildly disappointed that something so easy actually seems harder in git than in CVS. (I now await a million messages telling me why I'm wrong and what a fathead I am ;- ) ) Friday, May 22. 2009Londiste gets real, Wheeler gets converts
At pgcon2009, end of day 1.
I just went to the Londiste session. There are awesome features coming in version 3, including cascading nodes, changing providers, switchover and DDL changes on all nodes driven from master. This will be real competition for Slony, and it still looks ridiculously easy to use. This will be a win for lots of people. Earlier, my colleague David Wheeler gave a superb session on pgTap, and why people should incorporate it into their development methodology. He's a great presenter and had the crowd eating out of his hand. I think he convinced some waverers, and I will certainly be checking it out. Thursday, May 21. 2009Small features can still win big
At PgCOn 2009.
I went to Robert Treat's session on new features in 8.4. One he liked a lot is supress_redundant_updates_trigger(), because by deploying it strategically he gets a big performance gain for very little effort. Since this was one of my contributions, I got a nice warm glow when I heard him say that. I did a similar type of trigger in PL/Pgsql recently, using Andrew Gierth's new version of hstore, which has a nice function to create an hstore from an arbitrary record. The reason I did this was that I needed to ignore a couple of fields when deciding whether or not to suppress the update. The new builtin trigger is very fast because it doesn't work field by field, but just does a memcmp() of the data part of the tuples. I hope we can get a few more useful builtin trigger functions in future releases. UpdateSince blogging the above 3 people at the conference have approached me asking for info about this feature (not because of the blog entry). They all feel it will be tremendously useful to them. Which just makes my point further. Thursday, May 7. 2009Materialised Views for large joins
A meeting with a prospective client yesterday, the subject of an inefficient query that joined a large number or normalised tables came up. I don't know how many times I have seen this problem over the last few years - quite a lot at any rate. You particularly want to stay out of the zone where the genetic query optimiser kicks in, if possible. Over and over I have found that setting up a denormalised materialised view works really well for this.
Postgres doesn't have builtin support for materialised views (yet). But they are pretty easy to set up. First you need your actual non-materialised view, which you set up in the usual way, plus a table that you populate from the view: Add indexes to taste. Now you need to decide how to keep the data current. One very simple way is just to refresh it lazily every so often (say from a cron job): truncate foo_mat_view; insert into foo_mat_view select * from foo; If it needs to be kept up to date greedily, then another way to go is to use triggers. Depending on how complex the view is this can be tricky to work out, but in most cases I have found it can be done quite reasonably. I have seen a few cases where it led to an unacceptable performance hit, but in general I have found that triggers work quite well for this purpose. This is one of the most productive approaches I know if to performance problems of large joins. It's important to use EXPLAIN ANALYSE as an aid in deciding what goes in your matrialised view. It won't work if you just collapse a bunch of joins that are cheap anyway. Choose an expensive set for it and you will find your rewritten queries fly. PostgreSQL Experts Inc.
When I left the hedge fund I was working for a couple of years ago (boy, does that look like a good decision now), I decided not to seek full time employment any more but rather to try my hand at consulting, concentrating on PostgreSQL work. I formed a company Dunslane Consulting LLC and made my availability known to a few people. Since then I have managed to keep pretty steadily in work, without much effort. I've not had to go looking for work - it has tended to find me at just the right times.
However, the things you can do as a single consultant are limited. Some engagements require a number of roles, and for that I am often either technically or temperamentally unsuited (I have sworn never to work on a GUI again, nor on release management.) That's where PostgreSQL Experts Inc comes in. A number of people have got together to form this company, which is aimed at providing full service to PostgreSQL users, from setup to tuning to administration to application development. We have some people who are pretty well known in the PostgreSQL community, including Josh Berkus, our CEO, and David Wheeler and David Fetter, as well as your humble blogger, so we believe that potential customers will have confidence that they are getting the best when they come to us. Hence our name Feel free to call me or Josh or any of our other consultants if you need more information. Or check us out at pgCon Saturday, April 7. 2007*ahem*![]() Here's a little image that my son Tom (author of the recently merged enums patch) just made. Guess who it's meant to be? He likes this one better than the earlier effort, so in deference to his artistic sensibilities I replaced the one I had here before. Saturday, August 19. 2006Slon onThe script I pasted recently had some problems, which I have now fixed. It was very annoying that one of these generated slonik segmentation faults instead of giving some meaningful error message. It took me half a day to find out the problem. Gavin Sherry tells me he is working on some APIs that will make building a slony management GUI easier. That's good news. Meanwhile, the speed in copying the subscribed set on my 9Gb database was quite commendable, and since I got it working it has run without a squeak. Tuesday, August 15. 2006Enums revisitedMy son Tom is visiting here in North Carolina, on his way to a working holiday in the UK. He and I have decided to knock over the enums project that I unfortunately didn't get done in time for 8.2, so we'll be posting some discussion Real Soon Now.
(Page 1 of 2, totaling 27 entries)
» next page
|
My Links etcCalendarQuicksearchCategoriesSyndicate This BlogBlog Administration |
