Monday, July 21. 2008
On July 22nd, 2008, I will be joining Sun Microsystems as PostgreSQL software engineer. Sun has been a valuable contributor to the PostgreSQL project for a number of years now, and I am looking forward to joining them in this effort. I am glad that I will be able to continue my personal role in the PostgreSQL project with the support of the great resources that Sun provides.
I would like to take this opportunity as well to thank my former colleagues at credativ GmbH for their support of the PostgreSQL project and my own career. I wish them much success in their continued development.
So, I expect that I will have more time to contribute to PostgreSQL development from now on, and both Sun and I have a sizeable backlog of projects and ideas that we would like to realize. Time to get started!
Friday, June 20. 2008
I'm sure everyone has created something like this once:
CREATE TABLE a (id int PRIMARY KEY, value text);
CREATE TABLE b (id int PRIMARY KEY, data text, a_id int REFERENCES a (id));
CREATE TABLE c (id int PRIMARY KEY, a_id int REFERENCES a (id), moredata text);
They all have id columns, but they are all different sets of IDs. Moreover, the same set of IDs is sometimes knows as "id", sometimes as "a_id". Some time ago I come to the conclusion that this naming scheme is bad. It makes sense locally within the table, because a.a_id would be redundant. But I think it has advantages to be redundant here anyway.
The first advantage is that you can use this join syntax:
a JOIN b USING (a_id) JOIN c USING (a_id)
If you build denormalized intermediate tables (materialized views) of this type
CREATE TABLE a_b AS SELECT * FROM a, b, c WHERE a.id = b.a_id AND a.id = c.a_id
it will fail because the join will have multiple columns named a_id. Unless using the USING join syntax, it won't know they are actually the same. So you would have to write out the column list instead of the asterisk in the select list.
Also, once your SQL queries become more complicated, say using table aliases and subselects, it is easy to lose track of what foo.id means in a particular context.
For all of these reasons, the general rule arises that a column name should be globally unique within a database schema. This should actually be checkable automatically if you allow duplicates only for columns connected by foreign key constraints.
There is another idea you can play with while developing your database schema. You start by creating (conceptually) one big table containing all the columns that are needed to store your data, which obviously requires globally unique column names. Then you use the decomposition algorithm to normalize this table down to the individual tables that you actually implement. Some might find this preferrable to converting ER diagrams.
Monday, May 26. 2008
To conclude my expedition to PGCon in Ottawa, today I completed the Ottawa Marathon. A very well organized race to go along with a very well organized conference. Until next year.
Sunday, May 25. 2008
PGCon 2008 is over, and I couldn't help feeling somewhat sad.
The second day of talks also featured many good presentations. My own also went quite well. We concluded the day with a meeting of the core team members who were present in Ottawa, and there will probably be a few actions coming out of that. Don't expect anything too dramatic, though.
Thanks to everyone for this great conference, and I'll see all of you next year if not sooner.
Friday, May 23. 2008
I welcome back my faithful readership to their regularly scheduled program. Thursday was the first day of talks at PGCon, as evidenced by the rise of blog posts on planetpostgresql under a "day 1" label. Welcome newbies!
The talks were really good. I won't go into the details here; the slides should be available on the conference web site. For those not here I will summarize the sentiments of the day as Everyone is hiring. If to you, "vacuum" means cleaning the disk rather than cleaning the house, there appear to be literally dozens of jobs just waiting to be grabbed.
In the evening we had the EnterpriseDB dinner. I think I caught a cold in the draft. Must save my voice for my presentation.
Thursday, May 22. 2008
Wednesday morning, I went geocaching again, attempting to complete the second UO cache, but ended up at a pile of rubble again. Hmm. Maybe this game works differently here in Canada. I hope I get to drop my travel bug before I leave though.
The main event of the day, at least for the major developers, was the developer meeting, which I incidentally managed to locate with my GPS device. I was skeptical about this meeting beforehand, but I have to say, it was extremely useful and enjoyable. The wiki page linked to above contains the meeting minutes. So I guess I'm on the hook now to kill off some PostgreSQL mailing lists, and perhaps we'll have a prototype cmake-based build system for PostgreSQL sometime.
In the evening, everyone met at the Yahoo! drinks+food event. Since I arrived an hour late because I was training for the run on Sunday, I didn't get to see any Yahoo things except a few Yahoo-branded napkins. But hey, I got a free beer and I was able to see the rerun of the Champions League final. Someone came up to me later to thank me for my previous blog entry reminding him to watch the game as well. So FYI: As far as I know, there is nothing important on TV today.
Wednesday, May 21. 2008
I went geocaching this morning, because I have a Travel Bug to drop in Canada. But I only found a pile of rubble. Ran into Bruce just getting out of his cab from the airport on the way back.
I presented my tutorial on porting Oracle applications to PostgreSQL in the afternoon. This went quite well, and I received encouraging feedback afterwards. The slides of the presentation should appear on the PGCon web site at the linked URL. For those who were confused by the somewhat cynical tone of the presentation: I have been traumatized a bit by the issue. By and large many applications are quite easy and straightforward to port. I certainly do encourage these efforts.
One thing that came up after the presentation that I have not considered in great depth is the issue of performance of the ported result. In the discussion, a few possibilities were mentioned:
The Oracle application is so carefully tuned with optimizer hints, it will never perform on PostgreSQL. We probably can't/won't port it.
Half the time of a porting project will be required to tune the PostgreSQL port, because Oracle optimizes bad queries much better.
Some things perform better in PostgreSQL, some worse. It probably averages out.
With the money you save with PostgreSQL, you can afford better hardware.
More insight on this issue would be welcome.
I think I'll go on a second geocaching attempt now and hit the Royal Oak pub later with the rest of the group.
Note for those coming from across the pond: The Champions League final is televised on TSN and RDS beginning at 14:00 tomorrow. You get both of these channels in the university residence. But there's the developer meeting ...
Tuesday, May 20. 2008
We're back in town, Ottawa! I had to go through special interrogation twice at the airport (once for immigration, once for customs), but I made it. Now I'm sitting at the desk in the university residence, staring out the window down onto the city. Not much has changed; it feels like I was here just yesterday. How funny.
Friday, April 25. 2008
Well, I asked for a wiki almost three years ago, but it is finally here in official capacity. Thanks! In the few weeks of its existence, we have already seen great progress in documenting many development issues that were previously undocumented or hidden in strange places. I have started to keep my personal notes and todo items, which were previously spread around various files and pieces of papers, under User:Petere.
I have been observing the wikification in the Debian project for a few years now. I expect that in the near future, all of the contents relevant to developers will have migrated from the main web site to the wiki.
I have signed up for the Ottawa Marathon on the weekend after PGCon. If anyone is still around on the Sunday, join me on or at the side of the track.
Monday, April 14. 2008
Reinhard Max has revived the PostgreSQL RPM packages for SUSE and has given them a permanent place at the openSUSE Build Service. This builds on and supercedes my earlier offering, so please go to the new URL in the future. And please send your suggestions and contributions; it is "open" after all. By the way, the packages developed there are integrated into the official SUSE distributions, so it is the "real thing".
Wednesday, March 19. 2008
The PostgreSQL RPM packages for SUSE have been neglected a bit recently. I have put up some packages for 8.3.0 and 8.3.1 at the openSUSE Build Service. Use the search interface to search for "postgresql". This won't be a permanent project of mine, but I'll probably continue as long as I need the packages myself and SUSE or someone else doesn't organize this job more sustainably.
Wednesday, March 5. 2008
Obviously, a lot of people are having trouble with the removal of many implicit casts in PostgreSQL 8.3. While this will lead to more robust applications in the future, it will prevent many people from moving to 8.3 altogether at the moment. The problem why you can't simply make the casts implicit again is that in a somewhat unrelated development, 8.3 will generate a cast from and to text automatically (well, implicitly) for any data type, which is why most of the casts in question have been removed from the pg_cast catalog altogether and you can't simply run an UPDATE command to put them back the way you want.
I have used a shell script to regenerate the removed casts including the required cast functions. The result is pg83-implicit-casts.sql. (Remove the x in the file name. The web server doesn't appear to want to serve .sql files.) I have tested this against some of the recent "bug reports" I found on the mailing lists, so it should work for some people at least. If someone wants to use this, I suggest you only add the casts you really need, not the whole file.
Thursday, February 28. 2008
A few helpful people and myself have been working on a Git hosting site for PostgreSQL and related projects at http://git.postgresql.org/, which we hereby let loose on the daring public. It serves mirrors of selected projects as Git pull and web access, and it offers project hosting with push access as well as user branch hosting. My earlier blog entry on patch review explains some of the rationale. We hope you will find this service useful.
Saturday, February 23. 2008
Improving the patch review process was one of the much discussed topics for the 8.4 release cycle. The recurring commit fests are one idea that will be tried out. Here is what I think. If you want more, better, and earlier review happening, you need to facilitate reviewing in many ways.
We currently have a well-established and arguably successful process for maintaining existing features. We have a public source code repository, test infrastructure, and many people willing and able to contribute. Committers can fix small issues immediately and larger issues with small overhead. Others can work on small issues with the small overhead of sending a patch that will soon get committed, and they can work on large issues, well, that is the debate. If more than one person wants to contribute, they obviously can. We have many hands and eyes working on everything, and that works well.
We don't have any of that for new feature patches, the staging area of development. The source code repository is a set of possibly related multiple 10k patches spread around the mailing list archives. The only way for others to get involved in small or large ways is by chance learning about the feature proposal, fetching the patch, which will usually no longer apply cleanly, alter it and send an equally huge patch back. If more than one person does that, it's impossible, unless you want to get involved in interdiff madness. Meanwhile the original developer has to sit still waiting for review, or more likely, continues development, which invalidates all the review work.
Now we also know that reviewing the core logic of patches is in fact hard and needs skilled people of which we would always like to have more. But much of our CVS-based development works with many little hands, and so should feature patch review.
Here is how I think patch review could work better. Someone writing a feature patch should use a distributed version control system that ties into our currently-CVS trunk. He publishes that repository either via web or via email. Now the little hands can get involved immediately. They can pull local copies and review the commit history. They can send small bug fixes, formatting, wording improvements. They can update the patch for new developments in the mainline or investigate collisions with other features in development. Someone else can start writing documentation. The original developer can merge those changes back and continue based on them. Meanwhile, a useful revision history is always available for more people to get involved. Additionally, by a simple link published in a wiki, say, you have obsoleted all the needs for patch trackers or the like. Other people, perhaps more skilled reviewers can offer early comments and easily review development progress.
That is how I would like to work. I have started to use the Git repository for my development work now because I believe that could make it happen. But you can replace your own technology if you think it can achieve similar effects. (In fact, Git is in my mind so ridiculously better than our current anonymous CVS setup and replaces CVSup as well, there is no reason -- other than the need to learn some new tools -- for anyone who has no CVS write access to even bother with CVS anymore.)
Incidentally, quite a few people have shown interest during the mailing list discussion in trying out source code repositories mirrored off CVS to something of their liking. I hope that way we can find more optimal development methods in the near future.
|