Friday, October 23. 2009A small disagreement in FSFLand
Karsten Gerlof, President of FSF Europe, has posted a blog entry that is very much at odds with the opinion I discussed the other day of Richard Stallman and some others, on the question of the value of dual licensing.
A couple of the best paragraphs:
He's spot on. And, to reiterate the point I made the other day, what I like about the PostgreSQL approach is that it puts all the players on a level playing field. There is no privileged player. max_files_per_process and OS limits
A client started getting some odd problems on a development server today, including some that said things like
After a little googling we found theDETAIL: could not create socket: Too many open files max_files_per_process setting, but the docs on it are far from intuitive. In particular, the advice to cure the problem by reducing this setting seemed positively counter-intuitive.After some research and thought, here's my understanding of what's going on. First, you might be hitting the global OS limit on open files (In Linux that's what you see with the command "sysctl -n fs.file-max"). Or you might be running into a per process limit (what you see with the command "ulimit -n"). In our case, the global limit is set very high (1617232). and "lsof" showed us to be nowhere near that range, so it seemed far more likely that we were hitting the per-process limit, which was 1024. As I understand it, Postgres will start closing files once it reaches the max_files_per_process limit, for files it knows it has opened. However, those aren't the only files it might have open. For example, it can't account for files opened by plperlu functions (which we have, in abundance). So there simply was not enough head room between the max_files_per_process limit of 1000, and the OS per process limit of 1024, to allow for things like perl modules.Our solution was to increase both the limits and the head room. This is a fairly fast and powerful box, and so we doubled the OS per process limits that postgres is operating under, by adding "ulimit -n 2048" to the startup script, and increased the number of files postgres would keep open, by setting max_files_per_process = 1500 in the postgres config. So our head room above the limit is now 548 instead of 24. And even if every postgres process used its maximum we'd be nowhere near the global limit.One thought I did have was it might be useful if we generated some sort of warning when the limit was approached, although I'm not sure how useful that would be in practice. Wednesday, October 21. 2009Couldn't have said it better myself ...
I don't usually comment on license wars, but I just found this amusing, so with some hesitiation I'll continue ...
Richard Stallman and some others have sent a letter to the EEC opposing the acquisition of MySQL by Oracle via the latter's acquisition of Sun. Here's part of what they say:
There, in a nutshell, is exactly what is wrong with the "parallel" approach. Why should anyone donate their code to a company practising this approach so that they, and only they, can make money from it, and eventually fall into the hands of some rapacious company like Oracle? And, by the way, what will stop Oracle or any owner from trying to be very aggressive about making commercial users pay for using the software? We have already seen one change in licensing terms for the libraries that many people found unconscionable, and I have myself been told by MySQL AB staff, long before the Sun acquisition, that any use by a commercial enterprise, including just internal distribution, would render us liable to pay them some license fees. It was that advice that drove the company I was then working for to decide definitively that we did not wish to use MySQL, and that the license terms for Postgres were much more suitable for what we wanted to do, which was to be able to ship a reference implementation of our product with a free (as in beer) database. Other people have since argued with me that we could have done it. Even if that's true, we did not wish to get into a legal argument with MySQL AB, so we simply decided not to use their product. But back to what Stallman and Co. had to say, I dislike monopolies. The "parallel" approach that they advocate creates a monopoly, where one entity has privileged rights over the code. By contrast, the BSD license that Postgres uses is much more open. Anyone can use the code. EnterpriseDB or CommandPrompt can sell products made using the code, modified or unmodified, as they wish, and using any code distribution policy they wish. So could we at PostgreSQL Experts Inc. if we were so minded (right now we are not, but you never know what the future holds). So can anyone else. Nobody has a monopoly. The other thing I like is that the BSD model used by PostgreSQL is much more likely to foster the creation of a real community, unlike the monopoly approach. Linux and certain certain other GPL licensed software have been successful, I suspect, precisely because they have not followed the parallel approach. Nobody has a sufficiently clear title to the whole body of Linux code that they could form such a monopoly, not even Linus Torvalds, who, I am sure, would not want to do any such thing. If he had started out requiring everyone to donate an unimpeded license to use the code to him, as MySQL does if you want your code incorporated in their product, I suspect Linux would have died long ago. Use of a BSD license is not a guarantee of an open community. Some BSD licensed projects have fairly closed development models. But I like the way it works for Postgres. That's one of the reasons I work on it. Incidentally, I agree with Stallman and Co. that Oracle should be required to float off MySQL if the Sun acquisition goes ahead. But my reasons have nothing to do with the passage above. I simply don't think that dominant players in a market, as Oracle is in the database market, should be allowed to acquire their competitors. I'd say the same if they acquired any database company, regardless of the license terms under which it was distributed. Wednesday, October 14. 2009More fun with windowing - avoiding expensive function calls
Say you have an expensive function you want computed over a window. It needs the window values aggregated, so you might call it like this:
select datum,foo(array_agg(datum) over w) from mytable window w as (partition by expr); But you soon discover that foo() is being called for every row, even though it's giving you the same value for every row in the window. So you might want to try to call it once per window. Then you can use another window function like first_value() to distribute the computed value to the rest of the rows. Unfortunately, you can't nest calls to window functions, so you have to do something like this: I'm sure there are more efficient ways to do this sort of thing, but I'm still feeling my way with window functions. We really need a large cookbook of window function recipes. Monday, October 12. 2009Just wow
Over at Linux Magazine, Bruce Byfield wrote quite reasonably about the abuse he received for taking a pro-feminist line in discussing sexism in the FOSS community. Among the comments is one from somebody calling himself "MikeeUSA" which includes this choice paragraph:
Just in case you've been living under a rock, Hans Reiser is in prison for murdering Nina. What sort of idiot do you have to be to blame the victim for their own murder? This is the equivalent of blaming Nicole Brown Simpson for depriving show business of a great actor. Read the whole thing if you want to be amazed, shocked, possibly distressed. Now what does this have to do with Postgres? Nothing much, except that Postgres is a FOSS project, and I for one am acutely aware of the imbalance of genders in our community, and wish it were otherwise. Count me among the Bruce Byfields and not among the MikeeUSAs. As far as I am concerned, people like MikeeUSA need to crawl under a rock and stay there. If I found such a person in our community, I'd be inclined to make their life so uncomfortable that they left. We don't need bigots and sexists, and we do need far more women. And we need more people to say so. Saturday, October 3. 2009Hidden stuff in docs source
While reviewing a patch the other day I came across some stuff in the docs that was commented out. I asked about it on the -hackers list but nobody seemed to care much. But ISTM that we should clean extraneous stuff from the docs. Stuff in SGML comments is useless. If it's a comment on the docs, that's one thing, but if it's content that is disabled then it's utterly useless.
I don't have time right now, but it would be nice if someone would clean this up a bit. Here is the unix command I quickly put together to look for all the comments: (Yes, I know there are probably a great many ways of doing this, and a few will be better. You do it your way and I'll do it mine.)Most of the comments are fine, but there are a number of things that probably need some attention. Friday, October 2. 2009Portability is both possible and beneficial
David Fetter says that database portability is possible but very hard.
That's not my experience. And we'd better hope he's not right, because it would make for perfectly reasonable grounds for many applications to drop PostgreSQL support. In fact, the people arguing that Drupal should drop PostgreSQL support are taking much the same line as David. Bugzilla is today able to run on PostgreSQL because of some work I and a couple of others did in creating an abstract schema mechanism, which it turns into a database specific schema for the database it's running on. This mechanism allows the Bugzilla developers never to have one logical schema on MySQL and another on PostgreSQL, as there is one and only one source for the logical schema, that is common to both. I believe that Drupal is moving in a similar direction, although some of its third party modules don't obey the discipline too well. But the really effective way to set up thorough going database independence is to do something like I saw at a large Wall St financial insitution I worked at for a while. For security reasons they had a policy that all access to the database had to be via stored procedure calls. This had the interesting side effect that the physical organization of the database was completely hidden fom the application programs. All they had was a procedural interface. The database administrators could have changed the organization of the database completely, and the apps would never have noticed. Or, indeed, they could have changed to a different database engine completely with similar effect. The applications programmers hated this. There was a never ending hatred between them and the database group. But I have since come to appreciate what a good thing it was. In particular, I think that this or some similar mechanism should be used to separate many applications into layers. Database specific stuff should be in the bottom layer and hidden from the layers above. We know from experience that a great many applications developers aren't very good at writing SQL, just as many people like me are hopeless at creating nice GUIs. Layering has many benefits, and one of them is the possibility of database independence. I think the argument that supporting n databases involves O(n²) complexity is overdrawn. It's no more difficult than, say, supporting numerous operating systems for PostgreSQl to run on. And it becomes manageable if you reduce the terrain that database specific code operates in.
(Page 1 of 1, totaling 7 entries)
|
My Links etcBlog AdministrationCalendarQuicksearchArchivesCategoriesSyndicate This Blog |
