Saturday, January 28. 2012
Someone asked on IRC how to build an external module using Microsoft C. I have done a lot of building core Postgres with various versions of Microsoft C, and written or committed substantial parts of the build system we use for it, but sadly I don't know the answer, and couldn't find one anywhere in our docs or on the wiki. Maybe there isn't a simple answer. If so, it's long since time we should remedy that. Unix people tend to look down on Windows, but in fact some of the largest deployments of Postgres anywhere are on Windows. It should not be beyond our wits to support this better.
Thursday, January 26. 2012
I have a (non-business) request to install and support WordPress. Now this is a project that doesn't seem to want to support PostgreSQL. Not for any good reason that I can tell. They just can't be bothered. So I found that there's a little plugin you can get that is a slightly clever hack. It apparently acts like a shim that intercepts the queries written in MySQL dialect SQL and rewrites them in the Postgres dialect.
When I had a look in it, however, it didn't look like there was any sort of use of parameterized statements. So then I had a look inside the Wordpress code. And they have a prepare method, but it doesn't actually use parameterized statements. What it does is interpolate values, supposedly properly quoted, into an SQL statement. Argh! Little Bobby Tables should have taught us that this is the wrong way to go. Use placeholders! There's no excuse for not doing it properly, no matter what your database.
I looked at the PHP API that Wordpress is using. It looks like it's the original mysql API that doesn't have a prepare method, unlike the postgres module. The "improved" Mysql extension for PHP does, however. Pity they aren't using it.
Wednesday, January 25. 2012
From time to time I hear supposed purists sneer at the PostgreSQL Enum feature. I usually try to restrain myself from replying to these people. But it's nice to know that they are appreciated, as this tweet shows.
I recently gave the Perl guys a bit of stick about the state of their bug tracker. However, I must give Ricardo Signes (who I am told is the current Perl Pumpking) and our own David Wheeler kudos for persisting to ensure that the bug I tried to file actually got filed, and seems to have some attention. Let's hope it gets fixed.
Monday, January 23. 2012
Fedora 16 ships with v8 (I'm not sure how far it goes back, Fedora 15 at least), which makes installing PLV8 extremely easy. Here's what I did earlier today. It took about a minute. I already had an installed and running instance of Postgres where I wanted PLV8 installed. So I did this: cd inst.json
sudo yum install v8 v8-devel
hg clone https://code.google.com/p/plv8js/
cd plv8js
PATH=../bin:$PATH make USE_PGXS=1
PATH=../bin:$PATH make USE_PGXS=1 install
cd ..
bin/createdb testplv8
bin/psql -c 'create extension plv8; create language plv8;' testplv8
Pretty simple, very quick.
People seem to be getting very excited about JSON in 9.2. So I just tried using the new type in combination with PLV8. It seems to work pretty well. Let's say we want to index a field in our JSON object called "x", so when we query on it the query will run fast. No problem. Here's a very simple function that gets the member out of the JSON:
CREATE or replace FUNCTION jmember (j json, key text )
RETURNS text
LANGUAGE plv8
IMMUTABLE
AS $function$
var ej = JSON.parse(j);
if (typeof ej != 'object')
return NULL;
return JSON.stringify(ej[key]);
$function$;
In reality we'd want something a bit more sophisticated than this, but you can get the idea from this. Armed with this function we could now create our index, using the functional index feature of PostgreSQL: CREATE INDEX x_in_json ON mytable (jmember(jsonfield,'x'));
Now, when we issue a query like SELECT *
FROM mytable
WHERE jmember(jsonfield,'x') = 'foo';
It should be able to use the index. This is reasonably analogous to a very simple use of MongoDB's ensureIndex() function.
We could make this somewhat nicer by providing some operators, and maybe building in a function like this, but the fundamental idea should work pretty much the same.
Wednesday, January 18. 2012
We don't provide any special indexing for XML on PostgreSQL - in fact there are no comparison operators defined for the type at all - and the current JSON patch won't provide anything special there either. But I have been wondering exactly what sort of indexing might be useful for tree structured objects. For the most part I'm inclined to think that these should be treated as singleton objects where we don't need to search on them (and if we do then the database is probably very badly structured). At least that's how I use them. For example, I can imagine storing a web session object as XML or JSON. But I'm going to know the session ID and store that as the key of the session table. I should never need to search for a session by its content, only by its ID. But let's say we did need to. What sort of operators would we use to index the data?
I'm not backing out my blog or any other site in protest against SOPA/PIPA. But I have signed Google's petition, and shared stuff about it on Facebook. So should you. If you are a twitterer, then tweet about it too.
A few people have mentioned an hstore to JSON transformation to me. I don't think it's possible to get a perfect fit, but something like this gets fairly close, and will work for many cases:
create or replace function hstore_to_json(h hstore) returns text language sql
as $f$
select '{' || array_to_string(array_agg(
'"' || regexp_replace(key,E'[\\"]',E'\\\&','g') || '":' ||
case
when value is null then 'null'
when value ~ '^true|false|(-?(0|[1-9]\d*)(\.\d+)?([eE][+-]?\d+)?)$' then value
else '"' || regexp_replace(value,E'[\\"]',E'\\\&','g') || '"'
end
),',') || '}'
from each($1)
$f$;
Tuesday, January 17. 2012
Somebody has linked to my earlier blog post with a link title saying the JSON type would definitely be in release 9.2 of PostgreSQL. I haven't said that. I said I hoped it would be. It hasn't been committed yet, and we still have some issues to sort out. People need to be more careful about this sort of thing.
Monday, January 16. 2012
On Wednesday, four days before the start of the final commitfest for release 9.2 of PostgreSQL, Robert Haas published YA patch to include JSON as a core type. Basically, his patch just parses the text to make sure it was valid JSON, and stores it as text. I'd just about given up on getting this into release 9.2, but I thought his patch was just a bit too minimal, so I put on the running shoes to add in some functionality to produce JSON from the database: query_to_json(), array_to_json() and record_to_json(). Yesterday I put out this patch, extending his, right on the deadline for this commitfest and thus for this release. A few simple examples from the regression tests:
SELECT query_to_json('select x as b, x * 2 as c from generate_series(1,3) x',false);
query_to_json
---------------------------------------------
[{"b":1,"c":2},{"b":2,"c":4},{"b":3,"c":6}]
(1 row)
SELECT array_to_json('{{1,5},{99,100}}'::int[]);
array_to_json
------------------
[[1,5],[99,100]]
(1 row)
-- row_to_json
SELECT row_to_json(row(1,'foo'));
row_to_json
---------------------
{"f1":1,"f2":"foo"}
(1 row)
and a slightly less simple example:
SELECT row_to_json(q)
FROM (SELECT $$a$$ || x AS b,
y AS c,
ARRAY[ROW(x.*,ARRAY[1,2,3]),
ROW(y.*,ARRAY[4,5,6])] AS z
FROM generate_series(1,2) x,
generate_series(4,5) y) q;
row_to_json
--------------------------------------------------------------------
{"b":"a1","c":4,"z":[{"f1":1,"f2":[1,2,3]},{"f1":4,"f2":[4,5,6]}]}
{"b":"a1","c":5,"z":[{"f1":1,"f2":[1,2,3]},{"f1":5,"f2":[4,5,6]}]}
{"b":"a2","c":4,"z":[{"f1":2,"f2":[1,2,3]},{"f1":4,"f2":[4,5,6]}]}
{"b":"a2","c":5,"z":[{"f1":2,"f2":[1,2,3]},{"f1":5,"f2":[4,5,6]}]}
(4 rows)
We need to sort out a few encoding issues, but I'm now fairly hopeful that we'll have some very useful and fast JSON functionality in release 9.2.
Sunday, January 8. 2012
A customer asked me how to avoid statements timing out. Knowing that they use connection pooling, my answer was: begin;
set local statement_timeout = 0;
select long_running_function();
commit;
But they reported that it didn't work. I got it working on exactly the machine, database and account they were using. Before reading further, see if you can guess why it worked for me and not for them. Imagine the Jeopardy music playing while you think. You have 30 seconds.
...
OK, I'll tell you. Their client program submitted this whole block as one statement, presumably using libpq's PQexec() function. So the statement timeout in effect at the time the statement was submitted applied to the whole collection. My client breaks up multiple statements and sends them to the backend one at a time. So the code worked as expected for me. When I understood what was happening, I got them to do: set session statement_timeout = 0;
and send that to the server before sending their long running query, and suddenly everything worked as expected. This was safe to do because, although their main application uses connection pooling, the context in which they were running does not. (The pooler doesn't issue RESET copmmands.)
I hope you've guessed by now that the clients in question were psql for me and pgadmin for my customer. I find pgadmin a good tool for exploring a large database schema, but I much prefer to use psql as a tool for running queries, for just these sorts of reasons.
Friday, January 6. 2012
From time to time suggestions are made that the PostgreSQL project should use trackers to manage bugs and possibly feature requests. I have a lot of sympathy with these suggestions. But there has always been lots of pushback, along with significant disagreement about which tracker to use. Having done a bunch of work years ago to make Bugzilla platform independent, I have some fondness for it, but others hate it with a passion that seems way out of proportion to the perceived evil, so it's probably out of the question, if we ever did decide to use some sort of tracker.
Meanwhile, I encountered the Perl community's use of a tracker yesterday. David Wheeler encouraged me to file a bug about the recently discovered misbehaviour of a documented piece of the perl API. Accordingly, I ran perlbug and it told me that it had sent the bug report. Later he asked me if I had done so, as he couldn't find the report. First black mark. Then I went and looked at the tracker's web interface. What I saw was just horrible. For perl 5 there are 259 "new" bugs, (the oldest 8 years old, which isn't "new" in my book) and 1149 open bugs. And my bug, which their own program told me had been successfully submitted, didn't seem to be there. What's the point in having such a system? It's worse than useless. Trackers require effort to maintain. As I remarked to David, it's no wonder that there is significant resistance to using them when we have horrible examples like this one.
Update
Apparently my bug is sitting in a moderation queue. So I'm glad it's not lost anyway.
Wednesday, January 4. 2012
Around this time of year I generally take a bit of time to think what I want to work on during the coming year, and what I want to write talks on for conferences. I have a couple of things I need to pay some attention to: - PLV8
- Cleaning up my text file Foreign Data Wrappers
Beyond that, some of the things I want to see done and I have some interest in working on include, in no particular order: - LATERAL subqueries
- Window functions for PL/Perl
- GROUPING SETS including CUBE and ROLLUP
- making a better API for and rewriting xpath_table()
- retail CREATE statement generation for any object
Of course, I can't work on all of these or even most of them. But with any luck I'll work on one or two (I've begun in one case) and with a liitle more luck I'll find subjects there for a talk or two.
Oh, and I really wish someone (not me) would work on MERGE, or at least the REPLACE/UPSERT piece of it, but it seems to be something of a graveyard for development efforts. It's by far the most important missing feature, in my far from humble opinion.
Tuesday, January 3. 2012
Apparently the fact that the buildfarm sends out email status notifications is one of the community's best kept secrets. It was one of the earliest features. There are four mailing lists for status that can be subscribed to, according to your taste. Please visit the lists page on pgFoundry if you want to find out when things break (or get fixed).
|