We don't provide any special indexing for XML on PostgreSQL - in fact there are no comparison operators defined for the type at all - and the current JSON patch won't provide anything special there either. But I have been wondering exactly what sort of indexing might be useful for tree structured objects. For the most part I'm inclined to think that these should be treated as singleton objects where we don't need to search on them (and if we do then the database is probably very badly structured). At least that's how I use them. For example, I can imagine storing a web session object as XML or JSON. But I'm going to know the session ID and store that as the key of the session table. I should never need to search for a session by its content, only by its ID. But let's say we did need to. What sort of operators would we use to index the data?
We have a fairly large corpus of XML documents stored as a field in a table. There exists some date redundancy between attribute-type data encoded in the files (authorship, pub date, etc.) which I break out into typical fact tables and query there. Where the existing xpath() functionality has saved me is to answer questions like "how are people using the media tag?" (what are all the src attributes of media tags?) These are ad-hoc, so might not pay for the cost of maintaining the index. Though I do find myself extracting this data to temp tables for further analysis. Sufficiently fast indexing would save me having to shred certain parts of the xml into tables, I'd think.
In theory, I haven't tried this in practice, they are already indexable in the same way that the NoSQL databases index JSON documents (e.g. CouchDB) thanks to Postgres's ability to index the output of a function. For XML you could index an xpath query against that document, for JSON you would use a custom function unless someone created something like xpath for JSON.
Indexing a particular xpath query really isn't going to do it. The idea would be to assist arbitrary xpath or similar queries, and similarly for JSON queries, in the same way that we assist arbitrary FTS queries by indexing tsvectors. I have no idea how to do that though, which is why I asked the question.
While a full xml/json index to assist arbitrary queries would be really cool, I don't think anyone has done that nor is it likely to be critical. What I would want is the ability to index key value locations (xpath/jsonpath) within a document.
I'm at SCALE right now and folks are very excited about JSON support in Postgres ... I've had several people approach me about it. So I've asked them the indexing question. Pretty much universally, they want indexed MongoDB/Xpath support for arbitrary path/value queries; that is, the ability to issue a query which has a path and a key and have an indexed lookup. More on this later ...
Well, AIUI Mongo indexes certain fields named in the ensureIndex() first argument, rather than providing for indexed lookup on arbitrary paths. This would be relatively easy to do, I think. We could possibly even do it initially as an extension. Indexing arbitrary paths, for queries a la xpath or JSONpath would be lots harder, I suspect. Maybe we need to start with the simple case, and we could find that that will work well enough for a large number of users.