Wednesday, January 18. 2012Tree climbingTrackbacks
Trackback specific URI for this entry
No Trackbacks
Comments
Display comments as
(Linear | Threaded)
We have a fairly large corpus of XML documents stored as a field in a table. There exists some date redundancy between attribute-type data encoded in the files (authorship, pub date, etc.) which I break out into typical fact tables and query there. Where the existing xpath() functionality has saved me is to answer questions like "how are people using the media tag?" (what are all the src attributes of media tags?) These are ad-hoc, so might not pay for the cost of maintaining the index. Though I do find myself extracting this data to temp tables for further analysis. Sufficiently fast indexing would save me having to shred certain parts of the xml into tables, I'd think. In theory, I haven't tried this in practice, they are already indexable in the same way that the NoSQL databases index JSON documents (e.g. CouchDB) thanks to Postgres's ability to index the output of a function. For XML you could index an xpath query against that document, for JSON you would use a custom function unless someone created something like xpath for JSON. For JSON replicating something similar to MongoDB's indexing would be fabulous: http://www.mongodb.org/display/DOCS/Indexes Indexing a particular xpath query really isn't going to do it. The idea would be to assist arbitrary xpath or similar queries, and similarly for JSON queries, in the same way that we assist arbitrary FTS queries by indexing tsvectors. I have no idea how to do that though, which is why I asked the question. While a full xml/json index to assist arbitrary queries would be really cool, I don't think anyone has done that nor is it likely to be critical. What I would want is the ability to index key value locations (xpath/jsonpath) within a document. Andrew,<br /> Well, AIUI Mongo indexes certain fields named in the ensureIndex() first argument, rather than providing for indexed lookup on arbitrary paths. This would be relatively easy to do, I think. We could possibly even do it initially as an extension. Indexing arbitrary paths, for queries a la xpath or JSONpath would be lots harder, I suspect. Maybe we need to start with the simple case, and we could find that that will work well enough for a large number of users. I agree -- start with the simple (yet very useful) case and then worry about the complex case if necessary. |
My Links etcBlog AdministrationCalendar
QuicksearchArchivesCategories |
|||||||||||||||||||||||||||||||||||||||||||||||||