Apr 21, 2019

I always wonder why not going one step further? Nowadays random reads on SSDs are so much cheaper and I don't think the end is just a huge B+- or LSM tree, which has to cluster everything and store so much data redundantly (transaction time). I would say that restoring a specific revision isn't really efficient.

I'm still working on a project which had been started around 2006 by Marc Kramis (his Ph.D. work) and where I began work on around 2007 :-) we borrowed quiet some ideas from ZFS mainly (as well as from Git now) and putted them to test on a sub-file level and added our own stuff as for instance record-level versioning via a sliding snapshot algorithm.

There's still a lot to do, but I'm able to store and query revisions of both XML as well as JSON now via XQuery and I'll look into (cost based) query optimizations and partitioning/replication next. I know that it has been probably crazy to write a storage manager from scratch, but I think Marc's ideas are pretty good and I added my own ideas and Sebastian Graf, another Ph.D. student back then also did a lot of work on the project, just as many other students. Maybe I'm just crazy to keep working on it almost daily now besides my day to day software engineering job, but yeah... I guess you have to be a bit too convinced and too dedicated to something, maybe (even though sadly I don't know if anyone tried it lately) ;-) maybe I need to contact Marc after all this time again :-)

https://sirix.io/concepts.html

http://pubsys.mmsp-kn.de/pubsys/publishedFiles/Kramis2014.pd...