Beng Chin OOI
epiC: Elastic Power-Aware Data Intensive Cloud
To reduce the performance limitations of MapReduce-based systems in dealing with a variety of workloads like OLAP, data intensive analytical jobs that are long running, and OLTP, workloads consisting of online transactions that demand short response time.
We have made several advances including:
1) Development of a novel elastic storage system (ES2) that employs vertical partitioning to group columns that are to be accessed together, and horizontal partitioning to further split these column groups across a cluster of nodes;
2) Development of an elastic execution engine (E3) that, while having a simpler communication model than Dryad, can better support multi-stages jobs better than MapReduce by avoiding reprocessing intermediate results, instead adopting a stage-based evaluation strategy and collocating data and user-defined (map or reduce) functions into independent processing units for parallel execution. E3 also supports block-level indexes, and built-in functions for specifying and optimising data processing flows;
3) Development of a novel cloud-based indexing structure supporting structured data like B+-tree and bitmap indexes and multidimensional data like R-tree index have been developed; and
4) Development of ecStore, a system that supports automated data partitioning and replication, load balancing, efficient range query processing and transactional access, while exploiting multi-version optimistic concurrency control and the provision of adaptive read consistency on replicated data.