Key Research Highlights > Databases & Big Data Analytics > MapReduce-based
Data Processing Systems
Beng Chin OOI |
MapReduce-based Data Processing Systems
Objective To allow users of MapReduce-based systems to keep the programming model of the MapReduce framework while empowering them with data management functionalities at an acceptable performance by minimising initialisation overheads. Results We have met the objective of this project through 1) careful tuning of key design factors in MapReduce (Hadoop) resulting in improved performance; and 2) the development of a query processing engine under the MapReduce framework with join algorithms at the operator level including MapReduce-based similarity (kNN) join to minimise the number of objects sent to the reduce node minimising computation and communication overheads; schemes for processing multi-join queries efficiently exploiting replication to expand the plan space; an automatic query analyser that accepts an SQL query, optimises it and translates it into a set of MapReduce jobs; and Concurrent Join to support multi-way join over partitioned data. |