Print Print Send link Bookmark and Share

Removing performance and programmability limitations of chip multiprocessor architectures

VTT'S FRONTIER PROJECT

KEYWORDS: Computer architecture, parallel computing, models of computation, parallel programming languages, compiling, optimizing, application software, performance measurement, FPGA prototyping, thread-level parallelism, instruction-level parallelism, general purpose computing

INTRODUCTION

Current CMPs architectures (SMP, NUMA, CC-NUMA, MP, VC) are tedious to program and often provide poor speedup compared to conventional sequential (single core) processors. This is because of lack of fast synchronization and latency hiding mechanism, i.e. weak models of computation.

The removing performance and programmability limitations of chip multiprocessor architecture (REPLICA) project aims developing the CESM architecture and methodology that would enable radically easier programming and higher performance with a help of the PRAM model of computation.

REPLICA is a 3-year (2011-2013) project funded by VTT with total budget of 1.4 M€. VTT collaborates with University of Linköping, Sweden, and University of Turku, Finland.

Fig. 1. The PRAM-NUMA model of computation of REPLICA.

REPLICA refers to replicating the processing resources and programming/data structures in a smart way to provide radically better performance and programmability than current multicore computers.












GOALS

In REPLICA we are developing a configurable emulated shared memory machine (CESM) architecture and methodology that enables radically easier programming and higher performance with a help of a strong parallel random access machine (PRAM) model of computation. As a proof of concept, we are building a prototype machine with selected I/O devices based on FPGA technology, develop a programming language with compiling and optimization tools, and a comprehensive set of sample applications that show the performance and ease of use.

New techniques and ideas to be employed in REPLICA include, but are not limited to

  • implements an easy-to-program strong MCRCW PRAM model of computation via multithreaded high-throughput computing
  • threads of within processors can be combined to mimic non-uniform memory access (NUMA) to support efficient execution of sequential/NUMA legacy code
  • efficient wave synchronization dropping the cost of synchronization from O(100) down to O(1/100)
  • supports multiple levels and models of parallelism—data, subgroup, and task parallelism at high level and virtual instruction-level parallelism at low level
  • uses source-to-source translation, low-level virtual machine and virtual ILP optimization to implement optimizing compiler supporting

IMPACTS

New knowledge, solutions, architecture, intellectual property for companies that design, manufacture, and exploit massively parallel computing solutions in their products.

The results of this project have potentially huge impact on industry and the way how future parallel computers are programmed.

Fig. 2. Dropping the cost of synchronization from O(100) down to O(1/100).

We are able to drop the cost of synchronization from O(100) down to O(1/100) with a help of throughput computing and synchronization wave technique (see Fig. 2).















CONTENTS

This www page describes the main part of the REPLICA project by introducing the on-going work on REPLICA architecture, REPLICA programming language, optimizing compiler for it, hardware prototype, application software, publications, and people behind REPLICA.


Additional information

Martti Forsell
Principal Scientist
+358 20 722 2278

Additional information

Martti Forsell
Principal Scientist
+358 20 722 2278