VTT'S FRONTIER PROJECT
KEYWORDS: Computer architecture, parallel computing, models of
computation, parallel programming languages, compiling, optimizing,
application software, performance measurement, FPGA prototyping, thread-level
parallelism, instruction-level parallelism, general purpose computing
Current CMPs architectures (SMP, NUMA, CC-NUMA, MP, VC) are tedious to program
and often provide poor speedup compared to conventional sequential (single
core) processors. This is because of lack of fast synchronization and latency
hiding mechanism, i.e. weak models of computation.
The removing performance and programmability limitations of chip
multiprocessor architecture (REPLICA) project aims developing the CESM
architecture and methodology that would enable radically easier programming
and higher performance with a help of the PRAM model of computation.
REPLICA is a 3-year (2011-2013) project funded by VTT with total budget of 1.4
M€. VTT collaborates with University of Linköping, Sweden, and University of
REPLICA refers to replicating the processing resources and
programming/data structures in a smart way to provide radically better
performance and programmability than current multicore computers.
Fig. 1. The PRAM-NUMA model of computation of REPLICA.
In REPLICA we are developing a configurable emulated shared memory machine
(CESM) architecture and methodology that enables radically easier programming
and higher performance with a help of a strong parallel random access machine
(PRAM) model of computation. As a proof of concept, we are building a
prototype machine with selected I/O devices based on FPGA technology, develop
a programming language with compiling and optimization tools, and a
comprehensive set of sample applications that show the performance and ease of
New techniques and ideas to be employed in REPLICA include, but are not
implements an easy-to-program strong MCRCW PRAM model of computation via
multithreaded high-throughput computing
threads of within processors can be combined to mimic non-uniform memory
access (NUMA) to support efficient execution of sequential/NUMA legacy code
efficient wave synchronization dropping the cost of synchronization from
O(100) down to O(1/100)
supports multiple levels and models of parallelism—data, subgroup, and task
parallelism at high level and virtual instruction-level parallelism at low
uses source-to-source translation, low-level virtual machine and virtual ILP
optimization to implement optimizing compiler supporting
New knowledge, solutions, architecture, intellectual property for companies
that design, manufacture, and exploit massively parallel computing solutions
in their products.
The results of this project have potentially huge impact on industry and the
way how future parallel computers are programmed.
We are able to drop the cost of synchronization from O(100) down to
O(1/100) with a help of throughput computing and synchronization wave
technique (see Fig. 2).
Fig. 2. Dropping the cost of synchronization from O(100) down to O(1/100).
This www page describes the main part of the REPLICA project by introducing
the on-going work on REPLICA architecture, REPLICA programming language,
optimizing compiler for it, hardware prototype, application software,
publications, and people behind REPLICA.