The REPLICA language systems consists of an easy-to-use high-level parallel
programming language REPLICA, OS/library support including dynamic thread
management for OS tasks, skeletons and OS/REPLICA language support libraries,
a low-level baseline language with C-syntax but with e-/fork-style parallelism
concept, C-libraries, unoptimized MBTAC assembler for minimal REPLICA CMP
configuration, and optimized MBTAC assembler for the configuration at hand
(see Figure 1).
At high level, REPLICA supports three major forms of parallelism common
in parallel computing platforms—data, synchronous subgroup, and task
parallelism, while at low level virtual instruction-level parallelism is
provided as a compiler optimization regardless of the dependencies of the code.
Fig. 1. The Replica language system.
REPLICA DESIGN GOALS
The Replica language’s main design goals are ease of programmability, safety,
potential for automatic optimizations, and scalability of the parallel
computation – starting from simple instruction level operations to task level
parallelism and high level parallel patterns (skeletons).
The core set of low level parallel primitives in Replica resembles those in
the e [Forsell04] and Fork languages [Keller01]. The main mechanisms are
attributes for specifying the memory storage type of data (private or shared),
controlling the thread level parallelism with thread group concepts, switching
between the NUMA and PRAM modes, and finally the supported hardware level
parallel operation instructions.
In order to share data between threads, the language provides an explicit
attribute shared for tagging this data. At the programming language level this
makes the variable name refer to the same location on all threads of the
group. The shared data adheres to the synchronous CRCW PRAM semantics, which
means that unlike contemporary architectures, Replica can still guarantee a
deterministic memory model even in this case. However, low level
synchronization constructs, such as basic barriers for the threads in a group,
are still provided by the language to prevent erroneous use of shared data in
case of independent concurrent thread groups or processes.
The control flow inside a thread group is managed in Replica with the standard
control constructs. The language provides both synchronous and asynchronous
versions of all constructs. These automatically manage the synchronization and
splitting / joining of thread groups in case the control flow diverges. The
thread and group id can be accessed/inspected via machine intrinsics. Along
with these thread intrinsics, the language also directly maps to the
architecture specific set of arithmetic multioperations, which can provide
significant speed improvements in data parallel and synchronous code.
core feature set is adopted from the C language, but simplified to make the
language easier to parse and to analyze
preliminary set of low level parallel primitives in Replica resembles those in
the e and Fork languages
supports three major forms of parallelism common in parallel computing
platforms—data, subgroup, and task parallelism
[Forsell04] M. Forsell. E – A Language for Thread-Level Parallel Programming
on Synchronous Shared Memory NOCs. WSEAS Trans. on Computers, 3(3):807–812,
[Keller01] J. Keller, C. Keßler, and J. Träff, Practical PRAM Programming,
Wiley, New York, 2001.