Parallel systems

Request a supercomputer!

Definition
A parallel computing system is a computer with more than one processor used for parallel processing. In the past, in a multiprocessor system, each processor was placed in a separate package, but today, with the introduction of multi-core chips, multiple processors are located together in one package. Currently, there are many different types of parallel computers that differ based on the type of connections between processors and memory. Flynn’s classification, one of the most accepted classifications for parallel computers, classifies parallel computers based on whether they include processors that all simultaneously execute the same instruction on different data (Single Instruction, Multiple Data – SIMD) or each processor executes different instructions on different data (Multiple Instruction, Multiple Data – MIMD).

Parallel Processing vs. Concurrency and Multitasking
Concurrency means that several tasks exist, and they are executed simultaneously, but a task can be interrupted, causing a pause in the overall process. Concurrency arises in computing systems where multiple computational processes run simultaneously and interact with each other (they have critical regions). The study of concurrency covers a wide range of systems, from tightly-coupled, highly concurrent parallel systems to loosely-coupled, asynchronous distributed systems. On the other hand, in parallel processing, when a main task is divided into smaller sub-tasks, these tasks can be independent. For example, if two threads or processes are running simultaneously on a single processor core, this is concurrency, but if they run on two processor cores, this is parallelism.

Multitasking refers to the simultaneous execution of two or more computer tasks by the central processing unit (CPU).
The process works as follows: 1- The processor receives an interrupt signal. 2- After receiving the interrupt signal, the processor stops its current task and saves the work done up to that point to continue later from the same point. 3- The processor then handles the device or program that requested the interrupt and processes the call. 4- After processing, the processor issues a scheduling interrupt.

Parallel Processing
Parallel processing is the simultaneous execution of a process, typically by dividing the processing tasks across multiple processors to improve efficiency and speed in reaching a solution. Sometimes, time-sharing techniques in a single processor are mistakenly considered parallel processing (multiple processes running simultaneously on a single processor). The idea is that a problem can generally be divided into smaller sub-problems that can be solved concurrently and later merged to produce a faster result.

The benefits of parallel processing over serial processing (the traditional method) include reduced computation time, the ability to solve larger problems, overcoming memory limitations, cost-effectiveness, and the use of modern technology.

Advantages of Parallel Processing
Some advantages of supercomputing systems, which are the primary driver of their rapid growth, include:

- Very low cost-to-performance ratio

- Affordable and accessible hardware and software

- Simple maintenance

- System scalability to meet growing demands

- Upgradable systems

- High system uptime and service availability

- Reduced execution time in simulations and solving practical problems

- Expanded research scope

- The ability to solve larger and more complex problems

- Use of input/output (IO) systems in many machines (such as distributed databases)

Parallel Programming
Parallel programming was created to make better use of system resources and increase the speed and performance of programs running on processors. In parallel programming, parts of the main program that can be executed simultaneously (concurrently) are divided into subprograms and run concurrently on multiple processors or threads. Parts of the program that cannot be parallelized are executed sequentially on one processor. The main difference between sequential and parallel programming is this division, though several other concepts arise that are not typically addressed in regular programming.

One primary reason for using parallel programming is to increase program execution speed, but single-core processors have the following limitations:

- Increasing the number of transistors in a processor to achieve higher speed increases power consumption, which can cause overheating if the number of transistors is too high.

- Even with powerful processors, each memory access requires multiple processor cycles, and memory access speed limits will cause the processor to wait for reading or writing to memory, thus underutilizing its power consumption.

History

- 1950s: The idea of using multiple instructions, multiple data (MIMD) dates back to 1954 with the introduction of the IBM 704, the first commercial computer for floating-point arithmetic calculations. In April 1958, S. Jil (Franti) introduced the concept of branch and wait in parallel programming.

- 1960s: In 1962, Burroughs introduced the D825 computer with 4 processors and 16 memory modules. In 1967, Amadal and Slotnik presented Amdahl’s law at a conference in the USA, which discusses the limitations of increasing speed via parallelization.

- 1980s: The first modern single instruction, multiple data (SIMD) machine was introduced in 1987, reconstructed by Denny Hills and Shirley Handler.

- 1990s: SIMD computers gained popularity, and OpenMP was introduced in 1997 to provide an API for parallel programming in Fortran. By 1998, the C/C++ version of OpenMP was available.

- 2000s to present: OpenMP has been improved, with versions 2.0 for Fortran in 2000 and 2.0 for C++ in 2002. Versions 2.5 and 3.0 were released in 2005 and 2008, respectively, and version 4.0 in 2012.

What is a computing cluster and what is its application?

Interprocess Communication
In parallel programming, processes need to communicate with each other, and the following methods are used:

- Shared Memory

- Message Passing

- Implicit Model

Shared Memory
In shared memory, parallel tasks communicate through a shared address space, which allows for asynchronous reading and writing. Synchronous access to these addresses requires mechanisms such as locks, semaphores, and monitors.

Message Passing
In this method, parallel tasks exchange data via messages, which can be either synchronous or asynchronous. In asynchronous communication, the sender sends the message without waiting for the receiver to be ready.

Implicit Model
In this model, communication between tasks is handled without the programmer’s involvement; the compiler manages it.

Principles of Parallel Programming
To find sufficient parallelism in a program (according to Amdahl’s law), the program must be divided into parallel and serial parts in such a way that the overhead introduced by dividing tasks across threads/processors is less than the benefits gained from parallelizing the program.

Granularity
When dividing tasks, careful attention must be given to the size of the parts that will run in parallel. Too many small tasks will lead to high overhead, and too large tasks will essentially run sequentially, reducing speed improvements.

Locality
High-volume memory has slower access speeds, while low-volume memory is faster. Programmers should ensure that algorithms primarily operate on data in local memory to improve performance.

Load Imbalance
Load imbalance occurs when some processors do not perform tasks during certain times due to insufficient parallelism or uneven task distribution. Load balancing can be static or dynamic during runtime.

Synchronization
Some parallel algorithms require synchronization of processors at certain points, such as after each iteration, to share intermediate results. One method for synchronization is using barriers, where processes wait for all others to reach the barrier before proceeding.

Race Conditions
Race conditions occur when multiple tasks concurrently access resources, leading to errors. These errors are often non-deterministic and difficult to detect. Using hardware or software locks can prevent race conditions.

Parallel Programming Tools
With tools, the programmer can design the parallel execution of the program, manage shared variables, input/output dependencies, communication between threads or processes, and decide how calculations, variables, and objects are distributed across the system.

Shared Memory Programming Tools

- POSIX Threads (Pthreads): A set of standard C libraries for multi-threaded parallel programming, where threads share the same address space, and synchronization of memory access is managed by the programmer.

- OpenMP: A parallel programming API for shared memory systems, supporting C, C++, and Fortran. It provides simple annotations for parallelizing serial code.

Distributed Memory Programming Tools

- Message Passing Interface (MPI): A widely used parallel programming interface for distributed memory systems such as clusters, optimizing for high-speed execution.

Parallel Programming Languages
Languages, libraries, and models for parallel programming are created for different memory architectures: shared, distributed, or hybrid. These include special programming interfaces for handling memory and parallel computation tasks.

Click to see the supercomputer information table and request one based on your need!

Parallel systems

Contact