Before making your application run in parallel, you should consider what your goal is. You must see the larger picture to avoid hard work with little gain. Using parallelization will improve performance but will always reduce overall efficiency of the system.
Performance is measured in time gained. This is good in general but sometimes it may be too expensive. If the server is already busy running many processes, starting a Fortran program that is consuming all the resources will slow down other applications that are already running in the background and are overheating the system, reducing performance.
Parallel processing is not always a good idea. A parallel program is wasting time, splitting a job in smaller parts to be run in parallel, then aggregating the partial results. This overhead can increase the total processing time. After putting effort in coding a faster program, you end up with a program that is slower than the single thread version.
Most blocking operations in a global process are I/O operations on diverse devices. Mechanical hard-disks & optical-disks are slow. Lately there are SSD storage devices that are much faster. Using better hardware can improve I/O performance significantly.
A good idea is to read data from one source and write data in parallel on different files. Having one file output/process will separate the processes and will improve performance. The problem is, after you finish processing you must aggregate the partial results back into a single file using a single thread.
Fortran compilers can target a specific platform. You can use compiler flags (options) to optimize the generated code and take advantage of special microprocessor features, specific to a platform that can improve performance of your application significantly. That may be a good alternative before using parallelization.
You can use Bash to run processes in parallel using the operating system multitasking capability. You can create a Bash script that starts different processes in the background. This is the most easy way to create a parallel process.
Map Reduce Model
Using this model, parallel processes can have different complexity levels and different duration. For large projects, you can organize processes using a job scheduler. You learn this technology in a Software Engineering course.
Interprocess communication is difficult. Also you have to use two programming languages. Data transfer is expensive and there is a lot of code that you must create to receive data from input, parse the data and create the output data.
Loops are computing intensive and can be run in parallel. You can use the Fortran compiler to parallelize loops using multi-threading. After the loop is finalized the main thread can aggregate the result. Not all loops can be executed in parallel. If a process is using nested loops, probably only the outer loop will be executed in parallel.
If you are not careful, running system processes simultaneously with loop parallelization will cause a competition for resources. This can overwhelm the server, and slow down all the processes. Maybe a better idea is to use multiple computers connected in a network if you need more computing power. This technique of parallelization is called Distributed Computing.
Some older versions of compilers may have automatic parallelization features. The idea was: you do not need to modify your program. The compiler can decide if your program can be optimized using parallel computing. This model of parallelization seams to be abandoned and replaced by explicit parallelization models.
For better control over parallelization you need to use compiler directives that trigger parallel code generators and enable specific parts of the application to run in parallel. You need skills to read, create and debug code designed for parallel execution.
Fortran has different methods of parallelization that can be used to increase process performance. Different compilers are implementing the parallelization standards in different ways. Fortran specification is hazy and unclear (on purpose). Here is a list of methods we have identified:
Fortran 2008 specification describes a new kind of loop. Do loop is augmented with keyword "CONCURRENT" or "concurrent". This enables more effective parallel execution of native Fortran code without the use of non-standard directives.
Declaring the loop concurrently enables the compiler to decide if the loop is good enough to be executed in parallel. The intention to run in parallel requires you to follow several restrictions, otherwise the compiler will not enable parallel execution.
In return for accepting these restrictions, a DO CONCURRENT might compile into code that exploits the parallel features of the target machine to run the iterations of the DO CONCURRENT construct without using any OpenACC or OpenMP directive.
! fortran fragment
integer,dimension(n) :: j, k
integer :: i, m
m = 10
i = 15
do concurrent (i = 1:n, j(i)> 0) local (m) shared (j, k)
m = mod (k(i), j(i))
k(i) = k(i) – m
end do
print *, i, m] ! expected 15 10
OpenMP (Open Multi-Processing) is an application programming interface (API) that supports multi-platform shared-memory. It consists of a set of compiler directives, library routines, and environment variables that enable run-time parallelization.
!$OMP PARALLEL DO DEFAULT(NONE) PRIVATE(i) REDUCTION(+:pi)
do i = 1, limit
pi = pi + (-1)**(i+1) / real( 2*i-1, kind=rk )
end do
!$OMP END PARALLEL DO
Fortran 2008 contains the coarray parallel. It is the first time that a parallel programming model has been added to the language as a standard feature, portable across all platforms. Compilers supporting the model are available or under development from all the major compiler vendors.
The coarray programming model consists of two new features added to the language, an extension of the normal array syntax to represent data decomposition plus an extension to the execution model to control parallel work distribution.
The coarray execution model is based on the Single Program Multiple Data (SPMD). A CAF program is replicated a number of times. Each copy has its own set of data objects and is named image. All images are executed asynchronously.
!coarray declaration
real :: a(n)[*]
complex :: z[0:*]
integer :: index(n)[*]
real :: b(n)[p, *]
real :: c(n,m)[0:p, -3:q, +3:*]
real, allocatable :: w(:)[:,:]
type(field),allocatable :: max[:,:]
Message Passing Interface (MPI) is a communication protocol for parallel programming. MPI is specifically used to allow applications to run in parallel across a number of separate computers connected by a network.
A distributed system consists of a collection of autonomous computers, connected through a network and distribution middleware, which enables computers to coordinate their activities and to share the resources of the system so that users perceive the system as a single, integrated computing facility.
program mpi
include 'mpif.h'
integer rank, size, ierror, tag, status(MPI_STATUS_SIZE)
call MPI_INIT(ierror)
call MPI_COMM_SIZE(MPI_COMM_WORLD, size, ierror)
call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierror)
print*, 'node', rank, ': Hello world'
call MPI_FINALIZE(ierror)
end program
Until Fortran is establishing the standard we can use one of these: Open MP or MPI. Selecting the right one is an Engineering decision. Here are some considerations:
Go back: Fortran Tutorial