Project

General

Profile

Milestone #1332

New layout is done

Added by Jun He over 8 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Start date:
05/29/2011
Due date:
% Done:

0%

Estimated time:
Duration:

Description

The MILC layout has been shifted from old one to the new machine-aware layout.

Old layout of MILC

How rank is assigned

Assume (x,y,z,t) are the coordinates of a subvolume. The rank is :

i = x + nsquares[XUP]*( y + nsquares[YUP]*( z + nsquares[ZUP]*( t )));     in node_number()

nsquare[e] is the number of subvolumes in direction e. Intuitively, the ranks of (x,y,z,t) and (x,y,z,t+1) are probably far away each other.

Assume MPI assigns ranks to machines by slots (byslot option tells MPI to use all slots on an available node before allocating resources on the next available node. It is the default of many MPI implementations.)

Then, obviously, (x,y,z,t) and (x,y,z,t+1) are not put in the same machine. Because only ranks that are close to each other will be put in the same rank.

New machine-aware layout of MILC

The goal of the new layout is to try to put more adjacent subvolumes together in the same machine.

Group

A group is a set of MPI ranks. The ranks in the same group will be put on the same machine and the communications among them do not need to go over Infiniband. A group equals a machine. The size of a group equals number of cores on a machine.

A group is a hype cube. Groups are organized as a bigger hyper cube. So subvolumes can fit into groups.

There are several group configurations:

groupsize[e] is the number of ranks in direction e in one group. groupsize[X]* groupsize[Y]* groupsize[Z]* groupsize[T]=number of cores on a machine.

ngroups[e] is the number of groups in direction e.

ngroups[X]*groupsize[X]+ngroups[Y]*groupsize[Y]+ngroups[Z]*groupsize[Z]+ngroups[T]*groupsize[T] = total number of ranks.
Subvolume (x,y,z,t) ~ (x+groupsize[X]-1, y+groupsize[Y]-1, z+groupsize[Z]-1, t+groupsize[T]-1) are put into one group.



Also available in: Atom PDF