Memory segments

The ability to safely access memory physically resident on another machine is the fundamental characteristic and the strength of the SISCI technology. If the remote memory is mapped in the addressable space of a local process, thus appearing as if it were local, a data transfer is as simple as a normal memcpy(). Memcpy() is typically implemented as a sequence of CPU instructions that will use CPU load or store instructions to send or fetch data from remote memory. This transfer method is known as Programmed I/O (PIO). Alternately, the SISCI Application programmer can use the Direct Memory Access (DMA) approach, whereby the CPU simply gives instructions to the network interface DMA controller about the transfer (for example, source address, destination address and size), which makes it free to do other things in parallel with the data transfer.

Both cases require a way to manage local memory segments on one side and a way to attach to remote memory segments on the other side.

In this chapter, you will learn:

  • How to allocate a memory segment on the local node.

  • How to make a local segment available to other nodes.

  • How to connect to a memory segment available on a remote node.

Managing local segments

Allocating memory space

The allocation of a segment on the local host is done with the function SCICreateSegment(). The main reason to have a special function for the allocation instead of a normal malloc()-like call is that the driver must be aware of the created segment and associated parameters. Moreover, most operating systems require that memory used in the way SISCI uses it has specific characteristics, such as being non-swappable and/or being physically contiguous. Different hardware implementations may have different requirements; using SCICreateSegment() ensures the implementation details can be hidden from the user, and ensures portability.

A typical usage of SCICreateSegment()is as follows, assuming we are on the sender node:

sci_desc_t v_dev;
sci_local_segment_t local_segment;
sci_error_t error;

SCIInitialize(...);
SCIOpen(&v_dev,...);

SCICreateSegment(v_dev, /* virtual device */
                 &local_segment, /* handle to the allocated segment */
                 RECEIVER_SEG_ID, /* segment identifier */
                 RECEIVER_SEG_SIZE, /* size */
                 NO_CALLBACK, /* ignore this for the moment */
                 NO_ARG, /* callback arg, ignore */
                 NO_FLAGS,
                 &error);

if (error == SCI_ERR_OK) {
    /* a segment is available for use */
} else {
    /* manage error */
}

Let us look at the different parameters:

v_dev is the virtual device, as it comes from the SCIOpen() call shown previously.

local_segment is a handle to the memory segment to be allocated. It will be initialized if the call is successful. The segment identifier, which in this case is the constant RECEIVER_SEG_ID, is an integer that uniquely identifies the segment to the driver. A remote node that wants to use this segment needs to know the value of this identifier. The driver checks the uniqueness of the identifier, so applications should choose appropriate values. RECEIVER_SEG_SIZE is the size of the segment that will be allocated.

The specification of a callback allows triggering the execution of a certain function when something happens concerning this segment. This option is covered later in the Events and callbacks chapter.

As usual, error contains an error code, which, if representing failure, gives a hint about the cause. Typical errors for such functions are the non-uniqueness of the segment identifier and the unavailability of free space.

Deallocating a memory segment

When a memory segment is no longer needed, for example, before quitting the application, it should be destroyed in order to release unneeded resources.

The way to do it is via the function SCIRemoveSegment():

sci_error_t error;
sci_local_segment_t segment;

SCICreateSegment(..., &segment, ...); /* initialization */

/* use the segment */

SCIRemoveSegment(segment, NO_FLAGS, error);

if (error == SCI_ERR_OK) {
    /* the resource is freed */
} else {
    /* manage error */
}

SCIRemoveSegment() normally succeeds, if not, the SISCI driver will release left-over resources when the application terminates. A typical cause of failure for this call is the dependency of other resources on this segment. SCIRemoveSegment() succeeds even if there are remote processes still connected to the local segment. In such a case, the segment is kept available only for the connected processes until they disconnect.

Making a segment available

Once a segment has been allocated, it needs to be made visible to local processes or the other nodes connected to the network. This export operation is performed by calling two different functions in sequence.

The first one is SCIPrepareSegment(). Logically, its goal is to map the segment into the 64-bit network address space shared by all the nodes in the network. What it does in practice is to make sure that the segment can be correctly accessed by the specified network adapter. This includes guaranteeing the requirements possibly set by the operating system, like non-swappability and physical contiguity.

The second step is performed by SCISetSegmentAvailable(), which makes the segment visible to the other nodes via the specified adapter.

sci_error_t prepare_error;
sci_error_t avail_error;
sci_local_segment_t segment;

SCICreateSegment(..., &segment, ...); /* initialization */

SCIPrepareSegment(segment,
                  NO_FLAGS,
                  &prepare_error);

if (prepare_error == SCI_ERR_OK) {

    SCISetSegmentAvailable(segment,
                           ADAPTER_NO,
                           NO_FLAGS,
                           &avail_error);

    if (avail_error == SCI_ERR_OK) {
        /* segment is now available to the remote nodes */
    } else {
        /* manage availability error */
    }

} else {
    /* manage preparation error */
}

As shown in the code above, the preparation and the availability of a memory segment are per adapter.

Typical errors for the two functions above are mainly related to problems with the specified adapter (for example, it does not exist, or the segment specified in SCIPrepareSegment() doesn’t match with the one specified in SCISetSegmentAvailable()).

Making a segment unavailable

SCISetSegmentUnavailable() changes the visibility of an exported segment, preventing new remote connections through the specified adapter.

sci_local_segment segment;
unsigned int local_adapter = 0;
sci_error_t error;

SCICreateSegment(..., &segment, ...);
SCIPrepareSegment(segment, ...);
SCISetSegmentAvailable(segment, ...);

/* use the segment */

SCISetSegmentUnavailable(segment, ADAPTER_NO, NO_FLAGS, &error);
if (error == SCI_ERR_OK) {
    /* the segment in not available for new remote connections */
} else {
    /* manage error */
}

Calling SCISetSegmentUnavailable() doesn’t affect existing remote connections, as they are not aware of the change. SCISetSegmentAvailable() could then be used, for example, to make a segment available only to a certain number of remote nodes. It would work like this:

SCISetSegmentAvailable(segment, ...);

/* wait for n remote nodes to connect */

SCISetSegmentUnavailable(segment, ...);

We will see later that there are ways to determine when a remote node has connected.

A state machine for a local segment resource

../../_images/image2.png

State machine for a local segment” below shows the life of a local segment.

Some additional comments:

  • NOT PREPARED means that the segment has been allocated successfully but that is not yet prepared to be used by a network adapter.

  • SCIRemoveSegment() is a legal operation from any state. This does not mean that it will succeed (e.g. it will not if there is a dependency on it).

  • It is not possible to “un-prepare” a segment.

Managing remote segments

Imagine the following scenario: A process on the receiver node, with node id RECEIVER_NODE_ID, has allocated and made a piece of memory available, with segment id equal to RECEIVER_SEG_ID.

The sender node has node id SENDER_NODE_ID. We would like to have a process that uses the memory segment on the receiver node.

It is worth clarifying some nomenclature to better understand the next sections:

  • A local segment is allocated on the receiver node.

  • From the sender node perspective, the memory segment allocated on the receiver node is a remote segment which is locally represented by a remote segment resource.

So a local segment and a local segment resource live on the same node, whereas a remote segment and a remote segment resource live on different nodes.

Connecting to a remote segment

The remote application has to “connect” to a remote segment. Logically speaking the connection process consists in finding the address and the size of the remote segment within the network address space.

This is achieved by calling the function SCIConnectSegment():

sci_desc_t v_dev;
sci_remote_segment_t remote_segment;
sci_error_t error;

SCIInitialize(...);
SCIOpen(&v_dev, ...); /* initialize the virtual device */
SCIConnectSegment(v_dev, /* virtual device */
                  &remote_segment, /* handle to the remote segment resource*/
                  RECEIVER_NODE_ID, /* remote node id */
                  RECEIVER_SEG_ID, /* remote segment id */
                  ADAPTER_NO, /* local adapter number */
                  NO_CALLBACK, /* ignore this for the moment */
                  NO_ARG, /* callback arg, ignore */
                  SCI_INFINITE_TIMEOUT, /* timeout */
                  NO_FLAGS,
                  &error);

if (error == SCI_ERR_OK) {
    /* the remote segment is connected */
} else {
    /* manage error */
}

Let us look at the list of parameters of the above call:

v_dev represents an initialized virtual device, needed to communicate with the driver.

remote_segment is a handle to a sci_remote_segment descriptor, which contains information about the

connection. The descriptor is allocated and initialized by the call, if successful.

The remote node and segment identifiers (RECEIVER_NODE_ID and RECEIVER_SEG_ID respectively) uniquely locate a memory segment on the network.

The adapter number ADAPTER_NO refers to the local adapter we want to use to access the remote segment.

The SCIConnectSegment() call is by default synchronous. It waits until either the connection request is resolved, or the specified timeout, expressed in milliseconds, expires. Only the first option can cause the function to return if the timeout is infinite, as shown above. The call can be made asynchronous; this behavior is described in the Events and callbacks chapter.

Once the remote segment is connected, its size (expressed in bytes) can be determined by calling the function

SCIGetRemoteSegmentSize():

sci_remote_segment_t remote_segment;
size_t remote_segment_size;

SCIConnectSegment(..., &remote_segment, ...);

remote_segment_size = SCIGetRemoteSegmentSize(remote_segment);

Disconnecting from a remote segment

Once an application has finished using a remote segment, it should disconnect from it, releasing the remote segment resource.

sci_remote_segment_t segment;
sci_error_t errror;

SCIConnectSegment(..., &segment, ...);

/* use the segment */

SCIDisconnectSegment(segment, NO_FLAGS, &error);
if (error == SCI_ERR_OK) {
    /* the remote segment resource is released */
} else {
    /* manage error */
}

A typical error for SCIDisconnectSegment() is SCI_ERR_BUSY, meaning that there are other resources depending on the remote segment resource.