Memory segments¶
The ability to safely access memory physically resident on another
machine is the fundamental characteristic and the strength of the SISCI
technology. If the remote memory is mapped in the addressable space of a
local process, thus appearing as if it were local, a data transfer is as
simple as a normal memcpy(). Memcpy() is typically implemented as a
sequence of CPU instructions that will use CPU load or store
instructions to send or fetch data from remote memory. This transfer
method is known as Programmed I/O (PIO). Alternately, the SISCI
Application programmer can use the Direct Memory Access (DMA) approach,
whereby the CPU simply gives instructions to the network interface DMA
controller about the transfer (for example, source address, destination
address and size), which makes it free to do other things in parallel
with the data transfer.
Both cases require a way to manage local memory segments on one side and a way to attach to remote memory segments on the other side.
In this chapter, you will learn:
How to allocate a memory segment on the local node.
How to make a local segment available to other nodes.
How to connect to a memory segment available on a remote node.
Managing local segments¶
Allocating memory space¶
The allocation of a segment on the local host is done with the function
SCICreateSegment(). The main reason to have a special function for the
allocation instead of a normal malloc()-like call is that the driver
must be aware of the created segment and associated parameters.
Moreover, most operating systems require that memory used in the way
SISCI uses it has specific characteristics, such as being non-swappable
and/or being physically contiguous. Different hardware implementations
may have different requirements; using SCICreateSegment() ensures the
implementation details can be hidden from the user, and ensures
portability.
A typical usage of SCICreateSegment()is as follows, assuming we are on
the sender node:
sci_desc_t v_dev;
sci_local_segment_t local_segment;
sci_error_t error;
SCIInitialize(...);
SCIOpen(&v_dev,...);
SCICreateSegment(v_dev, /* virtual device */
&local_segment, /* handle to the allocated segment */
RECEIVER_SEG_ID, /* segment identifier */
RECEIVER_SEG_SIZE, /* size */
NO_CALLBACK, /* ignore this for the moment */
NO_ARG, /* callback arg, ignore */
NO_FLAGS,
&error);
if (error == SCI_ERR_OK) {
/* a segment is available for use */
} else {
/* manage error */
}
Let us look at the different parameters:
v_dev is the virtual device, as it comes from the SCIOpen() call shown
previously.
local_segment is a handle to the memory segment to be allocated. It will
be initialized if the call is successful. The segment identifier, which
in this case is the constant RECEIVER_SEG_ID, is an integer that
uniquely identifies the segment to the driver. A remote node that wants
to use this segment needs to know the value of this identifier. The
driver checks the uniqueness of the identifier, so applications should
choose appropriate values. RECEIVER_SEG_SIZE is the size of the segment
that will be allocated.
The specification of a callback allows triggering the execution of a certain function when something happens concerning this segment. This option is covered later in the Events and callbacks chapter.
As usual, error contains an error code, which, if representing failure, gives a hint about the cause. Typical errors for such functions are the non-uniqueness of the segment identifier and the unavailability of free space.
Deallocating a memory segment¶
When a memory segment is no longer needed, for example, before quitting the application, it should be destroyed in order to release unneeded resources.
The way to do it is via the function SCIRemoveSegment():
sci_error_t error;
sci_local_segment_t segment;
SCICreateSegment(..., &segment, ...); /* initialization */
/* use the segment */
SCIRemoveSegment(segment, NO_FLAGS, error);
if (error == SCI_ERR_OK) {
/* the resource is freed */
} else {
/* manage error */
}
SCIRemoveSegment() normally succeeds, if not, the SISCI driver will
release left-over resources when the application terminates. A typical
cause of failure for this call is the dependency of other resources on
this segment. SCIRemoveSegment() succeeds even if there are remote
processes still connected to the local segment. In such a case, the
segment is kept available only for the connected processes until they
disconnect.
Making a segment available¶
Once a segment has been allocated, it needs to be made visible to local processes or the other nodes connected to the network. This export operation is performed by calling two different functions in sequence.
The first one is SCIPrepareSegment(). Logically, its goal is to map the
segment into the 64-bit network address space shared by all the nodes in
the network. What it does in practice is to make sure that the segment
can be correctly accessed by the specified network adapter. This
includes guaranteeing the requirements possibly set by the operating
system, like non-swappability and physical contiguity.
The second step is performed by SCISetSegmentAvailable(), which makes
the segment visible to the other nodes via the specified adapter.
sci_error_t prepare_error;
sci_error_t avail_error;
sci_local_segment_t segment;
SCICreateSegment(..., &segment, ...); /* initialization */
SCIPrepareSegment(segment,
NO_FLAGS,
&prepare_error);
if (prepare_error == SCI_ERR_OK) {
SCISetSegmentAvailable(segment,
ADAPTER_NO,
NO_FLAGS,
&avail_error);
if (avail_error == SCI_ERR_OK) {
/* segment is now available to the remote nodes */
} else {
/* manage availability error */
}
} else {
/* manage preparation error */
}
As shown in the code above, the preparation and the availability of a memory segment are per adapter.
Typical errors for the two functions above are mainly related to
problems with the specified adapter (for example, it does not exist, or
the segment specified in SCIPrepareSegment() doesn’t match with the one
specified in SCISetSegmentAvailable()).
A state machine for a local segment resource¶
State machine for a local segment shows the life of a local segment.¶
Some additional comments:
NOT PREPARED means that the segment has been allocated successfully but that is not yet prepared to be used by a network adapter.
SCIRemoveSegment()is a legal operation from any state. This does not mean that it will succeed (e.g. it will not if there is a dependency on it).It is not possible to “un-prepare” a segment.
Managing remote segments¶
Imagine the following scenario: A process on the receiver node, with
node id RECEIVER_NODE_ID, has allocated and made a piece of memory
available, with segment id equal to RECEIVER_SEG_ID.
The sender node has node id SENDER_NODE_ID. We would like to have a
process that uses the memory segment on the receiver node.
It is worth clarifying some nomenclature to better understand the next sections:
A local segment is allocated on the receiver node.
From the sender node perspective, the memory segment allocated on the receiver node is a remote segment which is locally represented by a remote segment resource.
So a local segment and a local segment resource live on the same node, whereas a remote segment and a remote segment resource live on different nodes.
Connecting to a remote segment¶
The remote application has to “connect” to a remote segment. Logically speaking the connection process consists in finding the address and the size of the remote segment within the network address space.
This is achieved by calling the function SCIConnectSegment():
sci_desc_t v_dev;
sci_remote_segment_t remote_segment;
sci_error_t error;
SCIInitialize(...);
SCIOpen(&v_dev, ...); /* initialize the virtual device */
SCIConnectSegment(v_dev, /* virtual device */
&remote_segment, /* handle to the remote segment resource*/
RECEIVER_NODE_ID, /* remote node id */
RECEIVER_SEG_ID, /* remote segment id */
ADAPTER_NO, /* local adapter number */
NO_CALLBACK, /* ignore this for the moment */
NO_ARG, /* callback arg, ignore */
SCI_INFINITE_TIMEOUT, /* timeout */
NO_FLAGS,
&error);
if (error == SCI_ERR_OK) {
/* the remote segment is connected */
} else {
/* manage error */
}
Let us look at the list of parameters of the above call:
v_dev represents an initialized virtual device, needed to communicate with
the driver. remote_segment is a handle to a sci_remote_segment_t
descriptor, which contains information about the connection. The descriptor is
allocated and initialized by the call, if successful. The remote node and
segment identifiers (RECEIVER_NODE_ID and RECEIVER_SEG_ID respectively)
uniquely locate a memory segment on the network. The adapter number
ADAPTER_NO refers to the local adapter we want to use to access the remote
segment.
The SCIConnectSegment() call is by default synchronous. It waits until
either the connection request is resolved, or the specified timeout,
expressed in milliseconds, expires. Only the first option can cause the
function to return if the timeout is infinite, as shown above. The call
can be made asynchronous; this behavior is described in the
Events and callbacks chapter.
Once the remote segment is connected, its size (expressed in bytes) can be determined by calling the function
sci_remote_segment_t remote_segment;
size_t remote_segment_size;
SCIConnectSegment(..., &remote_segment, ...);
remote_segment_size = SCIGetRemoteSegmentSize(remote_segment);
Disconnecting from a remote segment¶
Once an application has finished using a remote segment, it should disconnect from it, releasing the remote segment resource.
sci_remote_segment_t segment;
sci_error_t errror;
SCIConnectSegment(..., &segment, ...);
/* use the segment */
SCIDisconnectSegment(segment, NO_FLAGS, &error);
if (error == SCI_ERR_OK) {
/* the remote segment resource is released */
} else {
/* manage error */
}
A typical error for SCIDisconnectSegment() is SCI_ERR_BUSY, meaning that
there are other resources depending on the remote segment resource.