On This Page
- Synopsis for RDMA_RC Example Using IBV Verbs
- Main
- print_config
- resources_init
- resources_create
- sock_connect
- connect_qp
- modify_qp_to_init
- post_receive
- sock_sync_data
- modify_qp_to_rtr
- modify_qp_to_rts
- post_send
- poll_completion
- resources_destroy
- Code for Send, Receive, RDMA Read, RDMA Write
- Synopsis for Multicast Example Using RDMA_CM and IBV Verbs
- Main
- Run
- Code for Multicast Using RDMA_CM and IBV Verbs
- Programming Examples Using RDMA Verbs
- Automatic Path Migration (APM)
- Multicast Code Example Using RDMA CM
- Shared Received Queue (SRQ)
- Experimental APIs
- Dynamically Connected Transport
- DC Usage Model
- Query Device
- Create DCT
- Destroy DCT
- Query DCT
- Arm DCT
- Create DCI
- Verbs API for Extended Atomics Support
- Supported Hardware
- Verbs Interface Changes
- Query Device Capabilities
- Response Format
- QP Initialization
- Send Work Request Changes
- User-Mode Memory Registration (UMR)
- Interfaces
- Device Capabilities
- QP Creation
- Memory Key Manipulation
- Non-inline memory objects
- Memory Key Initialization
- Cross-Channel Communications Support
- Usage Model
- Resource Initialization
- Device Capabilities
- Completion Queue
- QP Creation
- Posting Request List
- ConnectX-3/Connect-IB Data Endianess
This chapter provides code examples using the IBV Verbs
Synopsis for RDMA_RC Example Using IBV Verbs
The following is a synopsis of the functions in the programming example, in the order that they are called.
Main
Parse command line. The user may set the TCP port, device name, and device port for the test. If set, these values will override default values in config. The last parameter is the server name. If the server name is set, this designates a server to connect to and therefore puts the program into client mode. Otherwise the program is in server mode.
Call print_config.
Call resources_init.
Call resources_create.
Call connect_qp.
If in server mode, do a call post_send with IBV_WR_SEND operation.
Call poll_completion. Note that the server side expects a completion from the SEND request and the client side expects a RECEIVE completion.
If in client mode, show the message we received via the RECEIVE operation, otherwise, if we are in server mode, load the buffer with a new message.
Sync client<->server.
At this point the server goes directly to the next sync. All RDMA operations are done strictly by the client.
***Client only ***
Call post_send with IBV_WR_RDMA_READ to perform a RDMA read of server’s buffer.
Call poll_completion.
Show server’s message.
Setup send buffer with new message.
Call post_send with IBV_WR_RDMA_WRITE to perform a RDMA write of server’s buffer.
Call poll_completion.
*** End client only operations ***
Sync client<->server.
If server mode, show buffer, proving RDMA write worked.
Call resources_destroy.
Free device name string.
Done.
print_config
Print out configuration information.
resources_init
Clears resources struct.
resources_create
Call sock_connect to connect a TCP socket to the peer.
Get the list of devices, locate the one we want, and open it.
Free the device list.
Get the port information.
Create a PD.
Create a CQ.
Allocate a buffer, initialize it, register it.
Create a QP.
sock_connect
If client, resolve DNS address of server and initiate a connection to it.
If server, listen for incoming connection on indicated port.
connect_qp
Call modify_qp_to_init.
Call post_receive.
Call sock_sync_data to exchange information between server and client.
Call modify_qp_to_rtr.
Call modify_qp_to_rts.
Call sock_sync_data to synchronize client<->server
modify_qp_to_init
Transition QP to INIT state.
post_receive
Prepare a scatter/gather entry for the receive buffer.
Prepare an RR.
Post the RR.
sock_sync_data
Using the TCP socket created with sock_connect, synchronize the given set of data between client and the server. Since this function is blocking, it is also called with dummy data to synchronize the timing of the client and server.
modify_qp_to_rtr
Transition QP to RTR state.
modify_qp_to_rts
Transition QP to RTS state.
post_send
Prepare a scatter/gather entry for data to be sent (or received in RDMA read case).
Create an SR. Note that IBV_SEND_SIGNALED is redundant.
If this is an RDMA operation, set the address and key.
Post the SR.
poll_completion
Poll CQ until an entry is found or MAX_POLL_CQ_TIMEOUT milliseconds are reached.
resources_destroy
Release/free/deallocate all items in resource struct.
Code for Send, Receive, RDMA Read, RDMA Write
|
const char | *dev_name; | /* IB device name */ |
char | *server_name; | /* server host name */ |
u_int32_t | tcp_port; | /* server TCP port */ |
int | ib_port; | /* local IB port to work with */ |
int | gid_idx; | /* gid index to use */ |
};
/* structure to exchange data which is needed to connect the QPs */
struct cm_con_data_t
{
uint64_t | addr; | /* Buffer address */ |
uint32_t | rkey; | /* Remote key */ |
uint32_t | qp_num; | /* QP number */ |
uint16_t | lid; | /* LID of the IB port */ |
uint8_t | gid[16]; | /* gid */ |
} __attribute__ ((packed));
/* structure of system resources */
struct resources
{
struct ibv_device_attr device_attr; | /* Device attributes */ | |
struct ibv_port_attr | port_attr; | /* IB port attributes */ |
struct cm_con_data_t | remote_props; | /* values to connect to remote side */ |
struct ibv_context | *ib_ctx; | /* device handle */ |
struct ibv_pd | *pd; | /* PD handle */ |
struct ibv_cq | *cq; | /* CQ handle */ |
struct ibv_qp | *qp; | /* QP handle */ |
struct ibv_mr | *mr; | /* MR handle for buf */ |
char | *buf; | /* memory buffer pointer, used for RDMA and send ops */ |
int | sock; | /* TCP socket file descriptor */ |
};
struct config_t config =
{
NULL, | /* dev_name */ |
NULL, | /* server_name */ |
19875, | /* tcp_port */ |
1, | /* ib_port */ |
-1 | /* gid_idx */ |
};
|
struct addrinfo | *resolved_addr = NULL; |
struct addrinfo | *iterator; |
char | service[6]; |
int | sockfd = -1; |
int | listenfd = 0; |
int | tmp; |
|
* sock | socket to transfer data on |
* xfer_size | size of data to transfer |
* local_data | pointer to data to be sent to remote |
|
struct ibv_wc | wc; |
unsigned long | start_time_msec; |
unsigned long | cur_time_msec; |
struct timeval | cur_time; |
int | poll_result; |
int | rc = 0; |
|
struct ibv_send_wr | sr; |
struct ibv_sge | sge; |
struct ibv_send_wr | *bad_wr = NULL; |
int | rc; |
|
struct ibv_recv_wr | rr; |
struct ibv_sge | sge; |
struct ibv_recv_wr | *bad_wr; |
int | rc; |
|
size_t | size; |
int | i; |
int | mr_flags = 0; |
int | cq_size = 0; |
int | num_devices; |
int | rc = 0; |
|
struct ibv_qp_attr | attr; |
int | flags; |
int | rc; |
|
* | qp | QP to transition |
* | remote_qpn | remote QP number |
* | dlid | destination LID |
* | dgid | destination GID (mandatory for RoCEE) |
|
struct ibv_qp_attr | attr; |
int | flags; |
int | rc; |
|
struct ibv_qp_attr | attr; |
int | flags; |
int | rc; |
|
fprintf(stdout, | " Device name | : \"%s\"\n", config.dev_name); |
fprintf(stdout, | " IB port | : %u\n", config.ib_port); |
if (config.server_name)
fprintf(stdout, " IP | : %s\n", config.server_name); |
fprintf(stdout, | " TCP port | : %u\n", config.tcp_port); |
if (config.gid_idx >= 0)
fprintf(stdout, " GID index | : %u\n", config.gid_idx); |
|
struct resources | res; |
int | rc = 1; |
char | temp_char; |
/* parse the command line parameters */
while (1)
{
int c;
static struct option long_options[] =
{
{name = "port", | has_arg = 1, | val = 'p' }, |
{name = "ib-dev", | has_arg = 1, | val = 'd' }, |
{name = "ib-port", | has_arg = 1, | val = 'i' }, |
{name = "gid-idx", | has_arg = 1, | val = 'g' }, |
{name = NULL, | has_arg = 0, | val = '\0'} |
|
Synopsis for Multicast Example Using RDMA_CM and IBV Verbs
This code example for Multicast, uses RDMA-CM and VPI (and hence can be run both over IB and over LLE).
Notes:
- In order to run the multicast example on either IB or LLE, no change is needed to the test's code. However if RDMA_CM is used, it is required that the network interface will be configured and up (whether it is used over RoCE or over IB).
- For the IB case, a join operation is involved, yet it is performed by the rdma_cm kernel code.
- For the LLE case, no join is required. All MGIDs are resolved into MACs at the host.
- To inform the multicast example which port to use, you need to specify "-b <IP address>” to bind to the desired device port.
Main
- Get command line parameters.
- Create event channel to receive asynchronous events.
- Allocate Node and creates an identifier that is used to track communication information
- Start the “run” main function.
- On ending - release and free resources.
m - MC address, destination port
M - unmapped MC address, requires also bind address (parameter “b”)
s - sender flag.
b - bind address.
c - connections amount.
C - message count.
S - message size.
p - port space (UDP default; IPoIB)
API definition files: rdma/rdma_cma.h and infiniband/verbs.h
Run
- Get source (if provided for binding) and destination addresses - convert the input addresses to socket presentation.
- Joining:
- For all connections:if source address is specifically provided, then bind the rdma_cm object to the corresponding network interface. (Associates a source address with an rdma_cm identifier).if unmapped MC address with bind address provided, check the remote address and then bind.
- Poll on all the connection events and wait that all rdma_cm objects joined the MC group.
- Send & receive:
- If sender: send the messages to all connection nodes (function “post_sends”).
- If receiver: poll the completion queue (function “poll_cqs”) till messages arrival.
On ending - release network resources (per all connections: leaves the multicast group and detaches its associated QP from the group)
Code for Multicast Using RDMA_CM and IBV Verbs
|
Programming Examples Using RDMA Verbs
This chapter provides code examples using the RDMA Verbs
Automatic Path Migration (APM)
|
Multicast Code Example Using RDMA CM
|
Shared Received Queue (SRQ)
|
Experimental APIs
Dynamically Connected Transport
The Dynamically Connected (DC) transport provides reliable transport services from a DC Initiator (DCI) to a DC Target (DCT). A DCI can send data to multiple targets on the same or different subnet, and a DCT can simultaneously service traffic from multiple DCIs. No explicit connections are setup by the user, with the target DCT being identified by an address vector similar to that used in UD transport, DCT number, and DC access key.
DC Usage Model
- Query device is used to detect if the DC transport is supported, and if so what are it's characteristics
- User creates DCI's. The number of DCI's depends on the user's strategy for handling concurrent data transmissions.
- User defines a DC Access Key, and initializes a DCT using this access key
- User can query the DCI with the routine ibv_exp_query_qp(), and can query the DCT with the ibv_exp_query_dct() routine.
- User can arm the DCT, so that an event is generated when a DC Access Key violation occurs.
- Send work requests are posted to the DCI's. Data can be sent to a different DCT only after all previous sends complete, so send CQE's can be used to detect such completions.
- The CQ associated with the DCT is used to detect data arrival.
- Destroy resources when done
Query Device
The function int ibv_exp_query_device(struct ibv_context *context, struct ibv_exp_device_attr *attr)
is used to query for device capabilities. The flag IBV_EXP_DEVICE_DC_TRANSPORT in the field exp_atomic_cap of the struct ibv_exp_device_attr defines if the DC transport is supported.
The fields,
int max_dc_req_rd_atom;
int max_dc_res_rd_atom;
in the same structure describe DC's atomic support characteristics.
Create DCT
/* create a DC target object */
struct ibv_dct *ibv_exp_create_dct(struct ibv_context *context,
struct ibv_exp_dct_init_attr *attr);
- context - Context to the InfiniBand device as returned from ibv_open_device.
- attr - Defines attributes of the DCT and include
- Struct ibv_pd *pd - The PD to verify access validity with respect to protection domains
- struct ibv_cq *cq - CQ used to report receive completions
- Struct ibv_srq *srq - The SRQ that will provide the received buffers.Note that the PD is not checked against the PD of the scatter entry. This check is done with the PD of the DC target.
- dc_key - A 64 bit key associated with the DCT.
- port - The port number this DCT is bound to
- access flags - Semantics similar to RC QPs
- remote read
- remote write
- remote atomics
- min_rnr_timer - Minimum rnr nak time required from the requester between successive requests of a message that was previously rejected due to insufficient receive buffers. IB spec 9.7.5.2.8
- tclass- Used by packets sent by the DCT in case GRH is used
- flow_label - Used by packets sent by the DCT in case GRH is used
- mtu - MTU
- pkey_index - pkey index used by the DC target
- gid_index - Gid (e.g., all caps) index associated with the DCT. Used to verify incoming packets if GRH is used. This field in mandatory
- hop_limit - Used by packets sent by the DCT in case GRH is used
- Create flags
Destroy DCT
/* destroy a DCT object */
int ibv_exp_destroy_dct(struct ibv_exp_dct *dct);
Destroy a DC target. This call may take some time till all DCRs are disconnected.
Query DCT
/* query DCT attributes */
int ibv_exp_query_dct(struct ibv_exp_dct *dct, struct ibv_exp_dct_attr *attr);
Attributes queried are:
- state
- cq
- access_flags
- min_rnr_flags
- pd
- tclass
- flow_label
- dc_key
- mtu
- port
- pkey_index
- gid_index
- hop_limit
- key_violations
- pd
- srq
- cq
Arm DCT
A DC target can be armed to request notification when DC key violations occur. After return from a call to ibv_exp_arm_dct, the DC target is moved into the “ARMED” state. If a packet targeting this DCT with a wrong key is received, the DCT moves to the “FIRED” state and the event IBV_EXP_EVENT_DCT_KEY_VIOLATION is generated. The user can read these events by calling ibv_get_async_event. Events must be acked with ibv_ack_async_event.
struct ibv_exp_arm_attr {
uint32_t comp_mask;
};
int ibv_exp_arm_dct(struct ibv_exp_dct *dct,
struct ibv_exp_arm_attr *attr);
- dct - Pointer to a previously create DC target
- attr - Pointer to arm DCT attributes. This struct has a single comp_mask field that must be zero in this version
Create DCI
A DCI is created by calling ibv_exp_create_qp() with a new QP type, IBV_EXP_QPT_DC_INI The semantics is similar to regular QPs. A DCI is an initiator endpoint which connects to DC targets. Matching rules are identical to those of QKEY for UD. However, the key is 64 bits. A DCI is not a responder, it's only an initiator.
The following are the valid state transitions for DCI with required and optional params
From | To | Required | Optional |
Reset | Init | IBV_QP_PKEY_INDEX, IBV_QP_PORT, IBV_QP_DC_KEY | |
Init | Init | IBV_QP_PKEY_INDEX, IBV_QP_PORT, IBV_QP_ACCESS_FLAGS | |
Init | RTR | IBV_QP_AV, IBV_QP_PATH_MTU | IBV_QP_PKEY_INDEX, IBV_QP_DC_KEY |
RTR | RTS | IBV_QP_TIMEOUT, IBV_QP_RETRY_CNT, IBV_QP_RNR_RETRY, IBV_QP_MAX_QP_RD_ATOMIC | IBV_QP_ALT_PATH, IBV_QP_MIN_RNR_TIMER, IBV_QP_PATH_MIG_STATE |
RTS | RTS | IBV_QP_ALT_PATH, IBV_QP_PATH_MIG_STATE, IBV_QP_MIN_RNR_TIMER |
Verbs API for Extended Atomics Support
The extended atomics capabilities provide support for performing Fetch&Add and masked Compare&Swap atomic operations on multiple fields. The figure below shows how the individual fields within the user-supplied-data field are specified.
In the figure above, the total operand size is N bits, with the length of each data field being four bits. The 1's in the mask indicate the termination of a data field. With ConnectX® family of HCA's and Connect-IB®, there is always an implicit 1 in the mask.
Supported Hardware
The extended atomic operations are supported by ConnectX®-2 and subsequent hardware. ConnectX-2/ConnectX®-3 devices employ read-modify-write operations on regions that are sized as multiples of 64 bits with 64 bit alignment. Therefore, when operations are performed on user buffers that are smaller than 64 bits, the unmodified sections of such regions will be written back unmodified when the results are committed to user memory. Connect-IB® and subsequent devices operate on memory regions that are multiples of 32 or 64 bits, with natural alignment.
Verbs Interface Changes
Usage model:
- Query device to see if
- Atomic Operations are supported
- Endieness of atomic response
- Extended atomics are supported, and the data sizes supported
- Initialize QP for use with atomic operations, taking device capabilities into account
- Use the atomic operations
- Destroy QP after finishing to use it
Query Device Capabilities
The device capabilities flags enumeration is updated to reflect the support for extended atomic operations by adding the flag:
+ IBV_EXP_DEVICE_EXT_ATOMICS ,
and the device attribute comp mask enumeration ibv_exp_device_attr_comp_mask is updated with:
+ IBV_EXP_DEVICE_ATTR_EXT_ATOMIC_ARGS,
The device attributes struct, ibv_exp_device_attr, is modified by adding struct ibv_exp_ext_atomics_params ext_atom
|
Atomic fetch&add operations on subsections of the operands are also supported, with max_fa_bit_boundary being the log-base-2 of the largest such subfield, in bytes. Log_max_atomic_inline is the log of the largest amount of atomic data, in bytes, that can be put in the work request and includes the space for all required fields. -For ConnectX and Connect-IB the largest subsection supported is eight bytes.
The returned data is formatted in units that correspond to the host's natural word size. For example, if extended atomics are used for a 16 byte field, and returned in big-endian format, each eight byte portion is arranged in big-endian format, regardless of the size the fields used in an association in a multi-field fetch-and-add operation.
Response Format
The returned data is formatted in units that correspond to the host's natural word size. For example, if extended atomics are used for a 16 byte field, and returned in big-endian format, each eight byte portion is arranged in big-endian format, regardless of the size the fields used in an association in a multi-field fetch-and-add operation.
QP Initialization
QP initialization needs additional information with respect to the sizes of atomic operations that will be supported inline. This is needed to ensure the QP is provisioned with sufficient send resources to support the number of support WQE's.
The QP attribute enumeration comp-mask, ibv_exp_qp_init_attr_comp_mask, is expanded by adding
+ IBV_EXP_QP_INIT_ATTR_ATOMICS_ARG ,
Send Work Request Changes
|
User-Mode Memory Registration (UMR)
This section describes User-Mode Memory Registration (UMR) which supports the creation of memory keys for non-contiguous memory regions. This includes the concatenation of arbitrary contiguous regions of memory, as well as regions with regular structure.
Three examples of non-contiguous regions of memory that are used to form new contiguous regions of memory are described below. Figure 2 shows an example where portions of three separate contiguous regions of memory are combined to create a single logically contiguous region of memory. The base address of the new memory region is defined by the user when the new memory key is defined.
The figure below shows a non-contiguous memory region with regular. This region is defined by a base address, stride between adjacent elements, the extent of each element, and a repeat count.
The figure below shows an example where two non-contiguous memory regions are interleaved, using the repeat structure UMR.
Interfaces
The usage model for the UMR includes:
- Ability to with ibv_exp_query_device if UMR is supported.
- If UMR is supported, checking struct ibv_exp_device_attr for it's characteristics
- Using ibv_exp_create_mr() to create an uninitialized memory key for future UMR use
- Using ibv_exp_post_send() to define the new memory key. This can be posted to the same send queue that will use the memory key in future operations.
- Using the UMR defined as one would use any other memory keys
- Using ibv_exp_post_send() to invalidate the UMR memory key
- Releasing the memory key with the ibv_dereg_mr()
Device Capabilities
The query device capabilities is queried to see if the UMR capability is supported, and if so, what are it's characteristics. The routine used is:
int ibv_exp_query_device(struct ibv_context *context, struct ibv_exp_device_attr *attr)
struct ibv_exp_umr_caps umr_caps field describes the UMR capabilities. This structure is defined as:
|
The fields added to the struct struct ibv_exp_device_attr to support UMR include:
- exp_device_cap_flags - UMR support available if the flag IBV_EXP_DEVICE_ATTR_UMR is set.
- max_mkey_klm_list_size - maximum number of memory keys that may be input to UMR
- max_send_wqe_inline_klms - the largest number of KLM's that can be provided inline in the work request. When the list is larger than this, a buffer allocated via the struct ibv_mr *ibv_exp_reg_mr(struct ibv_exp_reg_mr_in *in) function, and provided to the driver as part of the memory key creation
- max_umr_recursion_depth - memory keys created by UMR operations may be input to UMR memory key creation. This specifies the limit on how deep this recursion can be.
- max_umr_stride_dimension - The maximum number of independent dimensions that may be used with the regular structure UMR operations. The current limit is one.
QP Creation
To configure QP UMR support the routine
ibv_qp * ibv_exp_create_qp(struct ibv_context *context, struct ibv_exp_qp_init_attr *qp_init_attr)
is to be used. When the attribute IBV_EXP_QP_CREATE_UMR is set in the exp_create_flags field of struct ibv_exp_qp_init_attr enables UMR support. The attribute IBV_ IBV_EXP_QP_INIT_ATTR_MAX_INL_KLMS is set in the field comp_mask struct ibv_exp_qp_init_attr, with the field max_inl_send_klms defining this number.
Memory Key Manipulation
To create an uninitialized memory key for future use the routine
|
To query the resources associated with the memory key, the routine
|
Non-inline memory objects
When the list of memory keys input into the UMR memory key creation is too large to fit into the work request, a hardware accessible buffer needs to be provided in the posted send request. This buffer will be populated by the driver with the relevant memory objects.
|
Memory Key Initialization
The memory key is manipulated with the ibv_exp_post_send() routine. The opcodes IBV_EXP_WR_UMR_FILL and IBV_EXP_WR_UMR_INVALIDATE are used to define and invalidate, respectively, the memory key.
The struct ibv_exp_send_wr contains the following fields to support the UMR capabilities:
|
Cross-Channel Communications Support
The Cross-Channel Communications adds support for work requests that are used for synchronizing communication between separate QP's and support for data reductions. This functionality, for example, is sufficient for implementing MPI collective communication with a single post of work requests, with the need to check only of full communication completion, rather than on completion of individual work requests.
Terms relevant to the Cross-Channel Synchronization are defined in the following table:
Term | Description |
Cross Channel supported QP | QP that allows send_enable, recv_enable, wait, and reduction tasks. |
Managed send QP | Work requests in the corresponding send queues must be explicitly enabled before they can be executed. |
Managed receive QP | Work requests in the corresponding receive queues must be explicitly enabled before they can be executed. |
Master Queue | Queue that uses send_enable and/or recv_enable work requests to enable tasks in managed QP. A QP can be both master and managed QP. |
Wait task (n) | Task the completes when n completion tasks appear in the specified completion queue |
Send Enable task (n) | Enables the next n send tasks in the specified send queue to be executable. |
Receive Enable task | Enables the next n send tasks in the specified receive queue to be executable. |
Reduction operation | Data reduction operation to be executed by the HCA on specified data. |
Usage Model
- Creating completion queues, setting the ignore-overrun bit for the CQ's that only hardware will monitor.
- Creating and configuring the relevant QP's, setting the flags indicating that Cross-Channel Synchronization work requests are supported, and the appropriate master and managed flags (based on planned QP usage). For example, this may happen when an MPI library creates a new communicator.
- Posting tasks list for the compound operations.
- Checking the appropriate queue for compound operation completion (need to request completion notification from the appropriate work request). For example, a user may setup a CQ that receives completion notification for the work-request whose completion indicates the entire collective operation has completed locally.
- Destroying the QP's and CQ's created for Cross-Channel Synchronization operations, once the application is done using them. For example, an MPI library may destroy these resources after it frees all the communicator using these resources.
Resource Initialization
Device Capabilities
|
Completion Queue
Completion queue (CQ) that will be used with Cross Channel Synchronization operations needs to be marked as such as CQ at creation time. This CQ needs to be initialized with
|
QP Creation
To configure the QP for Cross-Channel use following function is used
|
The exp_create_flags that are available are
- IBV_EXP_QP_CREATE_CROSS_CHANNEL - This must be set for any QP to which cross-channel-synchronization work requests will be posted.
- IBV_EXP_QP_CREATE_MANAGED_SEND - This is set for a managed send QP, e.g. one for which send-enable operations are used to activate the posted send requests.
- IBV_EXP_QP_CREATE_MANAGED_RECV - This is set for a managed receive QP, e.g. one for which send-enable operations are used to activate the posted receive requests.
Posting Request List
A single operation is defined with by a set of work requests posted to multiple QP's, as described in the figure below.
The lists are of tasks are NULL terminated.
The routine
|
In addition, in the field exp_send_flags in ibv_exp_send_wr the flag IBV_EXP_SEND_WITH_CALC indicates the presence of a reduction operation, and IBV_EXP_SEND_WAIT_EN_LAST is used to signal the last wait task posted for a given CQ in the current task list.
For ibv_exp_calc_data_type the types
- IBV_EXP_CALC_DATA_TYPE_INT,
- IBV_EXP_CALC_DATA_TYPE_UINT,
- IBV_EXP_CALC_DATA_TYPE_FLOA
are supported.
The supported data size for ibv_exp_data_size is IBV_EXP_CALC_DATA_SIZE_64_BIT.
New send opcodes are defined for the new work requests. These include:
- IBV_EXP_WR_SEND_ENABLE
- IBV_EXP_WR_RECV_ENABLE
- IBV_EXP_WR_CQE_WAIT
ConnectX-3/Connect-IB Data Endianess
The ConnectX-3 and Connect-IB HCA's expect to get the data in network order.