目录

​前言​

​libibverbs和librdmacm的区别?​

​客户端和服务端操作​

​Rdma Verbs​

​Client Operation​

​Server Operation​

​Return Codes​



相互关系
  • rdma_cm依赖ib_verbs
  • ib_verbs和rdma_cm 都是Mellanox公司提供的两个动态链接库。(这两个库的API ,queue pair (QP)  类似于 TCP的sockets)
  • rdma_cm,连接管理器库,包含了对ib_verbs的封装或具体化,通过verbs API 对硬件进行访问。

   例如:

   rdma_post_send  = ibv_post_send(qp,wr.opcode=IBV_WR_SEND,bad_wr)  ?

   rdma_post_read  = ibv_post_send(qp,wr.opcode=IBV_WR_RDMA_READ ,bad_wr) ?

  rdma_post_write  = ibv_post_send(qp,wr.opcode=IBV_WR_RDMA_WRITE,bad_wr) ?

 

libibverbs和librdmacm的区别?

 

在infiniband/verbs.h中,定义了ibv_post_send()和ibv_post_recv()操作,分别表示,将wr发布到SQ和RQ中,至于是什么操作(send or write/read),和wr中的opcode有关。

对ibv_post_send()来说,对应的是struct ibv_send_wr,其中有opcode,表示操作码,有SEND/WRITE/READ等。

对于ibv_post_recv()来说,对应的是struct ibv_recv_wr,没有操作码,因为只有接收一个动作,所以不需要定义其它的操作码。但是发送来说,有三类。

在rdma/rdma_verbs.h中,有rdma_post_send(),rdma_post_recv(),rdma_post_read(),rdma_post_write()。

rdma_post_send():把wr发布到QP的SQ中,需要mr

rdma_post_recv():把wr发布到QP的RQ中,需要mr

rdma_post_read():把wr发布到QP的SQ中,执行RDMA READ操作,需要远程地址和rkey,以及本地存储地址和长度,以及mr

rdma_post_write():把wr发布到QP的SQ中,RDMA WRITE操作,需要远程的被写入地址和rkey,以及本地要发送数据的地址和长度,以及mr

所以rdma/rdma_verbs.h中的四种通信函数其实和infiniband/verbs.h中的两种方法是一致的。

 

ibv_post_send()对应rdma_post_send()、rdma_post_read()、rdma_post_write(),ibv_post_recv()对应rdma_post_recv()。

 

客户端和服务端操作

(​​https://linux.die.net/man/7/rdma_cm#:~:text=The%20rdma_cm%20supports%20the%20full%20range%20of%20verbs,an%20array%20of%20buffers%20for%20sending%20and%20receiving​​)

Rdma Verbs

The rdma_cm supports the full range of verbs available through the libibverbs library and interfaces. However, it also provides wrapper functions for some of the more commonly used verbs funcationality. The full set of abstracted verb calls are:

rdma_reg_msgs - register an array of buffers for sending and receiving

rdma_reg_read - registers a buffer for RDMA read operations

rdma_reg_write - registers a buffer for RDMA write operations

rdma_dereg_mr - deregisters a memory region

rdma_post_recv - post a buffer to receive a message

rdma_post_send - post a buffer to send a message

rdma_post_read - post an RDMA to read data into a buffer

rdma_post_write - post an RDMA to send data from a buffer

rdma_post_recvv - post a vector of buffers to receive a message

rdma_post_sendv - post a vector of buffers to send a message

rdma_post_readv - post a vector of buffers to receive an RDMA read

rdma_post_writev - post a vector of buffers to send an RDMA write

rdma_post_ud_send - post a buffer to send a message on a UD QP

rdma_get_send_comp - get completion status for a send or RDMA operation

rdma_get_recv_comp - get information about a completed receive

Client Operation

This section provides a general overview of the basic operation for the active, or client, side of communication. This flow assume asynchronous operation with low level call details shown. For synchronous operation, calls to rdma_create_event_channel, rdma_get_cm_event, rdma_ack_cm_event, and rdma_destroy_event_channel would be eliminated. Abstracted calls, such as rdma_create_ep encapsulate serveral of these calls under a single API. Users may also refer to the example applications for code samples. A general connection flow would be:

rdma_getaddrinfo

retrieve address information of the destination

rdma_create_event_channel

create channel to receive events

rdma_create_id

allocate an rdma_cm_id, this is conceptually similar to a socket

rdma_resolve_addr

obtain a local RDMA device to reach the remote address

rdma_get_cm_event

wait for RDMA_CM_EVENT_ADDR_RESOLVED event

rdma_ack_cm_event

ack event

rdma_create_qp

allocate a QP for the communication

rdma_resolve_route

determine the route to the remote address

rdma_get_cm_event

wait for RDMA_CM_EVENT_ROUTE_RESOLVED event

rdma_ack_cm_event

ack event

rdma_connect

connect to the remote server

rdma_get_cm_event

wait for RDMA_CM_EVENT_ESTABLISHED event

rdma_ack_cm_event

ack event

Perform data transfers over connection

rdma_disconnect

tear-down connection

rdma_get_cm_event

wait for RDMA_CM_EVENT_DISCONNECTED event

rdma_ack_cm_event

ack event

rdma_destroy_qp

destroy the QP

rdma_destroy_id

release the rdma_cm_id

rdma_destroy_event_channel

release the event channel

An almost identical process is used to setup unreliable datagram (UD) communication between nodes. No actual connection is formed between QPs however, so disconnection is not needed.

Although this example shows the client initiating the disconnect, either side of a connection may initiate the disconnect.

Server Operation

This section provides a general overview of the basic operation for the passive, or server, side of communication. A general connection flow would be:

rdma_create_event_channel

create channel to receive events

rdma_create_id

allocate an rdma_cm_id, this is conceptually similar to a socket

rdma_bind_addr

set the local port number to listen on

rdma_listen

begin listening for connection requests

rdma_get_cm_event

wait for RDMA_CM_EVENT_CONNECT_REQUEST event with a new rdma_cm_id

rdma_create_qp

allocate a QP for the communication on the new rdma_cm_id

rdma_accept

accept the connection request

rdma_ack_cm_event

ack event

rdma_get_cm_event

wait for RDMA_CM_EVENT_ESTABLISHED event

rdma_ack_cm_event

ack event

Perform data transfers over connection

rdma_get_cm_event

wait for RDMA_CM_EVENT_DISCONNECTED event

rdma_ack_cm_event

ack event

rdma_disconnect

tear-down connection

rdma_destroy_qp

destroy the QP

rdma_destroy_id

release the connected rdma_cm_id

rdma_destroy_id

release the listening rdma_cm_id

rdma_destroy_event_channel

release the event channel

Return Codes

= 0

success

= -1

error - see errno for more details

Most librdmacm functions return 0 to indicate success, and a -1 return value to indicate failure. If a function operates asynchronously, a return value of 0 means that the operation was successfully started. The operation could still complete in error; users should check the status of the related event. If the return value is -1, then errno will contain additional information regarding the reason for the failure.

Prior versions of the library would return -errno and not set errno for some cases related to ENOMEM, ENODEV, ENODATA, EINVAL, and EADDRNOTAVAIL codes. Applications that want to check these codes and have compatability with prior library versions must manually set errno to the negative of the return code if it is < -1.